Our Google Summer of Code interview this week is with Alexander Yakushev, whom I interviewed before way back two years ago in Issue 9.
Clojure Gazette: Where are you from? Where and what do you study?
Alexander Yakushev: I'm from Ukraine and currently study in Norway for my Master degree in applied computer science.
CG: What project are you working on?
AY: I'm working on the project called "Lean Clojure/JVM runtime", or Project Skummet as I nickname it myself. It is no secret that Clojure programs aren't particularly quick to start up or that they occupy considerable amount of memory. The mission of Project Skummet is to recognize parts of Clojure runtime which can be simplified and patch the compiler to produce bytecode that is "leaner" in terms of loading time and memory footprint. This is mostly achieved by giving up some dynamic capabilities that are useless in static AOT-compiled builds anyway.
CG: Are you familiar with the other project called "Lean Clojure"? How are they different?
AY: In fact I learned about it only recently and hadn't yet a chance to talk to Reid [McKenzie, also a GSoC participant]. From the first look it seems that our projects are fairly similar, at least by the short description on Melange. Daniel [Solano Gómez] mentioned that there were multiple people interested in the idea, so I guess two were picked as a safety measure. This is a non-trivial project, so having two different takes on it increases the change of something viable being produced.
CG: That makes sense. What kinds of things will Project Skummet remove from the compiled code?
AY: The primary thing to do is to get rid of Vars in the way we currently know them. The concept of Var gives a lot of flexibility for redefining values and functions when you develop in the REPL, but this can be sacrificed in a release build that you are not going to modify dynamically. By replacing most Vars with static fields we can save the time spent on calling
bindRoot() during namespace initialization, reduce the size of compiled classes (so they will be loaded faster) and avoid the roundtrip of calling
getRawRoot() each time the var is referenced.
Because of this change many of Clojure's other concepts relying on Vars won't work anymore. So simple replacements will have to be found for them as well. For instance, protocols might be able to be substituted with interfaces, and multimethods can be compiled into switch-cases. This will probably require making the compiler aware of some features that are implemented in pure Clojure at the moment.
The most ambitious ideas are to compile Clojure functions to Java methods and to implement code tree-shaking in compile time, so that only used functions from the namespace get into the resulting classes. The introduction of changes like that into the compiler cannot be achieved by a minor surgery, but rather requires a massive shift in the way the compiler works. For example, the compiler has to become multi-pass to do such things which is already a significant innovation. That's why I view these ideas as experimental for now.
CG: How will dynamic vars work? Will they need to be first-class?
AY: I tend to think that they should be left as is. If dynamic vars were to be reimplemented using this static field approach, I guess the result would yield something entirely similar to the existing dynamic vars in both the design and performance characteristics. Besides, the absolute most of functions and values are bound to non-dynamic vars, so the dynamic ones are certainly not the prime target for optimization.
Which leads to another proviso - by removing vars I don't actually mean deleting the
clojure.lang.Var class and the supporting logic in
c.l.Namespace. Vars and "lean-compiled vars" can peacefully coexist in the environment compiled with Skummet. This is actually important because with such avant-garde changes to the compiler I expect a decent amount of existing libraries to be broken. Which is why it will be useful to say "OK, I need this particular Var as is, don't optimize it". Code that relies on "with-redefs" is probably the main candidate for this.
CG: You've mentioned two types of overhead: the time to bootstrap Clojure code and the time to deref a var. How much time do you expect to save for each type? Is the deref time mitigated at all by the JIT compiler?
AY: There is an excellent post by Nicholas Kariniemi where he analyzes where exactly the time is spent during initialization. His results show that var and metadata creation and assignment take almost 50% of clojure.core bootstrap time on desktop and more than a third on Android. I hope to shave a significant portion it, although it is hard to talk numbers before at least a proof-of-concept is reached. Also with the removal of vars and metadata the class files will become smaller, which means faster loading of them and less occupied PermGen.
As for the Var dereferencing overhead , it is not the primary concern of this project but I expect a perceptible performance boost from cutting it down. It isn't something that JIT or some sort of caching can help us with - JVM has no guarantees that
Var.deref() will return the same result, so it has to call it every time. Removing the
deref() and addressing an AFunction object directly is one step closer to being able to use MethodHandle and invokedynamic from Java 7 (if Clojure ever drops the support for Java 6). Michael Fogus wrote a great article about it a few years back.
CG: It sounds like a valuable project. What obstacles or challenges do you foresee?
AY: As I've already said, when modifying the core of the runtime it is hard to get everything correct right from the beginning. Library developers don't use Vars just as mutable references - they put metadata on them, they attach hooks to them with robert-hooke, they use
with-redefs and so on. Then there are protocols and multimethods that have their own logic dependent on vars. If any of these or other corner cases are not addressed then some libraries will break. And the ultimate goal of Project Skummet is to provide a compiler/runtime that is fully compatible with the current one.
When we talk about more substantial modifications like tree-shaking or compiling functions to methods, the most challenging part I think is to implement those without changing the compiler semantics too much. Because if we don't then it can again lead to unforeseen consequences with the existing code becoming broken. There is currently no specification on how Clojure AOT compiler should work compared to the dynamic compilation, thus developers just rely on the way it works right now and we must meet their expectations.
A sort of metachallenge is to integrate Skummet compiler into Clojure in a way that they don't interfere, and provide users with a convenient way to use either. Build Profiles might be the solution, an idea that was proposed a couple years ago and got some attention recently.
CG: Are there plans to integrate Skummet with the Clojure-on-Android work?
AY: Absolutely. My personal main interest with this project is to fix the most limiting Clojure-Android issues which are startup time and memory consumption. So far I can't see any reasons why Skummet modifications would work on JVM and wouldn't on Dalvik/ART, but in any case being able to produce Android-compatible bytecode is a high-priority condition in my plan.
CG: How can interested people follow your progress?
AY: I use Clojure-Android blog to publish my ideas and discoveries. I'll also post a Github link there once I start committing something useful.
CG: I can't wait! I would love to see faster startup times as well for desktop apps. Is there anything else you'd like to mention before we close?
AY: That's probably all I got to say for now. The active phase of GSoC has just begun, so I hope to have more to share in the nearest future. I'd like to thank my mentors, Daniel Solano Gómez and Timothy Baldridge, and the whole Clojure GSoC team for giving me an opportunity to work on this project. And of course thank you Eric for this great conversation!
CG: Thank you and good luck!