Problems with the JVM
From OO to Clojure Workshop!
Watch my free workshop to help you learn Clojure faster and shift your paradigm to functional.
Well, my last few articles might make you think I'm all rosy on the JVM. It's got all sorts of investment in performance, cross-platform compatibility, and backwards compatibility. It has industry adoption and a giant ecosystem of libraries and tooling. But there's still stuff that was just a mistake.
Null pointers are completely unnecessary. They muddy up the whole mess.
You have to do
null checks before method calls. Their inventor calls
them "My Billion Dollar
Who hasn't been bitten by a
Java has null pointers. And so does the JVM. Clojure would be a better
language without null pointers (in my opinion), but being on the JVM,
you're going to get one sooner or later. So Clojure has made
common, first-class value. That means you deal with it all the time. I
think I use
nil way more in Clojure than I ever did in Java. And it's
less dangerous, since Clojure uses extensive nil
For instance, in Clojure
nil has a type (which is
nil), while Java's
null has a weird type without a name.
nil is a seq and an empty
associative collection. It's not perfect, but
nils aren't as big of a
nulls are in Java.
Tail call elimination
Tail call elimination (TCE) (also incorrectly called tail call optimization) is a standard technique when compiling functions, but the JVM does not do it.
Tail calls are function calls that are the last thing a function will ever do. For instance, in this function:
(defn math [x y z] (+ x (* y z)))
The last thing the function will do is add, so that call to
+ is a
tail call. The multiplication isn't a tail call because there's more
work to do.
Because it's the last thing the function will do, you don't really need
to remember to come back to the function after the addition is
calculated. You can skip that and go straight to where the call to
math would return. Typically, programming languages will use a stack
to store the return address. You can save a lot of memory in the stack
by eliminating these unnecessary return addresses. Moreover, you save so
much that you eliminate a whole class of stack overflow errors.
Languages that use a lot of recursion take advantage of TCE. Recursive
definitions are difficult or impossible without it.
It's one of the central techniques Guy Steele and Gerald Sussman talk about in the Lambda the Ultimate papers. Guy Steele has been campaigning for tail call elimination since Java was originally released. Here is a succint and recent plea based on an older essay.
However, Clojure has made minor practical compromises to eliminate what
would be a problem in a language focused on recursion. First of all, the
Clojure compiler will set locals to
null just before doing a tail call
This lets the garbage collector get started on memory that otherwise
would have a reference from the stack.
Another compromise is the
recur form. It is known as
explicit tail recursion. A big reason to have TCE is to allow for tail
self-recursion, that is, a function calling itself from a tail position.
recur form does exactly that. It's only allowed in the tail
position (the last function called from a function) and it does
self-recursion without adding to the stack. It's an interesting
compromise and it solves one of the biggest uses of TCE. However,
pervasive TCE is still important and un-addressed.
When there is a lot of mutual tail calls (meaning functions that call each other in tail position), Clojure has another feature called trampolines. The solution is clever: instead of doing a tail call, just return a function that would do the tail call. Ugh. It's hard to explain in English, easy in code. Let's say you had this setup:
(defn even? [x] (if (zero? x) true (odd? (dec x)))) (defn odd? [x] (even? (dec x)))
Alright, this is totally contrived, and it only works for positive
integers. But just notice that
even? are in tail positions.
If I call this with a big number
(even? 100000000), I get a stack
overflow. Each call to
odd? adds another frame to the
stack. I can't use
recur because they aren't tail calls. I have to use
(defn even? [x] (if (zero? x) true (fn  (odd? (dec x))))) (defn odd? [x] (fn  (even? (dec x))))
Now I can just call
(trampoline (even? 100000000)) and it works just
fine. It's slower than TCE, but it doesn't take all of the memory of
non-TCE function calls.
All trampoline does is check the value you pass it. If it's a function, it calls it and loops with the return value. If it's not a function, it returns it. So it loops until it doesn't have a function any more. It's quite a simple thing. Check out the source.
Notice also that, if you know it, you could use the
# syntax to turn
those tail calls into functions, which makes it look a little nicer:
(defn even? [x] (if (zero? x) true #(odd? (dec x)))) (defn odd? [x] #(even? (dec x)))
The last thing I want to mention about tail call elimination with Clojure is that someone did try to add TCE to the Clojure compiler by doing source code transformations. I don't recommend you use it, but it might be interesting to check out.
Are JVM startup times really slow? This article has gone long, so I'll deal with this next time.
Meanwhile, if you learned something about Clojure and the JVM and want to learn more, check out JVM Fundamentals for Clojure. It's a five-hour course covering lots of JVM stuff that I use every day when programming in Clojure. It is not a complete JVM course, of course, just the important stuff for using Clojure. Buy it today or buy a membership.