Java Generational Garbage Collection
Persistent data structures require really good garbage collection. Lisp has always had persistent data structures. The cons list is persistent because it can share structure. When Clojure came out, it featured immutable persistent data structures. And not just the list. It also has vectors, maps, and sets.
Because they're immutable, once an object is instantiated, it is never changed. That means Clojure has to use a "copy-on-write" discipline. Instead of modifying the object, you make a modified copy. That means you've got lots of garbage. You're making copies all the time.
Luckily, because the data structures are persistent, it means that they share a lot of their structure. The amount of garbage memory per modification can be quite low relative to the amount of memory that can be reused.
But it's still a lot of garbage. Every seq allocates. Every time
you modify a vector or a map, it allocates. Garbage is the name of
the game in Clojure. That's why garbage collection was invented. A lot
of research has gone into garbage collection, particularly
for reducing the amount of time the GC has to pause.
One of the coolest ways to reduce that pause time is to consider the age of the objects. Most objects that are old (they were instantiated a while ago) tend to stick around. So why look at them frequently to check? It also turns out that most objects are discarded very soon after being used. So you get rid of most of your garbage very quickly. This type of memory management is called Generational Garbage Collection.
Generational GC is one of the reasons Clojure can be so fast. Clojure creates so much "ephemeral" garbage, it's important to be able to allocate and collect it quickly. Java's memory management has been tuned and worked on for years. I tried to find numbers to quantify how much effort has been put into it, but I couldn't. I think it's safe to say that it's in the millions of dollars. Allocation and collection are down to a handful of instructions per object.
Clojure's data structures exercise the JVM's fast GC. While using them is never going to be as fast as pure Java, the JVM does allow Clojure to be practical. Of course, the memory usage still needs to be configured and managed, but the JVM allows us to program at a much higher level and to take advantage of shared-memory parallelism.
If you're not experienced in the JVM, all of those details can be overwhelming. People have told me flat out that they would love to write Clojure but they're afraid of the JVM. This is a shame, because the JVM is what enables Clojure. I wanted to create something that would help people feel comfortable with the JVM without spending years gaining experience.
So I created a five-hour module called JVM Fundamentals for Clojure. It won't turn you into a JVM expert, but it will give you an in-depth tour. It teaches you all of those things that I do day-to-day that I've seen people get stuck with. How do you deal with out of memory errors? How do you navigate this huge standard library? All of that stuff is covered. You can get the course as part of Beginner Clojure: An Eric Normand Signature Course.
Of course, not everything is rosy on the JVM. Besides the complexity, there are a few things that are not so great that actually make it a less-than-ideal host.
