Atom code explanation

Summary: I go over a real-world example of how atoms and immutable values allow you to compose constructs in ways that are easy to reason about and less prone to error.

The other day I was in IRC #clojure and someone asked a good question. They had code like the following, and they couldn't understand why they couldn't modify a map.

(def state (atom {}))

(doseq [x [1 2 3]]
  (assoc @state :x x))

(println @state)

What does this print? Well, the asker wanted it to print {:x 3}. But it printed {}. To understand what's happening, let's go step by step.

{} creates an empty map. It's literal syntax for a constructor for a map. This one happens to be empty.

(atom {}) takes the empty map that was just created and passes it to the function atom, which constructs a new clojure.lang.Atom. Atoms are objects, and its current state is the empty map we just passed in.

(def state (atom {})) defines a new var called state in the current namespace.

At this point, we've got a variable called state whose value is an atom that holds an empty map.

(doseq [x [1 2 3]] loops over the numbers 1, 2, and 3. x will be bound to each of those numbers, in turn.

@state gets transformed into (deref state), which returns the current value of state. :x is a literal keyword, and x is a reference to the x bound inside the loop.

(assoc @state :x x) creates a new map by taking the current value of state (which happens to be {}) and associating :x with x (which will be 1, 2, and 3 as the loop happens). The value is returned by assoc, and then thrown away, since it isn't bound to anything.

Then (println @state) will print the current value of state, which still is {}.

This code shows a common problem that beginners face in Clojure: how do immutable data structures (like maps) and the concurrency primitives (like atom) work together to manage state?

The answer is quite simple (in the Rich Hickeyan sense) and elegant. By separating the ideas of value and state, Clojure has made it easy to express precisely the behavior you want in concurrent systems.

The value is the map. It is immutable. It cannot change. It is a single value, and it will always be the same. That means threads can share the value with no worries that one of them will change it.

The state is the atom. It's a mutable object. And being an object, it has methods that define its interface. In the code above, we saw that you can call deref on an atom to get its current value. deref is basically a getter.

The main way to change the value of an atom is using swap!. swap! takes an atom and a function (plus optional arguments) and calls the function on the current value of the atom. It then sets the value of the atom to the return value of the function. So let's use that to fix the code.

(def state (atom {}))

(doseq [x [1 2 3]]
  (swap! state assoc :x x))

(println @state)

swap! takes the atom (state) and a function (assoc) and some arguments (:x x). It calls assoc on the current value of state with those extra arguments and sets the value of the atom to the return value of the function.

The swap! expression is almost (but not) the same as this code:

(reset! state (assoc @state :x x)) ;; never do this

reset! changes the state of the atom but without regard to the current value. This new code is bad because it's not thread-safe. Use swap! if you need to use the current value to determine the new value.

So what does an atom do? What does it represent?

Atoms guarantee one very important thing: that each state is calculated from the last state. The swap! operation is atomic. No matter how many threads are trying to change the value, each change is calculated from the previous value and no previous values are lost. That's its contract as an object and it's one of the important ways that Clojure helps with concurrency.

How can a value be lost?

If we have two threads, each trying to change state in the same incorrect way (using reset!), the order of evaluation will have several steps:

  1. (deref state) ;; call this value *1
  2. (assoc *1 :x x) ;; call this value *2
  3. (reset! state *2)

Because the threads are running concurrently, the operations have a chance of interleaving their steps in unwanted ways. For instance, threads A and B might interleave like this:

  1. A: (deref state) ;; call this value *1A
  2. A: (assoc *1A :x x) ;; call this value *2A
  3. B: (deref state) ;; call this value *1B
  4. B: (assoc *1B :x x) ;; call this value *2B
  5. B: (reset! state *2B)
  6. A: (reset! state *1A)

What happened? On line 6, A set the value of state to the value it calculated on line 2. So B's work is completely discarded. That's probably not what was intended. What's worse is that that is one of many possible interleavings, some of which work and some don't. Welcome to concurrency!

What you probably wanted was to make sure that no work is discarded. You want the operation to be atomic. That's why it's called an atom. swap! is atomic. A swap! to an atom occurs "all at once", instead of on three lines like the reset! example. If two threads are doing swap!, there are two possible interleavings.

  1. A: (swap! state assoc :x x)
  2. B: (swap! state assoc :x x)

And

  1. B: (swap! state assoc :x x)
  2. A: (swap! state assoc :x x)

These are usually what you want. If only one or neither one works, atom is not the right construct for you.

So there you go. Atomic mutable state with immutable values gives you a nice, composable concurrency semantics. You could do it with locks but it's harder to ensure you're doing it correctly. It's slightly higher-level than locks yet it provides tremendous value. Atoms are easier to reason about and less prone to errors.

If you'd like to learn the basics of Clojure, I recommend my video course called LispCast Introduction to Clojure. I don't go over concurrency, but you will learn lots of functional programming. Go check out the description to see if it's right for you.