Atom code explanation
Summary: I go over a real-world example of how atoms and immutable values allow you to compose constructs in ways that are easy to reason about and less prone to error.
The other day I was in IRC #clojure and someone asked a good question. They had code like the following, and they couldn't understand why they couldn't modify a map.
(def state (atom {}))
(doseq [x [1 2 3]]
(assoc @state :x x))
(println @state)
What does this print? Well, the asker wanted it to print {:x 3}
. But
it printed {}
. To understand what's happening, let's go step by
step.
{}
creates an empty map. It's literal syntax for a constructor for a
map. This one happens to be empty.
(atom {})
takes the empty map that was just created and passes it to
the function
atom
, which
constructs a new clojure.lang.Atom
.
Atoms are objects, and its current
state is the empty map we just passed in.
(def state (atom {}))
defines a new var called state
in the current
namespace.
At this point, we've got a variable called state
whose value is an
atom that holds an empty map.
(doseq [x [1 2 3]]
loops over the numbers 1, 2, and 3. x
will be
bound to each of those numbers, in turn.
@state
gets transformed into (deref state)
, which returns the
current value of state
. :x
is a literal keyword, and x
is a
reference to the x
bound inside the loop.
(assoc @state :x x)
creates a new map by taking the current value of
state
(which happens to be {}
) and associating :x
with x
(which
will be 1
, 2
, and 3
as the loop happens). The value is returned by
assoc
, and then
thrown away, since it isn't bound to anything.
Then (println @state)
will print the current value of state
, which
still is {}
.
This code shows a common problem that beginners face in Clojure: how do
immutable data structures (like
maps) and the concurrency primitives (like atom
) work together to
manage state?
The answer is quite simple (in the Rich Hickeyan sense) and elegant. By separating the ideas of value and state, Clojure has made it easy to express precisely the behavior you want in concurrent systems.
The value is the map. It is immutable. It cannot change. It is a single value, and it will always be the same. That means threads can share the value with no worries that one of them will change it.
The state is the atom. It's a mutable object. And being an object, it
has methods that define its interface. In the code above, we saw that
you can call
deref
on an
atom to get its current value. deref
is basically a getter.
The main way to change the value of an atom is using
swap!
. swap!
takes an atom and a function (plus optional arguments) and calls the
function on the current value of the atom. It then sets the value of
the atom to the return value of the function. So let's use that to fix
the code.
(def state (atom {}))
(doseq [x [1 2 3]]
(swap! state assoc :x x))
(println @state)
swap!
takes the atom (state
) and a function (assoc
) and some
arguments (:x x
). It calls assoc
on the current value of state
with those extra arguments and sets the value of the atom to the return
value of the function.
The swap!
expression is almost (but not) the same as this code:
(reset! state (assoc @state :x x)) ;; never do this
reset!
changes
the state of the atom but without regard to the current value. This new
code is bad because it's not thread-safe. Use swap!
if you need to
use the current value to determine the new value.
So what does an atom do? What does it represent?
Atoms guarantee one very important thing: that each state is
calculated from the last state. The swap!
operation is atomic. No
matter how many threads are trying to change the value, each change is
calculated from the previous value and no previous values are lost.
That's its contract as an object and it's one of the important ways that
Clojure helps with concurrency.
How can a value be lost?
If we have two threads, each trying to change state
in the same
incorrect way (using reset!
), the order of evaluation will have
several steps:
(deref state) ;; call this value *1
(assoc *1 :x x) ;; call this value *2
(reset! state *2)
Because the threads are running concurrently, the operations have a chance of interleaving their steps in unwanted ways. For instance, threads A and B might interleave like this:
- A:
(deref state) ;; call this value *1A
- A:
(assoc *1A :x x) ;; call this value *2A
- B:
(deref state) ;; call this value *1B
- B:
(assoc *1B :x x) ;; call this value *2B
- B:
(reset! state *2B)
- A:
(reset! state *1A)
What happened? On line 6, A set the value of state
to the value it
calculated on line 2. So B's work is completely discarded. That's
probably not what was intended. What's worse is that that is one of many
possible interleavings, some of which work and some don't. Welcome to
concurrency!
What you probably wanted was to make sure that no work is discarded.
You want the operation to be atomic. That's why it's called an atom.
swap!
is atomic. A swap!
to an atom occurs "all at once", instead of
on three lines like the reset!
example. If two threads are doing
swap!
, there are two possible interleavings.
- A:
(swap! state assoc :x x)
- B:
(swap! state assoc :x x)
And
- B:
(swap! state assoc :x x)
- A:
(swap! state assoc :x x)
These are usually what you want. If only one or neither one works, atom is not the right construct for you.
So there you go. Atomic mutable state with immutable values gives you a nice, composable concurrency semantics. You could do it with locks but it's harder to ensure you're doing it correctly. It's slightly higher-level than locks yet it provides tremendous value. Atoms are easier to reason about and less prone to errors.
If you'd like to learn the basics of Clojure, I recommend my video course called LispCast Introduction to Clojure. I don't go over concurrency, but you will learn lots of functional programming. Go check out the description to see if it's right for you.