Clojure Gazette 184: The Onslaught of States
The Onslaught of States
Issue 184 - August 01, 2016
Computers have given us the ability to encode information in a very
elegant and reductive way. We use the smallest unit that can carry
information --- the bit --- and by turning it on or off, we have two
possible states. Putting multiple bits in an order gives us a way to
multiply their states. One bit is two states. Two bits have four states.
Four bits have sixteen states. And thirty-two bits have 4,294,967,296
states. The number of possible states grows exponentially with the
number of bits.
b bits have
2b possible states.
How many bits of RAM does your computer have? Mine has 8 gigabytes, which is 64 billion bits. How many possible states is that? My calculator says: "Error". That means it's a lot :) If you add one more bit, it doubles. Joe Armstrong gave a great talk about the difficulties of dealing with such a huge amount of state.
And so we find ourselves with a problem: we've got this awesome power of exponentially growing possible states. The vast majority of those states are worthless, just like most random sequences of letters are not meaningful sentences --- let alone true sentences or answers to your questions. Our job as programmers is to keep this exponential monster load of bits set correctly. If we can't set it to the exact right answer then at least to a meaningful answer.
We've got to tame that exponentiation somehow. Luckily, exponents have
an inverse operation called logarithm. If we can use logarithms, we can
"undo" the exponentiation and get some control over it. There is a
mathematical structure that can come to the rescue: the tree. If you add
n nodes to a tree, the depth of the tree grows at
log n. If you
store data in the nodes, you can access it logarithmically by walking
down the tree. Hence, the concept of the "trie" was invented in 1959 by
René de la Briandais and
later named by Edward
Fredkin . Tries get
my vote for the third most important idea in computer science.
"Trie" stands for reTRIEval. Tries are a way to store data in a tree structure with an index. The index determines the path into the tree that you need to follow to find your data. They give us a way to manage large amounts of memory efficiently. Flat arrays are natural to work with in memory. They're just contiguous blocks. It's easy and fast to access any element by offset. You can say "give me the 1001th element" and get it right away.
But offsets are rarely the key that you want to look something up by. You want to say "Find me the record for Jane Smith." Her name tells you the path to take. Go to the "S" drawer, skip to the "M" section, then riffle through the small number of records in that section until you find Smith, Jane. Of the thousands of records, you can get to the record you need quickly. This is the power of tries.
Clojure's famous persistent collections use tries for their
implementations. A Clojure vector is a trie. Vectors are indexed by a
signed 32-bit integer, so they can store
231 elements (one bit is
reserved for the sign +/-). Because the branching factor of the trie is
32, that means accessing any element at worst requires
log32 231 = 6.2
steps in the path. Because you can't have a partial step, the worst case
is 7 steps. Clojure's hash maps use the hash code of the key --- 32 bits
--- to follow a path down the branches of the tree. You've got the same
maximum number of steps: 7.
And then what do we do when we get Jane's record? We look up the page we
want. Then we look up the section and field we want. We can use paths
all the way up and all the way down. And so we find ourselves building
trees out of our tries. We use
get-in to walk paths. And that way, we
can easily manage even more data.
In Clojure you can look up any of 4 billion values with 7 memory fetches. Tries can theoretically get bigger, too, with bigger indexes (64-bit) and higher branching factors. Trees (and tries) let us manage huge amounts of data for little cost. They give us a tool to deal with the exponentiation of the state space. And they're baked into Clojure. Two of the three biggest ideas in computer science at the heart of Clojure (Universal Turing Machine and Tries). That's not bad. But what if it had the third (building reliable systems from unreliable parts)? Well, a person can dream :)