The 100 Most Used Clojure Expressions
Sign up for weekly Clojure tips, software design, and a Clojure coding challenge.
Summary: Would you like to optimize your learning of Clojure? Would you like to focus on learning only the most useful parts of the language first? Take this lesson from second language learning: learn the expressions in order of frequency of use.
When I was learning Spanish, I liked to use Anki to drill new vocabulary. It's a flashcard program. I found that someone had made a set of cards from an analysis of thousands of newspapers. They read in all of the words from the newspapers, counted them up, and figured out what the most common words were. The top 1000 made it into the deck.
It turns out that this is a very good strategy for learning words. Word frequency follows a hockey stick distribution. The most common words are used so much more than the less common words. For instance, the 100 most common English words make up more than 50% of text. If you've got limited time, you should learn those most common words first.
People who are trying to learn Clojure have been asking me "how do I learn all of this stuff? There's so much!" It's a valid question and I haven't had a good answer. I remembered the Spanish newspaper analysis and I thought I'd try to do a similar analysis of Clojure expressions.
Clojars provides a list of all of their
Many of them include a git repo. So I got that list and fetched the git
repo. There were over seven thousand git repos pointed to by Clojars.
That was a lot of stuff, and git repos have a lot of data I wasn't
interested in, so I filtered all the files looking for Clojure files.
/\.clj(s|c|x)?$/ That left 59,665 Clojure source files with over four
million lines total.
I read in all of the code in the files and parse it as Clojure. Then I walked it, looking for lists. I counted the head of all lists in each file, aggregating it all together. Then I sorted by frequency. I also only counted symbols in the head position.
Just a few warnings before I continue. This is a very informal study. I didn't try to remove duplicate files. Some files failed to parse. And this obviously isn't all Clojure code on GitHub. It even misses quite a few important projects because they're not in Clojars. But, it's a large dataset and probably good enough for figuring out relative frequencies of expressions to use for learning.
Here are successive graphs zooming in on the word frequencies. In each, I've ordered the expressions by frequency and plotted the frequency. Notice the nice long tail curve.
Top 1000 symbols ordered most frequent first
Top 100 symbols ordered most frequent first
Top 25 symbols ordered most frequent first
This means that learning to read and write expressions at the left end
of the graph will have an outsized impact on being able to read and
write code in general. Even the most frequent 25 expressions are
vastly more common than the rest. But looking at the expressions by
hand, I think if you learned about 200 of them, you'd cover most code
easily. Between 100 and 200, lots of third party library functions and
lots of common variable names (
f for function) start to show up.
Either way, you should start at the left hand side and work your way
right. Most of what you would learn in LispCast Introduction to
is included in the top 100.
I think the results show that learning the most common 100 expressions will have an outsized usefulness. If you're interested in learning Clojure, you should begin with those. I've created flashcards in PDF (with grid, without grid) form that you can print, and also a CSV version that you can import into Anki for spaced repetition.