Why does Clojure's conj add to the end of vectors but the beginning of lists?
Summary: conj
can be confusing if you're used to other languages. It
is not a commonly defined operation. Most languages define common
positional adding operations. conj
, however, has more useful
semantics. How do you use conj
usefully if you can't guarantee the
position?
When I was at university, we were taught object-oriented programming as it exists in Java. We learned about interfaces, inheritance, and the Liskov Substitution Principle. It makes sense. If you're claiming that you've got a (sub)type of car, it still has to do everything a car can do. Otherwise it's not really a subtype of car.
The point of confusion
Whenever I'm teaching someone Clojure, there's a point in the journey where everyone gets at least curious, if not outright confused.
(conj '(1 2 3) 4) ;;=> '(4 1 2 3)
vs
(conj [1 2 3] 4) ;;=> [1 2 3 4]
What's the deal? Does conj
add to the beginning or the end? What
possible contract could allow both of these behaviors?
Then I show them that the confusion goes even deeper:
(conj #{1 2 3} 4) ;;=> #{1 4 2 3} ;; or some other random order
and
(conj {1 2 3 4} [5 6]) ;;=> {1 2 3 4 5 6}
What gives? How is this even possible? Do hashmaps and linked lists even share a common class ancestor?
The answer is, sure, if you want them to. The student must get over a tiny conceptual bump. And once that bump is surmounted, poof, a new way of seeing is discovered.
The bump
Let's get over that bump right here, right now.
Let's imagine a traditional collection interface, let's call it List
.
It has two methods, addFirst
and addLast
that add new elements. So
you write an algorithm that adds a bunch of items to the end with
addLast
. It takes a List
as argument, because that's the least
subtype you need to perform the algorithm.
You call that algorithm with an ArrayList
, which has the nice property
that addLast
is constant time. Woohoo! Your algorithm is fast and
great.
A few months later, you get a phone call from another developer. He's complaining that he used your routine and can't figure out why it's so slow. It was working fine for a while, but as the users generated more records in the database, the routine was grinding to a hault.
You check out the code and immediately see the problem: the database
query was returning not an ArrayList
but a LinkedList
. The
implementation of addLast
on LinkedList
s is actually linear.
Adding a bunch of stuff to the end was turning into a quadratic
operation.
Let's say that again: even though the location
semantics of the operation were
the same, addLast
on one had constant time and on the other had linear
time. They both gave equivalent lists, but one of them was too slow.
Does this satisfy the Liskov Substitution Principle? In practice,
can you really substitute one for the other? Algorithmic complexity
matters.
Clojure avoids that mess (while swapping it for another, which I'll get
to shortly). It defines conj
, which means not "put this at the
beginning" or "put this at the end", but "hey, collection, you know
yourself better than I ever can. Please add this wherever it makes sense
for you as long as you do it in constant time. Thanks."^1
Practically, that means that conj
on LinkedList
adds to the front,
because that's constant time. And conj
on ArrayList
adds to the end.
But, because the operation doesn't talk about order, like addFirst
and
addLast
do, you can now extend conj
to Set
and even Map
if you
consider key/value pairs as single items. And that means that linear
algorithms using conj
will remain linear regardless of which
collection you use.
The mess that Clojure chooses over the other mess
Does this satisfy the Liskov Substitution Principle? Well, that depends on how you look at it. You certainly don't guarantee that you get the same or even equivalent answers out. Consider this:
(def a [1 2 3])
(def b '(1 2 3))
(= a b) ;;=> true
(= (conj a 4) (conj b 4)) ;;=> false
So, here, performing the same operation on two equal values does not
give equal results. That's kind of hard to reason about. But it's a
similar tradeoff that you see with other operations that don't guarantee
order. For instance, imagine two sets a
and b
.^2
(= a b) ;;=> true
(= (seq a) (seq b)) ;;=> could be false!
The order of most sets is not guaranteed! This means that Clojure has
some operations that do not maintain equality. conj
just happens to be
one of them.
What's the point?
So Clojure does not provide add
operations that guarantee order
regardless of collection type. Fine. What's the point?
The point is that, in practice, conj
is more useful than addFirst
and addLast
combined. By defining a function using conj
, it will
work on a broader number of collections. It might give different
answers for each, but it won't explode on one and do fine on the rest.
And often the answers it gives are just fine. A basic version of
into
can be defined very easily. It works on all
collections (for both to
and from
).
(defn into [to from]
(reduce conj to from))
Common usage
One last thing before I wrap up: because the collection itself defines
where the item will be added, I often find myself choosing the
collection based on where I need it. A common idiom in Common Lisp was
to make a new list by cons
ing onto the front, then reversing it at the
end because you really wanted them in the other order. In Clojure,
there's no need, because you can just use a
vector (and use conj
). As long
as the vector is local to the algorithm, it's not part of the contract,
so it's your choice.
Conclusion
Java was wrong. addFirst
and addLast
cannot be substituted in
LinkedList
and ArrayList
. They have different algorithmic
complexities and at some point one's performance will be totally
unacceptable. The operation that does allow for substitutability in
algorithm complexity is conj
, which is always constant time. But then
it doesn't maintain equality. However, I find that conj
is way more
natural and helps algorithmic reasoning more than guaranteeing where
the item is placed.
If you'd like to learn Clojure, I recommend my video course LispCast Introduction to Clojure. It's a great introduction to the language using animations, exercises, and screencasts. It's designed to give a deep dive straight to what makes Clojure interesting. It begins with syntax, goes through functional programming, and ends with data-driven programming.
- It's even polite!
(def a (set (range 100))) (def b (apply sorted-set (range 100)))