Why does Clojure's conj add to the end of vectors but the beginning of lists?

Written by Eric Normand. Published: August 1, 2019.

Want the best way to learn Clojure?

Invest in yourself with my Beginner Clojure Signature Course.

8 fundamental modules
240 fun lessons
42 hours of video

Beginner Clojure: An Eric Normand Signature Course

Summary: conj can be confusing if you're used to other languages. It is not a commonly defined operation. Most languages define common positional adding operations. conj, however, has more useful semantics. How do you use conj usefully if you can't guarantee the position?

When I was at university, we were taught object-oriented programming as it exists in Java. We learned about interfaces, inheritance, and the Liskov Substitution Principle. It makes sense. If you're claiming that you've got a (sub)type of car, it still has to do everything a car can do. Otherwise it's not really a subtype of car.

The point of confusion

Whenever I'm teaching someone Clojure, there's a point in the journey where everyone gets at least curious, if not outright confused.

    (conj '(1 2 3) 4) ;;=> '(4 1 2 3)

    (conj [1 2 3] 4) ;;=> [1 2 3 4]

What's the deal? Does conj add to the beginning or the end? What possible contract could allow both of these behaviors?

Then I show them that the confusion goes even deeper:

    (conj #{1 2 3} 4) ;;=> #{1 4 2 3} ;; or some other random order

and

    (conj {1 2 3 4} [5 6]) ;;=> {1 2 3 4 5 6}

What gives? How is this even possible? Do hashmaps and linked lists even share a common class ancestor?

The answer is, sure, if you want them to. The student must get over a tiny conceptual bump. And once that bump is surmounted, poof, a new way of seeing is discovered.

The bump

Let's get over that bump right here, right now.

Let's imagine a traditional collection interface, let's call it List. It has two methods, addFirst and addLast that add new elements. So you write an algorithm that adds a bunch of items to the end with addLast. It takes a List as argument, because that's the least subtype you need to perform the algorithm.

You call that algorithm with an ArrayList, which has the nice property that addLast is constant time. Woohoo! Your algorithm is fast and great.

A few months later, you get a phone call from another developer. He's complaining that he used your routine and can't figure out why it's so slow. It was working fine for a while, but as the users generated more records in the database, the routine was grinding to a hault.

You check out the code and immediately see the problem: the database query was returning not an ArrayList but a LinkedList. The implementation of addLast on LinkedLists is actually linear. Adding a bunch of stuff to the end was turning into a quadratic operation.

Let's say that again: even though the location semantics of the operation were the same, addLast on one had constant time and on the other had linear time. They both gave equivalent lists, but one of them was too slow. Does this satisfy the Liskov Substitution Principle? In practice, can you really substitute one for the other? Algorithmic complexity matters.

Clojure avoids that mess (while swapping it for another, which I'll get to shortly). It defines conj, which means not "put this at the beginning" or "put this at the end", but "hey, collection, you know yourself better than I ever can. Please add this wherever it makes sense for you as long as you do it in constant time. Thanks."^1

Practically, that means that conj on LinkedList adds to the front, because that's constant time. And conj on ArrayList adds to the end. But, because the operation doesn't talk about order, like addFirst and addLast do, you can now extend conj to Set and even Map if you consider key/value pairs as single items. And that means that linear algorithms using conj will remain linear regardless of which collection you use.

The mess that Clojure chooses over the other mess

Does this satisfy the Liskov Substitution Principle? Well, that depends on how you look at it. You certainly don't guarantee that you get the same or even equivalent answers out. Consider this:

    (def a [1 2 3])
    (def b '(1 2 3))

    (= a b) ;;=> true
    (= (conj a 4) (conj b 4)) ;;=> false

So, here, performing the same operation on two equal values does not give equal results. That's kind of hard to reason about. But it's a similar tradeoff that you see with other operations that don't guarantee order. For instance, imagine two sets a and b.^2

    (= a b) ;;=> true

    (= (seq a) (seq b)) ;;=> could be false!

The order of most sets is not guaranteed! This means that Clojure has some operations that do not maintain equality. conj just happens to be one of them.

What's the point?

So Clojure does not provide add operations that guarantee order regardless of collection type. Fine. What's the point?

The point is that, in practice, conj is more useful than addFirst and addLast combined. By defining a function using conj, it will work on a broader number of collections. It might give different answers for each, but it won't explode on one and do fine on the rest. And often the answers it gives are just fine. A basic version of into can be defined very easily. It works on all collections (for both to and from).

    (defn into [to from]
      (reduce conj to from))

Common usage

One last thing before I wrap up: because the collection itself defines where the item will be added, I often find myself choosing the collection based on where I need it. A common idiom in Common Lisp was to make a new list by consing onto the front, then reversing it at the end because you really wanted them in the other order. In Clojure, there's no need, because you can just use a vector (and use conj). As long as the vector is local to the algorithm, it's not part of the contract, so it's your choice.

Conclusion

Java was wrong. addFirst and addLast cannot be substituted in LinkedList and ArrayList. They have different algorithmic complexities and at some point one's performance will be totally unacceptable. The operation that does allow for substitutability in algorithm complexity is conj, which is always constant time. But then it doesn't maintain equality. However, I find that conj is way more natural and helps algorithmic reasoning more than guaranteeing where the item is placed.

If you'd like to learn Clojure, I recommend my video course LispCast Introduction to Clojure. It's a great introduction to the language using animations, exercises, and screencasts. It's designed to give a deep dive straight to what makes Clojure interesting. It begins with syntax, goes through functional programming, and ends with data-driven programming.

It's even polite!
(def a (set (range 100))) (def b (apply sorted-set (range 100)))