All about clojure.set

Written by Eric Normand. Published: August 1, 2019.

Want the best way to learn Clojure?

Invest in yourself with my Beginner Clojure Signature Course.

8 fundamental modules
240 fun lessons
42 hours of video

Beginner Clojure: An Eric Normand Signature Course

Summary: clojure.set is part of the standard library that comes with Clojure. It has functions for doing set operations and relational algebra.

Clojure comes with a namespace called clojure.set in the standard library. It's something I turn to all the time, even though it's not so big. The namespace is fundamentally about standard operations on sets. I want to give you a quick tour of the library.

Including the library

The library comes standard, so there's no extra dependency you have to add to your project. However, you will have to add a :require statement to your ns form at the top of namespaces where you want to use it. It's typically aliased to set.

    (ns com.lispcast.my-ns
      (:require [clojure.set :as set]))

Set-theoretic operations

When I think about sets, I think back to my math classes where we learned to do certain set operations like union, intersection, and difference. These operations are all available in clojure.set and they act like your math teacher would expect.

(set/union a b) is the set containing all elements from both a and b.

(set/intersection a b) is the set containing only elements that are in a and b.

(set/difference a b) is a set containing things that are in a but not in b.

(set/subset? a b) is true if b has all the elements in a.

(set/superset? a b) is true if a has all the elements in b.

Warning: these operations assume all the arguments are sets. They don't check the types.

A few map operations

It may seem weird to put these in with the set operations, but these map operations were useful for implementing the operations in the next section. They're still public and handy when you need them.

(set/rename-keys {:x 1 :y 2} {:x :a}) is a map where the key :x is swapped for :a, namely {:a 1 :y 2}.

(set/map-invert {:x 1 :y 2}) is a map where the keys and values are swapped, namely {1 :x 2 :y}.

Relational algebra operations

Relational algebra is what gave relational databases their name. If you imagine SQL tables are sets of records, you can see why these operations belong with other set operations. clojure.set has a complete set of the relational algebra basic operations. I don't use these nearly as often as I should, even though I know they're there. In clojure.set operations, relations are sets of maps, all with the same keys.

(set/select (fn [row] (>= (:age row) 18)) people) is a set of all people 18 or over.

(set/join movie-appearances actors) is the natural join between movie-appearances and actors. Imagine if movie-appearances was a set of maps that looked like {:actor-name "Mark Hamill" :movie-name "Star Wars" :character-name "Luke Skywalker" ...} and actors was a map like {:actor-name "Mark Hamill" :nationality "USA" ...}. Well, to find the natural join, you union maps that have the same value for keys that are shared. Here, :actor-name is shared between both relations. This join would contain the map

    {:actor-name "Mark Hamill"
     :movie-name "Star Wars"
     :character-name "Luke Skywalker"
     :nationality "USA"
     ...}

Sometimes, though, the keys aren't named right, or you just want to do a subset of the keys. In that case, you can pass in a map of key to key correspondences.

(set/project movie-appearances [:actor-name :character-name]) is a relation without all the keys that aren't in that list. Remember, it's still a set (no duplicates) so it may have fewer tuples.

(set/rename movie-appearances {:movie-name :movie-title}) is the same relation but with :movie-name changed to :movie-title.

(set/index movie-appearances [:actor-name]) is a way to build an index of who was in what movie, looked up by the actor's name.

Conclusions

Clojure comes with the standard set operations you're used to, plus some useful relational algebra operations if you're feeling frisky. Many for loops are actually set operations in disguise! And how many complex sprograms could be rewritten as a few relational algebra operations? We'll never know. But I bet you can find some uses for these operations in your code!