All about clojure.set
Summary: clojure.set
is part of the standard library that comes with
Clojure. It has functions for doing set operations and relational
algebra.
Clojure comes with a namespace called clojure.set
in the standard
library. It's something I turn to all the time, even though it's not so
big. The namespace is fundamentally about standard operations on sets. I
want to give you a quick tour of the library.
Including the library
The library comes standard, so there's no extra dependency you have to
add to your project. However, you will have to add a :require
statement to your ns
form at the top of namespaces where you want to
use it. It's typically aliased to set
.
(ns com.lispcast.my-ns
(:require [clojure.set :as set]))
Set-theoretic operations
When I think about sets, I think back to my math classes where we
learned to do certain set operations like union, intersection, and
difference. These operations are all available in clojure.set
and
they act like your math teacher would expect.
(set/union a b)
is the set containing all elements from both a
and
b
.
(set/intersection a b)
is the set containing only elements that are in
a
and b
.
(set/difference a b)
is a set containing things that are in a
but
not in b
.
(set/subset? a b)
is true if b
has all the elements in a
.
(set/superset? a b)
is true if a
has all the elements in b
.
Warning: these operations assume all the arguments are sets. They don't check the types.
A few map operations
It may seem weird to put these in with the set operations, but these map operations were useful for implementing the operations in the next section. They're still public and handy when you need them.
(set/rename-keys {:x 1 :y 2} {:x :a})
is a map where the key :x
is
swapped for :a
, namely {:a 1 :y 2}
.
(set/map-invert {:x 1 :y 2})
is a map where the keys and values are
swapped, namely {1 :x 2 :y}
.
Relational algebra operations
Relational algebra is what gave relational databases their name. If you
imagine SQL tables are sets of records, you can see why these
operations belong with other set operations. clojure.set
has a
complete set of the relational algebra basic operations. I don't use
these nearly as often as I should, even though I know they're there. In
clojure.set
operations, relations are sets of maps, all with the
same keys.
(set/select (fn [row] (>= (:age row) 18)) people)
is a set of all
people 18 or over.
(set/join movie-appearances actors)
is the natural join between
movie-appearances
and actors
. Imagine if movie-appearances
was a
set of maps that looked like
{:actor-name "Mark Hamill" :movie-name "Star Wars" :character-name "Luke Skywalker" ...}
and actors
was a map like
{:actor-name "Mark Hamill" :nationality "USA" ...}
. Well, to find the
natural join, you union maps that have the same value for keys that are
shared. Here, :actor-name
is shared between both relations. This join
would contain the map
{:actor-name "Mark Hamill"
:movie-name "Star Wars"
:character-name "Luke Skywalker"
:nationality "USA"
...}
Sometimes, though, the keys aren't named right, or you just want to do a subset of the keys. In that case, you can pass in a map of key to key correspondences.
(set/project movie-appearances [:actor-name :character-name])
is a
relation without all the keys that aren't in that list. Remember, it's
still a set (no duplicates) so it may have fewer
tuples.
(set/rename movie-appearances {:movie-name :movie-title})
is the same
relation but with :movie-name
changed to :movie-title
.
(set/index movie-appearances [:actor-name])
is a way to build an
index of who was in what movie,
looked up by the actor's name.
Conclusions
Clojure comes with the standard set operations you're used to, plus some
useful relational algebra operations if you're feeling frisky. Many
for
loops are actually set operations in disguise! And how many
complex sprograms could be rewritten as a few relational algebra
operations? We'll never know. But I bet you can find some uses for these
operations in your code!