Pre-West Interview: Leon Barrett

Talk: Clojure Parallelism: Beyond Futures

Leon Barrett's talk at Clojure/West is about parallelism in Clojure.


Clojure is well known for its parallel programming super powers. Immutable data structures, concurrency primitives, and a few convenient constructs like future and pmap have been there since the beginning. But what's even cooler is how people have been able to build on the strong foundation Clojure established to create new parallel abstractions. Leon Barrett will talk about some of these. The description mentions reducers, tesser, and claypoole.

Rich Hickey gave a talk about reducers back in 2012, focusing on the ideas and abstractions they are based on. A more practical talk was given by Renzo Borgatti at Strange Loop 2013. Kyle Kingsbury gave a talk about tesser, a library which extends Clojure's parallel abstractions to execute in a distributed manner. And Leon Barrett himself wrote a recent blog post about Claypoole.

About Leon Barrett

Homepage - GitHub - Google+


Leon Barrett was gracious enough to agree to an interview. He is giving a talk at Clojure/West about parallel programming in Clojure. The background to his talk is available, if you like.

Interview with Leon Barrett How did you get into Clojure?

Leon Barrett: Actually, I wasn't even aware of Clojure until I started at The Climate Corporation 2.5 years ago. I had used Lisp a number of times in school, so it wasn't too foreign, but I'd never imagined that I'd write Lisp for a living. Clojure is great for shared-memory parallelism. Is it good for distributed programming?

LB: It can be; the same set of abstractions that are helpful locally can be helpful when distributed. For instance, there are some nice Hadoop MapReduce tools (though I happen to dislike Cascalog---it doesn't mesh with my mental model of mapreduce). Of course, you have to do some extra work to support distributed computing, but all in all it's relatively pleasant. What are your preferred distributed Clojure abstractions?

LB: You know, I don't feel qualified to opine on this too much. I mostly end up working on tasks that are fairly embarrassingly parallel, and I spend more of my time worrying about single-machine parallelism (hence my need for Claypoole). The core idea in the better Clojure distribution work I've done was, for both Storm and Hadoop, to just write simple, functional data transformations and then let the distribution framework worry about everything else. Where should someone get started with distributed programming in Clojure?

LB: I think both Parkour and Storm are very nice, though I had some issues using Parkour on Amazon's EMR. Just working with those, it's pretty easy to write distributed tools without worrying about the hard distributed parts (data movement, reliability, etc) yourself. What makes Clojure great for parallel programming?

LB: Clojure is great for parallel programming because of three things: Immutability, good core libraries, and macros.

First, the bane of parallel programming is mutable state; state makes parallel programs much harder to reason about. While it's possible to avoid shared state by writing functional programs with immutable data even in other languages, it's much easier in Clojure because all the standard libraries support it, so Clojure is easier to do parallelism with from the very beginning. Also, Clojure's parallelism features, such as future and deref, are well-designed and easy to use, making it very easy to get started with parallelism. Finally, macros make it possible to write parallel things without so much fuss; for instance, Clojure's futures (built with macros) are easier to use than Java's futures because they don't require any boilerplate. Similarly, the library core.async uses macros to add amazing parallel and asynchronous features as a library, whereas in other languages such features would need to be designed into the core language. Can you briefly explain Claypoole? What does it do? Why did you write it?

LB: I wrote Claypoole because I was dissatisfied with Clojure's built-in pmap for several reasons, including 2 in particular: First, I wanted to control the degree of parallelism, but core pmap's parallelism is determined only by the number of CPUs. Second, I wanted my pmaps to get things done as fast as possible, but core pmap is lazy and may be inefficient when the individual tasks have a high variance in duration.

Claypoole's core feature is a pmap that meets my demands---it's eager, and I can control the degree of parallelism, even across multiple simultaenous pmaps. As a bonus, it turns out that once one has good control over sharing threadpools in pmaps, it's easy to add other such features, so Claypoole also does a number of related, handy things, such as parallel for (pfor), unordered pmaps that return results in the order they're completed (upmap), and so on. I see Claypoole as a tool to provide an advanced degree of control over parallelism. Those sound really handy! Do you use reducers in your work? Where have you found them most helpful?

LB: I don't actually use reducers a lot. I used them while working with Parkour, and they were very cute there. However, in my own work I've tended to prefer Claypoole. Reducers is good because it has much less overhead than chained maps, which is great for functionally combining small tasks. However, I find that I tend map bigger tasks, so I benefit more from fine-grained control of parallelism in Claypoole than I would from the efficiency of reducers. Do you have any good background materials for people who want to do a little pre-reading/watching?

LB: Nope. My talk will start with some basics to make sure that everyone's on the same page, and my initial test audiences seemed to indicate that my intro works well for both beginners and more advanced programmers. How can people help with Claypoole?

LB: I love pull requests, and I appreciate bug reports (obviously, repeatable test cases make my life easier). But mostly, my goal with open-sourcing Claypoole was to have the community do this stuff right once, rather than having lots of people making partial reimplementations that work on just their case. So, just use it! I want people to use Claypoole rather than reimplementing it. Where can people follow you online?

LB: I'm not terribly vocal online. I guess I post to the Climate Corp engineering blog every so often. If Clojure were a food, what food would it be?

LB: One could claim that it'd be Cajun blackened catfish, because it's lean and full of flavor.