Some thoughts on map, filter, and reduce
This is an episode of Thoughts on Functional Programming, a podcast by Eric Normand.
Are map, filter, and reduce popular for a reason? Do these things capture some essence of iteration? Are they just better for loops?
Eric Normand: What's the deal with map, filter, and reduce? Why do we use them so much in functional programming? Why do I call them the three basic functional tools?
I have mentored people over the years. I'm always surprised by how far people can get without them when they are using a functional language. They get stuff done. Then I'm surprised that people that I think are at a really high level of functional programming have not adopted these things.
I think that they are a gateway. They're a milestone on the way to becoming a better functional programmer. I wanted to talk about them a little bit more deeply than what they do and stuff like that.
Functional programmers use map, filter, and reduce. One reason we do that is we have a lot of data. We have a lot of sequences of data that we are passing around and manipulating.
They do form a really nice way of doing pipelines, data transformation pipelines. You have data in one format coming into the pipeline, through a series of steps, it gets transformed one step at a time until it gets to the last step. Once it comes out the last step, it's the data you need.
This is a standard functional programming pattern. Data transformation pipelines. Each of those pipelines, usually, is a map, a filter, or a reduce.
Map transforms one sequence into another sequence. Same number of elements, just each of the items from the first sequence has a function applied to it and it becomes an item in the output sequence.
Filter, depending on how you look at it, will keep or remove the items you want or don't want. It does both. It's keeping some and removing others. You could filter for evens — so you can have all these numbers coming in and then coming out — only even numbers are coming out. The odd ones are skipped.
Reduce is all about taking this sequence and reducing it down into some aggregate value. If you wanted to find the sum — the sum is an aggregate — the sum of some numbers, you can reduce plus over them.
You reduce the plus function over the sequence of numbers. You start at zero because that's the identity for plus. We learned that in algebra class.
Zero's where you start counting stuff. You reduce that down into an aggregate, which is the sum. You can do the same thing for the product. It's also an aggregate.
You start with one. You reduce a bunch of numbers using times. Then you have the aggregate, which is the product.
You can reduce adding stuff to a different collection. You want to add stuff to a set. That's your aggregate, the set. It starts empty and you just add one thing at a time until you've got the whole thing.
We use these because they allow us to talk about the operations on the items instead of operations over the whole thing. We're separating out the fact that it's working on a collection from how we're operating on the items.
It's a simple thing. Let's just pipeline. We can say first, square all the numbers. We map the square function over the numbers.
Then we're going to filter for evens. Now we only want the evens. We're going to look up those numbers in a database. We're getting new data back for each number as the ID. We're going to map the age attribute over all those things because they were people and they have ages.
Now we have the list of ages. Now we're going to reduce and average them. That's our aggregate, the average. That's a typical pipeline, data transformation pipeline.
Notice you probably could have done all that in a single for loop. You could have done each step as a single for loop as well. That's the origin of map, filter, and reduce.
People noticed that, "Well, I could do this for loop where I have to manage an index variable and increment data and remember to exit the loop when I'm off the end of the list."
You could do that, or you could say, "Hey. There's this pattern I do all the time where I'm creating a new list that's like the old list except each of the elements is modified." That's a map.
Before my for loop I'm initializing this empty list. In the for loop I'm adding a thing to the list each time and then I'm returning it or I'm using it in the next step.
What functional programmers have noticed is that there are only a handful of useful applications of for loops. Let me put that in another way. It's not that there's only a handful, it's that this handful comes up almost all the time.
We're talking about 80, 90 percent of the time. You still need a for loop for some weird cases but, it's not clear that you're not in one of those weird cases.
When you're doing these for loops, there's a danger with for loops that you can do anything with them, when there's only one thing you really want to do. That's, say map, or you want to filter, you want to eliminate some of the items from it.
That's what happened. Overtime, people have realized it's safer, easier, clearer, more convenient, to use these map, filter, and reduce than to use for loops.
There's some conditions to using map, filter, and reduce. They are higher order functions. Meaning they take functions as arguments. Which means you need first-class functions. It's a prerequisite for having map, filter, and reduce in your language.
In my perspective, they're just better for loops. They're better applications of for loops. You still need a for loop for those other weird cases. In the cases where map, filter, and reduce are used, they're superior in almost every way.
I would say that one thing to increase you skill with functional programming, the tactical skills of functional programming. How to actually take this problem and code it up in a functional way. Learning map, filter, and reduce, try to apply it in new ways.
I remember when I first started doing Clojure. Maybe in my first year or two doing Clojure. I didn't use reduce so much. I was used to other languages. I did map and filter. Those I used quite a bit.
Because Clojure's immutable data structures really encourage it, I started using reduce a lot more. I realized there were a million places where it could be used that I didn't think of before. I would have used a for loop or I would have written it recursively. I would do my own recursion.
It's so much better to use reduce than to do a for loop or a recursion. It just works better. You don't have to worry about tail calls and all that stuff. It just works better.
You still need a general purpose recursion but when you can, use a reduce. That's my recommendation. Go through your code and whenever you're operating on a collection think, "Is this a reduce that I'm doing?"
Map and filter are pretty clear. Reduce is the one that I think takes a little more pause and thought. Once you make it into a reduce it's so much easier, so much simpler.
My name is Eric Normand. I would love to hear your stories about applying that filter and reduce. Specifically reduce because that one always gets interesting. Do you like using reduce? Do you prefer something else? Do you like nested reduces? Please say no, please say no.
My name is Eric Normand. You can find me on Twitter @ericnormand. Please follow me. You will learn about all these thoughts that I have had on functional programming. I share them all over the place on Twitter.
You can ask me questions, tell me your opinions, get into a heated yet friendly debate over email at firstname.lastname@example.org. Please do. I love getting into discussions.
Finally, I'm starting to get onto LinkedIn. You can find me there. LinkedIn. I don't know what the URL is. Just search for Eric Normand Clojure. You'll find me. I'm on there.
Follow me. I'll try to follow you back. I share the same kinds of ideas. More business centric. If you want your business to get into Clojure. If you think the company you work for needs to learn a little bit more about Clojure, hit like. Hit thumbs up. Share it with your team.
Awesome. See you later. Bye.