3 Examples of algebraic thinking
This is an episode of Thoughts on Functional Programming, a podcast by Eric Normand.
In a recent episode, I said algebraic thinking was the third level of functional thinking. In this episode, I give some concrete examples.
Eric Normand: Can you give me some examples of algebraic thinking, that third level of functional thinking that you're talking about? I sure can. My name is Eric Normand and I help people thrive with functional programming. I got a question about an episode I did a week or two ago, about the three levels of functional thinking.
The three levels were, the first level is thinking in terms of actions, calculations, and data. Just dealing with the fact that you got a mutable data. You want to separate your actions from your calculations, those kinds of things.
Second level is start thinking in terms of data transformations and the pipelines. You're using map fill to reduce those things to get the data that you have to look like the data that you need in a very high-level, higher or function way.
Three, the third level was what I was calling algebraic thinking. You're creating an algebra to model your domain. I got a question about this in a YouTube comment. It was from P1MPS01. Not to be confused with P1MP02. P1MPS01 asks if I can give some examples so that it's clear what I mean by this algebraic thinking.
I came up with three tiny little examples that I can talk about. That's what we're going to do in this episode. The first example. Imagine you're writing a video editor. I don't know if you've ever edited video, but if you have a GUI editor you can put clips in and overlay the clips and make transitions between things and cut stuff out. You can do all sorts stuff.
Now, I want you to think for a moment how you might implement this. While I'm describing this algebra, you might see how this...There's the technical details of how to actually put the video together, how to render that final video so that you can put it YouTube or whatever you do with your final video.
What the algebraic thinking is about, is about how do you represent that video before you've rendered it? You're representing it in memory in a way that you can manipulate and will also feed into whatever rendering engine you've got.
This is how I would begin to describe an algebra of video editing. We have the notion of a clip. This is a video clip. Now, at heart and the essence of the video clip it's basically just a function. A function from time to image. You give it a time and it gives you an image.
This will allow you to sample the video. This is more implementation details, but it will allow you to sample the video at whatever frame rate you need and get a frame out at whatever time you need for that frame rate.
Then we have to think about, how do we compose these things together? There's really two main ways to compose clips together. One is in sequence. When this one stops playing, we'll start playing that one. The other one is overlay. This is if you want to put some text on top of a video, you can overly that text on top.
You're deferring to the overlay of images. You would ask for the frame at time T from one bit from video A and B is on top of that, overlaid on top. You'd also ask for the frame of time T of B and you would overly the images.
We deferred however our image system works that's how we're going to overlay the images. Overlaying and images is also...you could also think of that as an algebra. We're not going over that. We're just going over these clips.
There's also stuff like trimming. You want to cut a sub-clip from this clip. You can actually break it into two operations. You can trim from the left, and you can trim from the right, and you'll have something left in the middle. With these operations, you can do basically anything you need to do with the video.
There's going to be some things that are done more on the frames of the video. For instance, you could transition, fade to black kind of thing. That would be done at the image level, but you need some way of representing that as part of this algebra. That this is a transformation add-on...
OK. Here's the thing. We're all playing with time here. This is a function of time. The only input you have is time. Then, you have the output. The input you have is time, so when I take clip A and I take clip B, and I sequence them so that B plays after A, I make a new clip.
That clip, when I ask for a frame at a certain time, if it's less than the length of A, it will give me the frame from A. It will ask the function A, "What's your frame?" If it's greater than the length of A, then it'll subtract the length of A from the time and use that time to pass to B, and it will get a frame from that.
This is a very simple composition, a simple way to compose that doesn't require complex math or a real understanding of video formats and things like that. It's simply a very easy implementation. It's either ask A or do a simple subtraction and then, ask B. That allows us to compose these two things in sequence.
We've already seen what happens if I put B on top of A. We're going to ask A for its frame at that time, and we're going to ask B for its frame at that time. Then, we're going to do whatever image overlay we have for those two frames.
Now, you can also see that we're going to have some issues where we have to solve like, "What if I want B to start later than A starts?" I want to overlay B, but I don't want them to start at the same time.
You can make an empty video, an empty clip that returns an image that's just transparent. The overlaying a transparent on top of A is not going to change it, but it makes it start at the right time. You can sequence that empty clip which has a length. You can sequence that empty clip with B, and it'll start at a later time.
You can see we can start to play with these primitives that we've made, and we defined a formal way that they compose. We've got basically everything we need from just these very small pieces.
Let's look at another similar algebra. This is parser combinators. You can define a parser in terms of other parsers, this is known as parser combinators. If you have some very primitive parsers — let's call them parsers — it takes an input string. You could say it's a string. The only thing it does is it looks for one character.
It succeeds if it finds that character. It fails if it doesn't find the character right at the beginning. The first character of that string, it's going to compare. You've got this very primitive, easy to implement operation. Now, how do you use those primitives to make a parser that parses something more complicated?
Well, you have sequence. You say, "We'll have to match A, and then a B, and then a C, and then a D," but the sequence can work with only two parses, so you take a parser that matches A, and you take a parser that matches B, you compose them into a parser that matches A, then B.
You could implement it very simply. You can already see how to do it. First you match A on the string. If it succeeds, you drop the first character from the string, and then you try to match B on that new string. If that succeeds, you're done. If either of them fail, the whole thing fails. Very simple means of combining. That's not enough because you need some choice.
What if I want to match A or B? You need a thing called, alternative. Alt, something like that, that takes an A, and if A fails, then it tries B on the same string. It doesn't remove it but if A does pass, if it succeeds, then it doesn't call the B.
In this way, you can compose up from these very primitive pieces an entire parse tree, a parser that will match certain strings and this is a context-free grammar that you can create.
Notice, it's got a lot of similarities with this video-editing algebra. The video-editing algebra had sequence. This one also has sequence and notice that there are monoids. We went over this in another episode. Monoids are binary operations that take two things of the same type and the thing they return is of that same type, too.
The three pieces, the two inputs and the one output, are all the same type. Take two parsers, you return a parser. You take two clips, you return a clip.
There is a monoid. The other thing about monoids is that the grouping doesn't matter so the order of operations doesn't matter. You should look that up in past episodes. I just did one on the difference between associative and commutative monoids or associative, and they have an identity. The empty clip would be an identity. It doesn't do anything when you sequence.
If you have a clip of zero length and you sequence it with some other clip, it's just going to be the same result as that other clip.
Same with a parser, if you sequence, you could consider a parser that always parses and does not consume input, that's an identity for that sequence operation. Notice when we talk about monoids, that something very useful when you're developing algebras is to find these places where a well-known, well-understood, usually named algebraic concept matches what you're doing.
Monoids are really easy and they're very powerful so I often look for monoids when I'm doing this but there's a lot of other algebraic properties as well. This is where a lot of people find value in category theory when they're programming because a lot of these algebraic properties were developed as part of category theory.
I want to go over one more. This is the third example. This one is not a monoid as far as I can see. It's more like a little state machine but it's modeling an asynchronous value. This is the value stored on a server. Let's say you're on the browser, you're programming the browser, the front end, and there's a value on the server.
You need to get that value, but there's time between when you initiate the request and when the response finally comes back. You want to model that somehow. You want to model that the request is in flight, maybe you need to show that loading spinner. Often, if you do it in a like a jQuery way, you don't really model it as much as rely on a sequence of actions to happen.
You make the request, and then that fires off and while it's still in flight, you modify a DOM element to show a spinner — you make the spinner visible — and then when the request comes back, you hide the spinner and set the value. You do everything in an imperative way.
This works out for a lot of cases where a lot of paths that could happen, but it doesn't completely capture it and it doesn't allow you to reason at a higher level about it. For instance, if you have multiple things you're waiting on now, it's like, "I don't know what to do anymore." What if we modeled this thing as a value? The whole state? You could draw out the timeline.
The value is like uninitialized. It's unrequested yet, it's undefined, unknown, however you want to call it. It's like this initial state where you don't have the value yet, but you haven't even started looking for it yet.
Then you realize you need that value so you're going to transition to a new state, which is like a loading state. Then that request comes back. Now, it either succeeds or it fails.
Let's say that those are the two options. You need to have a state that for each of those that it transitions to, if it succeeds, boom, you can store the value, and state, saying that you got it, you're ready to show the value, you don't need to show loading anymore. Maybe it failed, you might have some other thing we're going to do like, "We're going to retry three times."
You've got to keep track of that state but notice, this is also an algebra because you can look at every state that the value could be in. You could look at every possible event that could happen to it so we can make another request. What's interesting is, you could have these cases that show up in your Web app and you just haven't even thought of.
For instance, what if we make the request but we already have a value? What state do we transition to? Looking at it algebraically, and seeing that there's all these states and all these events, you have to multiply them out and see what to do in each possible case but when you do that, you create a kind of algebra that composes the value with the events that could happen to it.
The events that could happen to it are by initiator request. I get a positive successful response back or I get an error back. These are all possible transitions and you need to have a state that works for any possible transition. Yes, it's a combination of current state and results, it gives you a current state. It's not a monoid because the results or that event is not the same type as the current state.
You might be able to finagle it so that it is the same type and then you have a monoid, but I think it's best not to try that, just leave it as current state, an event gives you the next current state.
When I wrote these out and came up with the examples, it was very clear to me that there was some principles going on here so I want to share those. One principle is that the combining forms are simple. They are very easy to write. Often if you start by modeling, let's say in a classical OO way, you might model the video editor.
You have some operation that's like, add frames to a video...First of all, you are mutating stuff so that's already complex. You get into like weird, corner cases. Like adding to the video is going to change the length, then what if something comes after it? You just have this weird...all these things to keep track of.
Whereas in the algebra way, it's much clearer how things should combine. You create this unified abstraction of clip. It has a length, it's just a function of time to image, to frame. The combinations you can do because it's such a limited abstraction, there's not much you can do with it. That constraints you into coming up with these simple combining forms.
They have a minimum of concepts. Remember with this video editor, we just had clips and then we had a few operations like sequence overlay, left-trim, right-trim. We came up with MD clip. There's a minimum of concepts to put together.
The same with the parser combinators. You start with a basic one-character matcher. You also have a matcher that doesn't consume input but always succeeds. You probably have one that doesn't consume input but always fails. Then you have these operations to combine them.
The final thing is this idea of whole values. To me it's like this mysterious...not mysterious, ineffable principle. It's a very hard to describe what it means. I have a whole episode trying to explore this idea. This is one of John Hughes's ideas. He's a big functional programming...I don't know what you call them, researcher.
He has this notion of whole values which basically means you have everything you need in the value to make a decision, make a computation from it. Often when we're programming, we will separate the values we need into different variables. We'll have, "Oh, this piece here, this piece here, this piece here."
It works great for this one algorithm we've got, but if you put them all together into a single value, then that value could become useful in its own right. It's got all the context it needs around it to make useful decisions to do useful computations.
That's all I got so I'm going to recap. I went over the three levels of thinking. Again, this is just a scheme. I'm coming up with the organize my teachings and stuff. I don't know if it has any real...Only anecdotally do I have any sense that this is real.
That is really how people progress while they're learning functional programming. I got this question in the comment about examples. So the three examples I came up with are a video editing algebra, a parser combinator library and this asynchronous value that's like a state machine.
Then I went over three principles, simple combining form, so we talked about the different forms are very easy to write, a few lines each, a minimum of concepts.
We could have added other interesting concepts to the video algebra. Notice, instead of coming up with like a cut-clip operation, I decided to turn it into two operations. One is trimmed from the left and one is trimmed from the right, and with those two you've broken it down more. Each one is easier to write and so you have a more minimum notion of the pieces that you need.
Then the whole values which is that ineffable idea that when you put stuff that goes together, has a strong relationship that it really is very cohesive. It contains its own context and can be useful outside of that little piece of the algorithm that you're writing right now.
OK. If you like this episode, you can find this and all past and future episodes at lispcast.com/podcast. You'll also find links to subscribe there, every episode has video, audio and text, depending on how you prefer that time to consume and then also on the pages, you'll find links to my email and social media.
I love to get into discussions, I love reading the comments and the questions that I get on Twitter and even in the YouTube comments. Thank you very much and rock on.