A Theory of Functional Programming 0002
This is an episode of Thoughts on Functional Programming, a podcast by Eric Normand.
...there. How are you doing?
All right. I am talking about, "A Theory of Functional Programming," which is a book that I'm working on, and this is actually me working on it right now. I am going to talk about the topic, and hopefully, the transcript will turn into my book.
I talked about a lot of things last time. That was last Friday, and I know that the last thing I talked about was composition, and I don't think I got very far into that, so I'll talk about it.
We talked about the three domains, and how you can be in each domain, and stay in the domain. I think that's important to prove that they do have an integrity to them, that there's something about them. Once they're in that domain, you can just travel around in it without changing domains.
Now one question is, "What happens when you compose across domains?" That's interesting. If you have two pieces of data and you put them together, what do you get? You get a new piece of data. It's still data.
If you take two calculations and you put them together, you get a new calculation. Finally, you take two actions. You put them together. You do one after the other, or you can do them both in parallel, then that is a new action. You've stayed in the domain.
What if you take a piece of data and you compose it with a calculation? You don't get data, you get a calculation. The calculation will infect the data, and now you have calculation.
Similarly, if you take an action and a calculation, you put them together, you get a new action. If you have a action, which is read file in to a string. Then, you have a calculation, that is, parse the string into a piece of data, into a data structure, the thing that you get when you put them together — you read the string in from the file and then parse it — that whole thing is an action.
Similarly, if you needed to take a data structure, serialize it to a string, and then write that to disc, if you took that thing as a whole, as an unit, that's actually an action. We see that things infect downwards. I'm saying that actions are contagious downwards, meaning if you have actions as the big whole, and then inside of that you have calculations, and inside of that you have data.
We talked about this hierarchy where we know that actions are universal. We know that there are some things within that sphere of action which we can consider to be timeless.
Yes, the plus operation does change the memory of our computer. It modifies every operation in there including addition, modifies a register. That's where you store the answer. It reads from memory, so that memory could change at any point.
In a general sense, every operation depends on when it is run and how many times it is run. Using our language or some other discipline, we can say, "Well, that memory or the register, it's kind of special. We're not going to store anything important in there so that at any point we can overwrite it."
You make all these rules, which we've developed. As programmers, these are good practices that we've developed.
If you have enough rules, you can say, "I've made a function that does plus, that uses the plus operator or the plus instruction that my machine understands, and it uses a stack discipline, and it does all this stuff, so you never are overwriting any memory and it's using the arguments that are on the stack and nothing else is writing them."
You have this illusion that you're setting up in functional programming, that it is safe, that these things are timeless no matter when I run this function, I'm going to get the same answer.
It depends on this entire stack of disciplines that our compilers enforce, that we enforce as programmers, or that our data structures enforce. This stack with millions of little decisions that we've put in there in order to enforce this illusion.
The idea is that calculations are a subset of actions that are timeless and then, in that, we know that calculations, because they're basically mathematical functions, like in a lambda calculus, that these functions can represent any data.
They can represent numbers using the Church notation and you can do operations on these numbers. You can represent everything using functions, also known as lambdas, if you're into that.
That means that data is a subset of calculations. The reason things infect down, actions infect calculations, is because you're just pulling it up by composing these two things, you're getting stuff up.
Just naturally through entropy, just programming away and changing stuff without real effort, if you don't put effort toward this, through entropy things will just naturally move up.
Your data will become calculations, will become actions, and so then your whole application is written out of actions, and through discipline and effort, energy, expenditure we can push stuff down, back down, down, down, down, down.
This is one of those imperatives of functional programming, is push stuff down as far as you can the stack because it's going to come back up naturally. You have to put in the effort to make it go down.
Here's another thing, when you have these compositions like an action plus a calculation equals an action, you have the reverse operation too, which is the refactoring. Instead of composing these two things together, as you would when you're writing a function out, do this, then do that, then do that, you can pull each of those steps out as a separate component. You can actually separate out the calculation from the action.
Here's the nice thing, you can separate it. If you have an action, you might be able to separate out a calculation from that action, and so now more of your code is in calculation as opposed to action. Similarly, you can separate out data from an action and more of your code is in data, then in action than before, and then you can separate out data from a calculation.
Now more of your stuff is in data than in calculations, so you're making progress and you can also, obviously, take two data and separate them. That's fine, too. You're not going to gain very much in that.
But notice something very important, which is that, if you have data, there's no way to pull an action out, there's no way to pull a calculation out. That data is at the bottom of this three-tiered stack. There's no way to get that stuff out.
That's an important thing. If you're looking at a calculation, you're not going to be able to pull out an action. That's good to know. It's good to keep that in mind, that it's not possible to get that.
I want to talk again about these three domains. I'm going to go into each of them and just talk about them. First, let's talk about data. Data has several properties. The most important one is that it's inert. This means that it does nothing.
It is also self-identical which is another way of saying it is what it is. It just is. We already defined it. It's facts about an event that you've recorded. These are records. These are things you will want to keep. We spend so much energy in the real world making sure that our records last a long time.
If you go into a doctor's office, they will have a huge filing cabinet full of old records that is actually required by law that they keep them. Maybe now they can be all electronic. I'm not up-to-date on the laws. But the point is, they're spending a lot of time keeping those records around.
It's a shame that when we've moved to computers, we default to stuff that can change. Because we're basically riding all over those old records, throwing them out. Every time we change them, it's like, "We don't need that anymore." It would be like if you went to the doctor. You got sick. You went to the doctor.
They said, "You have the flu. Here's some medicine. Come see me in a week." Come back in a week, and you say, "Doctor, I'm feeling better." They say, "Great!" And they rip up that paper like you never got sick. Just throw it away. That's not what they do. They put a new paper in that says, "New visit. Yes, now they're better."
They have the one that said they're sick. They say, "You take the medicine. You're better." Put that in there too. Put it in the folder. File it away. We'll see you later. Next time you come, we'll have that. That is what we want to do in functional programming is we make some data we don't change it. We can always make new data.
We have a lot of RAM now. I don't want to name a date when this started happening. But we have a lot of RAM now and a lot of hard drive space and a lot of cloud storage space. We can store way more than we actually can even conceive of. For a business, you're writing business software. It's probably just a very negligible cost per customer.
Back in the day, what day? I don't know. We had less. It's getting cheaper and cheaper. There must have been a time when we had less. It was important to be able to use that space. But nowadays it doesn't make sense. It just doesn't make sense to throw it away for no good reason.
You might have a reason, and you need to throw it away just for the cost reason. It doesn't make sense. Another thing about data. Data is serializable. Because it's inert, you can represent it in a way, in bytes on a disk, and read it in later. You can send it over the wire. This is a very important property of it.
Machine code is often not something you can send over like a function. It's very hard to send over the wire. It's hard to store it and read it back in and use it later. It's just not something that we have as an industry focused on being able to do. Because, and I think it is just a generally hard problem, that function probably cause other functions.
You need those functions too. It's a problem that we have not solved yet but we have definitely solved in many times over. We haven't solved it definitively, but many times over we've come up with serializable data formats. It's just something we can do very easily. Something that's cross-platform, that's understood across.
It's not a problem right now. We've solved it. You can use JSON, you can use XML, you can use flat text files. Whatever. It's data. It's serializable. The next thing is it requires interpretation. What do I mean by this? This is one of its limitations is that if you have a function, you can do science on it and figure out what does this function do?
You can pass some arguments. You might be able to figure it out. But data, like the number five, what does it mean? You need to know the context. You need to know what it's all about. It's something that requires meaning to be imposed on it. The cool thing about that is because it's almost like neutral to meaning.
That means you can interpret it in two different ways. What's an example of this? A good example. We have records of like let's say, Sumeria. Ancient Sumer where they recorded the accounting of the trade of cattle. Someone paid taxes in head of cattle, they counted them, they made a record, and we still have that record. We found it. We dug it up.
When it was originally put like that, it's probably done for accounting so they can keep track of the years. Are we getting more taxes every year? What parts of the country are paying more taxes? These kinds of questions are easily answered by having all these records. But we're not using it for that, are we? We're asking what was their economy like?
As anthropologists, asking what was the economy like? How did they collect taxes? We're interpreting the same data in an entirely new way. Another example, a more modern example. If you store logs of Web requests. You just log every web request to a file. That Web request that came in, your server interprets it in one way.
Like, "It's a request for this file, let me serve that file." "It's a request for this, let me serve that." Later, you're doing analysis, you're doing some analytics. You want to know how many visitors did we add to each page? That's a different analysis of the same data. You're imposing a different meaning on it. This is an important aspect of data.
Eric: I mentioned Sumer, and I think this is an important point that I didn't get into enough. This idea of data as an inert serializable, meaning that you can make a record of it, like you can store it on a disc, and it requires interpretation, this idea goes back to the beginning of history.
People have been writing down records for a long time, and they have developed techniques of how they should be stored, what kind of stuff needs to be recorded. Someone might have written down the number five, the number six, the number seven, and someone's like...100 years later, "What does this mean? Was this how much pieces of corn they ate?"
I don't know what this...They've learned, "Let's write it out a little more complete so that someone can understand it later."
We've learned over time to make our data more self-describing, to maybe make it more redundant by having two copies of everything. All that stuff are techniques that have been passed down since this earliest time.
I think that in the computerization, and in certain practices that we do now in software, we've neglected that history. Functional programming, one of the things it's doing is it's resurrecting that, at least in industry.
It is resurrecting all these practices, that record keeping is about keeping a memory of an event. A fact about an event for as long as possible, as long as you can foresee it being needed.
Eric: Another thing that we've done in computer science in general is develop data structures that can store data with certain operations, and those operations have well understood complexities, meaning their access patterns have certain speeds associated with them.
Not absolute speeds but relative speeds. It's relative to the size of the container. One example is the tree data structure has logarithmic access to everything inside. The link list has linear access. We've built these data structures that have useful and coherent access patterns to them.
A HashMap has constant time lookup given the key. That's something that we can rely on. A linked list has linear time lookup given the index, the integer that you jump into. All of these things we have known complexities for, this really helps us deal with data.
This helps us make better use of data, store it better, access it better. This, I believe, is talked about more in the functional programming community, that data and its patterns of access and storage is an important part of the story of data.
Here is one more cool thing about data, data is also universal, though it requires interpretation, like I said before. You can take a string of bytes, or you can store a computer program as a string of bytes.
You can store a computer program as a file of buffer of characters that get sense of a compiler. All of this is data. Everything is represented in data in our computers. You can represent calculations as data, and you can represent actions as data.
Definitely, you can represent data as data, but you can also represent as a pattern. Before I go on, because these two are related. Representing a program as data is a thing that we do very often in functional programming. Very often.
It's very common to see an interpreter written right in the language for some data. Typically, we think of interpreter, you have a string of text that's going to get parsed.
Very often in functional language, you will have a representation of a calculation or an action written right out in the literal data structures that the language gives you, which is really cool. We're writing programs that contain programs, that run programs that have an effect that...This is where it starts to get very circular.
We're on a universal Turing machine, we can write in a machine that runs another program that is also Turing complete, which runs another machine that's Turing complete, etc. In functional program, you see this a lot.
It's possible in other paradigms, but this is something that's very commonly done in functional programming.
Two other patterns, besides this interpreter pattern, that have to do with data, two other patterns that have to do with data are configuration as data. We see this very often.
We'll have a data file, a file on the disk, maybe it's in XML, maybe it's in JSON. Maybe it's key value, like an INI file, and it configures your software.
A lot of the pieces of data that you normally hard code in there are now in this file. It's a very common pattern. You keep all the calculations, all the actions out of there, you just put the data. Then databases. The idea of a database is a place to store your data safely and securely, and forever, basically.
What you see in functional programming is more and more the emergence of either, append only databases, where you never delete old records. You only add new records, like in a doctor's office. Or databases that are logging.
They're essentially the same pattern, except you're writing the actions to take. You write actions to a log and you can always replay the log and figure out the current state.
The reason I differentiate them conceptually is because the logging ones tend to only give you the most current state. You can run it up to a certain time, but they only keep the current ones in memory. That is the most common access, but you could always run it to a certain point of time.
Anyway, it's a very minor distinction. I'm running up on 30 minutes, which was my goal for this. I'm just going to keep talking about data instead of moving on. Another thing about data is it gives you an audit trail.
If you have this database that is append only, often in an audit — a financial audit, or a security audit, or even a bug audit — you want to know, what was the state of this system on this day, or at this time when this bug happened?
What financial transactions did this system know about at this point in time? A traditional database, meaning one that updates values in place, if it's not designed specifically to do this, it won't work. You won't be able to ask, what was this system like?
I've written so many databases where if the user changes their password, or if the user updates some information about themselves, I just change what they had before, right in that column in their row, and we've forgotten what they had before.
That's no good. Now we can't ask this question anymore. The reason this is important is, these kind of audit trails are often overlooked in the requirements gathering.
It's an implicit assumption that the requirements gatherers don't write down that, "Hey, at any point I'm going to come and ask you what the system knew yesterday that opposed to what it knows today." This is something that functional programmers are thinking about and working on.
Eric: I talked about how data is transferable, meaning you could send it over the wire to another computer, it could read it in. That's really just part of it being inert, and part of us having standard data formats that things can be read in regardless of the programming language. Storable forever, I talked about that. Multiple interpretations, I talked about that.
Here, I don't want to talk too much about the problems with other paradigms, but one of the problems that functional programming addresses is that data being inert but requiring interpretation...object-oriented programming is trying to address this by attaching the runnable code to the object.
It knows how to interpret itself. It's the object, meaning the data and the code together. You can send it a message, and it knows how to go into the data and answer the question that you're asking it. It's very valuable to be able to do that, it's a great goal.
I personally believe that it is something of a pipe dream. Let me put it a better way. That with the technology that we have available and the history of data gathering that we've got as a civilization, storing inert data as is, without trying to attach code that we have to then run, storing it as is is our best bet right now.
That it is a fine research topic to figure out a way of attaching code that will be able to interpret the data later, but we are not there yet. As an industrial practice, I cannot recommend it. As research, I actually think it's a pretty cool topic.
Still do better than data, because you could have that data to be self-interpreting. The practical problems that we've run against are that if you wrap the data in methods — in the object-oriented speak, the method — you then lose all sorts of cool properties, like being able to join it.
You have a collection of them, of the pieces of data, each one is an object. How do you join — using relational calculus or something similar — it together? It's a very useful operation, and I don't think that there's a good answer.
A lot of the patterns in object-oriented programming are trying to work around this kind of problem. In functional programming, what we do is we say, we know the properties of our data structures. Just store it as data, it's just data, put it into a data structure that allows to join, and you're done.
We don't need to have some kind of protocol of methods, and then some algorithm for running those methods. It's an extra problem that we don't need. Like I said, it's a wonderful research problem. It's the kind of thing where you're like, "Look out, OK, in 100 years, maybe we'll have solved it."
How did we get there? You step back, then you say, OK, this is the next thing which I need to do. That's cool. That's awesome. As a practical industry programmer, commercial software that we're writing, trying to make reliable systems, we're just not there yet.
It's my opinion. OK. I'm done. My name is Eric Normand. I'm writing a book called A Theory of Functional Programming. You have heard me talking all about data mostly. Talking about composition, talk about data, and this, hopefully, will make it into the book.
Thank you for joining me. Please subscribe, like it, do whatever, mash all the buttons. I want people to discover this far and wide. We need a definition of functional programming. Thank you very much. See you later.