Why taming complex software?

This is an episode of Thoughts on Functional Programming, a podcast by Eric Normand.

Subscribe: RSSApple PodcastsGoogle PlayOvercast

My book is called Taming Complex Software. What's that all about? In this episode, I go into why complexity is a major problem and how functional programming can help.

Transcript

Eric Normand: Why would we want to tame complex software? My name is Eric Normand, and I help people thrive with functional programming. I'm writing this book, as you may know.

The title was chosen for me by my publisher. They work with a lot of books, a lot of titles, so they know what does well. I didn't have a better idea than this.

They used a lot of elements from what I was thinking for a title. The title is "Taming Complex Software — A Friendly Guide to Functional Thinking." A couple episodes ago, I talked about what functional thinking meant, and why I like the term.

I didn't choose it, but I like it. It's grown on me since they chose it.

I want to talk about this idea of taming complex software and what that really means, what it has to do with functional programming. Why is that the title of the book? This was actually something I came up with early, early in our discussions, my discussion with the publisher.

They do this really cool exercise and they said, "Why don't you write down the super-secret subtitle of the book? It doesn't even have to make any sense. It doesn't have to be coherent. It doesn't have to be catchy, but just write down the subtitle of the book and the title."

The title and subtitle. I came up with "Taming Software Complexity" as one of the phrases.

They chewed on that for a while. They liked it, but they changed it to Taming Complex Software. I think that there was some issue with "complexity" being complexity theory and I was not going to go there in the book and so they turned it into Taming Complex Software. I still like Taming Complex Software.

It probably makes more sense, even though I think of myself as going directly at the complexity. I think that I am operating on the software, and the software is what's getting tamed, so it makes a lot of sense.

Why complexity at all? Why are we talking about this complex software? That's what I want to go into in this episode. Software complexity is the reason why software gets harder to write as it gets bigger. As you add lines of code, the complexity goes up much faster than linearly.

Every language, every paradigm, has this curve of scale that it can reach. This is also true in other systems. If you can only communicate with handwritten notes, because you don't have a printing press yet, the size of your kingdom can only get so big. You can only rule and organize, and work with a kingdom of a certain size.

A lot of energy is put into systems of communication, systems of record-keeping. All these things help manage the complexity of a bigger and bigger kingdom. As your kingdom gets bigger, it actually gets bigger...what do I mean? As the radius of your [inaudible 4:34] , let's imagine it's just an expanding circle.

As the radius increases, the amount of land you have is growing quadratically. It's going up with the radius squared, so you do have issues where your communication distance that you can communicate effectively is within a certain radius. You've got interesting issues with the communication and how things work, as it grows.

Software has this similar problem, which is, as you write lines of code, the complexity grows more than quadratically. You need a lot of structure and a lot of help to get it to go bigger.

Where are those sources of complexity? I talk about this in much more depth in another episode. If you want to look at the sources of software complexity, search for that on my site on lispcast.com. I'll go over two of them right now briefly and probably more directly.

Here's the thing. Every conditional at least doubles the number of code paths. Every time you add an if statement, there's at least two branches. There's the then and the else.

Even if you don't have the else — it's a do-nothing branch — it doubles the number of code paths. That means potentially something will be different in each of those code paths.

It's more to test. It's more stuff that can go wrong. It's more you have to think about and analyze to know that yes, that code path is possible, and it doesn't break anything. It does give the right answer. It doubles.

If you look at a typical program, a typical piece of software, it must have millions of code paths in it. Millions, trillions. You don't even notice like, "Oh, if I just remove one if statement." Let's say you're refactoring and you remove an if statement. You've halved the complexity.

But it's so high already. You don't even feel it. It's already too complex to comprehend, even though you've cut it in half. You've made a significant change.

That's really interesting that we get to this point where it doesn't even matter if we add one or remove one. We don't feel the difference.

If you have \$10 billion and you double your money, like, "Eh, I don't care anymore." [laughs] "I can't spend it all. I can't even comprehend all that money." Or if you lose half of it, you're like, "Ah, I'm still super rich," which is really interesting.

I think that we're at that point in our software that we can't intuit. We can't have this good idea of what our complexity actually is. Functional programming has some ways of managing this. The main way is that through data modeling, we can reduce the number of conditionals down to the essential.

You're going to need conditionals in your software, because you're modeling some complex system that operates with the world. It needs a certain amount of complexity to it, but often we add conditionals. We add problems, more complexity than we need.

With good data modeling, in theory, you can reduce that number down to the essential bit. I have an episode on this if you want to go deeper into that. I go into the math of it and stuff.

The other source of complexity is something that we're dealing with more and more these days. Probably for the last 20 years, it's been pretty apparent that every action in sequence, when you're doing programming in parallel or in a distributed system, increases the complexity.

It increases it in a combinatorial way. Your operations, let's say they're running on different CPUs in your machine, different cores. These are operations on a different CPU. They can interleave in different ways every time they run. You need to make sure that every possible way that they can interleave is a valid way, because you can't control it.

You can't control the scheduler, how the CPUs are running, and what they're doing at what time unless you coordinate between them. In general, the baseline is you can't control how they interleave.

You need to make sure that every possible interleaving gives you the right answer. The number of possible interleavings has a factorial in it. It grows really fast.

The way I like to show how bad it is, is if you have 12 operations in sequence on two different threads — 12 operations, two different threads — you're already at over a million interleavings. Do they all do the right thing? That's your job, as the engineer, is to make sure that they either do the right thing or they can't happen.

Functional programming doesn't have all the answers to this, but it has some answers, and it's asking these questions. It's been coming up with ways of limiting the number of interleavings. It's been coming up with ways of making it so that the interleavings don't matter so much.

If you've got all immutable data, it doesn't matter. If all you're doing is reading immutable data, even if it's shared, it's OK. Functional programming has been working on stuff with distributed systems.

Even if you've got a single threaded system like you do in JavaScript, once you make a call to the server, you're now a distributed system, and you got the same problems. Stuff happening on the server is interleaved with stuff that's happening in the browser and the messages going back and forth are being interleaved. You've got the same problem.

I've said this before. I'll say it again. I don't think that functional programming is gaining popularity because of the cores that we've got. We're trying to make all this parallel software.

It's not happening. Not as much as we thought it would. Of course, in some places, it is happening, but it doesn't seem to be like, "Oh, it's so necessary to make everything parallel all the time."

What is necessary is we're making Web apps. We're making distributed apps. We're making an app on your phone that's talking to a server. Now, you need to horizontally scale out your service but you need to have some kind of consistency between the different servers.

Now, it's all distributed systems. That's where functional programming is really showing its value. It's because we've been thinking about these problems for a long time, and we have some answers.

What do you do? Like I was saying, you can reduce the complexity that's inherent in this distributed system and this parallel system. You can reduce it by limiting the number of interleavings, by eliminating certain possible interleavings from possibility, by limiting the number of things in sequence.

The longer your sequence is, the more interleavings there are by a combinatorial factor. If you shorten your sequence, you actually have fewer interleavings, which is another way of doing it.

That means doing less stuff with actions, with side effects. Doing more stuff with calculations that don't lengthen your timeline. That's why I talk about complexity.

Functional programming, I guess that it doesn't have all the answers. It lets you get to a bigger scale before you have problems, because you are going to still hit complexity limits. It's also been thinking about the problems and has solutions where other paradigms don't. The solutions are baked in to the paradigm.

This first notion of, "Is it an action, a calculation, or a data?" already divides the problem up into the hard parts, the actions, the necessary but medium-difficulty stuff, the calculations, and then the data, which is the easiest part.

Once you have immutable data, it's pretty settled. It doesn't cause problems. Then the data modeling is actually not so easy. It requires a lot of experience and design skills to model a problem as data, but that actually helps reduce the complexity.

Just dividing things into those three things really helps clarify where is my complexity. What can I do to reduce it to a minimum? I believe that the paradigm itself has baked in a lot of the solutions to eliminate complexity.

Awesome. I'll just recap real quick. Complexity is the reason software gets harder to write as it gets bigger. I don't know why, but you see every piece of software has this huge blossoming and blooming at first — all these new features, all this stuff. Then it slows down.

It's not that people run out of ideas for features to add. It's just that they can't add more features without breaking existing stuff. It's too hard. It just gets harder. Every conditional doubles the number of code paths, and every action and sequence when you're talking about parallel or distributed systems, increases the number of interleavings.

What we're talking about is the same with code paths and with interleavings. What will happen next? What is the next thing that will happen? Is that going to give us the right answer?

That's always the hard part because we have to reason it out in our heads. We have to play it out. What's going to happen next? If there's 12 things in sequence, and there's a million possible things that could [laughs] happen next, it's no good. It's too much to keep in your head.

The same with the number of code paths. Got all these branches, I don't know what's happening next. I don't even know what just executed.

FP doesn't have the answers, it can't reduce this to nothing but it does have frameworks for thinking about it. It does have a nice set of concepts that map nicely to these notions of complexity, to these sources of complexity. Pull out the things that are more complex, deal with them specially.

You got calculations that are much easier to deal with because they don't add to the number of interleavings. They are going to have conditionals, but with good data modeling, you can limit it down to the bare minimum.

All right, thank you so much. You can find this and all the other ones — the past, the present, the future episodes from the beginning of time to the end of time on lispcast.com/podcast. I've got video, audio, and text transcripts of all the episodes depending on your mood and your predilection for different media.

If you like the text, you like reading, go for it. If you like to listen in the car, you can subscribe. There's links to subscribe.

There's also links to social media including email, Twitter, LinkedIn — however you want to get in touch with me. I am happy to answer questions. I love getting questions, and talking about them in the podcast.

Tell me that you appreciate me. Make me feel good. All right people, thank you so much, and rock on.