How variants can reduce complexity

If we don't limit it, complexity will get out of hand. One way to limit complexity is by collapsing the number of possible states down to a few known states that we know how to handle.


Eric Normand: So if we've got this exploding multiplicative cases where each time we have a corner case, or even any kind of branch, we multiply it with the other branches. It seems like our software would explode with complexity and we'd just never get out of it, so how do we?

My name is Eric Normand, and these are my thoughts on functional programming. One way I want to talk about today that we can handle this complexity — at least limit it somehow — is by using a limited number of cases. If you have some function that has seven different cases, a branch in it has got seven different cases, and who knows what happens in each of those branches.

Each branch could have three, four cases itself because it calls another function that has branches in it, how do we ever deal with the complexity at all, if that's what's happening? There's this giant tree with branches. One of those branches is going to execute and it's going to have our answer. Well, the answer is that we limit the number of cases that we will return.

What that means is if you have 10 branches, you might say, "Well, there's only going to be three possibilities coming out of here." Each of those 10 branches has to return one of those three cases.

In that way, you're taking this thing that could have, you know, if you multiplied it all out, it could have 20,000 branches, it could have 1,000 branches and you are limiting it to three cases, so you've reduced the complexity by bucketing.

You're bucketing all those different possibilities into a smaller number. This is the thing that a language like Haskell has an advantage for. They have what are known as discriminated unions, also known as variants, where you define a new type that has different constructors.

Each constructor is a different case that has the data required for that case. You have different ways example is different ways of communicating with someone. You might have communication method as a type.

The three cases are email with a string, phone number with a string, which would represent the phone number, and then maybe you have their address, which is some more other complex type that has like the street, the city, the state, the zip code, that kind of thing.

What you're doing is you're defining a limited set of ways to communicate with someone that your system can handle. What that does is it means that no matter what branch you're on, you're going to get back a valid way to communicate with someone that your system knows how to handle.

It could be 20,000 branches, 20,000 different paths through your code to find an answer. It's going to be one of those three. That's a powerful thing that we should constantly be trying to reign in this complexity because it happens...the complexity multiplies. It multiplies too easily without any work on our part.

It's almost like entropy is on the side of complexity. We need to put energy into bucketing those things and making decisions about which one of these three are we going to handle.

When I see code, I often look at it in terms of how much complexity is this eliminating or creating. Very often it's just creating complexity.

For example, there's a pattern that I think of as generating complexity, which is when a function returns different types and each type is supposed to mean something different. It might return a string. Let's say in the case of...Well, I'll get to that later. Let me finish this example.

If we have different types we might say, "Well, a string is going to be the name of the person but if we don't know the name we'll return nil. Maybe they have multiple names. We'll return a list of names in that case."

Now, the problem is you've generated three cases and one of them is super complex. The case where you return a list of names. Now you're just opening up the door for complexity because a list has this other problem which is it can be empty. That's a case that you might not have thought about.

You also have the case where like, what do you do with all those extra names, right? You're just passing on the decision to some other piece of code. Then that code has to have branches. It has to have all three of these branches again, the string, the nil, and the list, and it has to figure out what to do with it.

What you want to do is think about whether you can reduce this. Just go through each one and say, "Well, nil is obviously not good enough as an answer to cover all three cases." It's not. I mean, you can't represent anything with nil. A string might be.

You could say, "Well, if we don't know the name we'll return an empty string, or we'll return something like unknown name or anonymous," something like that. Whatever fits the software. If they have multiple names, well, we could just say, "We're just going to pick one now. We're going to pick the first one." Maybe a string will work. That might be a really good answer.

Then you could also say, "Well, the list of names can work for everything because we can have zero names if they don't have a name, so an empty list." We can have one name, which is the other string case. Then we can have multiple names, if that's what we want. The list also works as a single case to eliminate this complexity.

You still might have to deal with the zero names, the one name and the multiple names as separate cases. Those are kind of implied by the list already so you'll probably be aware of that when you're writing your code and so you won't get them mixed up.

Also, maybe what you want is to turn the lack, the singular or the plural, into a single thing, which is potentially plural but it's a list, because you want to iterate through it or you want to do something else that's list like. That really depends on your use case.

You see what I've done is I've taken these three possible cases and I've turned them into one in two different way. I eliminated nil. It's not going to work. If I just need a string to print to the user, maybe no name is the best string, or an empty string is what you should write, so why don't I just return that?

I mean, deciding whether you have a string or a list that is actually up to your application and what you're going to do with that. That's the kind of analysis I do. When I see someone returning different things, I'm like, "Just pick one."

Make that one case eliminate some complexity there because you're going to have to use that somewhere else and have those same three cases pick apart your value and figure out what branch it was on over there. You've coupled the two implementations.

Another thing I see is something like in the case that we had before where you had a string for the email address and also a string for the phone number. I see people using the string and returning just the string. Let's say you had those two cases, a email and a phone number. They'll just return a string with the value in it.

Later in some other code they have to use a Regex or something to figure out what type of thing it is. That sucks because it's probably not the responsibility at that point where the Regex is to know what are the possible things that this thing can return. What you want is some way to capture what the thing is outside of the string itself.

I mean, there's probably no overlap between emails and phone numbers, but it probably does get complicated. Let's say you have email and text message. That sounds like a email address and phone number, but now you're going to support Apple Messages, which you can use an email address to address someone.

Now, you've lost it. You don't know what's what anymore just by using a Regex. What you should have had is some way of tagging that value for what it is, so you don't have to do the Regex anymore. You might still have to do the Regex for some other reason like validating it, but you don't have to do it to determine what case you're in.

You can use the tag, which should be well known, and maybe even you have compiler help to make sure that you're getting all the tags like you would in Haskell. If there's a very limited amount of information, I would use a tuple or in Haskell I would use a dedicated type with different cases.

In Clojure, I would use a tuple. The tuple would have the first element of the tuple would be a tag as a keyword. I would put email and then the string. Then, if I was doing a phone number it would be like phone number string. The keyword phone number and then the string with the phone number in there.

You still need to know somewhere else that there are two cases. It doesn't eliminate that problem. Well, I mean, you might not because you can wrap that up. You can wrap that up in a API in a set of operations that are valid on communication methods. You can have a set of functions that have the cases, but that's all implementation that's hidden behind the interface.

That's another issue which I see a lot, which is people just putting stuff in a map. Maps are great. You should put stuff in a map, but you should also define an interface for it and wrap it up. You shouldn't just say, "Someone else has to know the structure of this."

Now you're tying your coupling, the thing that is consuming the map with the thing that produces the map. Some other piece of code shouldn't have to know the exact structure of the data that you're returning. It should be wrapped up in an interface. That's all about eliminating complexity.

I think I've teased that idea of wrapping stuff up in an interface. We'll talk more about that at a different time.

Awesome. Thank you so much. Please subscribe, like, star, plus one, all those things. They really help people find this. If they could be useful to people, I hope that people do find it. You're the ones who have already found it and maybe you think it is worth finding. Please share it with people.

Also, I've been getting lots of cool comments on Twitter, by email. My Twitter is @ericnormand. Please, whatever you think, if you agree, if you disagree, I would really love to have a discussion with you. That's what this is all about.

All right, great. See you later.