How do you create a semantic base layer?
This is an episode of Thoughts on Functional Programming, a podcast by Eric Normand.
In stratified design, we are looking for layers of meaning, each one implemented on top of the last. But how do you go about building those in an existing codebase? While it remains more of an exploration than a step-by-step method, we can still describe some techniques that help find them. In this episode, I talk about four of them.
Eric Normand: How do you begin to turn an existing code base into a stratified design?
Hi, my name is Eric Normand, and these are my thoughts on functional programming. In a previous episode, I talked about stratified design and how it's a design characterized by different layers — meaning semantic layers — one built on top of the other. This was a good way to design and structure your application.
It's a way that suggests a good structure to your code, like what code goes in what module, things like that. It's also good architecturally. It puts things that change frequently together and things that change seldom together, which is another good thing architecturally. I had a question. A nice listener posed a very good question, which was the following.
"How do I begin to come up with these base layers?" The base layer is the bottom layer if you think of it like a pyramid. Start at the bottom. The ground is the programming language layer. All the stuff the programming layer, which gives you functions, objects, data types, the mathematical operations — all that stuff that you get in the language layer.
Then the base layer. What he's referring to is what's defined directly on top of that, which is a thin layer of semantic meaning on top of that language layer. It's thin, meaning it doesn't do much. It doesn't provide much functionality. Because it doesn't provide much functionality, it can be really solid.
Then you build another layer on top, another layer on top, and another layer on top until you're at the layer that changes the most. It's up at the top. It's short because it's built out of more powerful pieces underneath. You don't have a lot of code that's changing. What's up with the bottom? It changes very, very seldom. Maybe you'll add to it but you shouldn't have to change the things.
Now, just as a really simple example, in your software, you might have to deal with email addresses. You might not have these operations on email addresses that are universal like they are timeless. Email addresses don't change. It's a standard. It's going to be stuff like pull out the domain part.
You know how you can add plusses to your email address while you might want to have a thing that can remove that to canonicalize it. Maybe there's a lowercase operation you do. These are very standard email address-specific operations.
Your email address is going to be a string. That's a language layer. On top of that, you're going to build these email address-specific operations out of string operations. It might use ReAjax or two lowercase or whatever functions you have in your language.
Then on top of this email address layer, you're going to add the stuff that you need to do with the email addresses as your software's domain. What are you doing with them? That's their user ID for logging in and whatever you're doing with that.
That's going to change a lot more frequently than these email address operations. Those email address operations could probably be useful as a library that would be shared across multiple applications in multiple domains, because email address is such a common data type in software in general.
That's what you're looking for. Something that is so solid and universal that you can build the whole application on top of it and never touch it. Like I said, you might add to it because you might realize we're missing something, but you're not going to change the thing that finds the domain. Once you get that right, it's done.
How do you go about finding those things in your code? Email addresses is easy because it is a standard already. You're looking for it in something that maybe hasn't been done before. You're looking for this semantic layer in the domains specific concept.
Without knowing the specifics of the code, it might be kind of hard to give general advice, but I'm going to try. I'm going to give the advice that I would do not knowing, [laughs] almost nothing about this software.
I'm also assuming that this is existing working software that has been made by a typical process where you weren't thinking about layers while you were writing it.
My first go to refactoring for when I don't know what other refactoring to do, it's not clear, there isn't enough semantic information to figure out what to do next. What I do is I shorten functions.
If I have a function that has 10 lines in it, I try to extract out smaller functions from within it. Try to come up with good names for them and make that original function shorter. Instead of having, and I'll give an example, you might have a function that's a reduce and it has an anonymous function in it, then it's got the data that it's reducing over.
I would take that anonymous function and I would pull it out and name it, like at the top level. That name, trying to come up with like, "What is the meaningful operation here. What level of semantics is it at?"
Did I have a list of employees and so now this reduce operation is taking the employees and doing something to them. Is it treating it like employees? Or is it treating it like data? What's going on?
I'm trying to name it and I'm just coming up with random examples, but this is summing up all the salaries of all the employees so I know how much I need to pay them this month.
I do a reduce and then like, "Oh, wait inside the reduce I'm pulling out the salary of each employee. That reduce function, the function I just pulled out, it might have 15 lines by itself. I go in and I say, "What are the things, what's going on? Can I pull things out and name them?"
This is my go-to refactoring in general when things need to be cleaned up. If I don't know specifically what to do. It's usually a good thing because there's a lot of mess hidden inside big functions.
What I'm trying to do is find all the different layers. I'm trying to find the different operations that happen at different levels of meaning by pulling them out. If I've got a one or two line function, I know that's pretty succinct. It's probably just one layer more then what it's built on.
If I have 5 to 10 lines, it's probably skipping layers and there's a layer in between that I could be building on. This is what to do with existing code where you don't know where to begin. I think it's a good place to begin. It'll help clarify your code in general, even if you never arrive at some kind solid base layer that is unchanging forever. Like a universal base-layer.
It's good in general. It might to lead you to some understanding. The next thing is, I would be trying to find monoids. That's just a thing I have. I try to find monoids. Why monoids? There's a thing about monoids, which is that they take their binary operations. It takes two entities of the same type, two values of the same type, and it returns a value of the same type.
What that means in this discussion is that you're staying at the same semantic level. If I'm renting something from a car dealership, I take...That's a bad example. Let's say I want to combine two discounts. I'm going to have a sale and I need to represent the discount that you get if you're in the sale.
I combine two discounts. How do they combine? There's 10 percent off and another 10 percent off. Do I add them? I don't want to treat it just like a number, because what if you multiply them instead of add them?
A discount, you can start to think of it like, "OK. This is a real semantic thing, because I could have 10 percent off. I could have a fixed amount off, a constant amount like \$10 off." I have this operation where I realize I'm adding two numbers. What I really want to do is combine two discounts.
I'm taking two discounts and returning a new discount. I'm looking for things like that. Usually, they happen with reduces, that reduction where I'm adding salaries of all the employees. Maybe I don't want to, inside the reduction function, pulling out the salary and adding it to the accumulator value.
What I really want to be doing is taking two salaries and combining them into a new thing. Maybe I wouldn't consider it a salary. I would consider it an amount of money. It's just a quantity of money, number of dollars, something like that. I combine those two.
It's addition, but I get a new quantity of money out. Then it's not just a reduce. It's a map over the employees, a map converting all these employees into their salaries. A list of employees into a list of salaries, and then I reduce over that.
I've extracted out that part of the reduce function that was doing two things. It was both adding the salaries and extracting it out. Now, I have a monoid. It takes two sums of money and it gives you a new sum of money.
This is an operation that is a candidate for something that is solid. Now, you can have a library. Think of it like, "I have a library of money operations that my accounting department can start to use. I have a library of sale operations."
That's stuff like combining two discounts. Maybe they don't combine. They don't add. Maybe when you combine two discounts, it just chooses the greater one. If I have a coupon and there's a sale going on, you don't want to give 50 percent off of something.
The 30 percent off coupon trumps the 20 percent sale. That might be a company policy. Instead of doing an addition, which is what you had before, you should be calling this other operation because it gives you a place to define the semantics of that operation.
I'm going to recap. Number one was pull out smaller bits of functions to make your functions shorter. You start to get a whole bunch of functions and now you want to organize. Now, you organize them along the dependency lines. That's going to be number three.
The whole idea is refactor big functions into lots of smaller functions. That's one. Number two is look for monoids, because monoids are operations that stay, by definition, at a particular semantic level. This is a good number three, too.
They're also monoids that is they're usually combiners, combining operations. The thing about combining operations that makes them nice is that they're the most complicated kind of operation you'll find.
Eric: There's a marching band in the street. You want to do those first. Since they are the most complicated, they're the ones that are going to require you to model the data you need most specifically.
If you do your easy operations first, which is what most people do — they leave the hard operations until later — what happens is they finally get to those hard operations and they realize they don't have the data they need for them.
I have a whole episode about this with a really good example from real software that I've used, so I'm not going to go into it anymore. Go listen to that, or just imagine missing data because you didn't think about it until later.
The thing you're trying to do is to find these combining operations and define them. They might not be monoids. You might combine two cars into a fleet. Then, of course, you have fleet combining. That's a cool operation.
You might have operations that aren't really combining. They're not monoids but they are combining, meaning they're returning a different thing. They might have three types but they're still combiners. Those are good things to focus on at first.
When you find them, they will help you flesh out the semantic entity that you need to be focused on. Then the other operations become easy around them. You can do that same refactoring we talked about in number one where you pull them out.
This will be number four if I can remember it now. It's something I was thinking about coming back to — the directionality of dependencies. As you are pulling these things out and let's say you leave them all in the same module, which is fine, this module starts to get bigger and bigger.
What you should be noticing, what you should be looking for is what is depending on what. Function A calls function B. That means A depends on B. If A depends on B, there's two possible choices.
One is A and B are in the same semantic level, absolutely the same. Or A is in a higher semantic level from B. You got to use your judgment here and you shouldn't just look at two. You should look at them all as a whole. Start pulling this apart, but you should be looking for a directionality. If A calls B, and B calls C, and C calls D, almost certainly A and D are not in the same semantic level.
This could happen a lot where you're doing something like a map over an entity. By calling map...Map is a sequence operation, so you are treating this entity like a sequence. You're probably skipping layers. It is a smell.
Why am I doing map here? Now, you might be doing map because you have something like a sale has a collection of vehicles that are in the sale. Collection, that's fine. You're going to do a map at some point, because that's part of the semantics of the sale.
In general, that's what you're looking for. You're looking for stuff that like, "Wait. Why is it that A calls B, B calls C, C calls D, and then D calls B? Is it because it just happened to have the same shape of data? Maybe that shouldn't happen.
Or why is it that B calls C, B calls D, and C calls D? Maybe I need to create more of a hierarchy between these things. Maybe D is at a lower level than both B and C, but maybe B has skipped a level. Maybe there is an operation in C."
It's not strict. It's not that a layer can only call stuff one layer below it. That smells. It's guiding your nose, guiding you through the discovery of these layers. You should be able to graph it.
You should be able to say like, "If you graphed them, if you graph the dependencies in something like graph this and you let its algorithm bubble it up like a tree, you should be able to see these layers."
You should be able to see, "OK. These five operations at the top, these are the highest level operations, and then they call this other layer that goes in here and his other layer. Then at the bottom, at the leaves of these operations that just call like basic language things." You should be able to see that.
I've never done that. I actually visualized it with a graph visualization algorithm. That might be an interesting thing to do, but you're always looking to say like, "Relative to these others things that are in its graph, where does it belong? Does it belong with this other thing? Does it belong on this layer or that layer?"
It's not very specific advice but I think that that's what I do. Now, the last thing I want to say is I do have a talk called "Building Composable Abstractions" that basically tries to approach this if you got a greenfield project, a greenfield abstraction.
It tries to come from the other direction and do a lot more upfront thinking about how to build this abstraction. I'll briefly talk about it. I don't want to go too deep into it because I have a whole hour-long talk that was already condensed from the hours and hours that I could talk about it.
The idea is you pick some core concept in your domain, in your app. If it's car sales, it might be the sale, the promotion that you're running. Let's say it is. Like on a white board, you write down all the operation.
When I say pick the sale, what I mean is you pick that concept and you really develop the metaphor for it. Develop it in your mind. Think about what is this like. This sale is like if you went to a clothing store and they had a sale.
Before the sale starts, you go around and you put a ticket like a red sticker on every sale item. That red sticker means it's 10 percent off. Then you go through and put a blue sticker on it, and that means 20 percent off.
Boom. Automatically, you have a picture in your mind of what a sale looks like. You're basically going through all your inventory and you're tagging cars what discount they're going to get.
That's just one possible way to do a sale. I just made that up. Your sale might be different. It might be, "If your name starts with an F, you get 50 percent off. But if your name starts with a J, you get..." Whatever you want to do. It's up to you. It's up to your business.
You need to have that in mind before you go into the next step, because that metaphor is going to give you answers. Then you go through and you figure out what the operations are on the sale.
In my case, the operations would be tagging. Given a car, I give it a tag, which represents a discount. I'm also going to need a representation for discount, what the colored tag means.
I'm going to have maybe something that maps blue means 20 percent, green means 30 percent, something like that. I'll have that also as a concept. You go through and you find all the operations and you start with the combining operations.
Combining operation in a sale might be something like if you have the sale, you want to add a new car to that sale. Not all the cars are on sale but you add cars to the sale. How do you do that?
Do you add one car at a time? Do you add, "All the 2018 cars are now sale."? You come up with that operation, and these combining operations inform us.
Now, this is an iterative process, so you might not get it right the first time. You do that and you try to figure out what these operations are. If you have to, you start over and you find different operations.
Then you take those operations and you implement them as functions on data. Then you can test it out. It's not implement. It's, let's say, model it with code. [laughs] This isn't the final implementation.
The final implementation is going to involve the database and Ajax requests and stuff like that. This is just model it in memory with code, so you can play with it, maybe visualize it. Then step four is implement it once you've worked out all the kinks in it.
They're both on my site, LispCast.com. You can find them there. I hope this answered the question. I hope it wasn't too ranty and rambly. It probably was. I apologize, but this is a deep topic, so it had to.
My name is Eric Normand. This has been my thought on functional programming. If you want to get in touch with me, ask me more questions. I love it. I love getting questions. I'm at the audience size now, where I feel like I'm getting a regular stream of questions and I really appreciate that.
Get in touch with me. I'm on Twitter, @ericnormand. You can also email me. Probably better for questions, email@example.com. LispCast, L-I-S-P-C-A-S-T. You can also find me on LinkedIn if that's your bag. See you later. Bye.