Stratified design and functional architecture

I spoke at Øredev 2023 about stratified design and functional architecture. The slides were a little messed up in the video, but in general they're readable.

Download slides

Transcript

Welcome. My name is Eric Normand, and I'm going to be talking about stratified design and functional architecture, functional programming stuff. I wrote a book on functional programming. I'm going to show this slide again at the end, but there's a discount code if you'd like to buy the book. I also have a copy of the book if you want to look through it. After the talk. So this is the summary slide of the talk. We're going to talk about pure functions and stratified design, and how that leads to the onion architecture very naturally. Show of hands. Who knows what a pure function is? So I know I'm talking to cool stratified design. Do you know what that is? Couple of people, couple of people, three people. All right, an onion architecture. Okay, more like the first one. All right. We're going to go through each of these. So here's some snippets of code. The first one is about sending an email. The next one is about saving user to the database. We have get current time. Then we have some code for summing numbers calculating the length of a string, and then at the end, we just have some like an object and an array. Now, this is all normal code. We would see this in any kind of code base, whether it's object or an interfunctional. The big difference is that functional programmers are going to draw a line here, and they're going to call these top things that affect the world or are affected by the world. They're going to call them something different. I call them actions. We'll see some other names for them later. Then they're going to draw another line, and they're going to call those calculations or something else, pure function, something like that. Then at the bottom, we have just plain data. So inert stuff, it doesn't run, it's just data. When I talk to functional programmers, everybody says that this is essential. You cannot do functional programming without making this primary distinction between code that affects the outside world and then code that doesn't. So I call this the fundamental distinction in functional programming. Let's go through each of them. We'll start with the easy one, which is data. Kind of all know what that means. It's facts about events. You get an HTTP request. It's got some payload. You learn about what time it was sent. You learn what IP it was sent from. You also learn what the, it's a create user HTTP request. It's going to have the username that they want to have. All this stuff is facts about this particular event, and you can derive other ideas from it, other pieces of data. It's the easy stuff that you would guess, number strings, enums, collections, stuff like that. The second one is calculations. These are computations from input to output. They're timeless computations. It doesn't matter when you run them or how many times you run them. You're always going to get the same answer if you give them the same inputs. They're also known as pure functions and mathematical functions. I don't call them that in this talk or in my book because it's kind of pure, just makes it sound like there's some moral aspect to it. And using the word function also confuses it with the programming language feature, you know, like JavaScript function. And it's not always function. Sometimes it's operators. So we have our arithmetic operators. Those are calculations. Do you give three plus five? You're always going to get the same answer. Stuff like absolute value, string concatenation, same two strings. You're always going to get an equivalent string at the end. Or something like validating an email address. It's either valid or it's not. It's not going to change. Lastly, we have actions. And like I said before, these affect or are affected by the outside world. You'll also hear them called impure functions or side-effecting functions or functions with side effects. There's a rule of thumb. If you can't tell if it's an action, you can apply this rule. How many times does it run? It depends on that or when it is run. So for example, if you read from the database, it depends on when you read. If you read before it gets changed, you'll get a different answer from after it gets changed. So sending an email. That depends on how many times you send it, right? You probably only want to send an email one time. Not zero times and not 10 times. Reading from the database, I just said that one. Or writing to a file. Same thing. If you write to a file before someone else reads it, you're going to have a different result from it if you write after. So actions are just harder to deal with. They're harder to test. So let's say we have an action. That's the little red box that's in production. It sends an email and then the green box is a calculation. So we have our build server. It's going to test our action before it sends it over into production. That action, if we want to test it, we have to set up some kind of fake email server. Or maybe it's a real email server, but it sends to a fake email address. We have to do something. It's going to make it harder to test than that calculation, which can run as many times as you want. Wherever you want it to run, it's going to be the same answer. They're also harder to run safely in production. You can have stuff like race conditions. You can have stuff like if you send an email, you tell a server to send an email, but it times out. Did it send the email or not? You have these troubles you have to deal with in production. There's solutions to them, but they're just harder to run. And then they're harder to debug. So something like a race condition, also difficult to debug. So because of this, functional programmers tend to choose calculations over actions when they can. They have to do actions. We all our software needs actions. We need to send the email, but as much as we can, we're going to move our code into calculations. Functional programmers also come up with systems for managing their actions and controlling them. You might have heard of stuff like effect systems. So that's the kind of thing functional programmers do. They know that actions are hard to manage, so they come up with whole systems to deal with them. And so over time, you'll see that functional programmers write more code in their calculations and less coded actions. Let's look at another way that actions are kind of insidious and hard to use. It's called the spreading rule. So we have some code here, and we'll start at the bottom with main. So the main function calls this function called affiliate payout, which is defined just above. It loops through all the affiliates and calls this function figure payout. Figure payout is defined right above, and it does some math, and it checks if this number is over 100, and then it does a send payout, which transfers money from one account to another. So definitely an action. So I've highlighted it in red because it's an action. So we might say, hey, this code is pretty functional. It only has one action in it, but you haven't applied the spreading rule yet. The spreading rule is very basic, easy to apply. If a function calls an action, then that function itself is an action. So send payout is an action, so figure payout is an action. But here it's called here, and so then affiliate payout is also an action, which means that main is an action. So this whole code is just all actions. So once you use an action somewhere, it affects everything up the call tree. Another way to look at it is as a call stack. So we have main calling affiliate payout, which calls figure payout. Now let's say figure payout also calls some calculations. If we find that at the end of one of those calculations calls an action, like this isn't possible, right? Because if the calculation called an action, it would be an action, and all the way down. So the whole thing would be red. So what you find is that there's going to be some last action on the stack. And above that action, there's going to be only calculations. These kinds of regularities, where you have these rules that you can apply and trust, to me they mean that these categorizations are something. They mean something, that they have real applicability that you can reason about them like this. The spreading rule, this effect on the stack, indicate that these categories are useful. There's a refactoring that we do as functional programmers. I call it extracting calculations. Let's say we work at this company called coupon dog, and they send out a weekly newsletter with coupons for products you might want. And here's their code. It's going to fetch some coupons from the database. It's going to get all the subscribers, and it loops through all the subscribers and sends each one an email with the coupons in it. We can highlight our actions just so we know. And we see that the stuff reading from the database, obviously actions, and then the email system that send is an action. And we might look at this and say, it's all actions. There's no calculations in there. But we can actually pull some calculations out. Sorry, yeah, pull some calculations out. And if we make that its own function, then we've got less code in actions. It's easier to test this email for subscriber, because it's just a pure function to calculation. And so now we have less hard code to test. Another thing we might do, this is not the same refactoring. But we might do this instead of looping through, see what we have here is we're looping through the emails, each subscriber, and constructing the email and sending it. As a functional programmer, I tend to do this. I tend to make all the emails first. And so that could actually, I could actually pull that out into its own thing. It's kind of a one liner, so it's no point, really. But I might pull it out just to be able to test it. And so we construct all the emails up front and then loop through them and send each one. So when I show this, I get a question a lot, very commonly. It's isn't it inefficient to create all the emails up front? What if you have billions or millions of users? Well, if you have that many users, congratulations, first of all. But I just want to point a couple things out. One is we were already, even before we did any functional refactoring, we already had the problem. We were reading all the users from the database to begin with. So if we had millions of users, we had all the records for all the users in memory already. It doesn't make it better that we're also generating all the emails, but it was already there. This problem was already there. But we'll solve it and we'll do it in a functional way. So the standard way of dealing with something that doesn't fit into memory is you can fetch one page at a time. So you've fed a small fixed size subset each time. So we can modify our send issue by, instead of reading all the subscribers, we just read one page, started page zero, read it. Then while we're still getting stuff from the database, we do the same logic we did before, but then at the end, we increment page and clear, I'm sure. But notice we didn't have to change our calculation. Our calculation, we're still generating the email in exactly the same way. And you'll notice that send issue and this issue of having too many users is itself an action. If you apply this rule of thumb that I talked about, the when or how many times it's run, this optimization depends on when it is run. So if you run this function, sorry, this action, on your first day of your company, when you have one or two users, you're going to get a different result when everything fits in memory. You're going to get a different result from today when you have millions of users. So this was a problem in actions. And so that's where we solved it. We solved it as an action. Okay, so we've checked off pure functions. Now we're going to go on to stratified design. Same code, but now we're going to look at it in a different way. We're looking at it as a call graph. And we're going to look at it as a directed acyclic graph. So a tree, I'm just going to ignore like if there's any cycles or anything. Usually there's no cycles. So we have send issue at the top and we're going to make sure all the arrows point down. So it's nice and hierarchical. Send issue is going to fetch those things from the database. It's also going to send email and it's going to calculate that email for a subscriber. And each of those things has things it calls. I want to focus on the green part where the email for a subscriber is doing some string concatenation operations, it's doing other stuff too. But that string concatenation is using string and it's also doing some kind of object stuff. Now if I squint, I'm using my design sense here. There's like a dotted line there that I see that the stuff under at the bottom, it's about strings and objects and concatenation. It's very generic. It has nothing to do with my emailing coupons domain. But email for subscriber, just by the name alone, that's written in my domain terms. So it's at a different layer of meaning. So I draw a line there and we're starting to see these layers form. And as your code gets more developed, I know I didn't have a lot of code. It's like two functions. But as you get, your system gets bigger and bigger and you're writing, you're refactoring and pulling out important functions and naming them, you'll start to see on the call graph these layers form, these layers of meaning. And that's what we mean by stratified design, stratified meaning separated out into layers. We see this in kind of every domain of human endeavor, I'll say. So this is Swedish cuisine. So at the bottom, we have this base layer of chemistry, modern cuisine tends to put that like at the bottom, like there's these chemical things we have to understand, like protein, heat, etc. And then on top of that, we build these fundamental cooking techniques of chopping, stirring, applying heat. And then for the particular, and that's very generic, right, you could go to any country and their cuisine would have chopping and stirring and stuff in it, right? It's very generic. But once you start getting into a particular cuisine, you start getting into their building blocks, stuff like re-edning and long coke. Thanks for that, Giggle. And then on top of those building blocks, those techniques that are typical for the cuisine, you start having dishes. And I won't try to pronounce those. So if we had a pizza shop app, we would see the same kind of stratification happening. At the bottom, we have JavaScript, and then these libraries built in JavaScript. Then we have e-commerce ideas, like products and shopping carts and prices, coupons, discounts, that kind of thing. And then we would have a domain layer, a semantic layer about pizza shops, about how to order a pizza, like how to describe an order with the different toppings that you put on your pizza. And then your app at the top is going to be like, how do I make the gooey so that you can modify that order and submit it and all that stuff? I'll give another example. Like, let's say you have to do email address validation, and you're using Regex, you would be able to stratify it like this. At the bottom, you have your basic type, it's a string. And then there's string operations built on top of that. This would be like, substrings and indexes into the string. And then on top of that, you build a Regex engine. Usually it comes with it. It comes with your language. And then out of Regex, you'd write an email validation function. This stratification happens at all levels all over the place. All right. So I have a question for you. What is the most reusable? So we're stratifying, right? We're finding a vertical gradient, a vertical spectrum from top to bottom. What? And if this has any meaning to it, then we're going to see some regularities that the stuff at the top tends to be more x, the stuff at the bottom tends to be more y. And the stuff in the middle is somewhere in between. So I'm going to ask you a question. What tends to be more reusable, the top or the bottom? Shout it out. Bottom. Anybody say top just before I, okay. Yeah, I agree. The bottom. Especially since we've put JavaScript there, like, obviously, that is more reusable than some kind of GUI that we built out of it on top. Okay. Which one changes more frequently? Top or the bottom? Top. Yes. It probably changes more frequently because it's easier to change. If you change something at the bottom, everything above it might break. But stuff at the top doesn't have anything calling it. So it's easier to change it without breaking everything. Okay. Last one. What is more valuable to test? I think this one is harder. Is it more valuable to test the stuff at the top or the stuff at the bottom? Top. I hear a couple of people say that. Both. Okay. So there's arguments both ways I'm going to try to give them. The argument for the top is that if you test something at the top, you're also testing everything it calls and everything that calls and everything that calls. So one test tests all this stuff. The argument for the bottom is that if I test something at the bottom, since it changes less frequently, that test is going to last longer. So it's actually more bang for your buck to test the stuff at the bottom. Plus, you're going to know if that test fails, you have a more specific thing that you know failed. I tend to be in the bottom group. If you have tests for JavaScript, everybody who uses JavaScript is going to benefit from them. So it's more worthwhile. Whereas if you test your gooey, like how it gets laid out, and then that changes next week, you got to change the test so it wasn't that worth it. Okay. So now that I've given that argument, do you believe me that the bottom stuff is more worth testing or the top? All the, you got to test all the layers. Okay. Do you test your gooey? Yes? Okay. All right. Cool. All right. So I'll just leave it like that. That's my opinion. I think that it is up for debate. Okay. One more question. What is more general and which one is more specific? The top or the bottom? Top is specific. Good. And the bottom is general. Yeah. Awesome. Okay. We've mastered stratified design. And now I want to show how it all leads to the onion architecture with no or just just naturally. So when I talk about layers and I talk about architecture, I have to show the traditional layered architecture to show how it's different. Who's familiar with this kind of model where you have a web interface, then your application logic and then database at the bottom or show of hands? Yeah. Yeah. Okay. Good. This is very, very common, commonly drawn. We can do it as a call graph just to be able to compare with what we had before. You have this web handler and it has these different operations. I just put two, but imagine there's hundreds of them. And then those all talk to the database and then finally generate a response through the web handler. Let's move it over to the side and we're going to show a more functional way to do it where we're having more calculations. Now, remember, if we have calculations, we can't call the database at the bottom. We just can't. We have to move it over to the side. If it's, if we put the database at the bottom, everything becomes red above it. So we pull it over to the side. Also, the email server, all that stuff gets pulled over to the side. So our domain operations, we try to make them pure functions, calculations. So all that stuff becomes green. And then we can draw these diagonal lines. So the red stuff goes in the interaction layer. And we have this slice of our domain operations. And then on the bottom, we have what I'm calling the language, the language layer. It includes the libraries in addition to the JavaScript. And then we take those lines and we make them a circle and we make them circles. And so we get this onion shaped multi-layered concentric circle thing. So our interaction layer talks to the database, talks to the email server. It's getting web requests in. And then inside of that, we've got a domain layer and then the language layer. The same rules apply that all of our arrows point in so that interaction layer knows about our domain. But the domain doesn't know that it's being called by the interaction layer. Same as for the language, JavaScript doesn't know what you're doing. It doesn't know that you're building a pizza application. So this is the onion architecture. It has other names besides that. It's got ports and adapters. Don't know why. Hexagonal architecture. I do know why it's called hexagonal. It's a dumb reason. The person who invented it on their slide used a hexagon because they tile well. And so he could put a bunch of them on a slide to show all these little modules that talk to each other. And they looked good on a slide and he used a hexagon. There's no reason structurally why it has to be a hexagon. And then also there's this idea of a functional core imperative shell. So our domain model, remember everything below that first little diagonal line was functional. And so that's the core and then you have the shell on the outside that's doing all the messy and imperative stuff. Okay, so let's go over some questions. Before, well, I could take questions now. Before I go over my standard questions. Does anyone have a question about onion architecture? Yes, in the back. [inaudible] Right, okay. Sure. Right. Right, okay. That's a good question and I'll repeat it. So isn't it a waste of time to test the basic JavaScript operations and stuff? I would say on my team, yes, it's a waste of time. I do hope that the JavaScript developers are writing tests, you know, for the V8 or whatever JavaScript engine that I use. So I was probably unclear. I still think it's worth testing JavaScript. It's just not my job, right? I'm just going to assume that it's well tested and but I do think that it's super valuable because any test that they write because JavaScript is so popular is going to be valuable to everybody. So that's what I meant. I don't mean you have to test it. I just mean that the test for JavaScript is valuable. Oh, so and so like when is when you ready draw the line? Yeah. A library, you assume that they test it. Right. And so this isn't a functional programming question but I'll answer it from my personal opinion. I test the code I write. I don't test other people's code. And I do read the code that I pull in also and make sure it's got tests. So thank you though. Okay, so common questions about onion architecture. Any more questions? Okay, good. Thank you for that because I was I was unclear. All right. So here's a question I get every time I present the onion architecture except today. What if your domain rule needs to ask the database for more information? I get this all the time. And I think it's there's a bunch of things going on. So I'm going to I have to answer it in like three different times three different ways. So here's what the question is saying. Like I've got this interaction layer. I've got the domain layer. I've got the language layer. I didn't write the language layer. And the interaction layer is talking to the database. But so is the domain like it needs to know some data from there. Okay, so what I ask is, is it really a domain rule? Because we use these terms a little too loosely sometimes business rule domain rule like. All right, here's an example. We have some code. We have this incomplete migration of our image database. So we have a new image database. We try to get an image ID 123. And if it doesn't find it, we just get undefined back, then we're going to look at the old database. Okay, it's common pattern. Nothing wrong with it. But this is not a domain rule. This is like something else. Business rules, something else. Our domain terms are product image price discount. We do see image in there. But we have these non domain terms too. Database old new. That has nothing to do with the domain of e-commerce, right? Domain of e-commerce is the stuff on the top. This is some something that belongs in the interaction layer. Another example of something where you would think you need to get more information from the database. So we have this calculation at the top. It's a pure function called generate report. It just does a bunch of string concatenation. It takes all the products and builds this big string. So we have this line where we fetch database, fetch the products from the database for last year. And then we generate the report from it. And then a new requirement comes in and it says, well, we want to include all the discounts in our report. We're just putting the price and the name, but we forgot to, you know, sometimes there's a coupon code or something and we're not putting that in there. But the stuff we're getting from the database when we do DB dot fetch products, it just gives us the ID for the discount. It's some string or it gives us null. So we have the name, the price, and the discount ID is some optional ID. So how do we get the actual data for that discount? Well, we could go into our calculation and in this code at the top where we're doing all the string concatenation, we could fetch it from the database, but then it wouldn't be a calculation anymore. So what do we do? So at the top, I've assumed in my product that I've got the discount or not if it doesn't have one. I still fetch the database, fetch the products from the database. But then in the middle, I'm going to map through all of them and do the effect to the discount database and add it to the product. And so then I still call my pure calculation at the end. Okay. Is this clear? Yeah. All right. So we're still doing all the database fetches up front, and then we're generating our big report. So that's one way to avoid putting database calls in your domain logic. This is the final way I'm going to explain it. We've got to not overcomplicate this. It is a natural effect of stratifying and turning stuff into calculations and moving all your actions to the side. Then you just draw your diagonal lines. It is just a mental framework to help us organize our code. If it doesn't work for you, you just work around it. It doesn't have to be some difficult thing. If it's easier to write your code so that you have actions, like you take a calculation and you turn it into an action, that's fine. You just move it over to the side and it'll be okay. Right. Natural division. These are just trying to illustrate this again. It's just a natural thing. When you've got your call graph looking like this, you just draw the lines. That's all the onion architecture is saying. I think a lot of people think of it like this, that the domain is this brain in a jar that's making all these decisions. It needs, because it's a pure function, it needs some way of telling the interaction layer, "Hey, it's time to save this value to the database or send this email." The brain is talking to this henchman, this ego or a guy who's doing its bidding. It's actually the opposite. Remember, the application starts at the main with red, with actions. Your application actually is an action. It just calls out down some things it's going to call our actions and some things it's going to call our calculations. We can see this again in this other example. I just want to make it clear that your application is the onion. Your application isn't the core in the middle and then somehow there's some communication going back and forth. You should think of it more like your domain model, your domain calculations are a language that your application uses to help describe the problems it's having and wants answers to. That's onion architecture. Here's that slide again. You can get to the book with that link at the top. There's a code, groctev23. It'll give you 40% off. They do charge shipping, but with that 40% off it's cheaper than getting it on Amazon and getting free shipping. If you're interested in more ideas like this, you can go to my site, have a podcast and a newsletter and get on those. Thank you very much. Any questions? Yes. >> I do it both directions depending on the situation. If I know I need a feature in my existing app, I'll often start with putting the button in the GUI and clicking it and making it do a thing and then realizing, oh, this could be a calculation. Sometimes it's more complicated. I want to play with it and test it first. I start with the insides before I wire a button up to it. Yeah, it really depends. I would say it's not just testability. Testability is a big one. Reusability of your code. Calculations are just inherently more reusable. They're reusable because they don't send the email. You can use it in places where you don't want to send the email. And then the change frequently, we all have to modify and maintain existing code. Being able to stratify it so that the stuff that changes more frequently is easier to change is another benefit. Clear decoupling. >> Sure. That's right. That's right. Yes. >> No, I think I do. Yeah. No, I tend to put my code together semantically. If there's some function that's used the example of these before, construct the email, which is a calculation, but then there's a function that sends an email that's an action. They're both about email, so I'll put them in the same email library module. Thank you very much.