The Art of Domain Modeling

I was delighted to speak at Re:Clojure 2021. Here is the recording of my talk.

Transcript

Hello, my name is Eric Normand, and I'm going to be speaking about the art of domain modeling. I want to thank you for attending my talk, and I also want to thank the organizers for putting on this conference. I know it's a lot of work. I believe that modeling the world is a human way to understand how things work. From the beginning of humanity, we have created models. We imagine what an animal might do so that we can better hunt it. We tell stories about situations so we can learn from experiences we never had. We use science to develop models of many aspects of the world, and now we have computers to build and test models very quickly. Why do we model? Models help us see further. We can only experience the present. Models place the present into perspective with the past and the future. For instance, if it's raining today in the present, we can place it in this larger cycle, most of which we don't experience very easily directly. Models help us learn and understand cure disease and feed more people. And from a capitalist, maybe economic standpoint, better models help us create more value and lower costs. One way software adds value to a business is through new features, so we can translate as more value means new features, and since maintenance is the most expensive cost of software, the primary way to lower cost is to require less maintenance. Unfortunately, both the speed at which we can add features and the cost of maintaining the existing features are unknowns until after we do them. We can estimate, but we get the estimates wrong. In business, we call this a lagging indicator. You can't control the metric directly because it's too late to change it by the time you measure it. So intelligent people in the software industry have developed a leading indicator for maintenance cost and feature speed. They call it software design. Software design is a bunch of rules and advice. Sometimes it's measurable of things you can use as a proxy for cost and speed. For example, software design might recommend functions of no more than five lines of code, meaningful names, and low coupling and high cohesion. The argument goes that if your code is messy and hard to understand, it will be harder to maintain and modify. I believe that. It makes sense to me. But why does our code get so messy in the first place? Is it because we're lazy or under intense pressure for management? Maybe. But I also believe that good software designers have an intuition for modeling that they don't know how to express. They can give us rules of thumb, but those designers, their software, isn't just short functions and good names. There's something more that they're not telling us because it's very hard to figure out their expertise. I'll explain this. I want to tell a story. Maybe it's a fairy tale, but I think it has some truth to it. Here's a diagram showing the geocentric model of the universe. It's neat and tidy. Everything is named well. The lines are all clear. You can follow the lines all around the circle. And it works pretty well. At any point on the calendar, you can figure out where the planets will be, mostly. But the diagram is complicated. There are all these curly cues and crossing lines. If you want more accuracy, you need more curly cues. And a chart like this requires lots of maintenance. Management goes to the first programmer. There's three programmers. Goes to the first one and says, "We need a new feature. Please add another planet." The programmer responds, "Well, that's going to take a long time. Can't we clean it up first?" So management asks the second programmer, "Can't you clean it up a little?" And that second programmer, she responds, "I've been to the clean diagramming workshops. We use all the best practices, like using sharp pencils and clear labels. We could get rid of some lines, but then we lose accuracy." Finally, management hires a software design consultant and asks her, "Please help us with our technical debt. We can't add new features quickly, and it costs a lot to maintain our diagram." The designer says, "Well, what if we put the sun in the middle?" The programmers rejoice. The diagram is easier to maintain and easier to add planets, and management is very happy. I hope the moral of this story is clear. The model makes a huge difference. I've seen this happen before, not this particular model, but I've seen it before. A team follows a software designer's rules, but when that designer sees the code, they think, "Wait, there's a better design than this." And then the team says, "But we've followed your rules. We've got short methods. We've named everything well. What else could we do?" And it's a bit like talking about how sharp your pencil was when you drew the lines or how good your labels are on your diagram. One of the issues is that most software design advice is about the code. It's all about long methods or too many classes. It's all about the code. The advice fails to look at the system you are modeling. They never even mention that the thing at the center might make a difference, whether it's the sun or the earth. They just talk about how what your code looks like, and I think that's the main problem. I believe that the quality of the model is a leading indicator of software design. So a better model leads to better software, leads to lower maintenance, and faster features. Now, I guess I'm taking it upon myself to figure out how to make better models. I think it's stuff that people already know that they're already doing, that you gain with lots of experience, and some people are better than others, and I want to figure out what they're doing and somehow write it down and systematize it and make it accessible for others. I'm still very early in that process. I'm going to introduce some ideas in this talk that are not 100% done yet, and all follows this one principle. Iterate early while it's cheap. Another way to put it is let's focus on the domain complexity and get it right before we add in all of the other stuff--architecture, language, implementation that all adds complexity. So let's work on just the domain first. To do that, we build a domain model. It's an abstract specification of how a small part of the world works. A small part of the world, that's the domain. Abstract means it leaves out unnecessary details, like what database it runs on or what language you write it in, and specification--well, that just means it has to be complete and correct. Domain models have these three parts. So at the bottom right, we have information. That's the relevant information that you need to capture. It's the operations on that information and invariance about what must always be true within the model, and we're going to go over those a little more. At the risk of being too concrete, I've divided it up into three phases, the modeling process. There are phases that kind of go from more abstract to more concrete, and they're put in this order because, like I was saying before, if you iterate earlier while it's in an abstract phase, it's cheaper. If you can learn something at the abstract phase, the cost of that learning is very low, but the value is still high. As you go into implementation, you start learning stuff. It's harder to change, becomes costlier to do that--to re-implement that learning. It's not a strict ordering of phases. You can also think of them as different perspectives that you can take at different times. You can move around in these phases. You don't have to move strictly forward. In the first phase abstract, you work out what your operations will be in an abstract way. You don't implement them. You work from the signatures. We're going to use an example domain of a pizza shop. Here is the kind of thing that you're working with in this abstract phase. Your operations are just going to be your function's signature. That means just the name, the arguments, their types, and the return type. Here I'm using closure. Notice I don't have any implementations. It's just the first line of the function. There's no way to notate the return type, so I'm just going to put it as a comment afterward. Of course, also the names of the arguments are implying their types, because we don't have types and static types in closure. There's no way to annotate that. If you're using something like Haskell or TypeScript or Java, you have types, use them. They're a very nice way of working with just the function's signature. I've given two example invariants you're going to have many more. Notice that these invariants are talking about the behavior of the operations, especially when they work together. Notice we're already starting to answer a lot of questions about the model with just these two invariants. Of course, you would have more than just these two. An example of a question we're answering, this one is saying, this invariant is answering the question, can we have toppings with no cost? This is saying no. The cost of a pizza is less than the cost of that pizza with a new topping tea. This is for all pizzas and all toppings, this will hold. What this is saying is that toppings always add cost to the pizza, which is a reasonable assumption to make about your model. It's also reasonable to make another assumption. You could say that, well, they're less than or equal to. Some toppings are going to be cost zero, they're free. You could also say, some toppings will reduce the cost of a pizza. Then you wouldn't really be able to say much at all about all toppings. This one is saying that, well, if you add a topping and you remove a topping, then it's like a no-op. It's going to be the same pizza as before. It makes sense. In this phase for the information, I'd just like to keep it very minimal and just make notes about what concepts we need to track. Here I've noted that we need a notion of cost because we have this here, we need a notion of size, we need the pizza and the topping, of course, and then we have to have a quality for pizzas and less than for cost. I could actually write them up in the operations, but I've just put them down here, just as kind of like a general notes to myself. We don't want a data model yet, it's too early. We just want to work at this high abstract level with the operations and how they relate to each other. The primary measure of a domain, of the main model is fit. That is how well the model models the phenomena in the domain. One question you would ask at this point to determine fit is, can I make every kind of pizza in the model that I want to make in real life? That's a question of fit. Then you can ask if it's simple. This is a much more aesthetic quality. You can ask, is it simpler with the sun in the middle or with the earth in the middle? What about with Venus in the middle? In terms of pizza, do we want to limit the number of toppings? Maybe we can solve that problem at another layer, a layer on top of this one, so we're just going to push that concern out from this model. That's like a business concern, how many toppings you can have for pizza. You can imagine a pizza with 200 pounds of mushrooms, is that reasonable to make? Probably not, but like I said, we're going to solve that problem in another layer. In the next phase, we build a very simple implementation of our operations so that we can run scenarios and learn from them. In this phase, the run-able specification phase, we implement cheat functions. We take all of our function signatures and we build these in-memory representations of them and we reuse operations that our language gives us. We just do it really simply, really easily, whatever we have at hand. What we don't want to do is bring in stuff like a database at this point. You don't want to make Ajax calls. It's too early to build in that kind of complexity. This is something very easy to work with that's well understood. Works in memory so you can run it a million times if you need to. For our invariance, we're going to take them and turn them into tests. Here I've made a property based test where I'm generating pizza. For all pizzas and all toppings, if we add the topping and then remove the topping, it's the same pizza. Same invariant as before. In the information, we're going to just apply basic data modeling, which is a whole book in itself, but we don't have time to get into that. You'll learn a lot at this phase. You can even try to build a pizza. You can tweak the spec if you learn something like, "Oh, I wanted to build this pizza and I couldn't figure out with the operations I had how to make it happen, how to build that pizza." You might learn that. Then you can go back and modify your operations and change the invariance, etc. Once you've learned all you want to learn from it, you can start to add other non-domain concerns. Just make sure they're still abstract. I put it in this phase. You could call it a different phase, but in this phase or this part of the phase, you want to add abstract non-domain concerns. If you want to run it in the browser, you already have that in mind, then you know you're going to have to do Ajax calls with the server. That's very concrete, Ajax calls on the server that's concrete, but a synchrony, now that's an abstract idea that you can start building in and making sure it works without going down to actual the concrete level. Same thing with concurrency. You don't know if it's going to run on multiple cores, but you can build in concurrency and you know that things might need to change over time, so you build in mutability. This is the phase where you might say, "Well, let's use event sourcing. It's a very abstract idea. It's going to just be a log of events. It's abstract enough to fit into this phase." You want to think about all this stuff now before things get complicated and you have to deal with actual deployments and things. Okay, then you have the implementation phase. You take your model and you build it for real in your language and architecture. You have tests already, so it should work for the first time. Well, yeah, right, I'm joking, but at least you can know how it fails because you have all those tests. You also have the runnable specification, which can be like an oracle to tell you how your real system, your real implementation should behave, so you can build tests using that. In your tests, there's many ways to implement invariants. It really depends on your language. If you have types, use those. If a lot of language features have their own built-in invariants, for instance, the synchronize keyword in Java means only one invocation of a method at any one time, even if it's invoked from multiple threads. Tests are good, especially property-based tests, and data structures can maintain invariants. For instance, sets don't have duplicates. You can also do runtime checks if you have to, and your brain can work out simple invariants too. You can write down proofs. You can put the invariants in documentation. Just remember that invariants that aren't run from your pipeline, your deployment pipeline, they don't work when you're not there, so you have to make sure to get them right. Okay, there's a lot to say about data modeling, but I want to bring up the idea that you can actually analyze the domain independent of the language features you're going to use to implement that domain, and some of the things that you will see in your domain. Alternatives mean it's a choice among several possibilities. For instance, each ingredient is a choice. Mushrooms or peppers or spinach. Okay, combinations mean choosing many things, and so they have a kind of multiplicative effect when you count how many possibilities there are. So the pizza can have several ingredients, so it's a kind of combination. So if you have three ingredients and ten ingredients to choose from, your pizza has three, there's actually ten choose three ingredients, possible pizzas to make, modulo the pizzas that are the same because you chose them in different orders. Okay, each ingredient has a cost, so you're mapping ingredients to a dollar amount, right? So that's a mapping, and of course there's quantities like the dollar amount, names like the toppings are going to have names, and then collections of things. For instance, an order is a collection of pizzas. You can look in the domain and find those things, and then later you map them to features in your language or database or however you're going to implement it. So when you're doing that mapping, you have to take into account one more on a high level issue, how often does a thing change, and I'm calling that volatility. So if a thing never changes, you can just hard code it, okay? I'm calling that closed, so we might bet that we will always have three sizes of pizza, small, medium, large, and so we code them that way. We just code it as, you know, there's these three choices, we use an enum and Java or something like that, and if we ever find out that we're wrong, like, oh, we want a fourth size, like an extra large, then we just have to modify the enum, we have to go change the code and do a new release, and it's rare enough that we're going to just eat that cost, okay? It's not that big a deal, and we're calling that closed, you just have to change the code. However, we know that we do different discounts monthly, so we don't want to have to modify the existing code to add new types of discounts because that could introduce bugs, instead we want to be able to add new code without touching old code. One way to do that is to implement a subclass in Java, in closure you might use multi-methods, and I'm calling that open, and then finally we know that ingredients can change daily. You can run out of mushrooms one day, or you can find a new vegetable that you want to make as an ingredient in the market, and it's only available, you know, this week. So we don't want to do a release for that, a daily release just to, like, say that we ran out of mushrooms, we don't want to go through that, so we want to make them runtime values, first class values that we can store in a database, and we can modify it at runtime. Alright, there's a ton more to talk about, but I think that this gives a pretty good taste of it, I hope it wets your appetite, if you would like to follow along with my exploration of these ideas, please listen to my podcast and get on my newsletter. I discussed these ideas there, I'm also interested to hear your thoughts on domain modeling. Thank you very much.