All about the data lens

In this episode, I introduce the data lens, its questions, and its goals of capturing relationships among data values in data.

Transcript

[00:00:00] All about the data lens.

[00:00:03] Hello, my name is Eric Normand, and this is my podcast. Welcome.

[00:00:09] So, as you know, I'm writing a book about executable specifications as an approach for domain modeling. And I'm organizing the content into the lenses. There's seven or eight of them. And each one gives a different perspective on your problem, on your software, so that you can gather more information and make better design decisions.

[00:00:42] And today I'm talking about the data lens.

[00:00:47] The main question in the data lens is what are the relationships between data values? So here's an example. In our coffee shop, we have some sizes. We have small, medium, large, and so these are data values. We could use a string, we could use a keyword, or we could use some other type that our language provides, whatever, an enum, something like that, to have these different values.

[00:01:21] Small, medium, large.

[00:01:24] Likewise, we have three other values. These are the roast. Light roast, medium roast, dark roast. Okay. If we write 'em all down on a piece of paper, we can kind of start grouping them into different categories, different relationships. So the first grouping, the obvious one, is that all the sizes go together and all the roasts go together.

[00:01:57] Now, what is the relationship among these data values? What is the relationship inside that group of the size. Inside the size, you have to choose one. A cup of coffee can't be small and medium. You have to choose one. And these are exclusive. There's no other ones that you could choose. Small, medium, and large. Choose one.

[00:02:27] I've started calling that relationship an alternative. So the size is an alternative because there's many choices and you have to choose one of them. In this case, three choices, and only one is possible. We see the same relationship with the roast. There's light, medium and dark roasts, and you have to choose one, and so that's an alternative. You choose one of the many possible options.

[00:02:59] But then we're not done because what's the relationship between the values in the size group and the values in the roast group?

[00:03:13] Well, it turns out if you, if you analyze it, that you need to choose one of each. You choose a size and a roast so you can get a large medium roast coffee I'm calling that a combination cuz you're combining the roasts and the sizes and you're getting a new set of possibilities.

[00:03:41] This is pretty basic stuff that we do all the time. What I'm trying to do is give it names. And show that there's more advanced things that we could actually be doing that we're not. If we just went back to basics and said, what are we actually doing? Well, what we're actually doing is we're looking at the relationships between the data values.

[00:04:08] So a more sophisticated example is something like all of the add-ins. Because you can have zero or more add-ins. You can use the same add-in multiple times. You can have like two espresso shots, not just one. You can have two shots of hazelnut syrup , whatever flavors you wanna add. And you would also have zero.

[00:04:35] And so now it becomes much more sophisticated, there's a lot more combinations of things you can have. Zero, you can have one of the choices, you can have two of the choices, but with duplicates. So the combinations become much greater. But still we're doing that same analysis.

[00:04:57] What is the relationship among these values and how does it relate to the other values in that same domain?

[00:05:05] One of the things that we're gonna be doing in this lens is giving a bunch of examples of different kinds of relationships that are very common. These aren't an exclusive set of relationships. Some domains have different kinds of things. We just can look at the common ones.

[00:05:31] Not only do we see the relationships, but we have to choose how are we going to encode it. And so we need to look at our programming language with fresh eyes and come up with all the ways that we have to encode any particular kind of relationship.

[00:05:55] So this alternative relationship, you could use strings. Maybe you do run time checking. TypeScript lets, you have like a type that is an enum, it's like a set of strings that you could use. I think TypeScript even has a thing that's explicitly called an enum. Java has enums, you know, there's different choices, there's different ways of encoding this, and we need to pick one.

[00:06:27] This lens doesn't really give you that many tools for paring down those options to make the decision easier. It's mostly about seeing what are all the options. The other lenses give you more constraints. Other lenses like operations. Once you start seeing, hey, we're gonna be adding and removing add-ins a lot, like maybe we shouldn't use a list because then it's gonna be a linear time operation, you know , things like that. Maybe that comes up, but it comes up from another lens. This lens is mostly about expanding your ideas about the possibilities for how to encode a thing. The lens volatility tells you whether it needs to be a data value, cuz you have to store it in the database, or it should be something that's much more of a static construct, like a class. You could have a class called small size. And a class called medium size that extend the interface of size. Right? Because, you know, well we're gonna be adding sizes later, but not that often that they need to be like a string in the database. But we wanna have them customized.

[00:07:56] The point is that subclasses or classes that all implement the same interface is a way to implement alternatives, and we can look at our programming language anew and see it that way as this relationship among data values in our domain is the same as the relationship among subclasses of an interface. And so that gives us the opportunity to use it to encode that.

[00:08:33] Now, all this stuff might seem really basic. It does to me. I'll, I'll just be honest. It does seem very basic to me if I'm being honest. But when I talk to some people about it, it seems like this is not obvious to them. And so that's a great opportunity for me to write it down and figure out what exactly am I doing when I'm doing this and teach it to them.

[00:09:11] I've had debates with people who were very into object-oriented patterns, you know, the classic design patterns and. They propose things like using decorators to solve this kind of problem. So you have all these add-ins, you have sizes and roasts, and they say, we'll have an interface called coffee. We'll have a class called regular coffee, and that's gonna have a certain size and roast and no add-ins. And then we're gonna have a bunch of wrapper classes, decorators, that implement the same interface. They implement the coffee interface, but they modify the quote behavior. This is where I think that they just went off the rails. There's no behavior being modified except stuff like maybe the price changes when you change the size or the price changes when you do add-ins, but that's not really behavior. That's simply the price. But they would say, well, if you want a large coffee, you would wrap the regular coffee with a new object called large. And if you wanted to put soy milk in it, you would wrap that with a soy milk add in wrapper. This is how you have this very open and flexible way of modifying coffees without having an explosion of classes, because otherwise you might think you need something like a small, medium roast soy espresso shot class. So you'd need basically every single combination of coffees as a class.

[00:11:22] So to solve this problem of too many classes, because there are a lot of combinations of coffees, you use this wrapper thing. Well, it just sounds bad intuitively to me, and I've had to really dig down deep into my intuition and try to figure how can we objectively say that this is a suboptimal solution, when there's a quite simple one.

[00:11:53] The quite simple one is the data modeling approach I've already gone through where you have basically a class called Coffee that has three fields. One the size, one the roast, and one the add-ins, which is some kind of collection. Then you're done, right? You don't have to do all this class hierarchy and wrapper objects. You don't have to do all that stuff, but why is mine better?

[00:12:24] Objectively, there's two things. One, look at the number of combinations that are possible.

[00:12:37] In the data modeling approach, the approach that I like, the number of combinations of coffees that you can construct in your model, that you can code, is very, very close to the number of different coffees you can actually make. And that's the goal. You want to get it close. So we've already said a coffee can't be small and large at the same time. And we've encoded that. And a coffee can't be light roast and dark roast at the same time. We've encoded that. And if we choose our collection correctly, we can ensure that we can have multiple add-ins with the same type. In terms of the number of possibilities, we don't have any possibilities that don't make sense or that maybe are duplicates of other possibilities.

[00:13:43] But when you've got this wrapper object thing, there's nothing in the language or anywhere in the model that says that I can't wrap a large coffee with a small size. So now I have basically conflicting wrappers. One is making it large, one is making it small. There's nothing saying I can't have a light roast and a dark roast at the same time.

[00:14:16] You would have to come up with some rule, like, oh, well the last one you wrap it with is the one that counts. Okay , but still you're able to construct different ways of representing the same coffee. You've just exploded the number of ways you can encode it, but the number of things you can encode is the same.

[00:14:39] So you've got all this ambiguity now. Which is the right way to do it? Do I have to normalize things? You've got infinite number of times more things that you can encode than things that should be encodable in the model. Okay? That's the first problem.

[00:14:56] The second problem is once you start getting into the operations, which is a different lens, but let's talk about it now because it's fresh. Once you start getting into the operations, yes, maybe this decorator pattern can help you calculate the price, right? And it can maybe help you like print out. Remember what our problem is. We went over this in the last one. We're trying to encode an order so that we can print it out and give it to. The barista. So that's another operation we need to do is to print it out. Maybe this can help you print it out, but now I'm starting to think, well, this is actually kind of hard because if I wanna make a coffee that has two soy shots and one espresso shot, there are many ways to encode that now. I can wrap that thing with soy and then wrap it with espresso and then with soy, like is that gonna print soy, espresso, soy? Because what I wanted to do is print soy X two for two times soy shots and then espresso. Because I know in my kitchen that the soy comes first, it's on the left and then the espresso is over here. I don't want them to have to like put the soy walk over to the espresso machine, pull an espresso shot and then say, oh look, there's another soy, and go back to the soy station. I don't want them to have to do that. That kind of thing becomes very hard to do because the wrappers are all wrapped up.

[00:16:44] How do I remove an espresso shot? Yeah, if it's the last wrapped thing, it's easy. So I guess undo becomes easy cuz you can just unwrap the the last thing. But what if I wanna remove the espresso shot when it's three deep? How do I do that? That now seems like, wow, that's gonna be really hard to figure out, because I have to like unwrap it and then rewrap. It becomes hard.

[00:17:17] Now we get into stuff like use cases we haven't really discussed, but that are likely to come up like, we actually wanna do accounting on this and figure out how many people use soy. Well, how do I figure out if a thing has soy in it? I have to unwrap, I have to like go through each coffee and unwrap it.

[00:17:42] How do I store this object onion thing, this layers of wrapped objects, in the database? There's all these problems that come up. To me, it's just not a good solution. The decorator pattern is often a good solution. It's just not a good solution for this, and especially when you have something as straightforward as the data model.

[00:18:10] Now when I bring this up, they're like, well, data modeling is not easy. It's not straightforward. And so that's why I think I need to spend a good amount of time discovering how to data model from my own experience, and try to find experience from others and teach it, right? The challenge is to teach it because I don't think it's that hard and I think it's missing.

[00:18:37] I wanna include in this lens the question about the analysis about states that you can represent. How many ways can you encode something? How many things look different from an encoding perspective versus how many things can you actually encode in your model, right?

[00:19:01] So there's this simple analysis. You can just count the states, like that's the first level. Just count them. How many states can we encode with this decorator pattern thing? It's like infinite, cuz you can always wrap again. There's nothing stopping you. And how many times or how many things can we represent in our model? And I haven't done the math, depends on how many add-ins you have, but it's, it's like on the order of thousands. There's obvious difference there. So just that the count can often give you good information about whether your data model is decent.

[00:19:43] The second one is a little bit more sophisticated. You look at the Venn diagram. What states can I represent in this model? What states can I not represent in this model? Cause I've left something out. Like for instance, maybe you've left out that sometimes you can have an error in your system. And oh, there's no way to represent that we got an error. Right? You forgot that. So now that goes in that part of the Venn diagram.

[00:20:17] And then there's stuff that you can represent that doesn't mean anything or that duplicates something that's already meaningful. In this case we have stuff like what's the difference between a coffee with espresso and soy and a coffee with soy and espresso? They're the same coffee. The customer doesn't care, but somehow you're able to encode them differently. That shouldn't be.

[00:20:44] So there's these three sets. There's the overlap, the good stuff. You wanna maximize that. And then you wanna minimize the other two. So you want as close as possible, a one-to-one fit between the actual domain and the relationships within that domain that we've analyzed and the encoding.

[00:21:06] So let me summarize the things in this lens.

[00:21:10] One is this analysis of the relationships among data values.

[00:21:15] There's gonna be a bunch that are common and easy to spot. If you happen to see it, like, boom, I know that's an alternative. There are other kinds of relationships that you're gonna have to custom build, but there's gonna be a bunch that are so common you should be able to spot them right away.

[00:21:40] Once you've done that analysis, now you need to analyze your programming language and look for constructs that either already have exactly that relationship or allow you to build that relationship easily. You're gonna look at the ways that you can make different data types, the ways that you can build collections and maybe make a new data type that represents that relationship exactly. But look at your language with fresh eyes and see the relationships. And then try to find a construct that exactly matches the relationship or closely matches. They're not always gonna be exact, but closely matches the relationship among your data values in the domain.

[00:22:31] And then finally, how do you analyze the number of possible encodings that you have, compare it to the things in your domain, and use that as a kind of a quality metric to help you decide what kinds of encodings you should be doing or whether you're on track. Other lenses can also help you pare down the number of choices you have, like maybe strings in the database won't work for your situation because of some operation thing that you'll find later in a different lens.

[00:23:07] Okay. My name is Eric Normand. This has been another episode of my podcast. Thanks for listening, and as always, rock on!