Collections in domain models

When do we use collections in domain models, and how do we think about the states they represent?

Transcript

[00:00:00] How should we use collections in a data model? Hello, my name is Eric Norman and this is my podcast. Welcome. I'm doing another car episode today.

[00:00:19] So I've been working on my book and I've got a whole table of contents. Now, obviously it might change. It's good for now. And one of the things that I need to talk about is collections in a data model. So what am I talking about? So we've been using the example of pizza. I think it's a pretty good, uh, system to work with because.

[00:00:56] It's a domain that we all understand pretty well. We've [00:01:00] all had a pizza before, and so it lets us jump straight into the encoding part of data modeling. Which is where you already know your domain. You already have your concepts pretty clear. You just need to write it in code. Okay? So in a pizza, you can have multiple toppings.

[00:01:26] In the example we're using, you can have. Up to three toppings and there's four choices of toppings. So how do you represent that? Well, one thing you can do is, is break it down by how many states are possible in this model. Right? You have a set of possible in, uh, toppings. For your pizza, and then you have this other choice of how many toppings you have.

[00:01:58] So one way to break it [00:02:00] down to see how many, how many possible topping combinations there are is to do it as two choices. The first choice is how many toppings you want. Do you want 0, 1, 2, or three? Okay. And that's an alternative. You can't have three and two, so you have to choose one of those. And when we talked about this already, an alternative means it's a subtype, which means you're gonna add the different, uh, sub components sub.

[00:02:40] All right, so once you've made this choice of how many ingredients you're going to want, then you have the choice of all of which ingredients those are, but those choices. Depend on how many you chose in the first step. So if you chose zero toppings, you just [00:03:00] want a plain pizza, then you have no choices to make.

[00:03:05] So there's only one state that you know, you've already decided it. If you have one pizza, then you have one choice to make, which topping. Or, sorry, not one pizza. If you have one topping, you have one choice to make. Which topping is it? Right? So if there's four choices, that means there's four possible states there.

[00:03:28] And just to keep track, we had one state from zero, and we have four states from one ingredient. And when you get to two ingredients, Now you have two choices to make. So that's four. And it's in combination, right? They're not, they're not alternatives. You have to choose both. The first one ingredient, uh, first topping and the second topping.

[00:03:56] And so that gives you a, a product type. So [00:04:00] you're gonna do four times four. That gives you 16. Uh, and please forgive me if this is maybe too detailed. It's so obvious. Uh, this is actually not obvious to a lot of people, and I want to be able to explain it to them. Uh, if you're already able to do this, then uh, this should just be a review.

[00:04:24] Okay. So then if you have three, you're gonna have four times. Four times four. And so then you add those all up and you get the number of possible states. And this is ignoring some little details that we can get to in another episode. Uh, basically there's the problem of like, well, if I choose. Uh, olive lives and mushrooms and, uh, artichoke hearts.

[00:04:57] Is that different from artichoke hearts, [00:05:00] olives and mushrooms? Uh, we've counted them twice, right? We counted it as two different possible states to be in when it's basically the same pizza, right? No one cares. when you're eating the pizza, whether the mushrooms are on top or what, you know, there's no order to the, to the toppings.

[00:05:20] So, uh, we haven't dealt with that. And I'm, I want to deal with that in a separate, separate episode cuz it, it, it's, um, it's a complication. Okay. So this simplified model where those pizzas are actually. , we have this number of states. Now you could represent this as I did, uh, talking about it as a choice, So an alternative of the number of toppings, and then [00:06:00] some combination of other of, of choices of topic.

[00:06:08] Right. So it's like this nested thing where if you choose zero, then there's zero. There's a combination of zero things if you choose one, there's a combination of one thing. If you choose two, it's a combination of two things. You could do that and you could precisely model exactly the kinds of, uh, states that are possible, uh, and, and stop at three.

[00:06:37] Uh, but usually what we do is we, when we have a a, a relationship like this, this kind of multiple choice thing, we use a collection. So collection already has a length as part of it, like a [00:07:00] size of the collection, and it can, uh, those, um, toppings the choices that you make kind of embedded in it. So, uh, if you need to choose three ingredients, well, you just make an array of three different ingredients.

[00:07:20] It already has easy excessors for getting at each of those three. You can just index into the array, you can get the length, and so it's much easier. To just use something that exists already. There's a whole bunch of functions already written for dealing with a raise, and so why, uh, why reinvent this a raise, handle it just fine.

[00:07:49] Now, one reason you wouldn't, you might not wanna reinvent or you might not wanna use a raise, is that they can do so much more than what [00:08:00] your. What your needs are, right? So in our pizza case, we only want up to three ingredients, and a raise can hold many times more of that than that. And so now you have this new problem of somewhere in your code.

[00:08:17] You have to limit the number of ingredients that go in, right? You have to have some way, some runtime check.

[00:08:31] You, uh, that you know, you're not, you don't have an array of four toppings because then you know, you've broken your, your, in your inva and you've broken your model. You've escaped outside of your model, so you need something else outside of that. Also, there's other things like, um, a array can concatenate.

[00:08:55] Right? Or you can just do things to them that is not [00:09:00] really part of your model yet. Like you have no, there's no notion, uh, in your pizza model of concatenating two, uh, sets of, in ingredients into a new set of ingredients like that doesn't exist, right? So you also have to start limiting. Uh, operations that already exist for arrays and to, you know, through discipline or something like, Well, we're not gonna do this.

[00:09:33] We're not gonna call sort, we're not gonna call reverse. You know, you're, you have to just start limiting what is, um, possible to be done on this array. Uh, it, when you're using it as a list of topic. Uh, and so those are the reasons you do it, but those are actually pretty easy to do. And if you kind of wrap up [00:10:00] the pizza operations, the operations on a pizza, like adding an ingredient, removing an ingredient, uh, and just say, Well, we're just gonna treat it not like an array, but like this topping list, if.

[00:10:18] Wrap them up and you focus while you're doing it and you, you just get it right. You test it well, whatever, uh, you can avoid, uh, other places in the code that are gonna treat it like an array and do whatever they want to it, which might not make sense in your pizza domain. Okay? So, uh, that is, Uh, wh when we will have a kind of one to many relationship, uh, we often jump straight into collections.

[00:10:55] Uh, there's different kinds of collections, and that's kind of what [00:11:00] I want to get to in another episode. That maybe you want collections that, uh, maintain the order or don't maintain the order. And this is kind of a, uh, another level of data modeling where you start to think about basically equality and how equality, uh, what does it mean for two pizzas to have the same toppings?

[00:11:27] Right? So basically they have equal toppings and does. Data model represent that, and am I really counting the states properly that I consider different states? Okay. We'll get to that in another episode. Uh, I also, in this episode want to talk about many to many relationships. These are typically hard to do with something like a, like a bunch of pointers, a bunch of [00:12:00] references, and an object oriented.

[00:12:02] Because the keeping the, the pointers bidirectional in sync is really hard. And that's typically what people try to do, that you have o object a well, let's, let's give an example. A, uh, you have a student who needs to register for courses at their university and. , obviously the course is gonna have multiple students, and so you have a student has multiple courses, course has multiple students, and you want to keep the references that they have to each other in sync.

[00:12:39] So it's kind of a one to many and a and in. Another one to many, but it's gotta stay. It's gotta stay. Correct. So you don't want a student who thinks they're in the course, but the course doesn't think the student is in the course. So how do you do that? Well, in using a, a [00:13:00] collection of references, it's actually kind of hard.

[00:13:03] Uh, it's, it's a, it's a hard problem to make sure that as a student de registers from the course, the course knows that the student knows it. It's, it's intricate, it's not impossible, but it's an intricate problem to solve. Uh, but what we do in a more functional style or a data modeling style is we use a collection that represents pair.

[00:13:31] So you would have a, let's say a set that is the registry for the university that has pairs of students and courses. And of course we're gonna name them. So we'll have like the student ID and the course ID in the pair, not the whole structure itself. And this registry just maintains a set of everyone who's.[00:14:00]

[00:14:00] Uh, everyone who's in, who's registered for courses, so it'll have student with id. One is in course seven, and student with ID two is also in course seven. And now all you have to do is basically do a look up. Like if you wanna know who are all the courses, in course seven, you just go through the set and you filter out for the ones that are, have a seven in.

[00:14:28] In the course spot. And if you wanna learn, uh, what all the courses for student one are, then you do something similar. You just filter through all of the twos for the, uh, the, the twos that have a one in the student spot and you're done. You don't have to worry about things getting out of sync because there's only one thing to maintain.[00:15:00]

[00:15:00] The thing that makes it, I think, uh, a little difficult for object oriented programmers is that they have this concept of ha a and a student has a course, translates very directly. In most systems of object oriented analysis into references, and if they have multiple courses, Well, it's a collection of references, and you have to make this leap to say, Well, we're not actually modeling the student, we're modeling the registry.

[00:15:38] So the registry can have. Pairings of students and courses, and that is a leap that is hard to get to with the standard ways that people teach Object or in a program. Not saying it's impossible. You could obviously make a class called registry and do that, but [00:16:00] people want, People want the student to have a field or a method called GET courses, you know, and it will.

[00:16:11] All the courses they're in. And how do you do that Because the student doesn't have a registry, You know, it just becomes this, um, this problem. And I think it, a lot of it has to do the, the reason it's a problem is it has to do with the, the poverty of relationship types that people use in object or an analysis.

[00:16:36] They basically use has. And isa, right? And maybe they have has many, but it's the same thing. It's just a collection of ISA of hazards and there's a lot more nuance to, to relationships than just ISA and Hasa. Uh, so, [00:17:00] uh, I want to go over that. I wanna explain that like this is not, this is not even hard. , but you have to, you have to free yourself of this is a hasa and try to see that A, that representing many to many relationships often requires stepping outside of the immediate, uh, immediate data entity, right?

[00:17:32] The immediate data structure, and then it becomes very, All right. Uh, this episode has gone on long enough. Uh, this is, my name is Eric Norman. This has been another episode of my podcast. Thanks for listening, and as always, rock on.