Applying domain modeling to an existing data structure

Domain modeling also works after you've already got lots of code. How can we apply domain modeling analysis to existing data structures?


[00:00:00] How do we apply domain modeling when we've got an existing data structure?

[00:00:10] Hello, my name is Eric Normand and this is my podcast. Welcome.

[00:00:16] So I've been writing my book and found this interesting case that I hadn't planned of before, which is that some people, when they're working on an existing code base, already have an existing data model that they realize is not what they want, or a similar case, which is you want to use an existing data structure and it doesn't do exactly what you want.

[00:00:54] We'll cover those two cases. The answer is you've got two functions that you could write. Two operations. One is validation. It validates your data, and the other is normalization. And we'll go over both of these.

[00:01:16] Okay, so let's say you have some existing data model that's not quite right. I often talk about this example: if you have three Booleans to represent a flow that's actually got four states. First you're drafting the document, then you're editing it, then you are ready for publication, and then you're published. So there's four states and they happen in order. They always happen in order. And one way you could represent the state that it's in is with Booleans.

[00:02:01] Has it been drafted? That's one Boolean. Has it been edited? That's a second Boolean. And then has it been published? That's a third Boolean. And you could figure out what state you're in based on which of those are set. If you've been following along, you can already see the problem, which is that you've got four states but with three Booleans that's two to the third states. You actually have eight possible states. You got four states that you want to capture, and you're using a data structure that has eight states. You got four extra states that don't mean anything. How do you get those states? Well, you set a Boolean to true when the things before it are still false .

[00:02:58] So you could say, okay, I've drafted it: true. Skip the editing Boolean somehow and publish it: true. Okay, so you've got this document that has been published. It's true, but somehow the edited Boolean is still marked false. What does that mean? Does it mean that you skipped the editing accidentally and somehow published it without editing it? Or did you do the editing and you just messed up the code?

[00:03:29] You do have this ambiguity now of these four extra states that don't mean anything. So one thing you could do to fix this, once you've analyzed this problem and you've got a lot of code, you can't change it. What do you do? Well, you could validate, you could say I am in either a valid state or an invalid state.

[00:03:55] So you figure out the four valid states to be in and you figure out the four invalid states to be in and you can now determine, hey, I'm in an invalid state. I can't do anything. I need a human to look at me. You know, whatever. You could just throw an error and not allow any other operations to happen on this.

[00:04:17] Sometimes that helps because you're checking a little bit more strictly because you kind of have to at this point, and it might help, it might help your code get more robust. And by centralizing it into a function, you don't have to write that checking code all over the place.

[00:04:37] Now the normalization function is a little different. Normalization is saying I have multiple states that I can encode that actually correspond to the same state. The example I give a lot is a pizza with mushrooms and olives is the same as a pizza with olives and mushrooms. They're the same pizza. They're gonna taste the same.

[00:05:08] But we can represent both because we're using an array. An array keeps the order. And so we can't look at equality, strict equality, but what we can do is normalize the pizzas into a standard form. That is the canonical way of representing them so that they can be compared with equality.

[00:05:31] So that's what normalization is. It's saying we've got these equivalent states, and we want to convert all the equivalent states into the same form. Okay? So one thing you could do on your toppings to normalize is to sort them in alphabetical order. So if you sort all your toppings in alphabetical order, and you have olives and mushrooms, that's gonna turn into mushrooms and olives.

[00:06:03] And if it's already mushrooms and olives, it's gonna stay mushrooms and olives, cause that one's already in order. So all the things that are equivalent are now going to encode into the same normal form and then you can compare them with equals. Okay? So you can normalize and then validate.

[00:06:23] Validate then normalize. You can do all these things in whichever order you want.

[00:06:29] Now there's one more thing, a bonus that I, I, um, have been thinking about that I forgot to mention at the beginning.

[00:06:37] Let's say we do have these eight document editing states. What status is this document in? And you kind of have to check three Booleans every time. Is this one true and this one true and this one false? One thing you can do is a compromise. You can't really change your data model. It's already in the database. There's millions of records already, but you can write a function, I'm gonna call it an adapter function, that this is probably not the right term for it. It's more like an interpreter.

[00:07:13] Anyway, it's gonna take these three Booleans and it's going to return the string that you originally want. You're going from three Booleans to an enum of the four strings. What do you do with those four extra states that are invalid? Well, you can collapse them all into a single string called invalid.

[00:07:40] So now you are returning five possible states and notice you've improved a whole bunch. You've gone from having four invalid states to one invalid state, and now you're switching on a string, which is much easier to read for a human instead of three bulls.

[00:08:04] You can use this function that interprets these three Booleans into the enum all over the place and basically remove most of the downside of using three Booleans. When you really only have four states now you still have the problem of having an invalid state, but it's only one now instead of four.

[00:08:30] And you still have the problem of figuring out how to go from one state to the next. I mean, it's still not an easy problem, but you're able to deal with it in a much better way. It's a little compromise. Instead of changing all your data models, you just say, okay, we messed up, but let's collapse all our invalid states into one and we'll just call it invalid.

[00:09:01] So those are the three tricks for dealing with existing data models that aren't quite right, or to deal with existing data structures that we want to use for convenience, such as using the array to represent toppings. We can validate. We can normalize. And then we have this adapter for adding some ergonomics back into our poor data model.

[00:09:37] All right. My name is Eric Normand. Thanks for listening. And as always, rock on!