Chapter 1
Data Lens Part 1
Chapter objectives
This chapter presents a challenge to me as an author: I’ve got to reteach something most programmers already do intuitively without making the topic seem obvious and boring.
You probably have an intuitive sense of how to model the data of a domain. You likely do it every day. But sometimes we need to relearn the skills we rely on at a deeper level so we can build on top of the new understanding.
The data lens is all about encoding the relationships we find in our domain using features of our language that have the same structure.
41
Domain
Code
model
0
0
4
by the end of this chapter, you’ll understand these three diagrams
one of
one of
"super"
"mega"
"galactic"
Super
Mega
Galactic
alternative
Welcome to MegaBuzz!
42
MegaBuzz is the premier fast-food coffee shop. We pride ourselves on giant servings, coffee that needs milk to taste good, and whatever flavors you need to feel special.
Our barristas have been doing a good job, but we’re growing like crazy. We need help from some software.
Can you help us design the data model?
Each coffee consists of one of three sizes, one of three roasts, and optional add-ins.
Super
Raw
Soymilk
Espresso
Almond
Chocolate
Hazelnut
Mega
Burnt
Galactic
Charcoal
{
"size": "super",
"roast": "burnt",
"addIns": ["espresso",
"soy"]
}
Coffee model
Sizes
Roasts
Add-ins
Coffee encoding: an example coffee in JSON
Each coffee encodes the choices of the customer. The JSON on the right represents one possible coffee. It may seem obvious how to encode that JSON, but let’s dive deep into the process to really understand it.
Chapter 1
Encoding the size of a coffee
43
Data Lens Part 1
We’ll take the encoding one piece at a time. Let’s start with the size.
In our model, to choose a size, we have to select one among the three different sizes. This structure comes up so much, we can give it a name. We’ll call it alternative. Alternatives mean choosing one from a set of options.
We want to choose an encoding that has the same structure as our model. In this case, we want to preserve the “one of” structure that characterizes alternatives.
We chose to encode the size by representing each choice as a string, then representing the “one of” structure as a TypeScript union type. It means that a Size has to be one of those three strings.
one of
alternative
one of
alternative in model
union type in code
type Size = "super" |
"mega" |
"galactic";
"super"
"mega"
"galactic"
Super
Mega
Galactic
TypeScript union type indicated by |
TypeScript type declaration
Note
I’m using TypeScript types when its notation is very clear. Please don’t take it to mean that domain modeling can be done only with TypeScript or only with static types. It’s just a convenient way to express the encoding.
44
Encoding the roast of a coffee
We encode the roast in a similar way to the size. It is a choice of one roast among many options, so it too is an alternative.
We choose to encode the size as a union type of three strings. Each string corresponds to one of the choices. And the union type maintains the “one of” structure from the model. Since the structure between the roasts is the same as the structure between the sizes (“one of”), it makes sense to use the same encoding.
one of
alternative
one of
alternative in model
union type in code
type Roast = "raw" |
"burnt" |
"charcoal";
"raw"
"burnt"
"charcoal"
Raw
Burnt
Charcoal
Chapter 1
Data Lens Part 1
45
one of
zero or more
one of
zero or more
AddIn
alternative
collection
type AddIn = "soy" |
"espresso" |
"hazelnut" |
"chocolate" |
"almond" ;
type AddIns = AddIn[];
Encoding the add-ins of a coffee
The add-ins have a different structure from alternatives. First of all, you don’t choose one. You can choose multiple. You can also repeat the same add-in, such as two espresso shots. We’ll break down the structure into two parts:
We must choose each add-in in the collection, which is very much an alternative. Each one is one of five choices.
Then when we collect them together, there is a zero-or-more structure to the add-ins. We will call this kind of structure a collection.
Soymilk
Espresso
Almond
Chocolate
Hazelnut
"soy"
"espresso"
"almond"
"chocolate"
"hazelnut"
We’ve chose to encode the collection of add-ins as an array. There are many choices we could make since there are many types of collections. We’ll revisit this choice soon.
all of
all of
combination
type Coffee = {
size : Size;
roast : Roast;
addIns: AddIn[];
};
{
"size" : "super",
"roast" : "burnt",
"addIns": [
"soy",
"espresso"
],
}
Size
Roast
46
AddIn[]
these are the types we just defined
here’s an example coffee encoded in this way
Encoding the whole coffee
Now that we’ve got the three components of a coffee, we can combine them together. This time, the structure is “all of” instead of “one of” because the coffee needs a size, roast, and collection of add-ins (which could be empty). We call the “all of” structure a combination.
We chose to encode this combination as a JS object type in TypeScript. There were other possibilities. We will explore those later.
But, good news! We’ve finished describing how we’ve encoded this model. Now we will take a closer look at the choices we could have made but didn’t.
combination in model
JS object type in code
Chapter 1
47
Data Lens Part 1
this is just one possibility. we should consider more
Revisiting the size encoding
Here’s the same view of the size encoding we saw on page 43.
We’re going to zoom in on the bottom half of the diagram, the part showing the encoding. We’ll keep the model (the top half) as given and we’ll explore different choices we have for encoding the same model.
One thing I will emphasize repeatedly is that we should consider as many options as possible for each design decision. The quality of your design is proportional to how many possibilities you consider. While we’re here, we should look at different ways we could encode the size alternative.
Turn the page to zoom in on the bottom half of this diagram and see other ways to encode it.
one of
alternative
one of
alternative in model
union type in code
type Size = "super" |
"mega" |
"galactic";
"super"
"mega"
"galactic"
Super
Mega
Galactic
The quality of your design is proportional to how many possibilities you consider.
type Size = "super" |
"mega" |
"galactic";
enum Size {
super = "super",
mega = "mega",
galactic = "galactic",
}
48
interface Size {
name: string;
}
1
2
3
class Super implements Size {
name = "super";
}
class Mega implements Size {
name = "mega";
}
class Galactic implements Size {
name = "galactic";
}
Strings
+
union type
another possibility is to use care instead of static types. the same three string values would be legal, we just have to make sure not to use anything else
Strings
+
enum
Classes
+
interface
Numbers
Super
Mega
Galactic
alternative in model
possibilities in code
Many options for encoding an alternative
Possible ways to encode an alternative
Let’s zoom into the encoding of the size alternative. Any time we encode something, we have a choice in how it is encoded. Our programming language gives us particular constructs. We have to choose among those constructs which ones have the same (or similar) structure as the model we are encoding.
In this case, we are encoding an alternative in TypeScript. We can list the constructs TypeScript gives us that share the “one of” structure. We will see next how we can evaluate them to choose the best option.
one of
Super
Mega
Chapter 1
Galactic
Data Lens Part 1
49
Fit: evaluating our encoding
We have lots of options for how to encode our models. We need some way to compare them. Each option is slightly different. We need some way to know which ones are better for our model.
The secret is that there is no one way to evaluate them. Why? Software design is hard. It’s multidimensional. It’s too dependent on context for any simple scheme to work every time.
This book is full of lenses, and each lens gives us a different way to evaluate our options. Here in the data lens, we are going to use a concept called fit.
Let’s evaluate the fit of a simplified coffee—one that has only size and roast. To evaluate the fit, we need to count the number of states in the model.
Counting the states in a combination
Counting the states in a TypeScript object type
X
=
=
Our coffee is a combination of two alternatives, each with three options. When counting the states of a combination, we multiply the states of the components. So in this case, 3 sizes times 3 roasts equals 9 possible combinations.
We encode a coffee as a TypeScript object type. To count the states, we multiply the states of the two components. In this case, 3 sizes times 3 roasts equals 9 possible states.
Both the model and the code allow the same number of types. This is very important. We’ll call that perfect fit, and we’ll see this graphically on the next page. In addition, the correspondence between the model and code are clear.
type Coffee = {
size : Size;
roast: Roast;
};
"super",
"raw"
"mega",
"raw"
"galactic",
"raw"
"super",
"burnt"
"mega",
"burnt"
"galactic",
"burnt"
"super",
"charcoal"
"mega",
"charcoal"
"galactic",
"charcoal"
50
Model:
size x roast
Code:
JS Object
unrepresentable
meaningless
representable
0 states
9 states
0 states
Fit: Measuring the encoding with the model
Fit gives us a way to quantitatively judge an encoding and how well it represents the same possibilities as the model. Fit is not the only way to judge an encoding, but sometimes it is enough to show that one encoding is clearly worse than another. Fit means we compare the states in our encoding with the states in a model.
The best way to understand fit is with a Venn diagram. In one circle, we put the states that the model can represent. In the other, we put the states our code can represent. The overlap shows which states are representable in our model. The two non-overlap sections we will call unrepresentable and meaningless.
Perfect fit
When we compare the simplified model (the combination of size and roast) to using a JS Object type to encode it, we see that they have perfect fit. Perfect fit means that the states the model can represent and the states the encoding can represent are exactly the same. In other words, the unrepresentable and meaningless parts are both zero.
Chapter 1
51
Data Lens Part 1
Degenerate case: booleans for size
It’s often very useful to look at a very obviously bad case when you’re trying to understand a concept. We call that a degenerate case—obviously not the right answer. Let’s look at a degenerate case for encoding size, namely using a Boolean.
Booleans have exactly two states: true and false. However, our model needs three states for the size: super, mega, and galactic. Let’s take a look at the Venn diagram.
It’s clear that we shouldn’t use a Boolean to represent the size. A Boolean can really only represent two sizes.
This analysis may seem obvious, but it’s only because you’ve done the analysis. The same thinking extends to many existing encodings. Without doing the analysis, you may have a similar situation where you have states from your model you cannot represent in your code.
Prefer having meaningless states over having unrepresentable states. Very rarely can we encode a model with perfect fit. The world is nearly infinitely varied and we have finite tools in our languages. We’ll soon see how to deal with meaningless states with normalization functions and validation functions.
Model:
size
Code:
Boolean
unrepresentable
meaningless
representable
1 state
2 states
0 states
Prefer having meaningless states over having unrepresentable states.
4 problems encoding coffees with numbers
1
5
8
2
6
9
3
7
4
X
=
I mentioned before that we can encode the size using numbers. We can also encode these 9 states in the size x roast model using the first nine natural numbers
1. Bad fit
This encoding has several drawbacks. The first is that the fit is not great. Check out the fit in the Venn diagram to the right.
We can represent every state using Number, but there are many meaningless states. In this case, we use JavaScript numbers, which are 64-bit numbers. That means that the vast majority of the possible state space doesn’t have any meaning.
2. Human readability
The next problem is that the encoding is arbitrary. What does 5 represent? What about 8? The human readability is very low.
unrepresentable
meaningless
representable
0
9
1.845x1019
Model:
size x roast
52
Code:
JS Number
3. Difficult operations
We’ll look at operations more closely in the next chapter, but just imagine what it might be like to change the size of a coffee from super to mega. Or even writing code to determine the size of a coffee becomes a challenge.
4. Extra operations
You can add two numbers, but can you really add two coffees? What about multiplication? And less than (<)? No, these are meaningless operations.
Chapter 1
53
Data Lens Part 1
Fit: When you can’t measure easily
["soy", "espresso"]
["espresso", "soy"]
["soy", "soy", "almond"]
["almond", "soy", "soy"]
["soy", "almond", "soy"]
Let’s try to figure out the fit of the add-in collection encoded as an array. To review, each coffee can have zero or more add-ins, and each add-in can be one of five choices.
Calculating the states in our encoding (array of add-ins) is straightforward. But how can we calculate the number of states in the model? This is a difficult thing to calculate. Give it a try. How do you count the unique states, taking into account the idea that adding soy then espresso is the same as adding espresso then soy?
We could write some code to try to calculate it, but then we’d have to encode the model, and that’s actually the problem we’re trying to solve.
So let’s find examples in the two non-overlapping parts of the Venn diagram without calculating the actual number of states. These examples are all we need to show poor fit.
Soymilk
Espresso
Almond
Chocolate
Hazelnut
Model
Code
Examples of meaningless differences between states
type AddIn = "soy" |
"espresso" |
"hazelnut" |
"chocolate" |
"almond" ;
type AddIns = AddIn[];
Model:
add-ins
Code:
AddIn[]
unrepresentable
meaningless
representable
We can see clearly that we can represent every possible collection of add-ins. So we can ignore the unrepresentable section.
However, we can list many encodings that are different but in meaningless ways. Arrays are ordered and add-ins are not. We have a misfit between the domain and our encoding. All we have to do is reorder the add-ins in the array. That gives us a different state whose difference doesn’t mean anything in the model. Soy and espresso is the same as espresso and soy.
no problems here
54
Revisiting the add-ins encoding
one of
zero or more
one of
zero or more
AddIn
alternative
collection
type AddIn = "soy" |
"espresso" |
"hazelnut" |
"chocolate" |
"almond" ;
type AddIns = AddIn[];
Here is the same view of the add-ins model and encoding that we saw on page 45.
We’re going to zoom into the bottom half of this diagram to see the different choices we have for encoding it, just like we did for the size.
Remember, we broke this concept from our model into two pieces: choosing an add-in (alternative) and collecting them together (collection).
We encoded the concept with a union type of strings for the alternative and an array for the collection. We’ll keep the model as given, and we’ll keep the union type, but we’ll consider our options for representing the collection. We’ll see the options on the next page.
Soymilk
Espresso
Almond
Chocolate
Hazelnut
"soy"
"espresso"
"almond"
"chocolate"
"hazelnut"
I want to emphasize again the importance of considering different options. It may seem like experienced people don’t do this, but they do. They just do it very quickly. To get quick, you have to do it a lot.
Chapter 1
Data Lens Part 1
55
Set
type Addins = Set<AddIn>;
Examples:
new Set(["soy", "almond", "chocolate"])
new Set()
new Set(["almond", "espresso"])
JS Object
type AddIns = {
[addIn: string] : number;
};
Examples:
{"almond": 1, "soy": 2}
{"hazelnut": 2, "espresso": 1}
Array
type AddIns = AddIn[];
Examples:
["soy", "hazelnut"]
[]
["espresso", "espresso"]
Many options for encoding collections
collection in model
possibilities in code
zero or more
Let’s zoom into the encoding of the add-ins collection. We are looking for constructs in our language that give us the same (or similar) structure as the zero-or-more structure we identified in the model.
Here are some constructs TypeScript gives us to encode collections. I’ve also put their fit Venn diagram alongside each one.
0
0
all
some
all
all
["soy", "almond"]
["almond", "soy"]
...
new Set(["soy, "soy"])
new Set(["almond", "almond"])
...
0
0
{"soy": 0}
{"soy": -1}
{"dfs": 3}
...
new Map([["soy", -1]])
...
Map
type Addins = Map<AddIn, number>;
Examples:
new Map([["soy", 1]])
new Map()
new Map([["almond", 2], ["espresso", 1]])
56
Dealing with meaningless states
We’ve gone over four possible options for encoding a collection of add-ins: Array, Set, JS Object, and Map. We can disqualify Set because it can’t represent all of our states. The three that are left can represent all states, so they pass the fit test, but they each have extra meaningless states, so none have perfect fit.
Why are meaningless states a problem? We will have to handle them one way or another. Either we prevent them from happening, which requires work, or we allow them to happen and make sense of the meaningless values, which also requires work.
Different communities emphasize different strategies. Some try their hardest to get perfect fit. For instance, the F# and ML communities use the phrase “make illegal states unrepresentable”.
There are actually two kinds of meaningless states:
1. Meaningless differences
An example of a meaningless differences is two arrays with the same elements but different orders when the order is irrelevant. The values encode the same add-ins in the domain, but the two arrays appear different to the computer. If you compare them element-wise, they will be unequal. If you serialize them to JSON, the JSONs will be different. Etc. We can handle these cases with normalization functions.
3. Truly meaningless values
Truly meaningless values have no obvious interpretation. If you read ["almond", "soy"] from a file, your code could interpret this to mean, “Put almond and soy in the coffee.” But if it read {"soy": -54}, what does that mean? What does {"jfjdksfjl": 3} mean?
You can of course assign these values meanings, but it helps to avoid these values because they might indicate a bug somewhere. We can handle these cases with validation functions.
Array
Meaningless states:
Set
Unrepresentable states:
JS Object
Meaningless states:
Map
Meaningless states:
Chapter 1
Data Lens Part 1
57
Normalization function
Normalization functions eliminate meaningless differences by converting data into a normal form. Let’s look at the example of arrays of add-ins.
_.isEqual(a, b)
Compare a and b using deep value equality. Arrays are compared element by element.
function normalizeAddIns(addIns) { //=> addIns
return addIns.toSorted();
}
const a = normalizeAddIns(["soy", "almond"]);
const b = normalizeAddIns(["almond", "soy"]);
_.isEqual(a, b) //=> true
const a = ["soy", "almond"];
const b = ["almond", "soy"];
_.isEqual(a, b) //=> false
These two arrays are different, but in the meaningless way. They represent the same set of add-ins. That is, the customer doesn’t care about the order you record the add-ins. The barrista doesn’t care. The manager doesn’t care. The accountant doesn’t care. To every stakeholder of the domain we asked, these mean the same thing.
The normalization function defines what they mean by converting values to a normal form. We can write our add-ins normalization function like this:
Our normal form is sorted order. The normal form should be a form that you can easily convert any value to and that can be compared for equality. If we sort two arrays with the same elements, they will then have the same order.
Using this function, we can rewrite the above code and get a different result:
Although the meaningless differences can still exist, at any time we can normalize our values, get the normal forms, and then compare them for equality.
Signature
Normalization functions have the following signature (although they can be named however you want):
Property
Normalization functions should be idempotent. Normalizing a value already in normal form should be a no-op.
function normalize(t) //=> t
_.isEqual(normalize(a), normalize(normalize(a)))
we’ll see more about properties in the composition lens chapter
you can find this function, and many other useful ones, in the popular lodash library
Validation function
const validAddIns = ["soy", "espresso",
"almond", "chocolate", "hazelnut"];
function isValidAddIn(addIn) { //=> boolean
return validAddIns.includes(addIn);
}
function isValidAddInCollection(addIns) { //=> boolean
return Object.keys(addIns).every(isValidAddIn);
}
isValidAddInCollection({"almond": 3}) //=> true
isValidAddInCollection({"fdsfs": 2}) //=> false
function addAddIn(addIns, addIn) { //=> addIns
assert(isValidAddInCollection(addIns));
assert(isValidAddIn(addIn));
...
}
Validation functions identify truly meaningless values. They act as a filter for invalid values. We often use them to signal a programming error.
Here are two add-in collections encoded as JS objects. One of them is invalid.
Having this value in our system will mess things up. We want to detect it as soon as possible and signal it as an error. Let’s define two validation functions, one for a single add-in, one for a collection of add-ins:
Now we can distinguish between valid and invalid states, like so:
And we can use it to signal a programming error. Here we use an assert() to generate an error before trying to add an add-in when the arguments are invalid:
Signature
Validation functions have the following signature:
function isValid(t) //=> boolean
Property
Validation functions have no special properties.
const a = {"almond": 3}; // valid
const b = {"fdsfs": 2}; // invalid
this is but one of many possible ways of using validation functions in your code
58
Chapter 1
Meaninglessness is a choice
In the last few pages, we’ve defined a normalization and validation functions in very simple ways. But it’s typically not so simple. How you define them depends on your choice of encoding.
For example, what does {"soy": -1} mean? You are writing the program, so you (and your team) get to decide. This value could fall into either of the two kinds of meaninglessness:
You could say that it is a meaningless difference. If you choose that {"soy": -1} means don’t put soy shots, the normalization function should do this:
normalize({"soy": -1}) //=> {}
But if say that it is a meaningless value, the validation function should signal it as invalid:
isValid({"soy": -1}) //=> false
The choice is up to you. It is yet another example of how many design decisions we have to make within the complex systems we call software. What’s important is considering each one.
59
Data Lens Part 1
60
Reporting validation errors
type Validation<T> = {
isValid = true;
value: T;
} | {
isValid = false;
message: string;
};
function isValid(T) //=> Validation<T>
function valid(value) {
return {isValid: true, value};
}
function invalid(message) {
return {isValid: false, message};
}
On page 58 we saw simple definitions of validation functions that return Booleans. What if we want to report to the user what was invalid? Booleans don’t contain user-friendly information.
We can create a new type that distinguishes invalid states and reports a human-readable message about why the value is invalid. Here is the type and function signature, along with two utility functions:
And here are the validation functions we wrote before, this time with nice error messages:
function isValidAddIn(addIn) { //=> Validation<AddIn>
if (validAddIns.includes(addIn))
return valid(addIn);
else
return invalid(`"${addIn}" is not a valid add-in`);
}
function isValidAddInCollection(addIns) { //=> Validation<AddIns>
const invalids = Object.keys(addIns)
.map(isValidAddIn)
.filter(result => !result.isValid)
.map(result => result.message);
if (invalids.length > 0)
return invalid(
`Invalid add-ins collection: ${invalids.join(', ')}.`
);
else
return valid(addIns);
}
You may know what I’m going to say: choosing how to handle invalid data is one of the many design decisions you will have to make.
Chapter 1
utility functions
61
Data Lens Part 1
I’d like to show you a cool trick for improving legacy data models that I’ve used in the past. Imagine we write software for a newspaper that needs every article to go through an editorial process. The process is shown on the right. A document starts as drafting, then when the author is done, the editor edits it, then it is published. There are four states in the model.
However, the encoding we have from our legacy code has a different number of states. Here is the type.
Normalizing existing data models
type ArticleStatus = {
drafted = boolean;
edited = boolean;
published = boolean;
}
Drafting
Editing
Ready
Published
author
who has access?
editor
readers
It seems quite clear: as the document ends each step, the appropriate Boolean is flipped to true. But three Booleans can encode 23=8 states! Let’s look at the fit:
There were just as many meaningless as representable states!
It seemed impossible in our code, but with thousands of articles, we did encounter some meaningless values in our database. We had documents with:
{
drafted: false,
edited: true,
published: true
}
We never figured out how they happened.
And we had too many articles to change the data model. We came up with a solution that was a good compromise. The solution used normalization and validation functions together. Let’s take a look at the solution on the next page.
Representable: 4
4
Unrepresentable: 0
0
Meaningless: 4
0
The encoding we want
type ArticleStatus =
"drafting" |
"editing" |
"ready" |
"published";
This has perfect fit!
62
Using normalization and validation together
Here are our two types (the legacy type and the desired type) along with their fit.
We can’t change encodings, but we can adapt our current encoding to be a closer representation of our model. Here’s how. We create a new type that is very much like our desired encoding, but has one more state:
Then we write a function to adapt the old status type to the new type. It’s kind of like normalizing the old type to the new type:
Then the validation function considers every stray status as a meaningless value and identifies them:
The encoding we have
type ArticleStatus = {
drafted = boolean;
edited = boolean;
published = boolean;
}
The encoding we want
type ArticleStatus =
"drafting" |
"editing" |
"ready" |
"published";
type ArticleStatus2 = "drafting" |
"editing" |
"ready" |
"published"|
"invalid" ;
function isValidStatus(status) {
return status === "invalid";
}
function adaptStatus(status) { //=> ArticleStatus2
if(_.isEqual(status,
{drafted: false, edited: false, published: false}))
return "drafting";
if(_.isEqual(status,
{drafted: true, edited: false, published: false}))
return "editing";
if(_.isEqual(status,
{drafted: true, edited: true, published: false}))
return "ready";
if(_.isEqual(status,
{drafted: true, edited: true, published: true}))
return "published";
return "invalid";
}
0
0
0
0
1
4
4
4
4
Chapter 1
Data Lens Part 1
63
An alternative normalization
Let’s look at another way to adapt the old encoding to a new encoding. Remember, these decisions are up to you. They depend on all of the context that you have about your particular domain and codebase. The best I can do is recommend that you look at as many options as you can. So here is another one.
Instead of having the four states we want plus one to represent invalid states, we could keep it to just the four we want and get perfect fit. With a little investigation, it turns out that our code handls all the meaningless states just fine. For example, here is the code to determine whether to show a document as published:
function showPublished(document) { //=> boolean
return document.status.published;
}
It doesn’t check all of the other booleans, even though technically it should. Basically, the code was already ignoring other booleans in the status. The code to check if it was ready was like this:
We could decide to follow this pattern in our adaptStatus() function:
This essentially turns all stray statuses into meaningless differences, then normalizes them to the desired type.
Remember: We always have the choice whether to consider a value truly meaningless or as a meaningless difference.
function isReady(document) { //=> boolean
return !document.status.published &&
document.status.edited;
}
function adaptStatus(status) { //=> ArticleStatus2
if(document.status.published)
return "published";
if(document.status.edited)
return "ready";
if(document.status.drafted)
return "editing";
return "drafting";
}
The encoding we want
type ArticleStatus2 =
"drafting" |
"editing" |
"ready" |
"published";
0
0
4
Revisiting the size model
Here’s the same view of the size model we’ve been working with. We’re going to zoom in again, but this time we’re going to zoom in on the top, which represents the model.
One of the things that makes software design so difficult is how interdependent the choices are. Each decision changes the context, and hence affects all the other decisions.
So far, we’ve been dealing with the model as given. But the model is also a choice. It’s yet another reason that software design is so difficult: Not only do we have to decide how to encode our model, we have to decide what our model is in the first place.
On the next page, we’re going to zoom into the top half of the diagram above, and visit a few options that we have for modeling the size of a coffee. That will let us complete the picture of the domain modeling process.
one of
64
alternative
one of
alternative in model
union type in code
type Size = "super" |
"mega" |
"galactic";
"super"
"mega"
"galactic"
Super
Mega
Galactic
Chapter 1
65
Data Lens Part 1
alternative
count
latte
# of ml
cappuccino
espresso
size and ingredients related to style
possible models
encodings
Super
Mega
Galactic
Many options for the size model
We’re zooming into the size model to see a few possible ways we can model it. So far, we’ve been talking about coffee size as an alternative of three options: super, mega, and galactic. But that is far from the only way for the business to model the size.
Another way we could model the size is to sell coffee by volume. For instance, we could sell coffee by the milliliter instead of by the size of the cup. If we did that, we would need to encode the number of milliters the customer wants. We call this kind of structure a count.
But if we look at traditional espresso bars from Italy, the size is different. Coffees come in different styles, each style dictates the ingredients you find in it, but also the size and style of cup it is served in. We’re not going to model it (though you might as an exercise).
There are other ways we could model sizes of coffees, but that’s enough to explain the difficulty of software design. We’re not just choosing the encoding, we also have to choose the model (abstracting). Let’s take a look at the domain modeling process.
66
Domain
Code
Chapter 1
model
The domain modeling process
1. Abstract
We can now describe the domain modeling process. First, we start with the domain. The domain is the real-world context of the job we need our software to do. Before abstraction, the context is totally undifferentiated and highly complicated. We need to eliminate unnecessary details and analyze the necessary ones. That is the process of abstraction, which takes the domain and creates a model. The model is a set of concepts and their relationships.
2. Encode
Once we have a model, we need to encode its concepts and relationships in terms of our programming language. So far, we’ve seen how to encode the data, but in the next chapter we’ll see how to encode the operations as well. There are always multiple ways to encode something, so we have to make design decisions.
We encode the model in code so that we can run the code and see what it does. We can test it, using manual or automated testing. Or we can create a prototype.
3. Evaluate
Once we have an encoding, we can evaluate it. We might learn that we need a different encoding. We can go through the cycle on the right-hand side multiple times, revising our encoding each time until it is good enough.
On the other hand, we may decide that our encoding is fine, but the model needs to change. In that case, we need to take a new look at the domain.
4. Look anew
Sometimes, what we learn by running our encoding is that our model isn’t going to work. That’s another advantage of encoding our model as code instead of some other language. When you run an encoding with good fit, you can see if the model itself is good. If it is, keep working on the right-hand cycle. But if it’s not, it’s time to look anew at your domain and start abstracting again. So we may iterate through the left-hand cycle as well.
Data Lens Part 1
67
Domain modeling glossary
Abstraction
Abstraction is the process of analyzing a domain to synthesize the important concepts and their relationships. The end result is a model. The MegaBuzz model (sizes, roasts, add-ins) was abstracted for us by the business owners.
Data model
Data model refers to the encoding of the information of a model in data values and structures. We’ve developed a data model of coffee at MegaBuzz. We have not yet encoded other aspects of the model, such as operations.
Domain
The domain is the real-world context of the job we need our softare to do. In the examples we’ve seen so far, the domain is the business of MegaBuzz, the coffee shop.
Domain expert
A domain expert is a person who understands the model who can serve as a resource for the encoding process. The barista is a domain expert in the preparation of coffees. An accountant is a domain expert in accounting. Hopefully, we programmers become domain experts through the process of domain modeling.
Domain model
Domain model is a loose term sometimes refering to the model, but just as often to the encoding. One might refer to everything we’ve done so far as domain modeling.
We’ve been using a bunch of terms without defining them well. It’s now time to give them good definitions with examples from our MegaBuzz project.
Encoding
The encoding is a physical representation of the model used for communication with people and computers. The encoding can be diagrams, natural language, formal logic, or a programming language. We have built an encoding in a programming language, but we also had English-language descriptions and diagrams.
Evaluation
We evaluate an encoding by judging it in various ways. One way is with fit. We evaluate an encoding to learn how to improve it. We evaluated encodings for add-in collections using fit.
Fit
Fit measures how closely we encode a model’s domain concept or relationhip.
Look anew
We look anew to improve a model by understanding problems with its concepts and relationships. We haven’t seen an example yet, but imagine if our coffee encoding taught us that we should change how the business runs.
Model
A model is a set of related domain concepts. As concepts, they exist in the mind. One model we’ve seen is the concepts of 3 sizes, 3 roasts, and add-in collections. Some models we haven’t talked about are money and marketing promotions.
Conclusion
In this chapter we’ve done quite a lot. We saw how to encode a data model, how the domain modeling process works, and how we can evaluate a data model. This process might be done intuitively, but revisiting it at a deeper level can give us insights into how we can improve our design skills.
Summary
Up next . . .
We’ve seen some basic data modeling techniques. In the next chapter, we’ll continue our exploration of encoding structure with data and deeper our understanding of the process.
see the Data Lens Supplement for more common patterns
68
Chapter 1