← Contents · Runnable Specifications by Eric Normand · Work in progress · Comments
Chapter 1
Data Lens Part 1
Chapter objectives
Learn to analyze the structure of a domain and en-
code it in your language.
Learn to evaluate data models based on t.
Understand how domain models are constructed by
abstraction, encoding, and feedback.
This chapter presents a challenge to me as an author: I’ve
got to reteach something most programmers already do
intuitively without making the topic seem obvious and
boring.
You probably have an intuitive sense of how to model
the data of a domain. You likely do it every day. But some-
times we need to relearn the skills we rely on at a deeper
level so we can build on top of the new understanding.
The data lens is all about encoding the relationships
we nd in our domain using features of our language that
have the same structure.
41
Domain Code
1. abstract
3. evaluate
2. encode
4. look anew
model
0 04
by the end of
this chapter, you’ll
understand these
three diagrams
one of
one of
"super" "mega" "galactic"
Super Mega Galactic
alternative
42 Chapter 1
Welcome to MegaBuzz!
MegaBuzz is the premier fast-food coffee shop. We pride
ourselves on giant servings, coffee that needs milk to taste
good, and whatever avors you need to feel special.
Our barristas have been doing a good job, but were
growing like crazy. We need help from some software.
Can you help us design the data model?
Each coffee consists of one of three sizes, one of three
roasts, and optional add-ins.
Super
Raw
Soymilk Espresso
Almond
Chocolate
Hazelnut
Mega
Burnt
Galactic
Charcoal
{
"size": "super",
"roast": "burnt",
"addIns": ["espresso",
"soy"]
}
Coffee model
Sizes
Roasts
Add-ins
Coffee encoding: an example coffee in JSON
Each coffee encodes the choices of the customer. The JSON
on the right represents one possible coffee. It may seem
obvious how to encode that JSON, but let’s dive deep into
the process to really understand it.
43Data Lens Part 1
Encoding the size of a coffee
We’ll take the encoding one piece at a time. Lets start with
the size.
In our model, to choose a size, we have to select one
among the three different sizes. This structure comes up
so much, we can give it a name. We’ll call it alternative. Al-
ternatives mean choosing one from a set of options.
We want to choose an encoding that has the same struc-
ture as our model. In this case, we want to preserve the
one of” structure that characterizes alternatives.
We chose to encode the size by representing each choice
as a string, then representing the “one of” structure as a
TypeScript union type. It means that a Size has to be one
of those three strings.
one of alternative
one of
alternative in model
union type in code
type Size = "super" |
"mega" |
"galactic";
"super" "mega" "galactic"
Super Mega Galactic
TypeScript union
type indicated
by |
TypeScript type
declaration
Note
I’m using TypeScript types
when its notation is very
clear. Please don’t take it to
mean that domain model-
ing can be done only with
TypeScript or only with
static types. Its just a con-
venient way to express the
encoding.
44 Chapter 1
Encoding the roast of a coffee
We encode the roast in a similar way to the size. It is a
choice of one roast among many options, so it too is an
alternative.
We choose to encode the size as a union type of three
strings. Each string corresponds to one of the choices. And
the union type maintains the “one of” structure from the
model. Since the structure between the roasts is the same
as the structure between the sizes (one of”), it makes
sense to use the same encoding.
one of alternative
one of
alternative in model
union type in code
type Roast = "raw" |
"burnt" |
"charcoal";
"raw" "burnt" "charcoal"
Raw Burnt Charcoal
45Data Lens Part 1
one of
zero or
more
one of
zero or more
AddIn
alternative collection
type AddIn = "soy" |
"espresso" |
"hazelnut" |
"chocolate" |
"almond" ;
type AddIns = AddIn[];
Encoding the add-ins of a coffee
The add-ins have a different structure from alternatives.
First of all, you don’t choose one. You can choose multiple.
You can also repeat the same add-in, such as two espresso
shots. We’ll break down the structure into two parts:
1. Choosing the add-in
2. Collecting them together
We must choose each add-in in the collection, which is
very much an alternative. Each one is one of ve choices.
Then when we collect them together, there is a zero-or-
more structure to the add-ins. We will call this kind of
structure a collection.
Soymilk Espresso
Almond
Chocolate
Hazelnut
alternative in model
collection in model
union type in code
array in code
"soy"
"espresso"
"almond"
"chocolate""hazelnut"
We’ve chose to encode the collection of add-ins as an array.
There are many choices we could make since there are
many types of collections. We’ll revisit this choice soon.
46 Chapter 1
all of
all of
combination
type Coffee = {
size : Size;
roast : Roast;
addIns: AddIn[];
};
{
"size" : "super",
"roast" : "burnt",
"addIns": [
"soy",
"espresso"
],
}
Size
Roast AddIn[]
these are the
types we just
dened
here’s an example
coffee encoded in
this way
Encoding the whole coffee
Now that we’ve got the three components of a coffee, we
can combine them together. This time, the structure is
all of” instead of “one of” because the coffee needs a size,
roast, and collection of add-ins (which could be empty).
We call the “all of” structure a combination.
We chose to encode this combination as a JS object type in
TypeScript. There were other possibilities. We will explore
those later.
But, good news! We’ve nished describing how we’ve
encoded this model. Now we will take a closer look at the
choices we could have made but didnt.
combination in model
JS object type in code
47Data Lens Part 1
this is just one
possibility. we
should consider
more
Revisiting the size encoding
Here’s the same view of the size encoding we saw on page
43.
We’re going to zoom in on the bottom half of the dia-
gram, the part showing the encoding. We’ll keep the mod-
el (the top half) as given and we’ll explore different choic-
es we have for encoding the same model.
One thing I will emphasize repeatedly is that we should
consider as many options as possible for each design deci-
sion. The quality of your design is proportional to how
many possibilities you consider. While we’re here, we
should look at different ways we could encode the size al-
ternative.
Turn the page to zoom in on the bottom half of this dia-
gram and see other ways to encode it.
one of alternative
one of
alternative in model
union type in code
type Size = "super" |
"mega" |
"galactic";
"super" "mega" "galactic"
Super Mega Galactic
The quality of your design is proportional to how many
possibilities you consider.
48 Chapter 1
type Size = "super" |
"mega" |
"galactic";
enum Size {
super = "super",
mega = "mega",
galactic = "galactic",
}
interface Size {
name: string;
}
1
2
3
class Super implements Size {
name = "super";
}
class Mega implements Size {
name = "mega";
}
class Galactic implements Size {
name = "galactic";
}
Strings
+
union type
another possibility is to
use care instead of static
types. the same three
string values would be legal,
we just have to make sure
not to use anything else
Strings
+
enum
Classes
+
interface
Numbers Super
Mega
Galactic
alternative in model
possibilities in code
Many options for encoding an alternative
Possible ways to encode an alternative
Lets zoom into the encoding of the size alternative. Any
time we encode something, we have a choice in how it is
encoded. Our programming language gives us particular
constructs. We have to choose among those constructs
which ones have the same (or similar) structure as the
model we are encoding.
In this case, we are encoding an alternative in Type-
Script. We can list the constructs TypeScript gives us that
share the “one of” structure. We will see next how we can
evaluate them to choose the best option.
one of
Super Mega Galactic
49Data Lens Part 1
Fit: evaluating our encoding
We have lots of options for how to encode our models. We
need some way to compare them. Each option is slightly
different. We need some way to know which ones are bet-
ter for our model.
The secret is that there is no one way to evaluate them.
Why? Software design is hard. Its multidimensional. Its
too dependent on context for any simple scheme to work
every time.
This book is full of lenses, and each lens gives us a dif-
ferent way to evaluate our options. Here in the data lens,
we are going to use a concept called t.
Lets evaluate the t of a simplied coffee—one that has
only size and roast. To evaluate the t, we need to count
the number of states in the model.
Counting the states in a combination
Counting the states in a TypeScript object type
X
=
=
Our coffee is a combination
of two alternatives, each
with three options. When
counting the states of a
combination, we multiply
the states of the compo-
nents. So in this case, 3 siz-
es times 3 roasts equals 9
possible combinations.
We encode a coffee as a
TypeScript object type. To
count the states, we mul-
tiply the states of the two
components. In this case, 3
sizes times 3 roasts equals
9 possible states.
Both the model and the code allow the same number of
types. This is very important. We’ll call that perfect t, and
we’ll see this graphically on the next page. In addition, the
correspondence between the model and code are clear.
type Coffee = {
size : Size;
roast: Roast;
};
"super",
"raw"
"mega",
"raw"
"galactic",
"raw"
"super",
"burnt"
"mega",
"burnt"
"galactic",
"burnt"
"super",
"charcoal"
"mega",
"charcoal"
"galactic",
"charcoal"
50 Chapter 1
Model:
size x roast
Code:
JS Object
unrepresentable meaningless
representable
0 states
9 states
0 states
Fit: Measuring the encoding with the model
Fit gives us a way to quantitatively judge an encoding and
how well it represents the same possibilities as the mod-
el. Fit is not the only way to judge an encoding, but some-
times it is enough to show that one encoding is clearly
worse than another. Fit means we compare the states in
our encoding with the states in a model.
The best way to understand t is with a Venn diagram.
In one circle, we put the states that the model can repre-
sent. In the other, we put the states our code can represent.
The overlap shows which states are representable in our
model. The two non-overlap sections we will call unrepre-
sentable and meaningless.
Perfect t
When we compare the simplied model (the combination
of size and roast) to using a JS Object type to encode it,
we see that they have perfect t. Perfect t means that the
states the model can represent and the states the encod-
ing can represent are exactly the same. In other words,
the unrepresentable and meaningless parts are both zero.
51Data Lens Part 1
Degenerate case: booleans for size
Its often very useful to look at a very obviously bad case
when you’re trying to understand a concept. We call that a
degenerate caseobviously not the right answer. Lets look
at a degenerate case for encoding size, namely using a
Boolean.
Booleans have exactly two states: true and false. Howev-
er, our model needs three states for the size: super, mega,
and galactic. Lets take a look at the Venn diagram.
Its clear that we shouldnt use a Boolean to represent the
size. A Boolean can really only represent two sizes.
This analysis may seem obvious, but its only because
you’ve done the analysis. The same thinking extends to
many existing encodings. Without doing the analysis, you
may have a similar situation where you have states from
your model you cannot represent in your code.
Prefer having meaningless states over having unrepre-
sentable states. Very rarely can we encode a model with
perfect t. The world is nearly innitely varied and we
have nite tools in our languages. We’ll soon see how to
deal with meaningless states with normalization functions
and validation functions.
Model:
size
Code:
Boolean
unrepresentable
meaningless
representable
1 state
2 states
0 states
Prefer having meaningless states over
having unrepresentable states.
52 Chapter 1
4 problems encoding coffees with numbers
1
5
8
2
6
9
3
7
4
X
=
I mentioned before that we can encode the size using num-
bers. We can also encode these 9 states in the size x roast
model using the rst nine natural numbers
1. Bad t
This encoding has several drawbacks. The
rst is that the t is not great. Check out
the t in the Venn diagram to the right.
We can represent every state using Num-
ber, but there are many meaningless states.
In this case, we use JavaScript numbers,
which are 64-bit numbers. That means
that the vast majority of the possible state
space doesnt have any meaning.
2. Human readability
The next problem is that the encoding is
arbitrary. What does 5 represent? What
about 8? The human readability is very low.
unrepresentable
meaningless
representable
0
9
1.845x10
19
Model:
size x roast
Code:
JS Number
3. Difcult operations
We’ll look at operations more closely in the next chapter,
but just imagine what it might be like to change the size of
a coffee from super to mega. Or even writing code to deter-
mine the size of a coffee becomes a challenge.
4. Extra operations
You can add two numbers, but can you really add two cof-
fees? What about multiplication? And less than (<)? No,
these are meaningless operations.
53Data Lens Part 1
Fit: When you cant measure easily
["soy", "espresso"]
["espresso", "soy"]
["soy", "soy", "almond"]
["almond", "soy", "soy"]
["soy", "almond", "soy"]
Lets try to gure out the t of the add-in collection en-
coded as an array. To review, each coffee can have zero or
more add-ins, and each add-in can be one of ve choices.
Calculating the states in our encoding (array of add-ins)
is straightforward. But how can we calculate the number
of states in the model? This is a difcult thing to calculate.
Give it a try. How do you count the unique states, taking
into account the idea that adding soy then espresso is the
same as adding espresso then soy?
We could write some code to try to calculate it, but then
wed have to encode the model, and that’s actually the
problem we’re trying to solve.
So lets nd examples in the two non-overlapping parts
of the Venn diagram without calculating the actual num-
ber of states. These examples are all we need to show poor
t.
Soymilk Espresso
AlmondChocolate Hazelnut
Model
Code
Examples of
meaningless
differences between
states
type AddIn = "soy" |
"espresso" |
"hazelnut" |
"chocolate" |
"almond" ;
type AddIns = AddIn[];
Model:
add-ins
Code:
AddIn[]
unrepresentable
meaningless
representable
We can see clearly that we can represent every possible
collection of add-ins. So we can ignore the unrepresent-
able section.
However, we can list many encodings that are different
but in meaningless ways. Arrays are ordered and add-ins
are not. We have a mist between the domain and our
encoding. All we have to do is reorder the add-ins in the
array. That gives us a different state whose difference
doesn’t mean anything in the model. Soy and espresso is
the same as espresso and soy.
no problems
here
54 Chapter 1
Revisiting the add-ins encoding
one of
zero or
more
one of
zero or more
AddIn
alternative collection
type AddIn = "soy" |
"espresso" |
"hazelnut" |
"chocolate" |
"almond" ;
type AddIns = AddIn[];
Here is the same view of the add-ins model and encoding
that we saw on page 45.
We’re going to zoom into the bottom half of this diagram
to see the different choices we have for encoding it, just
like we did for the size.
Remember, we broke this concept from our model into
two pieces: choosing an add-in (alternative) and collecting
them together (collection).
We encoded the concept with a union type of strings for
the alternative and an array for the collection. We’ll keep
the model as given, and we’ll keep the union type, but we’ll
consider our options for representing the collection. We’ll
see the options on the next page.
Soymilk Espresso
Almond
Chocolate
Hazelnut
alternative in model
collection in model
union type in code
array in code
"soy"
"espresso"
"almond"
"chocolate""hazelnut"
I want to emphasize again the importance of considering
different options. It may seem like experienced people
don’t do this, but they do. They just do it very quickly. To
get quick, you have to do it a lot.
55Data Lens Part 1
Set
type Addins = Set<AddIn>;
Examples:
new Set(["soy", "almond", "chocolate"])
new Set()
new Set(["almond", "espresso"])
JS Object
type AddIns = {
[addIn: string] : number;
};
Examples:
{"almond": 1, "soy": 2}
{"hazelnut": 2, "espresso": 1}
Array
type AddIns = AddIn[];
Examples:
["soy", "hazelnut"]
[]
["espresso", "espresso"]
Many options for encoding collections
collection in model
possibilities in code
zero or
more
Lets zoom into the encoding of the add-ins collection. We
are looking for constructs in our language that give us the
same (or similar) structure as the zero-or-more structure
we identied in the model.
Here are some constructs TypeScript gives us to encode
collections. I’ve also put their t Venn diagram alongside
each one.
0
0
all
some
all
all
["soy", "almond"]
["almond", "soy"]
...
new Set(["soy, "soy"])
new Set(["almond", "almond"])
...
0
0
{"soy": 0}
{"soy": -1}
{"dfs": 3}
...
new Map([["soy", -1]])
...
Map
type Addins = Map<AddIn, number>;
Examples:
new Map([["soy", 1]])
new Map()
new Map([["almond", 2], ["espresso", 1]])
56 Chapter 1
Dealing with meaningless states
We’ve gone over four possible options for encoding a col-
lection of add-ins: Array, Set, JS Object, and Map. We can
disqualify Set because it can’t represent all of our states.
The three that are left can represent all states, so they pass
the t test, but they each have extra meaningless states, so
none have perfect t.
Why are meaningless states
a problem? We will have
to handle them one way
or another. Either we pre-
vent them from happening,
which requires work, or we
allow them to happen and
make sense of the mean-
ingless values, which also
requires work.
Different communi-
ties emphasize different
strategies. Some try their
hardest to get perfect t.
For instance, the F# and
ML communities use the
phrase “make illegal states
unrepresentable.
There are actually two kinds of meaningless states:
1. Meaningless differences
An example of a meaningless differences is two arrays
with the same elements but different orders when the
order is irrelevant. The values encode the same add-ins
in the domain, but the two arrays appear different to the
computer. If you compare them element-wise, they will be
unequal. If you serialize them to JSON, the JSONs will be
different. Etc. We can handle these cases with normaliza-
tion functions.
3. Truly meaningless values
Truly meaningless values have no obvious interpretation.
If you read ["almond", "soy"] from a le, your code
could interpret this to mean, “Put almond and soy in the
coffee.” But if it read {"s oy": -5 4}, what does that mean?
What does {"jfjdksfjl": 3} mean?
You can of course assign these values meanings, but it
helps to avoid these values because they might indicate a
bug somewhere. We can handle these cases with valida-
tion functions.
Array
Meaningless states:
Same add-ins,
different order
Set
Unrepresentable
states:
D u p l i c a t e
add-ins
JS Object
Meaningless states:
Add-in count is
negative integer
Add-in count is
zero
Add-in name is
not a valid add-
in
Map
Meaningless states:
Add-in count is
negative integer
Add-in count is
zero
57Data Lens Part 1
Normalization function
Normalization functions eliminate meaningless differenc-
es by converting data into a normal form. Let’s look at the
example of arrays of add-ins.
_.isEqual(a, b)
Compare a and b using
deep value equality. Arrays
are compared element by
element.
function normalizeAddIns(addIns) { //=> addIns
return addIns.toSorted();
}
const a = normalizeAddIns(["soy", "almond"]);
const b = normalizeAddIns(["almond", "soy"]);
_.isEqual(a, b) //=> true
const a = ["soy", "almond"];
const b = ["almond", "soy"];
_.isEqual(a, b) //=> false
These two arrays are different, but in the meaningless
way. They represent the same set of add-ins. That is, the
customer doesnt care about the order you record the add-
ins. The barrista doesnt care. The manager doesnt care.
The accountant doesnt care. To every stakeholder of the
domain we asked, these mean the same thing.
The normalization function denes what they mean by
converting values to a normal form. We can write our add-
ins normalization function like this:
Our normal form is sorted order. The normal form should
be a form that you can easily convert any value to and that
can be compared for equality. If we sort two arrays with
the same elements, they will then have the same order.
Using this function, we can rewrite the above code and
get a different result:
Although the meaningless differences can still exist, at
any time we can normalize our values, get the normal
forms, and then compare them for equality.
Signature
Normalization functions have the following signature (al-
though they can be named however you want):
Property
Normalization functions should be idempotent. Normaliz-
ing a value already in normal form should be a no-op.
function normalize(t) //=> t
_.isEqual(normalize(a), normalize(normalize(a)))
we’ll see more
about properties
in the composition
lens chapter
you can nd this
function, and many
other useful ones,
in the popular
lodash library
58 Chapter 1
Validation function
const validAddIns = ["soy", "espresso",
"almond", "chocolate", "hazelnut"];
function isValidAddIn(addIn) { //=> boolean
return validAddIns.includes(addIn);
}
function isValidAddInCollection(addIns) { //=> boolean
return Object.keys(addIns).every(isValidAddIn);
}
isValidAddInCollection({"almond": 3}) //=> true
isValidAddInCollection({"fdsfs": 2}) //=> false
function addAddIn(addIns, addIn) { //=> addIns
assert(isValidAddInCollection(addIns));
assert(isValidAddIn(addIn));
...
}
Validation functions identify truly meaningless values.
They act as a lter for invalid values. We often use them to
signal a programming error.
Here are two add-in collections encoded as JS objects.
One of them is invalid.
Having this value in our system will mess things up. We
want to detect it as soon as possible and signal it as an er-
ror. Lets dene two validation functions, one for a single
add-in, one for a collection of add-ins:
Now we can distinguish between valid and invalid states,
like so:
And we can use it to signal a programming error. Here we
use an assert() to generate an error before trying to add
an add-in when the arguments are invalid:
Signature
Validation functions have the following signature:
function isValid(t) //=> boolean
Property
Validation functions have no special properties.
const a = {"almond": 3}; // valid
const b = {"fdsfs": 2}; // invalid
this is but one of many
possible ways of using
validation functions in
your code
59Data Lens Part 1
Meaninglessness is a choice
In the last few pages, we’ve dened a normalization and
validation functions in very simple ways. But its typical-
ly not so simple. How you dene them depends on your
choice of encoding.
For example, what does {"s oy": -1} mean? You are
writing the program, so you (and your team) get to decide.
This value could fall into either of the two kinds of mean-
inglessness:
1. Meaningless difference
2. Truly meaningless value
You could say that it is a meaningless difference. If you
choose that {"s oy": -1} means dont put soy shots, the
normalization function should do this:
normalize({"soy": -1}) //=> {}
But if say that it is a meaningless value, the validation
function should signal it as invalid:
isValid({"soy": -1}) //=> false
The choice is up to you. It is yet another example of how
many design decisions we have to make within the com-
plex systems we call software. Whats important is con-
sidering each one.
60 Chapter 1
Reporting validation errors
type Validation<T> = {
isValid = true;
value: T;
} | {
isValid = false;
message: string;
};
function isValid(T) //=> Validation<T>
function valid(value) {
return {isValid: true, value};
}
function invalid(message) {
return {isValid: false, message};
}
On page 58 we saw simple denitions of validation
functions that return Booleans. What if we want to report
to the user what was invalid? Booleans dont contain us-
er-friendly information.
We can create a new type that distinguishes invalid
states and reports a human-readable message about why
the value is invalid. Here is the type and function signa-
ture, along with two utility functions:
And here are the validation functions we wrote before, this
time with nice error messages:
function isValidAddIn(addIn) { //=> Validation<AddIn>
if (validAddIns.includes(addIn))
return valid(addIn);
else
return invalid(`"${addIn}" is not a valid add-in`);
}
function isValidAddInCollection(addIns) { //=> Validation<AddIns>
const invalids = Object.keys(addIns)
.map(isValidAddIn)
.lter(result=>!result.isValid)
.map(result => result.message);
if (invalids.length > 0)
return invalid(
`Invalid add-ins collection: ${invalids.join(', ')}.`
);
else
return valid(addIns);
}
You may know what I’m going to say: choosing how to han-
dle invalid data is one of the many design decisions you
will have to make.
utility functions
61Data Lens Part 1
I’d like to show you a cool trick for improving legacy
data models that I’ve used in the past. Imagine we write
software for a newspaper that needs every article to go
through an editorial process. The process is shown on the
right. A document starts as drafting, then when the author
is done, the editor edits it, then it is published. There are
four states in the model.
However, the encoding we have from our legacy code
has a different number of states. Here is the type.
Normalizing existing data models
type ArticleStatus = {
drafted = boolean;
edited = boolean;
published = boolean;
}
Drafting
Editing
Ready
Published
author
who has
access?
editor
readers
It seems quite clear: as the document ends each step, the
appropriate Boolean is ipped to true. But three Booleans
can encode 2
3
=8 states! Lets look at the t:
There were just as many meaningless as representable
states!
It seemed impossible in our code, but with thousands
of articles, we did encounter some meaningless values in
our database. We had documents with:
{
drafted: false,
edited: true,
published: true
}
We never gured out how they happened.
And we had too many articles to change the data model.
We came up with a solution that was a good compromise.
The solution used normalization and validation functions
together. Lets take a look at the solution on the next page.
Representable: 4
4
Unrepresentable: 0
0
Meaningless: 4
0
The encoding we want
type ArticleStatus =
"drafting" |
"editing" |
"ready" |
"published";
This has perfect t!
62 Chapter 1
Using normalization and validation together
Here are our two types (the legacy type and the desired
type) along with their t.
We cant change encodings, but we can adapt our cur-
rent encoding to be a closer representation of our model.
Here’s how. We create a new type that is very much like
our desired encoding, but has one more state:
Then we write a function to adapt the old status type to the
new type. Its kind of like normalizing the old type to the
new type:
Then the validation function considers every stray status
as a meaningless value and identies them:
The encoding we have
type ArticleStatus = {
drafted = boolean;
edited = boolean;
published = boolean;
}
The encoding we want
type ArticleStatus =
"drafting" |
"editing" |
"ready" |
"published";
type ArticleStatus2 = "drafting" |
"editing" |
"ready" |
"published"|
"invalid" ;
function isValidStatus(status) {
return status === "invalid";
}
function adaptStatus(status) { //=> ArticleStatus2
if(_.isEqual(status,
{drafted: false, edited: false, published: false}))
return "drafting";
if(_.isEqual(status,
{drafted: true, edited: false, published: false}))
return "editing";
if(_.isEqual(status,
{drafted: true, edited: true, published: false}))
return "ready";
if(_.isEqual(status,
{drafted: true, edited: true, published: true}))
return "published";
return "invalid";
}
0
0
0
0
1
4
4
4
4
63Data Lens Part 1
An alternative normalization
Lets look at another way to adapt the old encoding to a
new encoding. Remember, these decisions are up to you.
They depend on all of the context that you have about your
particular domain and codebase. The best I can do is rec-
ommend that you look at as many options as you can. So
here is another one.
Instead of having the four states we want plus one to rep-
resent invalid states, we could keep it to just the four we
want and get perfect t. With a little investigation, it turns
out that our code handls all the meaningless states just
ne. For example, here is the code to determine whether
to show a document as published:
function showPublished(document) { //=> boolean
return document.status.published;
}
It doesn’t check all of the other booleans, even though
technically it should. Basically, the code was already ig-
noring other booleans in the status. The code to check if it
was ready was like this:
We could decide to follow this pattern in our adaptStatus()
function:
This essentially turns all stray statuses into meaningless
differences, then normalizes them to the desired type.
Remember: We always have the choice whether to con-
sider a value truly meaningless or as a meaningless dif-
ference.
function isReady(document) { //=> boolean
return!document.status.published&&
document.status.edited;
}
function adaptStatus(status) { //=> ArticleStatus2
if(document.status.published)
return "published";
if(document.status.edited)
return "ready";
if(document.status.drafted)
return "editing";
return "drafting";
}
The encoding we want
type ArticleStatus2 =
"drafting" |
"editing" |
"ready" |
"published";
0 04
64 Chapter 1
Revisiting the size model
Here’s the same view of the size model we’ve been work-
ing with. We’re going to zoom in again, but this time we’re
going to zoom in on the top, which represents the model.
One of the things that makes software design so difcult is
how interdependent the choices are. Each decision chang-
es the context, and hence affects all the other decisions.
So far, we’ve been dealing with the model as given. But
the model is also a choice. Its yet another reason that
software design is so difcult: Not only do we have to de-
cide how to encode our model, we have to decide what our
model is in the rst place.
On the next page, we’re going to zoom into the top half
of the diagram above, and visit a few options that we have
for modeling the size of a coffee. That will let us complete
the picture of the domain modeling process.
one of alternative
one of
alternative in model
union type in code
type Size = "super" |
"mega" |
"galactic";
"super" "mega" "galactic"
Super Mega Galactic
65Data Lens Part 1
alternative
count
latte
# of ml
cappuccinoespresso
size and ingredients
related to style
possible models
encodings
Super Mega Galactic
Many options for the size model
We’re zooming into the size model to see a few possible
ways we can model it. So far, we’ve been talking about cof-
fee size as an alternative of three options: super, mega, and
galactic. But that is far from the only way for the business
to model the size.
Another way we could model the size is to sell coffee by
volume. For instance, we could sell coffee by the milliliter
instead of by the size of the cup. If we did that, we would
need to encode the number of milliters the customer
wants. We call this kind of structure a count.
But if we look at traditional espresso bars from Italy, the
size is different. Coffees come in different styles, each
style dictates the ingredients you nd in it, but also the
size and style of cup it is served in. We’re not going to
model it (though you might as an exercise).
There are other ways we could model sizes of coffees, but
that’s enough to explain the difculty of software design.
We’re not just choosing the encoding, we also have to
choose the model (abstracting). Lets take a look at the do-
main modeling process.
66 Chapter 1
Domain Code
1. abstract
3. evaluate
2. encode
4. look anew
model
The domain modeling process
1. Abstract
We can now describe the domain modeling
process. First, we start with the domain.
The domain is the real-world context of the
job we need our software to do. Before ab-
straction, the context is totally undifferen-
tiated and highly complicated. We need to
eliminate unnecessary details and analyze
the necessary ones. That is the process of
abstraction, which takes the domain and
creates a model. The model is a set of con-
cepts and their relationships.
2. Encode
Once we have a model, we need to encode
its concepts and relationships in terms of
our programming language. So far, we’ve
seen how to encode the data, but in the
next chapter we’ll see how to encode the
operations as well. There are always multi-
ple ways to encode something, so we have
to make design decisions.
We encode the model in code so that we
can run the code and see what it does. We
can test it, using manual or automated test-
ing. Or we can create a prototype.
3. Evaluate
Once we have an encoding, we can evaluate
it. We might learn that we need a different
encoding. We can go through the cycle on
the right-hand side multiple times, revis-
ing our encoding each time until it is good
enough.
On the other hand, we may decide that
our encoding is ne, but the model needs
to change. In that case, we need to take a
new look at the domain.
4. Look anew
Sometimes, what we learn by running our
encoding is that our model isn’t going to
work. That’s another advantage of encod-
ing our model as code instead of some oth-
er language. When you run an encoding
with good t, you can see if the model itself
is good. If it is, keep working on the right-
hand cycle. But if it’s not, its time to look
anew at your domain and start abstracting
again. So we may iterate through the left-
hand cycle as well.
67Data Lens Part 1
Domain modeling glossary
Abstraction
Abstraction is the process of analyzing a do-
main to synthesize the important concepts
and their relationships. The end result is a
model. The MegaBuzz model (sizes, roasts,
add-ins) was abstracted for us by the busi-
ness owners.
Data model
Data model refers to the encoding of the
information of a model in data values and
structures. We’ve developed a data model
of coffee at MegaBuzz. We have not yet en-
coded other aspects of the model, such as
operations.
Domain
The domain is the real-world context of the
job we need our softare to do. In the exam-
ples we’ve seen so far, the domain is the
business of MegaBuzz, the coffee shop.
Domain expert
A domain expert is a person who under-
stands the model who can serve as a re-
source for the encoding process. The baris-
ta is a domain expert in the preparation of
coffees. An accountant is a domain expert
in accounting. Hopefully, we programmers
become domain experts through the pro-
cess of domain modeling.
Domain model
Domain model is a loose term sometimes
refering to the model, but just as often to
the encoding. One might refer to every-
thing weve done so far as domain modeling.
We’ve been using a bunch of terms without dening them
well. Its now time to give them good denitions with ex-
amples from our MegaBuzz project.
Encoding
The encoding is a physical representation
of the model used for communication with
people and computers. The encoding can
be diagrams, natural language, formal log-
ic, or a programming language. We have
built an encoding in a programming lan-
guage, but we also had English-language
descriptions and diagrams.
Evaluation
We evaluate an encoding by judging it in
various ways. One way is with t. We eval-
uate an encoding to learn how to improve
it. We evaluated encodings for add-in col-
lections using t.
Fit
Fit measures how closely we encode a mod-
el’s domain concept or relationhip.
Look anew
We look anew to improve a model by under-
standing problems with its concepts and
relationships. We havent seen an exam-
ple yet, but imagine if our coffee encoding
taught us that we should change how the
business runs.
Model
A model is a set of related domain concepts.
As concepts, they exist in the mind. One
model we’ve seen is the concepts of 3 siz-
es, 3 roasts, and add-in collections. Some
models we havent talked about are money
and marketing promotions.
68 Chapter 1
Conclusion
In this chapter we’ve done quite a lot. We saw how to en-
code a data model, how the domain modeling process
works, and how we can evaluate a data model. This pro-
cess might be done intuitively, but revisiting it at a deep-
er level can give us insights into how we can improve our
design skills.
Summary
Data models encode information about domain con-
cepts and their relationships. We use data types and
data structures to mimic their structure in our code.
The relationships in a model often fall into several
common patterns such as alternative or combination.
We should understand the semantics of the available
language features so that we can use them to encode
those relationships.
It is rare to nd that a language feature encodes a con-
cept or relationship exactly. We can use normalization
and validation functions when they dont exactly t.
The domain modeling process is about learning. We
abstract concepts from the domain and encode those
concepts. We can evaluate the encoding against the
model and evaluate the model against the domain.
The best way to get good t is to compare multiple en-
codings and choose the best one. Too often, we use the
rst encoding we think of.
Up next . . .
We’ve seen some basic data modeling techniques. In the
next chapter, we’ll continue our exploration of encoding
structure with data and deeper our understanding of the
process.
see the Data Lens
Supplement for more
common patterns