Data Lens Part 1 - Runnable Specifications

← Contents · Runnable Specifications by Eric Normand · Work in progress · Comments

Chapter 1

Data Lens Part 1

Chapter objectives

• Learn to analyze the structure of a domain and en-

code it in your language.

• Learn to evaluate data models based on t.

• Understand how domain models are constructed by

abstraction, encoding, and feedback.

This chapter presents a challenge to me as an author: I’ve

got to reteach something most programmers already do

intuitively without making the topic seem obvious and

boring.

You probably have an intuitive sense of how to model

the data of a domain. You likely do it every day. But some-

times we need to relearn the skills we rely on at a deeper

level so we can build on top of the new understanding.

The data lens is all about encoding the relationships

we nd in our domain using features of our language that

have the same structure.

Domain Code

1. abstract

3. evaluate

2. encode

4. look anew

model

0 04

by the end of

this chapter, you’ll

understand these

three diagrams

one of

"super" "mega" "galactic"

Super Mega Galactic

alternative

42 Chapter 1

Welcome to MegaBuzz!

MegaBuzz is the premier fast-food coffee shop. We pride

ourselves on giant servings, coffee that needs milk to taste

good, and whatever avors you need to feel special.

Our barristas have been doing a good job, but we’re

growing like crazy. We need help from some software.

Can you help us design the data model?

Each coffee consists of one of three sizes, one of three

roasts, and optional add-ins.

Super

Raw

Soymilk Espresso

Almond

Chocolate

Hazelnut

Mega

Burnt

Galactic

Charcoal

{

"size": "super",

"roast": "burnt",

"addIns": ["espresso",

"soy"]

}

Coffee model

Sizes

Roasts

Add-ins

Coffee encoding: an example coffee in JSON

Each coffee encodes the choices of the customer. The JSON

on the right represents one possible coffee. It may seem

obvious how to encode that JSON, but let’s dive deep into

the process to really understand it.

43Data Lens Part 1

Encoding the size of a coffee

We’ll take the encoding one piece at a time. Let’s start with

the size.

In our model, to choose a size, we have to select one

among the three different sizes. This structure comes up

so much, we can give it a name. We’ll call it alternative. Al-

ternatives mean choosing one from a set of options.

We want to choose an encoding that has the same struc-

ture as our model. In this case, we want to preserve the

“one of” structure that characterizes alternatives.

We chose to encode the size by representing each choice

as a string, then representing the “one of” structure as a

TypeScript union type. It means that a Size has to be one

of those three strings.

one of alternative

one of

alternative in model

union type in code

type Size = "super" |

"mega" |

"galactic";

"super" "mega" "galactic"

Super Mega Galactic

TypeScript union

type indicated

by |

TypeScript type

declaration

Note

I’m using TypeScript types

when its notation is very

clear. Please don’t take it to

mean that domain model-

ing can be done only with

TypeScript or only with

static types. It’s just a con-

venient way to express the

encoding.

44 Chapter 1

Encoding the roast of a coffee

We encode the roast in a similar way to the size. It is a

choice of one roast among many options, so it too is an

alternative.

We choose to encode the size as a union type of three

strings. Each string corresponds to one of the choices. And

the union type maintains the “one of” structure from the

model. Since the structure between the roasts is the same

as the structure between the sizes (“one of”), it makes

sense to use the same encoding.

one of alternative

one of

alternative in model

union type in code

type Roast = "raw" |

"burnt" |

"charcoal";

"raw" "burnt" "charcoal"

Raw Burnt Charcoal

45Data Lens Part 1

one of

zero or

one of

zero or more

AddIn

alternative collection

type AddIn = "soy" |

"espresso" |

"hazelnut" |

"chocolate" |

"almond" ;

type AddIns = AddIn[];

Encoding the add-ins of a coffee

The add-ins have a different structure from alternatives.

First of all, you don’t choose one. You can choose multiple.

You can also repeat the same add-in, such as two espresso

shots. We’ll break down the structure into two parts:

1. Choosing the add-in

2. Collecting them together

We must choose each add-in in the collection, which is

very much an alternative. Each one is one of ve choices.

Then when we collect them together, there is a zero-or-

more structure to the add-ins. We will call this kind of

structure a collection.

Soymilk Espresso

Almond

Chocolate

Hazelnut

alternative in model

collection in model

union type in code

array in code

"soy"

"espresso"

"almond"

"chocolate""hazelnut"

We’ve chose to encode the collection of add-ins as an array.

There are many choices we could make since there are

many types of collections. We’ll revisit this choice soon.

46 Chapter 1

all of

combination

type Coffee = {

size : Size;

roast : Roast;

addIns: AddIn[];

};

{

"size" : "super",

"roast" : "burnt",

"addIns": [

"soy",

"espresso"

}

Size

Roast AddIn[]

these are the

types we just

dened

here’s an example

coffee encoded in

this way

Encoding the whole coffee

Now that we’ve got the three components of a coffee, we

can combine them together. This time, the structure is

“all of” instead of “one of” because the coffee needs a size,

roast, and collection of add-ins (which could be empty).

We call the “all of” structure a combination.

We chose to encode this combination as a JS object type in

TypeScript. There were other possibilities. We will explore

those later.

But, good news! We’ve nished describing how we’ve

encoded this model. Now we will take a closer look at the

choices we could have made but didn’t.

combination in model

JS object type in code

47Data Lens Part 1

this is just one

possibility. we

should consider

Revisiting the size encoding

Here’s the same view of the size encoding we saw on page

43.

We’re going to zoom in on the bottom half of the dia-

gram, the part showing the encoding. We’ll keep the mod-

el (the top half) as given and we’ll explore different choic-

es we have for encoding the same model.

One thing I will emphasize repeatedly is that we should

consider as many options as possible for each design deci-

sion. The quality of your design is proportional to how

many possibilities you consider. While we’re here, we

should look at different ways we could encode the size al-

ternative.

Turn the page to zoom in on the bottom half of this dia-

gram and see other ways to encode it.

one of alternative

one of

alternative in model

union type in code

type Size = "super" |

"mega" |

"galactic";

"super" "mega" "galactic"

Super Mega Galactic

The quality of your design is proportional to how many

possibilities you consider.

48 Chapter 1

type Size = "super" |

"mega" |

"galactic";

enum Size {

super = "super",

mega = "mega",

galactic = "galactic",

}

interface Size {

name: string;

}

class Super implements Size {

name = "super";

}

class Mega implements Size {

name = "mega";

}

class Galactic implements Size {

name = "galactic";

}

Strings

union type

another possibility is to

use care instead of static

types. the same three

string values would be legal,

we just have to make sure

not to use anything else

Strings

enum

Classes

interface

Numbers Super

Mega

Galactic

alternative in model

possibilities in code

Many options for encoding an alternative

Possible ways to encode an alternative

Let’s zoom into the encoding of the size alternative. Any

time we encode something, we have a choice in how it is

encoded. Our programming language gives us particular

constructs. We have to choose among those constructs

which ones have the same (or similar) structure as the

model we are encoding.

In this case, we are encoding an alternative in Type-

Script. We can list the constructs TypeScript gives us that

share the “one of” structure. We will see next how we can

evaluate them to choose the best option.

one of

Super Mega Galactic

49Data Lens Part 1

Fit: evaluating our encoding

We have lots of options for how to encode our models. We

need some way to compare them. Each option is slightly

different. We need some way to know which ones are bet-

ter for our model.

The secret is that there is no one way to evaluate them.

Why? Software design is hard. It’s multidimensional. It’s

too dependent on context for any simple scheme to work

every time.

This book is full of lenses, and each lens gives us a dif-

ferent way to evaluate our options. Here in the data lens,

we are going to use a concept called t.

Let’s evaluate the t of a simplied coffee—one that has

only size and roast. To evaluate the t, we need to count

the number of states in the model.

Counting the states in a combination

Counting the states in a TypeScript object type

Our coffee is a combination

of two alternatives, each

with three options. When

counting the states of a

combination, we multiply

the states of the compo-

nents. So in this case, 3 siz-

es times 3 roasts equals 9

possible combinations.

We encode a coffee as a

TypeScript object type. To

count the states, we mul-

tiply the states of the two

components. In this case, 3

sizes times 3 roasts equals

9 possible states.

Both the model and the code allow the same number of

types. This is very important. We’ll call that perfect t, and

we’ll see this graphically on the next page. In addition, the

correspondence between the model and code are clear.

type Coffee = {

size : Size;

roast: Roast;

};

"super",

"raw"

"mega",

"raw"

"galactic",

"raw"

"super",

"burnt"

"mega",

"burnt"

"galactic",

"burnt"

"super",

"charcoal"

"mega",

"charcoal"

"galactic",

"charcoal"

50 Chapter 1

Model:

size x roast

Code:

JS Object

unrepresentable meaningless

representable

0 states

9 states

0 states

Fit: Measuring the encoding with the model

Fit gives us a way to quantitatively judge an encoding and

how well it represents the same possibilities as the mod-

el. Fit is not the only way to judge an encoding, but some-

times it is enough to show that one encoding is clearly

worse than another. Fit means we compare the states in

our encoding with the states in a model.

The best way to understand t is with a Venn diagram.

In one circle, we put the states that the model can repre-

sent. In the other, we put the states our code can represent.

The overlap shows which states are representable in our

model. The two non-overlap sections we will call unrepre-

sentable and meaningless.

Perfect t

When we compare the simplied model (the combination

of size and roast) to using a JS Object type to encode it,

we see that they have perfect t. Perfect t means that the

states the model can represent and the states the encod-

ing can represent are exactly the same. In other words,

the unrepresentable and meaningless parts are both zero.

51Data Lens Part 1

Degenerate case: booleans for size

It’s often very useful to look at a very obviously bad case

when you’re trying to understand a concept. We call that a

degenerate case—obviously not the right answer. Let’s look

at a degenerate case for encoding size, namely using a

Boolean.

Booleans have exactly two states: true and false. Howev-

er, our model needs three states for the size: super, mega,

and galactic. Let’s take a look at the Venn diagram.

It’s clear that we shouldn’t use a Boolean to represent the

size. A Boolean can really only represent two sizes.

This analysis may seem obvious, but it’s only because

you’ve done the analysis. The same thinking extends to

many existing encodings. Without doing the analysis, you

may have a similar situation where you have states from

your model you cannot represent in your code.

Prefer having meaningless states over having unrepre-

sentable states. Very rarely can we encode a model with

perfect t. The world is nearly innitely varied and we

have nite tools in our languages. We’ll soon see how to

deal with meaningless states with normalization functions

and validation functions.

Model:

size

Code:

Boolean

unrepresentable

meaningless

representable

1 state

2 states

0 states

Prefer having meaningless states over

having unrepresentable states.

52 Chapter 1

4 problems encoding coffees with numbers

I mentioned before that we can encode the size using num-

bers. We can also encode these 9 states in the size x roast

model using the rst nine natural numbers

1. Bad t

This encoding has several drawbacks. The

rst is that the t is not great. Check out

the t in the Venn diagram to the right.

We can represent every state using Num-

ber, but there are many meaningless states.

In this case, we use JavaScript numbers,

which are 64-bit numbers. That means

that the vast majority of the possible state

space doesn’t have any meaning.

2. Human readability

The next problem is that the encoding is

arbitrary. What does 5 represent? What

about 8? The human readability is very low.

unrepresentable

meaningless

representable

1.845x10

Model:

size x roast

Code:

JS Number

3. Difcult operations

We’ll look at operations more closely in the next chapter,

but just imagine what it might be like to change the size of

a coffee from super to mega. Or even writing code to deter-

mine the size of a coffee becomes a challenge.

4. Extra operations

You can add two numbers, but can you really add two cof-

fees? What about multiplication? And less than (<)? No,

these are meaningless operations.

53Data Lens Part 1

Fit: When you can’t measure easily

["soy", "espresso"]

["espresso", "soy"]

["soy", "soy", "almond"]

["almond", "soy", "soy"]

["soy", "almond", "soy"]

Let’s try to gure out the t of the add-in collection en-

coded as an array. To review, each coffee can have zero or

more add-ins, and each add-in can be one of ve choices.

Calculating the states in our encoding (array of add-ins)

is straightforward. But how can we calculate the number

of states in the model? This is a difcult thing to calculate.

Give it a try. How do you count the unique states, taking

into account the idea that adding soy then espresso is the

same as adding espresso then soy?

We could write some code to try to calculate it, but then

we’d have to encode the model, and that’s actually the

problem we’re trying to solve.

So let’s nd examples in the two non-overlapping parts

of the Venn diagram without calculating the actual num-

ber of states. These examples are all we need to show poor

t.

Soymilk Espresso

AlmondChocolate Hazelnut

Model

Code

Examples of

meaningless

differences between

states

type AddIn = "soy" |

"espresso" |

"hazelnut" |

"chocolate" |

"almond" ;

type AddIns = AddIn[];

Model:

add-ins

Code:

AddIn[]

unrepresentable

meaningless

representable

We can see clearly that we can represent every possible

collection of add-ins. So we can ignore the unrepresent-

able section.

However, we can list many encodings that are different

but in meaningless ways. Arrays are ordered and add-ins

are not. We have a mist between the domain and our

encoding. All we have to do is reorder the add-ins in the

array. That gives us a different state whose difference

doesn’t mean anything in the model. Soy and espresso is

the same as espresso and soy.

no problems

here

54 Chapter 1

Revisiting the add-ins encoding

one of

zero or

one of

zero or more

AddIn

alternative collection

type AddIn = "soy" |

"espresso" |

"hazelnut" |

"chocolate" |

"almond" ;

type AddIns = AddIn[];

Here is the same view of the add-ins model and encoding

that we saw on page 45.

We’re going to zoom into the bottom half of this diagram

to see the different choices we have for encoding it, just

like we did for the size.

Remember, we broke this concept from our model into

two pieces: choosing an add-in (alternative) and collecting

them together (collection).

We encoded the concept with a union type of strings for

the alternative and an array for the collection. We’ll keep

the model as given, and we’ll keep the union type, but we’ll

consider our options for representing the collection. We’ll

see the options on the next page.

Soymilk Espresso

Almond

Chocolate

Hazelnut

alternative in model

collection in model

union type in code

array in code

"soy"

"espresso"

"almond"

"chocolate""hazelnut"

I want to emphasize again the importance of considering

different options. It may seem like experienced people

don’t do this, but they do. They just do it very quickly. To

get quick, you have to do it a lot.

55Data Lens Part 1

Set

type Addins = Set<AddIn>;

Examples:

new Set(["soy", "almond", "chocolate"])

new Set()

new Set(["almond", "espresso"])

JS Object

type AddIns = {

[addIn: string] : number;

};

Examples:

{"almond": 1, "soy": 2}

{"hazelnut": 2, "espresso": 1}

Array

type AddIns = AddIn[];

Examples:

["soy", "hazelnut"]

[]

["espresso", "espresso"]

Many options for encoding collections

collection in model

possibilities in code

zero or

Let’s zoom into the encoding of the add-ins collection. We

are looking for constructs in our language that give us the

same (or similar) structure as the zero-or-more structure

we identied in the model.

Here are some constructs TypeScript gives us to encode

collections. I’ve also put their t Venn diagram alongside

each one.

all

some

all

["soy", "almond"]

["almond", "soy"]

...

new Set(["soy, "soy"])

new Set(["almond", "almond"])

...

{"soy": 0}

{"soy": -1}

{"dfs": 3}

...

new Map([["soy", -1]])

...

Map

type Addins = Map<AddIn, number>;

Examples:

new Map([["soy", 1]])

new Map()

new Map([["almond", 2], ["espresso", 1]])

56 Chapter 1

Dealing with meaningless states

We’ve gone over four possible options for encoding a col-

lection of add-ins: Array, Set, JS Object, and Map. We can

disqualify Set because it can’t represent all of our states.

The three that are left can represent all states, so they pass

the t test, but they each have extra meaningless states, so

none have perfect t.

Why are meaningless states

a problem? We will have

to handle them one way

or another. Either we pre-

vent them from happening,

which requires work, or we

allow them to happen and

make sense of the mean-

ingless values, which also

requires work.

Different communi-

ties emphasize different

strategies. Some try their

hardest to get perfect t.

For instance, the F# and

ML communities use the

phrase “make illegal states

unrepresentable”.

There are actually two kinds of meaningless states:

1. Meaningless differences

An example of a meaningless differences is two arrays

with the same elements but different orders when the

order is irrelevant. The values encode the same add-ins

in the domain, but the two arrays appear different to the

computer. If you compare them element-wise, they will be

unequal. If you serialize them to JSON, the JSONs will be

different. Etc. We can handle these cases with normaliza-

tion functions.

3. Truly meaningless values

Truly meaningless values have no obvious interpretation.

If you read ["almond", "soy"] from a le, your code

could interpret this to mean, “Put almond and soy in the

coffee.” But if it read {"s oy": -5 4}, what does that mean?

What does {"jfjdksfjl": 3} mean?

You can of course assign these values meanings, but it

helps to avoid these values because they might indicate a

bug somewhere. We can handle these cases with valida-

tion functions.

Array

Meaningless states:

• Same add-ins,

different order

Set

Unrepresentable

states:

• D u p l i c a t e

add-ins

JS Object

Meaningless states:

• Add-in count is

negative integer

• Add-in count is

zero

• Add-in name is

not a valid add-

Map

Meaningless states:

• Add-in count is

negative integer

• Add-in count is

zero

57Data Lens Part 1

Normalization function

Normalization functions eliminate meaningless differenc-

es by converting data into a normal form. Let’s look at the

example of arrays of add-ins.

_.isEqual(a, b)

Compare a and b using

deep value equality. Arrays

are compared element by

element.

function normalizeAddIns(addIns) { //=> addIns

return addIns.toSorted();

}

const a = normalizeAddIns(["soy", "almond"]);

const b = normalizeAddIns(["almond", "soy"]);

_.isEqual(a, b) //=> true

const a = ["soy", "almond"];

const b = ["almond", "soy"];

_.isEqual(a, b) //=> false

These two arrays are different, but in the meaningless

way. They represent the same set of add-ins. That is, the

customer doesn’t care about the order you record the add-

ins. The barrista doesn’t care. The manager doesn’t care.

The accountant doesn’t care. To every stakeholder of the

domain we asked, these mean the same thing.

The normalization function denes what they mean by

converting values to a normal form. We can write our add-

ins normalization function like this:

Our normal form is sorted order. The normal form should

be a form that you can easily convert any value to and that

can be compared for equality. If we sort two arrays with

the same elements, they will then have the same order.

Using this function, we can rewrite the above code and

get a different result:

Although the meaningless differences can still exist, at

any time we can normalize our values, get the normal

forms, and then compare them for equality.

Signature

Normalization functions have the following signature (al-

though they can be named however you want):

Property

Normalization functions should be idempotent. Normaliz-

ing a value already in normal form should be a no-op.

function normalize(t) //=> t

_.isEqual(normalize(a), normalize(normalize(a)))

we’ll see more

about properties

in the composition

lens chapter

you can nd this

function, and many

other useful ones,

in the popular

lodash library

58 Chapter 1

Validation function

const validAddIns = ["soy", "espresso",

"almond", "chocolate", "hazelnut"];

function isValidAddIn(addIn) { //=> boolean

return validAddIns.includes(addIn);

}

function isValidAddInCollection(addIns) { //=> boolean

return Object.keys(addIns).every(isValidAddIn);

}

isValidAddInCollection({"almond": 3}) //=> true

isValidAddInCollection({"fdsfs": 2}) //=> false

function addAddIn(addIns, addIn) { //=> addIns

assert(isValidAddInCollection(addIns));

assert(isValidAddIn(addIn));

...

}

Validation functions identify truly meaningless values.

They act as a lter for invalid values. We often use them to

signal a programming error.

Here are two add-in collections encoded as JS objects.

One of them is invalid.

Having this value in our system will mess things up. We

want to detect it as soon as possible and signal it as an er-

ror. Let’s dene two validation functions, one for a single

add-in, one for a collection of add-ins:

Now we can distinguish between valid and invalid states,

like so:

And we can use it to signal a programming error. Here we

use an assert() to generate an error before trying to add

an add-in when the arguments are invalid:

Signature

Validation functions have the following signature:

function isValid(t) //=> boolean

Property

Validation functions have no special properties.

const a = {"almond": 3}; // valid

const b = {"fdsfs": 2}; // invalid

this is but one of many

possible ways of using

validation functions in

your code

59Data Lens Part 1

Meaninglessness is a choice

In the last few pages, we’ve dened a normalization and

validation functions in very simple ways. But it’s typical-

ly not so simple. How you dene them depends on your

choice of encoding.

For example, what does {"s oy": -1} mean? You are

writing the program, so you (and your team) get to decide.

This value could fall into either of the two kinds of mean-

inglessness:

1. Meaningless difference

2. Truly meaningless value

You could say that it is a meaningless difference. If you

choose that {"s oy": -1} means don’t put soy shots, the

normalization function should do this:

normalize({"soy": -1}) //=> {}

But if say that it is a meaningless value, the validation

function should signal it as invalid:

isValid({"soy": -1}) //=> false

The choice is up to you. It is yet another example of how

many design decisions we have to make within the com-

plex systems we call software. What’s important is con-

sidering each one.

60 Chapter 1

Reporting validation errors

type Validation<T> = {

isValid = true;

value: T;

} | {

isValid = false;

message: string;

};

function isValid(T) //=> Validation<T>

function valid(value) {

return {isValid: true, value};

}

function invalid(message) {

return {isValid: false, message};

}

On page 58 we saw simple denitions of validation

functions that return Booleans. What if we want to report

to the user what was invalid? Booleans don’t contain us-

er-friendly information.

We can create a new type that distinguishes invalid

states and reports a human-readable message about why

the value is invalid. Here is the type and function signa-

ture, along with two utility functions:

And here are the validation functions we wrote before, this

time with nice error messages:

function isValidAddIn(addIn) { //=> Validation<AddIn>

if (validAddIns.includes(addIn))

return valid(addIn);

else

return invalid(`"${addIn}" is not a valid add-in`);

}

function isValidAddInCollection(addIns) { //=> Validation<AddIns>

const invalids = Object.keys(addIns)

.map(isValidAddIn)

.lter(result=>!result.isValid)

.map(result => result.message);

if (invalids.length > 0)

return invalid(

`Invalid add-ins collection: ${invalids.join(', ')}.`

);

else

return valid(addIns);

}

You may know what I’m going to say: choosing how to han-

dle invalid data is one of the many design decisions you

will have to make.

utility functions

61Data Lens Part 1

I’d like to show you a cool trick for improving legacy

data models that I’ve used in the past. Imagine we write

software for a newspaper that needs every article to go

through an editorial process. The process is shown on the

right. A document starts as drafting, then when the author

is done, the editor edits it, then it is published. There are

four states in the model.

However, the encoding we have from our legacy code

has a different number of states. Here is the type.

Normalizing existing data models

type ArticleStatus = {

drafted = boolean;

edited = boolean;

published = boolean;

}

Drafting

Editing

Ready

Published

author

who has

access?

editor

readers

It seems quite clear: as the document ends each step, the

appropriate Boolean is ipped to true. But three Booleans

can encode 2

=8 states! Let’s look at the t:

There were just as many meaningless as representable

states!

It seemed impossible in our code, but with thousands

of articles, we did encounter some meaningless values in

our database. We had documents with:

{

drafted: false,

edited: true,

published: true

}

We never gured out how they happened.

And we had too many articles to change the data model.

We came up with a solution that was a good compromise.

The solution used normalization and validation functions

together. Let’s take a look at the solution on the next page.

Representable: 4

Unrepresentable: 0

Meaningless: 4

The encoding we want

type ArticleStatus =

"drafting" |

"editing" |

"ready" |

"published";

This has perfect t!

62 Chapter 1

Using normalization and validation together

Here are our two types (the legacy type and the desired

type) along with their t.

We can’t change encodings, but we can adapt our cur-

rent encoding to be a closer representation of our model.

Here’s how. We create a new type that is very much like

our desired encoding, but has one more state:

Then we write a function to adapt the old status type to the

new type. It’s kind of like normalizing the old type to the

new type:

Then the validation function considers every stray status

as a meaningless value and identies them:

The encoding we have

type ArticleStatus = {

drafted = boolean;

edited = boolean;

published = boolean;

}

The encoding we want

type ArticleStatus =

"drafting" |

"editing" |

"ready" |

"published";

type ArticleStatus2 = "drafting" |

"editing" |

"ready" |

"published"|

"invalid" ;

function isValidStatus(status) {

return status === "invalid";

}

function adaptStatus(status) { //=> ArticleStatus2

if(_.isEqual(status,

{drafted: false, edited: false, published: false}))

return "drafting";

if(_.isEqual(status,

{drafted: true, edited: false, published: false}))

return "editing";

if(_.isEqual(status,

{drafted: true, edited: true, published: false}))

return "ready";

if(_.isEqual(status,

{drafted: true, edited: true, published: true}))

return "published";

return "invalid";

}

63Data Lens Part 1

An alternative normalization

Let’s look at another way to adapt the old encoding to a

new encoding. Remember, these decisions are up to you.

They depend on all of the context that you have about your

particular domain and codebase. The best I can do is rec-

ommend that you look at as many options as you can. So

here is another one.

Instead of having the four states we want plus one to rep-

resent invalid states, we could keep it to just the four we

want and get perfect t. With a little investigation, it turns

out that our code handls all the meaningless states just

ne. For example, here is the code to determine whether

to show a document as published:

function showPublished(document) { //=> boolean

return document.status.published;

}

It doesn’t check all of the other booleans, even though

technically it should. Basically, the code was already ig-

noring other booleans in the status. The code to check if it

was ready was like this:

We could decide to follow this pattern in our adaptStatus()

function:

This essentially turns all stray statuses into meaningless

differences, then normalizes them to the desired type.

Remember: We always have the choice whether to con-

sider a value truly meaningless or as a meaningless dif-

ference.

function isReady(document) { //=> boolean

return!document.status.published&&

document.status.edited;

}

function adaptStatus(status) { //=> ArticleStatus2

if(document.status.published)

return "published";

if(document.status.edited)

return "ready";

if(document.status.drafted)

return "editing";

return "drafting";

}

The encoding we want

type ArticleStatus2 =

"drafting" |

"editing" |

"ready" |

"published";

0 04

64 Chapter 1

Revisiting the size model

Here’s the same view of the size model we’ve been work-

ing with. We’re going to zoom in again, but this time we’re

going to zoom in on the top, which represents the model.

One of the things that makes software design so difcult is

how interdependent the choices are. Each decision chang-

es the context, and hence affects all the other decisions.

So far, we’ve been dealing with the model as given. But

the model is also a choice. It’s yet another reason that

software design is so difcult: Not only do we have to de-

cide how to encode our model, we have to decide what our

model is in the rst place.

On the next page, we’re going to zoom into the top half

of the diagram above, and visit a few options that we have

for modeling the size of a coffee. That will let us complete

the picture of the domain modeling process.

one of alternative

one of

alternative in model

union type in code

type Size = "super" |

"mega" |

"galactic";

"super" "mega" "galactic"

Super Mega Galactic

65Data Lens Part 1

alternative

count

latte

# of ml

cappuccinoespresso

size and ingredients

related to style

possible models

encodings

Super Mega Galactic

Many options for the size model

We’re zooming into the size model to see a few possible

ways we can model it. So far, we’ve been talking about cof-

fee size as an alternative of three options: super, mega, and

galactic. But that is far from the only way for the business

to model the size.

Another way we could model the size is to sell coffee by

volume. For instance, we could sell coffee by the milliliter

instead of by the size of the cup. If we did that, we would

need to encode the number of milliters the customer

wants. We call this kind of structure a count.

But if we look at traditional espresso bars from Italy, the

size is different. Coffees come in different styles, each

style dictates the ingredients you nd in it, but also the

size and style of cup it is served in. We’re not going to

model it (though you might as an exercise).

There are other ways we could model sizes of coffees, but

that’s enough to explain the difculty of software design.

We’re not just choosing the encoding, we also have to

choose the model (abstracting). Let’s take a look at the do-

main modeling process.

66 Chapter 1

Domain Code

1. abstract

3. evaluate

2. encode

4. look anew

model

The domain modeling process

1. Abstract

We can now describe the domain modeling

process. First, we start with the domain.

The domain is the real-world context of the

job we need our software to do. Before ab-

straction, the context is totally undifferen-

tiated and highly complicated. We need to

eliminate unnecessary details and analyze

the necessary ones. That is the process of

abstraction, which takes the domain and

creates a model. The model is a set of con-

cepts and their relationships.

2. Encode

Once we have a model, we need to encode

its concepts and relationships in terms of

our programming language. So far, we’ve

seen how to encode the data, but in the

next chapter we’ll see how to encode the

operations as well. There are always multi-

ple ways to encode something, so we have

to make design decisions.

We encode the model in code so that we

can run the code and see what it does. We

can test it, using manual or automated test-

ing. Or we can create a prototype.

3. Evaluate

Once we have an encoding, we can evaluate

it. We might learn that we need a different

encoding. We can go through the cycle on

the right-hand side multiple times, revis-

ing our encoding each time until it is good

enough.

On the other hand, we may decide that

our encoding is ne, but the model needs

to change. In that case, we need to take a

new look at the domain.

4. Look anew

Sometimes, what we learn by running our

encoding is that our model isn’t going to

work. That’s another advantage of encod-

ing our model as code instead of some oth-

er language. When you run an encoding

with good t, you can see if the model itself

is good. If it is, keep working on the right-

hand cycle. But if it’s not, it’s time to look

anew at your domain and start abstracting

again. So we may iterate through the left-

hand cycle as well.

67Data Lens Part 1

Domain modeling glossary

Abstraction

Abstraction is the process of analyzing a do-

main to synthesize the important concepts

and their relationships. The end result is a

model. The MegaBuzz model (sizes, roasts,

add-ins) was abstracted for us by the busi-

ness owners.

Data model

Data model refers to the encoding of the

information of a model in data values and

structures. We’ve developed a data model

of coffee at MegaBuzz. We have not yet en-

coded other aspects of the model, such as

operations.

Domain

The domain is the real-world context of the

job we need our softare to do. In the exam-

ples we’ve seen so far, the domain is the

business of MegaBuzz, the coffee shop.

Domain expert

A domain expert is a person who under-

stands the model who can serve as a re-

source for the encoding process. The baris-

ta is a domain expert in the preparation of

coffees. An accountant is a domain expert

in accounting. Hopefully, we programmers

become domain experts through the pro-

cess of domain modeling.

Domain model

Domain model is a loose term sometimes

refering to the model, but just as often to

the encoding. One might refer to every-

thing we’ve done so far as domain modeling.

We’ve been using a bunch of terms without dening them

well. It’s now time to give them good denitions with ex-

amples from our MegaBuzz project.

Encoding

The encoding is a physical representation

of the model used for communication with

people and computers. The encoding can

be diagrams, natural language, formal log-

ic, or a programming language. We have

built an encoding in a programming lan-

guage, but we also had English-language

descriptions and diagrams.

Evaluation

We evaluate an encoding by judging it in

various ways. One way is with t. We eval-

uate an encoding to learn how to improve

it. We evaluated encodings for add-in col-

lections using t.

Fit

Fit measures how closely we encode a mod-

el’s domain concept or relationhip.

Look anew

We look anew to improve a model by under-

standing problems with its concepts and

relationships. We haven’t seen an exam-

ple yet, but imagine if our coffee encoding

taught us that we should change how the

business runs.

Model

A model is a set of related domain concepts.

As concepts, they exist in the mind. One

model we’ve seen is the concepts of 3 siz-

es, 3 roasts, and add-in collections. Some

models we haven’t talked about are money

and marketing promotions.

68 Chapter 1

Conclusion

In this chapter we’ve done quite a lot. We saw how to en-

code a data model, how the domain modeling process

works, and how we can evaluate a data model. This pro-

cess might be done intuitively, but revisiting it at a deep-

er level can give us insights into how we can improve our

design skills.

Summary

• Data models encode information about domain con-

cepts and their relationships. We use data types and

data structures to mimic their structure in our code.

• The relationships in a model often fall into several

common patterns such as alternative or combination.

We should understand the semantics of the available

language features so that we can use them to encode

those relationships.

• It is rare to nd that a language feature encodes a con-

cept or relationship exactly. We can use normalization

and validation functions when they don’t exactly t.

• The domain modeling process is about learning. We

abstract concepts from the domain and encode those

concepts. We can evaluate the encoding against the

model and evaluate the model against the domain.

• The best way to get good t is to compare multiple en-

codings and choose the best one. Too often, we use the

rst encoding we think of.

Up next . . .

We’ve seen some basic data modeling techniques. In the

next chapter, we’ll continue our exploration of encoding

structure with data and deeper our understanding of the

process.

see the Data Lens

Supplement for more

common patterns