Data Lens Part 2 - Runnable Specifications

← Contents · Runnable Specifications by Eric Normand · Work in progress · Comments

Chapter 2

Data Lens Part 2

Chapter objectives

• Understand the important distinction between com-

ponent and peer relationships.

• Discover when and how to reify a relationship into its

own encoding.

• Learn the various ways we can enforce the structure

of our encoding.

In Chapter 1, we began our exploration of encoding the re-

lationships among values in a domain. This chapter con-

tinues that exploration, diving deeper into the structure of

more complex relationships. We get into some theory. And

we end it by learning different ways to encode and enforce

structure.

{

size : "super",

roast : "burnt",

addIns: ["soy"]

}

{

size : "mega",

roast : "burnt",

addIns: ["soy"]

}

set size to mega

encode

decode

setSize(coffee, "mega")

domain

software

student→course student←course student↔course

relationship

student course

you’ll understand both of

these diagrams by the

end of this chapter

70 Chapter 2

Coffee order model

We’ve seen a lot of structure so far in a relatively simple

model—that of coffees. Now we’re going to explore the

modeling of relationships even deeper. Here is a coffee

order model that I’ve simplied to highlight the relation-

ships.

In this model, a coffee order has a customer and a

collection of coffees. The customer has an identier and a

name.

How do we encode this? One approach would be like this:

However, this approach has major drawbacks that we need

to discuss. An example JSON for an order is to the right.

What happens if I put this JSON into a document data-

base? The order has a copy of the customer, which is also

in the customer table, and also in all other orders. What

happens if I change the name eld of the order’s custom-

er? Or what if I change the name of the customer in the

customer table? Now this one will be wrong. We need to

wrestle with these questions when we model relation-

ships.

It’s not that the questions don’t have answers. We can

come up with some. The problem is that the answers

complicate our model. The hard questions imply a lack of

elegance. It would be nice to nd models that don’t have

hard questions.

This kind of thinking is actually a preview of what we’ll

see in Chapter 3 (Operation Lens). We’re using the opera-

tions of duplicate order and change customer name to un-

cover problems in the data model. We’ll learn how to do

this better in that chapter.

type Order = {

customer = Customer;

coffees = Coffee[];

};

type Customer = {

id = CustomerID;

name = string;

};

Coffees

Customer

all of

{

customer: {

id: "123",

coffees: [{

size: "super",

roast: "burnt",

addIns: []

}]

}

71Data Lens Part 2

Component relationships imply ownership

Peer relationships reference by identier

type Order = {

customer = Customer;

coffees = Coffee[];

};

type Order = {

customer = CustomerID;

coffees = Coffee[];

};

{

customer: "123",

coffees: [{

size: "super",

roast: "burnt",

addIns: []

}]

}

When a piece of data nests another piece of data inside of

it, it implies ownership. In the case of the order nesting

the customer, we saw a number of drawbacks because the

customer does not belong to the order.

On the other hand, the coffee that is part of the order

doesn’t seem to have the same problems. Duplicating a

coffee into a new order is not a problem, nor is changing

the copy. They’re different coffees for different orders! It

seems like the coffee does belong to the order.

This kind of relationship is called a component relation-

ship. We encode it by nesting data.

Another choice for modeling a relationship is as a peer.

Here is the type of the peer relationship version of an or-

der and a JSON example to the right.

In this model, instead of nesting, we use an identier to

refer to the customer. The customer record has to exist

somewhere else, and you need a way to look it up by the

identier.

Note that for this encoding, we don’t have the same

hard questions we had for the nested encoding because

we’re not duplicating the customer record. One might

say that this model more elegantly mimics the domain’s

structure.

Question:

Could we model the coffee relationship as a peer rela-

tionship

Two types of

relationships

1. Component relation-

ship

2. Peer relationship

72 Chapter 2

Exercise

type Order = {

customer = CustomerID;

paymentMethod = PaymentMethod;

barista = BaristaID;

deliveryMethod = DeliveryMethod;

orderMethod = OrderMethod;

coffees = Coffee[];

};

type Coffee = {

size : Size;

roast : Roast;

addIns: AddIn[];

};

For each of the parts of the Order, identify whether it is

a peer or component relationship. Check the appropriate

box. For collections, answer for the elements of the collec-

tion.

For each of the parts of the Coffee, identify whether it is

a peer or component relationship. Check the appropriate

box. For collections, answer for the elements of the collec-

tion.

Peer

Component

73Data Lens Part 2

Nest component relationships

Reference peer relationships

Q&A

Peer relationships should not be nested. If you need to

deep copy a piece of data, any nested data will also be cop-

ied. Instead, we should reference the peer using an identi-

er or a name.

We nest data under a larger piece of data when it is con-

sidered a component of the larger thing. For instance, the

coffees that belong to an order are components of the or-

der. We nest the coffee data structures inside of the order

data structures.

Example: Order references

customer and barista by identier

{

customer: "2312231",

...

}

Example: Coffee references size,

roast, and add-ins by name

{

size: "super",

roast: "charcoal",

addIns: ["soy", "almond"]

}

Example: Coffee nested inside order

{

...

coffees: [{

size: "super",

roast: "burnt",

addIns: []

}]

}

Doesn’t nesting use pointers to reference data?

Why isn’t that good enough?

Referencing by id or name is more durable than a pointer

to an object in memory. When you send data to another

machine, you can’t refer to the location in memory any-

more. The identier or name is something that you can

serialize and read back in without losing its meaning.

74 Chapter 2

Operations and composition constrain data

In the last few pages, we’ve been looking at the decision

of whether to nest data or to reference data. We made that

decision by noting how many difcult questions we face

when we copy the order or when we change the name of

the customer. In that case, we’re applying the operation

lens to guide our thinking. We thought through the conse-

quences of the copy order and change customer name opera-

tions. Knowledge from the operations constrained our

data.

The operation lens asks us to consider the domain op-

erations that will have to apply to our data. We can do this

a little bit while we’re modeling our data. But we will see

in Chapter 3 that it is better to begin with the operations

rst, before we encode our data, because each operation

constrains the decision space. The more constraints we

have, the easier decisions become. Each constraint may

eliminate choices, making our decisions easier.

Further, in addition to considering the operations

alone, we considered them in composition. We asked,

“What happens when we change the name after making

a copy?” With that question, we are applying the compo-

sition lens, which we’ll see in Chapter 4. Operations are

rarely applied alone. Instead, we apply a set of operations,

often in sequence, to achieve a result. We want those

sets of operations to guarantee properties. Composition

guarantees constrain our choices further.

It is very hard to make decisions about data structures

without some thought about the operations that will use

it. There are simply too many choices to rely on t alone.

That is why we’re talking about it now, before the chap-

ters on the operation and composition lenses.

Data encoding is on the implementation side, while

operations and composition are on the specication side.

We generally want to be modeling in specications with-

out regard to implementation. Although that means we

should work within the operation and composition lens-

es rst, I have found that people don’t feel comfortable

thinking so abstractly before they have learned the con-

crete skills of modeling and encoding data.

So let’s keep trucking along with the data lens. We’ll

get to operations and their composition in due time.

Operations we

considered

• copy order

• change customer name

Compositions we

considered

• copy order then

• change customer name

75Data Lens Part 2

Reifying relationships

It is common to encode relationships using references, ei-

ther by name or identier, within the data structure itself.

For example, the customer eld is part of the order data

structure. Another way to do it, which is often overlooked,

is to encode the relationship in its own data structure. We

call this reifying the relationship, which means we make the

relationship its own thing. We see this often in relational

databases where we make a table just to relate two data

items.

Let’s look at a scenario where reifying a relationship

makes sense for practical coding reasons and directly

encodes the model.

The Student-Course Problem

Imagine we’ve been hired to write software for keeping

track of student registrations. When a student registers

for a course, we need to record that fact so that the pro-

fessor can get a roster of students and the student can see

their schedule. Here are the concepts we need to relate:

to reify : to represent some-

thing abstract as a concrete

thing

“reify.” Merriam-Webster.com. 2024.

https://www.merriam-webster.com

(14 Feb 2024).

Student

Schedule

Roster

Course

This is a two-way relationship. Students enroll in courses,

and courses know their students. We’re going to look at

four different options for modeling two-way relationships

starting on the next page.

76 Chapter 2

Encoding two-way relationships

The Student/Course relationship is a two-way relationship.

The student needs to know their courses and the course

needs to know its students.

There are four options for modeling two-way relation-

ships. You can prioritize one direction over the other and

model it as a one-way relationship, losing some tness to

the domain. There are two ways to do that, shown as two

different arrow directions in the diagram. Another op-

tions is to maintain both references, which is hard to do.

And nally, you can reify the relationship to its own piece

of data.

student→course student←course student↔course

relationship

student course

If I know the student,

I can get the course.

If I know the course, I

can get the student.

If I know either, I

can get the other.

If I know either, I

can get the other.

The reason maintaining both references is hard to do is

that if one changes, we have to change the other. These

are in two separate data structures. They are now coupled

together. When we change the code for one, we have to

change the code for the other. This is error prone.

While all of these options have their place, the last one

is not talked about as much as the others, even though it

is often the best for the situation. In particular, it decou-

ples the two structures by centralizing the two references

of the relationship.

Let’s go over each option. We’ll show the types for the

encodings for student and course. And we’ll implement

some operations to compare the options.

these two models

sacrice tness to

the domain to make

encoding easier

this model matches the

domain, but encoding it

is difcult

this model matches the

domain, and encoding it

is easier

Three parts

1. Domain - the context

2. Model - our concepts

3. Encoding - the soft-

ware

77Data Lens Part 2

Prioritizing one direction

A common recommendation (that I don’t agree with) for

handling two-way relationships is to prioritize one direc-

tion. The reasoning goes like this:

• Encoding two-way relationships is hard.

• When you consider the operations in your model, one

direction is often more important.

• To make the encoding easier, only encode the more

important direction.

This sacrices tness between the model and the domain

to make encoding it easier. The encoding to the right pri-

oritizes the student → course relationship.

Here are implementations of our two operations:

Because we only have one direction, we have to iterate

through all students to generate the roster. It is expensive

and awkward for the computer.

There are three problems with the the prioritzation ar-

gument. First, it fails to consider the option of reing the

relationship. Reication is always an option.

Second, it is rare to nd that one direction is not im-

portant. Both directions are usually important enough—

that’s why they’re in the domain to begin with.

Third, even if one direction is signicantly more im-

portant, the cost of operating on the omitted direction is

high. If we have many more students than courses and

we generate many more schedules than rosters, it im-

plies student → courses is more important. But it is very

expensive to generate those rosters.

type Student = {

id: StudentID;

courses: CourseID[];

...

};

type Course = {

id: CourseID;

...

};

function generateSchedule(student) { //=> Schedule

return { student: student.id, courses: student.courses };

}

function generateRoster(allStudents, courseId) { //=> Roster

return {

course: courseId,

students: allStudents

.lter(s=>s.courses.includes(courseId))

.map(s=>s.id)

};

}

Q&A

How can you model a two-

way relationship with only

one reference? Isn’t that

technically impossible?

It is possible to model it

however you want. It is like

ignoring air resistance in a

physics problem. We know

there is air resistance, but

we assume it is negligble.

The process of abstraction

is about ignoring details

that don’t matter. The im-

portant question is: Do both

directions matter?

78 Chapter 2

Encoding student↔course

We can encode the two-way relationship using two refer-

ences.

type Student = {

id: StudentID;

courses: CourseID[];

...

};

function register(student,course){//=>[Student,Course]

return [update(student, 'courses', arrayAddUnique, course.id),

update(course, 'students', arrayAddUnique, student.id)];

}

function unregister(student,course){//=>[Student,Course]

return [update(student, 'courses', arrayRemove, course.id),

update(course, 'students', arrayRemove, student.id)];

}

type Course = {

id: CourseID;

students: StudentID[];

...

};

Generating schedules is easy. We’ve already seen it. And

generating rosters is very similar. But let’s nd two oper-

ations that stress both references at the same time. What

about registering and unregistring for courses?

Note that we have to return both the student and the course

from each operation because they are both modied and

we are using a functional approach. The coupling is ap-

parent. Also note that we have to store the new versions of

these data structures (not shown) so that the new versions

can be looked up by identier. To keep things consistent,

we would prefer the two storages to happen atomically.

With those two notes out of the way, the tness be-

tween the encoding, model, and domain are good.

The implementations of these two functions is rather

straightforward, though slightly awkward due to the cou-

pling. We have to remember that we cannot change one

side of the relationship without changing the other. Hav-

ing explicit functions to register and unregister is helpful,

but we could use some more help so we don’t have to rely

exclusively on discipline.

We have one more option to work through. Let’s take a

look on the next page.

79Data Lens Part 2

Reifying the student↔course relationship

The last option is to encode the two-way relationship by

making a new type that references both:

type Student = {

id: StudentID;

...

};

type Registry = {

byStudent: { [studentId: string]: CourseID[] };

byCourse: { [courseId: string]: StudentID[] };

};

function register(registry, studentID, courseID) { //=> Registry

const r2 = updateIn(registry, ['byStudent', studentID],

addArrayUnique, courseID);

return updateIn(r2, ['byCourse', courseID],

addArrayUnique, StudentID);

}

function unregister(registry, studentID, courseID) { //=> Registry

const r2 = updateIn(registry, ['byStudent', studentID],

arrayRemove, courseID);

return updateIn(r2, ['byCourse', courseID],

arrayRemove, studentID);

}

type Course = {

id: CourseID;

...

};

Let’s see the implementations of the operations.

This encoding combines the two sides of the relationship

into one place. That makes it easy to update them in one

operation and one storage (not shown).

This encoding also has a cool advantage: You don’t

need to have the student of the course, just their identi-

ers. But there is another advantage that is more subtle.

We’ll see it on the next page.

function generateSchedule(registry, studentID) { //=> Schedule

return { student: studentID,

courses: registry.byStudent[studentID] || [] };

}

function generateRoster(registry, courseID) { //=> Roster

return { course: courseID,

students: registry.byCourse[courseID] || [] };

}

80 Chapter 2

Generalizing a many-to-many relationship

typeManyToMany<A,B>={

indexByA:Map<A,Set<B>>;

indexByB:Map<B,Set<A>>;

};

functionrelate(relationship,a,b){//=>ManyToMany<A,B>

const r2 = updateIn(registry, ['indexByA', a],

setAdd, b);

return updateIn(r2, ['indexByB', b],

setAdd, a);

}

functionunrelate(relationship,a,b){//=>ManyToMany<A,B>

const r2 = updateIn(registry, ['indexByA', a],

setRemove, b);

return updateIn(r2, ['indexByB', b],

setRemove, a);

}

Now for the operations.

functionreportA(relationship,a){//=>Set<B>

return relationship.indexByA[a] || new Set();

}

functioncontains(relationship,a,b){//=>boolean

return reportA(relationship, a).has(b);

}

functionreportB(relationship,b){//=>Set<A>

return relationship.indexByB[b] || new Set();

}

The student/course relationship is an example of a ma-

ny-to-many relationship. Many-to-many relationships are

super common. It would be a shame to have to implement

the same thing every time.

Of course we don’t have to! Let’s encode the general

many-to-many relationship so that we can reuse it. Previ-

ously, we used strings, objects, and array. This time, we’ll

use custom types, maps, and sets. The principle is the

same.

If your model has lots of many-to-many relationships, it

may be worth it to generalize like this.

81Data Lens Part 2

Q&A

These data structures are supposed to encode

real world objects that have relationships.

Shouldn’t the relationship be part of the data

structure?

Well, that’s an interesting question. When teaching ob-

ject-oriented design, it is common to instruct students to

model the real-world objects by encoding each kind of ob-

ject as a class. However, are we really trying to encode re-

al-world objects? In the Domain Lens chapter, we explore

the question of what it is we are actually modeling.

But we can address it briey here. Not everything in

our encoding has to correspond to a part of a real-world

object. Does a student walk around with a list of cours-

es they’re in? Do they remember the course identiers?

Have they memorized their student id? Can you send

them a “register for course” message? No.

From a different angle, imagine how a university

would have kept track of students in 1900. They would

probably record the student/course registrations in a

book. A secretary would write the student’s identier and

the course identier, following certain rules. Those rules

likely maintain the many-to-many relationship. That

book is called a registry.

So our Registry type is encoding that real-world ob-

ject—the book a university would use—which didn’t ap-

pear in the problem description at all. Yet the registry

book is a plausible idea. And I made it up! What’s going

on?

We’re not encoding real-world objects. We’re encoding

our concepts of them, which we’ve been calling a model.

Particularly, we want the concepts that further the work

of the software. Coming up with those good concepts is

called abstraction.

We shouldn’t be too wedded to our initial, naive con-

cepts, which is what happens when we take a look at the

problem description and immediately encode the stuff

in it. This whole book is my attempt to open your mind to

the mental space between domain and encoding.

82 Chapter 2

Theory corner: Isomorphisms

The data lens concepts have deep mathematical mean-

ings. Knowing the mathematical notions can enhance our

appreciation for the profundity of the work we do.

The domain modeling process we’ve been following is

about constructing isomorphisms between the domain

and the software. An isomorphism is when two different

representations have the same structure so that you can

operate on them interchangeably.

{

size : "super",

roast : "burnt",

addIns: ["soy"]

}

{

size : "mega",

roast : "burnt",

addIns: ["soy"]

}

set size to mega

encode

decode

setSize(coffee, "mega")

domain

software

This diagram illustrates an isomorphism between a de-

sired coffee and the JSON that encodes it. You can follow

the arrows through two paths and get the same result.

Path 1: We start with the super coffee in the top left.

This is in the domain. When the customer asks to change

the size to mega, the barista could mentally change it to a

mega coffee, still within the domain (top arrow).

Path 2: Starting from the super coffee they are order-

ing, we encode it into JSON in our software. Using the

software, the barista triggers a setSize() operation,

which produces a different JSON. The barista can then

read that JSON or some visual representation of it (de-

code it) to understand what coffee the customer wants.

In an isomorphism, operating on either side of the do-

main/software wavy line will give the same results, and

we can always convert between them. This is what allows

our software to do its job. And it’s why it’s important to

look to the domain for answers to design questions.

path 2

path 1

the customer

orders this

the customer

changes the

order to this

83Data Lens Part 2

Enforcing structure

It is easy to accidentally violate the isomorphism between

the domain and the encoding. That is why we try to enlist

as much help as we can from the language, tools, and pro-

gramming practices we have available. We usually use a

combination of the following:

Types

Types are the strictest way to enforce structure. What-

ever you can encode in the types will be checked by the

compiler. However, not all structure can be encoded in all

type systems. And type systems also inuence the way we

model our domain. We will also consider how volatility in

our domain affects our use of types in the volatility lens

chapter.

Language features

All languages come with some built-in features (like class-

es, functions, etc.) that have their own semantics. As we

mature as modelers, we move past basic recipes by deeply

understanding feature semantics and choosing features

that have similar semantics to what we are trying to en-

code. See the Data Lens Supplement for examples.

Automated tests

Tests can enforce the structure of our encodings. Instead

of testing just the behavior of functions, consider testing

other properties too. For instance, that two functions are

inverses. We’ll explore properties more in the operation

and composition lens chapters.

Data structures

Data structures enforce their own structure. If we can nd

a data structure that has the same structure as our model,

we can use it to encode it.

Runtime checks

We developed normalization and validation functions that

can be used to check and enforce structure at runtime.

Your brain and discipline

Using discipline is by far the most error-prone approach.

84 Chapter 2

Conclusion

In this chapter, we’ve seen how to encode the structure of

relationships, including reifying relationships into their

own data structure. We’ve also discussed the technical

term isomorphism. And we’ve seen the common language

features available to enforce the structures we’ve been

uncovering.

Summary

• Relationships between entities are either component

or peer relationships. We should nest component

relationships and use names or identiers to encode

peer relationships.

• Relationships can be reied into their own data

structure. We do this when it helps model it with

better t.

• We want our domain, model, and encoding to share

a common structure, with ways to translate between

them. This is called an isomorphism.

• There are many ways to enforce structure in an en-

coding. We need to decide which ways we will use.

Up next . . .

Now that we’re good at data modeling, we can forget it for

a moment and focus on operations. The secret is that op-

erations are a more powerful way to think about a model,

but we couldn’t learn the operations without being good

at data modeling. Modeling the operations is the topic of

the next chapter. But don’t miss the Data Lens Supple-

ment, which describes relationships commonly found in

domains and various options for encoding them.