← Contents · Runnable Specifications by Eric Normand · Work in progress · Comments
Chapter 2
Data Lens Part 2
Chapter objectives
Understand the important distinction between com-
ponent and peer relationships.
Discover when and how to reify a relationship into its
own encoding.
Learn the various ways we can enforce the structure
of our encoding.
In Chapter 1, we began our exploration of encoding the re-
lationships among values in a domain. This chapter con-
tinues that exploration, diving deeper into the structure of
more complex relationships. We get into some theory. And
we end it by learning different ways to encode and enforce
structure.
69
{
size : "super",
roast : "burnt",
addIns: ["soy"]
}
{
size : "mega",
roast : "burnt",
addIns: ["soy"]
}
set size to mega
encode
decode
setSize(coffee, "mega")
domain
software
student→course student←course student↔course
relationship
student course
you’ll understand both of
these diagrams by the
end of this chapter
70 Chapter 2
Coffee order model
We’ve seen a lot of structure so far in a relatively simple
model—that of coffees. Now were going to explore the
modeling of relationships even deeper. Here is a coffee
order model that I’ve simplied to highlight the relation-
ships.
In this model, a coffee order has a customer and a
collection of coffees. The customer has an identier and a
name.
How do we encode this? One approach would be like this:
However, this approach has major drawbacks that we need
to discuss. An example JSON for an order is to the right.
What happens if I put this JSON into a document data-
base? The order has a copy of the customer, which is also
in the customer table, and also in all other orders. What
happens if I change the name eld of the orders custom-
er? Or what if I change the name of the customer in the
customer table? Now this one will be wrong. We need to
wrestle with these questions when we model relation-
ships.
Its not that the questions don’t have answers. We can
come up with some. The problem is that the answers
complicate our model. The hard questions imply a lack of
elegance. It would be nice to nd models that don’t have
hard questions.
This kind of thinking is actually a preview of what we’ll
see in Chapter 3 (Operation Lens). We’re using the opera-
tions of duplicate order and change customer name to un-
cover problems in the data model. We’ll learn how to do
this better in that chapter.
type Order = {
customer = Customer;
coffees = Coffee[];
};
type Customer = {
id = CustomerID;
name = string;
};
Coffees
Customer
all of
{
customer: {
id: "123",
name: "John",
},
coffees: [{
size: "super",
roast: "burnt",
addIns: []
}]
}
71Data Lens Part 2
Component relationships imply ownership
Peer relationships reference by identier
type Order = {
customer = Customer;
coffees = Coffee[];
};
type Order = {
customer = CustomerID;
coffees = Coffee[];
};
{
customer: "123",
coffees: [{
size: "super",
roast: "burnt",
addIns: []
}]
}
When a piece of data nests another piece of data inside of
it, it implies ownership. In the case of the order nesting
the customer, we saw a number of drawbacks because the
customer does not belong to the order.
On the other hand, the coffee that is part of the order
doesn’t seem to have the same problems. Duplicating a
coffee into a new order is not a problem, nor is changing
the copy. Theyre different coffees for different orders! It
seems like the coffee does belong to the order.
This kind of relationship is called a component relation-
ship. We encode it by nesting data.
Another choice for modeling a relationship is as a peer.
Here is the type of the peer relationship version of an or-
der and a JSON example to the right.
In this model, instead of nesting, we use an identier to
refer to the customer. The customer record has to exist
somewhere else, and you need a way to look it up by the
identier.
Note that for this encoding, we dont have the same
hard questions we had for the nested encoding because
were not duplicating the customer record. One might
say that this model more elegantly mimics the domains
structure.
Question:
Could we model the coffee relationship as a peer rela-
tionship
Two types of
relationships
1. Component relation-
ship
2. Peer relationship
72 Chapter 2
Exercise
Exercise
type Order = {
customer = CustomerID;
paymentMethod = PaymentMethod;
barista = BaristaID;
deliveryMethod = DeliveryMethod;
orderMethod = OrderMethod;
coffees = Coffee[];
};
type Coffee = {
size : Size;
roast : Roast;
addIns: AddIn[];
};
For each of the parts of the Order, identify whether it is
a peer or component relationship. Check the appropriate
box. For collections, answer for the elements of the collec-
tion.
For each of the parts of the Coffee, identify whether it is
a peer or component relationship. Check the appropriate
box. For collections, answer for the elements of the collec-
tion.
Peer
Peer
Component
Component
73Data Lens Part 2
Nest component relationships
Reference peer relationships
Q&A
Peer relationships should not be nested. If you need to
deep copy a piece of data, any nested data will also be cop-
ied. Instead, we should reference the peer using an identi-
er or a name.
We nest data under a larger piece of data when it is con-
sidered a component of the larger thing. For instance, the
coffees that belong to an order are components of the or-
der. We nest the coffee data structures inside of the order
data structures.
Example: Order references
customer and barista by identier
{
customer: "2312231",
...
}
Example: Coffee references size,
roast, and add-ins by name
{
size: "super",
roast: "charcoal",
addIns: ["soy", "almond"]
}
Example: Coffee nested inside order
{
...
coffees: [{
size: "super",
roast: "burnt",
addIns: []
}]
}
Doesn’t nesting use pointers to reference data?
Why isn’t that good enough?
Referencing by id or name is more durable than a pointer
to an object in memory. When you send data to another
machine, you cant refer to the location in memory any-
more. The identier or name is something that you can
serialize and read back in without losing its meaning.
74 Chapter 2
Operations and composition constrain data
In the last few pages, we’ve been looking at the decision
of whether to nest data or to reference data. We made that
decision by noting how many difcult questions we face
when we copy the order or when we change the name of
the customer. In that case, we’re applying the operation
lens to guide our thinking. We thought through the conse-
quences of the copy order and change customer name opera-
tions. Knowledge from the operations constrained our
data.
The operation lens asks us to consider the domain op-
erations that will have to apply to our data. We can do this
a little bit while were modeling our data. But we will see
in Chapter 3 that it is better to begin with the operations
rst, before we encode our data, because each operation
constrains the decision space. The more constraints we
have, the easier decisions become. Each constraint may
eliminate choices, making our decisions easier.
Further, in addition to considering the operations
alone, we considered them in composition. We asked,
“What happens when we change the name after making
a copy?” With that question, we are applying the compo-
sition lens, which we’ll see in Chapter 4. Operations are
rarely applied alone. Instead, we apply a set of operations,
often in sequence, to achieve a result. We want those
sets of operations to guarantee properties. Composition
guarantees constrain our choices further.
It is very hard to make decisions about data structures
without some thought about the operations that will use
it. There are simply too many choices to rely on t alone.
That is why we’re talking about it now, before the chap-
ters on the operation and composition lenses.
Data encoding is on the implementation side, while
operations and composition are on the specication side.
We generally want to be modeling in specications with-
out regard to implementation. Although that means we
should work within the operation and composition lens-
es rst, I have found that people dont feel comfortable
thinking so abstractly before they have learned the con-
crete skills of modeling and encoding data.
So lets keep trucking along with the data lens. We’ll
get to operations and their composition in due time.
Operations we
considered
copy order
change customer name
Compositions we
considered
copy order then
change customer name
75Data Lens Part 2
Reifying relationships
It is common to encode relationships using references, ei-
ther by name or identier, within the data structure itself.
For example, the customer eld is part of the order data
structure. Another way to do it, which is often overlooked,
is to encode the relationship in its own data structure. We
call this reifying the relationship, which means we make the
relationship its own thing. We see this often in relational
databases where we make a table just to relate two data
items.
Lets look at a scenario where reifying a relationship
makes sense for practical coding reasons and directly
encodes the model.
The Student-Course Problem
Imagine we’ve been hired to write software for keeping
track of student registrations. When a student registers
for a course, we need to record that fact so that the pro-
fessor can get a roster of students and the student can see
their schedule. Here are the concepts we need to relate:
to reify : to represent some-
thing abstract as a concrete
thing
reify.” Merriam-Webster.com. 2024.
https://www.merriam-webster.com
(14 Feb 2024).
Student
Schedule
Roster
Course
This is a two-way relationship. Students enroll in courses,
and courses know their students. We’re going to look at
four different options for modeling two-way relationships
starting on the next page.
76 Chapter 2
Encoding two-way relationships
The Student/Course relationship is a two-way relationship.
The student needs to know their courses and the course
needs to know its students.
There are four options for modeling two-way relation-
ships. You can prioritize one direction over the other and
model it as a one-way relationship, losing some tness to
the domain. There are two ways to do that, shown as two
different arrow directions in the diagram. Another op-
tions is to maintain both references, which is hard to do.
And nally, you can reify the relationship to its own piece
of data.
student→course student←course student↔course
relationship
student course
If I know the student,
I can get the course.
If I know the course, I
can get the student.
If I know either, I
can get the other.
If I know either, I
can get the other.
The reason maintaining both references is hard to do is
that if one changes, we have to change the other. These
are in two separate data structures. They are now coupled
together. When we change the code for one, we have to
change the code for the other. This is error prone.
While all of these options have their place, the last one
is not talked about as much as the others, even though it
is often the best for the situation. In particular, it decou-
ples the two structures by centralizing the two references
of the relationship.
Lets go over each option. We’ll show the types for the
encodings for student and course. And we’ll implement
some operations to compare the options.
these two models
sacrice tness to
the domain to make
encoding easier
this model matches the
domain, but encoding it
is difcult
this model matches the
domain, and encoding it
is easier
Three parts
1. Domain - the context
2. Model - our concepts
3. Encoding - the soft-
ware
77Data Lens Part 2
Prioritizing one direction
A common recommendation (that I don’t agree with) for
handling two-way relationships is to prioritize one direc-
tion. The reasoning goes like this:
Encoding two-way relationships is hard.
When you consider the operations in your model, one
direction is often more important.
To make the encoding easier, only encode the more
important direction.
This sacrices tness between the model and the domain
to make encoding it easier. The encoding to the right pri-
oritizes the student → course relationship.
Here are implementations of our two operations:
Because we only have one direction, we have to iterate
through all students to generate the roster. It is expensive
and awkward for the computer.
There are three problems with the the prioritzation ar-
gument. First, it fails to consider the option of reing the
relationship. Reication is always an option.
Second, it is rare to nd that one direction is not im-
portant. Both directions are usually important enough—
that’s why theyre in the domain to begin with.
Third, even if one direction is signicantly more im-
portant, the cost of operating on the omitted direction is
high. If we have many more students than courses and
we generate many more schedules than rosters, it im-
plies student → courses is more important. But it is very
expensive to generate those rosters.
type Student = {
id: StudentID;
name: String;
courses: CourseID[];
...
};
type Course = {
id: CourseID;
name: String;
...
};
function generateSchedule(student) { //=> Schedule
return { student: student.id, courses: student.courses };
}
function generateRoster(allStudents, courseId) { //=> Roster
return {
course: courseId,
students: allStudents
.lter(s=>s.courses.includes(courseId))
.map(s=>s.id)
};
}
Q&A
How can you model a two-
way relationship with only
one reference? Isn’t that
technically impossible?
It is possible to model it
however you want. It is like
ignoring air resistance in a
physics problem. We know
there is air resistance, but
we assume it is negligble.
The process of abstraction
is about ignoring details
that dont matter. The im-
portant question is: Do both
directions matter?
78 Chapter 2
Encoding studentcourse
We can encode the two-way relationship using two refer-
ences.
type Student = {
id: StudentID;
name: String;
courses: CourseID[];
...
};
function register(student,course){//=>[Student,Course]
return [update(student, 'courses', arrayAddUnique, course.id),
update(course, 'students', arrayAddUnique, student.id)];
}
function unregister(student,course){//=>[Student,Course]
return [update(student, 'courses', arrayRemove, course.id),
update(course, 'students', arrayRemove, student.id)];
}
type Course = {
id: CourseID;
name: String;
students: StudentID[];
...
};
Generating schedules is easy. We’ve already seen it. And
generating rosters is very similar. But lets nd two oper-
ations that stress both references at the same time. What
about registering and unregistring for courses?
Note that we have to return both the student and the course
from each operation because they are both modied and
we are using a functional approach. The coupling is ap-
parent. Also note that we have to store the new versions of
these data structures (not shown) so that the new versions
can be looked up by identier. To keep things consistent,
we would prefer the two storages to happen atomically.
With those two notes out of the way, the tness be-
tween the encoding, model, and domain are good.
The implementations of these two functions is rather
straightforward, though slightly awkward due to the cou-
pling. We have to remember that we cannot change one
side of the relationship without changing the other. Hav-
ing explicit functions to register and unregister is helpful,
but we could use some more help so we don’t have to rely
exclusively on discipline.
We have one more option to work through. Lets take a
look on the next page.
79Data Lens Part 2
Reifying the studentcourse relationship
The last option is to encode the two-way relationship by
making a new type that references both:
type Student = {
id: StudentID;
name: String;
...
};
type Registry = {
byStudent: { [studentId: string]: CourseID[] };
byCourse: { [courseId: string]: StudentID[] };
};
function register(registry, studentID, courseID) { //=> Registry
const r2 = updateIn(registry, ['byStudent', studentID],
addArrayUnique, courseID);
return updateIn(r2, ['byCourse', courseID],
addArrayUnique, StudentID);
}
function unregister(registry, studentID, courseID) { //=> Registry
const r2 = updateIn(registry, ['byStudent', studentID],
arrayRemove, courseID);
return updateIn(r2, ['byCourse', courseID],
arrayRemove, studentID);
}
type Course = {
id: CourseID;
name: String;
...
};
Lets see the implementations of the operations.
This encoding combines the two sides of the relationship
into one place. That makes it easy to update them in one
operation and one storage (not shown).
This encoding also has a cool advantage: You don’t
need to have the student of the course, just their identi-
ers. But there is another advantage that is more subtle.
We’ll see it on the next page.
function generateSchedule(registry, studentID) { //=> Schedule
return { student: studentID,
courses: registry.byStudent[studentID] || [] };
}
function generateRoster(registry, courseID) { //=> Roster
return { course: courseID,
students: registry.byCourse[courseID] || [] };
}
80 Chapter 2
Generalizing a many-to-many relationship
typeManyToMany<A,B>={
indexByA:Map<A,Set<B>>;
indexByB:Map<B,Set<A>>;
};
functionrelate(relationship,a,b){//=>ManyToMany<A,B>
const r2 = updateIn(registry, ['indexByA', a],
setAdd, b);
return updateIn(r2, ['indexByB', b],
setAdd, a);
}
functionunrelate(relationship,a,b){//=>ManyToMany<A,B>
const r2 = updateIn(registry, ['indexByA', a],
setRemove, b);
return updateIn(r2, ['indexByB', b],
setRemove, a);
}
Now for the operations.
functionreportA(relationship,a){//=>Set<B>
return relationship.indexByA[a] || new Set();
}
functioncontains(relationship,a,b){//=>boolean
return reportA(relationship, a).has(b);
}
functionreportB(relationship,b){//=>Set<A>
return relationship.indexByB[b] || new Set();
}
The student/course relationship is an example of a ma-
ny-to-many relationship. Many-to-many relationships are
super common. It would be a shame to have to implement
the same thing every time.
Of course we dont have to! Let’s encode the general
many-to-many relationship so that we can reuse it. Previ-
ously, we used strings, objects, and array. This time, we’ll
use custom types, maps, and sets. The principle is the
same.
If your model has lots of many-to-many relationships, it
may be worth it to generalize like this.
81Data Lens Part 2
Q&A
These data structures are supposed to encode
real world objects that have relationships.
Shouldn’t the relationship be part of the data
structure?
Well, that’s an interesting question. When teaching ob-
ject-oriented design, it is common to instruct students to
model the real-world objects by encoding each kind of ob-
ject as a class. However, are we really trying to encode re-
al-world objects? In the Domain Lens chapter, we explore
the question of what it is we are actually modeling.
But we can address it briey here. Not everything in
our encoding has to correspond to a part of a real-world
object. Does a student walk around with a list of cours-
es they’re in? Do they remember the course identiers?
Have they memorized their student id? Can you send
them a “register for course” message? No.
From a different angle, imagine how a university
would have kept track of students in 1900. They would
probably record the student/course registrations in a
book. A secretary would write the students identier and
the course identier, following certain rules. Those rules
likely maintain the many-to-many relationship. That
book is called a registry.
So our Registry type is encoding that real-world ob-
ject—the book a university would use—which didnt ap-
pear in the problem description at all. Yet the registry
book is a plausible idea. And I made it up! What’s going
on?
We’re not encoding real-world objects. We’re encoding
our concepts of them, which we’ve been calling a model.
Particularly, we want the concepts that further the work
of the software. Coming up with those good concepts is
called abstraction.
We shouldn’t be too wedded to our initial, naive con-
cepts, which is what happens when we take a look at the
problem description and immediately encode the stuff
in it. This whole book is my attempt to open your mind to
the mental space between domain and encoding.
82 Chapter 2
Theory corner: Isomorphisms
The data lens concepts have deep mathematical mean-
ings. Knowing the mathematical notions can enhance our
appreciation for the profundity of the work we do.
The domain modeling process we’ve been following is
about constructing isomorphisms between the domain
and the software. An isomorphism is when two different
representations have the same structure so that you can
operate on them interchangeably.
{
size : "super",
roast : "burnt",
addIns: ["soy"]
}
{
size : "mega",
roast : "burnt",
addIns: ["soy"]
}
set size to mega
encode
decode
setSize(coffee, "mega")
domain
software
This diagram illustrates an isomorphism between a de-
sired coffee and the JSON that encodes it. You can follow
the arrows through two paths and get the same result.
Path 1: We start with the super coffee in the top left.
This is in the domain. When the customer asks to change
the size to mega, the barista could mentally change it to a
mega coffee, still within the domain (top arrow).
Path 2: Starting from the super coffee they are order-
ing, we encode it into JSON in our software. Using the
software, the barista triggers a setSize() operation,
which produces a different JSON. The barista can then
read that JSON or some visual representation of it (de-
code it) to understand what coffee the customer wants.
In an isomorphism, operating on either side of the do-
main/software wavy line will give the same results, and
we can always convert between them. This is what allows
our software to do its job. And its why its important to
look to the domain for answers to design questions.
path 2
path 1
the customer
orders this
the customer
changes the
order to this
83Data Lens Part 2
Enforcing structure
It is easy to accidentally violate the isomorphism between
the domain and the encoding. That is why we try to enlist
as much help as we can from the language, tools, and pro-
gramming practices we have available. We usually use a
combination of the following:
Types
Types are the strictest way to enforce structure. What-
ever you can encode in the types will be checked by the
compiler. However, not all structure can be encoded in all
type systems. And type systems also inuence the way we
model our domain. We will also consider how volatility in
our domain affects our use of types in the volatility lens
chapter.
Language features
All languages come with some built-in features (like class-
es, functions, etc.) that have their own semantics. As we
mature as modelers, we move past basic recipes by deeply
understanding feature semantics and choosing features
that have similar semantics to what we are trying to en-
code. See the Data Lens Supplement for examples.
Automated tests
Tests can enforce the structure of our encodings. Instead
of testing just the behavior of functions, consider testing
other properties too. For instance, that two functions are
inverses. We’ll explore properties more in the operation
and composition lens chapters.
Data structures
Data structures enforce their own structure. If we can nd
a data structure that has the same structure as our model,
we can use it to encode it.
Runtime checks
We developed normalization and validation functions that
can be used to check and enforce structure at runtime.
Your brain and discipline
Using discipline is by far the most error-prone approach.
84 Chapter 2
Conclusion
In this chapter, we’ve seen how to encode the structure of
relationships, including reifying relationships into their
own data structure. We’ve also discussed the technical
term isomorphism. And we’ve seen the common language
features available to enforce the structures we’ve been
uncovering.
Summary
Relationships between entities are either component
or peer relationships. We should nest component
relationships and use names or identiers to encode
peer relationships.
Relationships can be reied into their own data
structure. We do this when it helps model it with
better t.
We want our domain, model, and encoding to share
a common structure, with ways to translate between
them. This is called an isomorphism.
There are many ways to enforce structure in an en-
coding. We need to decide which ways we will use.
Up next . . .
Now that we’re good at data modeling, we can forget it for
a moment and focus on operations. The secret is that op-
erations are a more powerful way to think about a model,
but we couldn’t learn the operations without being good
at data modeling. Modeling the operations is the topic of
the next chapter. But don’t miss the Data Lens Supple-
ment, which describes relationships commonly found in
domains and various options for encoding them.