Chapter 2
Data Lens Part 2
Chapter objectives
In Chapter 1, we began our exploration of encoding the relationships among values in a domain. This chapter continues that exploration, diving deeper into the structure of more complex relationships. We get into some theory. And we end it by learning different ways to encode and enforce structure.
69
{
size : "super",
roast : "burnt",
addIns: ["soy"]
}
{
size : "mega",
roast : "burnt",
addIns: ["soy"]
}
set size to mega
setSize(coffee, "mega")
domain
software
student→course
student←course
student↔course
relationship
student course
you’ll understand both of these diagrams by the end of this chapter
70
Coffee order model
We’ve seen a lot of structure so far in a relatively simple model—that of coffees. Now we’re going to explore the modeling of relationships even deeper. Here is a coffee order model that I’ve simplified to highlight the relationships.
In this model, a coffee order has a customer and a collection of coffees. The customer has an identifier and a name.
How do we encode this? One approach would be like this:
However, this approach has major drawbacks that we need to discuss. An example JSON for an order is to the right.
What happens if I put this JSON into a document database? The order has a copy of the customer, which is also in the customer table, and also in all other orders. What happens if I change the name field of the order’s customer? Or what if I change the name of the customer in the customer table? Now this one will be wrong. We need to wrestle with these questions when we model relationships.
It’s not that the questions don’t have answers. We can come up with some. The problem is that the answers complicate our model. The hard questions imply a lack of elegance. It would be nice to find models that don’t have hard questions.
This kind of thinking is actually a preview of what we’ll see in Chapter 3 (Operation Lens). We’re using the operations of duplicate order and change customer name to uncover problems in the data model. We’ll learn how to do this better in that chapter.
type Order = {
customer = Customer;
coffees = Coffee[];
};
type Customer = {
id = CustomerID;
name = string;
};
Coffees
Customer
all of
Chapter 2
{
customer: {
id: "123",
name: "John",
},
coffees: [{
size: "super",
roast: "burnt",
addIns: []
}]
}
Data Lens Part 2
71
Component relationships imply ownership
Peer relationships reference by identifier
type Order = {
customer = Customer;
coffees = Coffee[];
};
type Order = {
customer = CustomerID;
coffees = Coffee[];
};
{
customer: "123",
coffees: [{
size: "super",
roast: "burnt",
addIns: []
}]
}
When a piece of data nests another piece of data inside of it, it implies ownership. In the case of the order nesting the customer, we saw a number of drawbacks because the customer does not belong to the order.
On the other hand, the coffee that is part of the order doesn’t seem to have the same problems. Duplicating a coffee into a new order is not a problem, nor is changing the copy. They’re different coffees for different orders! It seems like the coffee does belong to the order.
This kind of relationship is called a component relationship. We encode it by nesting data.
Another choice for modeling a relationship is as a peer. Here is the type of the peer relationship version of an order and a JSON example to the right.
In this model, instead of nesting, we use an identifier to refer to the customer. The customer record has to exist somewhere else, and you need a way to look it up by the identifier.
Note that for this encoding, we don’t have the same hard questions we had for the nested encoding because we’re not duplicating the customer record. One might say that this model more elegantly mimics the domain’s structure.
Question:
Could we model the coffee relationship as a peer relationship
Two types of relationships
Exercise
Exercise
type Order = {
customer = CustomerID;
paymentMethod = PaymentMethod;
barista = BaristaID;
deliveryMethod = DeliveryMethod;
orderMethod = OrderMethod;
coffees = Coffee[];
};
type Coffee = {
size : Size;
roast : Roast;
addIns: AddIn[];
};
For each of the parts of the Order, identify whether it is a peer or component relationship. Check the appropriate box. For collections, answer for the elements of the collection.
72
For each of the parts of the Coffee, identify whether it is a peer or component relationship. Check the appropriate box. For collections, answer for the elements of the collection.
Peer
Peer
Component
Component
Chapter 2
73
Data Lens Part 2
Nest component relationships
Reference peer relationships
Q&A
Peer relationships should not be nested. If you need to deep copy a piece of data, any nested data will also be copied. Instead, we should reference the peer using an identifier or a name.
We nest data under a larger piece of data when it is considered a component of the larger thing. For instance, the coffees that belong to an order are components of the order. We nest the coffee data structures inside of the order data structures.
Example: Order references customer and barista by identifier
{
customer: "2312231",
...
}
Example: Coffee references size, roast, and add-ins by name
{
size: "super",
roast: "charcoal",
addIns: ["soy", "almond"]
}
Example: Coffee nested inside order
{
...
coffees: [{
size: "super",
roast: "burnt",
addIns: []
}]
}
Doesn’t nesting use pointers to reference data? Why isn’t that good enough?
Referencing by id or name is more durable than a pointer to an object in memory. When you send data to another machine, you can’t refer to the location in memory anymore. The identifier or name is something that you can serialize and read back in without losing its meaning.
Operations and composition constrain data
74
Chapter 2
In the last few pages, we’ve been looking at the decision of whether to nest data or to reference data. We made that decision by noting how many difficult questions we face when we copy the order or when we change the name of the customer. In that case, we’re applying the operation lens to guide our thinking. We thought through the consequences of the copy order and change customer name operations. Knowledge from the operations constrained our data.
The operation lens asks us to consider the domain operations that will have to apply to our data. We can do this a little bit while we’re modeling our data. But we will see in Chapter 3 that it is better to begin with the operations first, before we encode our data, because each operation constrains the decision space. The more constraints we have, the easier decisions become. Each constraint may eliminate choices, making our decisions easier.
Further, in addition to considering the operations alone, we considered them in composition. We asked, “What happens when we change the name after making a copy?” With that question, we are applying the composition lens, which we’ll see in Chapter 4. Operations are rarely applied alone. Instead, we apply a set of operations, often in sequence, to achieve a result. We want those sets of operations to guarantee properties. Composition guarantees constrain our choices further.
It is very hard to make decisions about data structures without some thought about the operations that will use it. There are simply too many choices to rely on fit alone. That is why we’re talking about it now, before the chapters on the operation and composition lenses.
Data encoding is on the implementation side, while operations and composition are on the specification side. We generally want to be modeling in specifications without regard to implementation. Although that means we should work within the operation and composition lenses first, I have found that people don’t feel comfortable thinking so abstractly before they have learned the concrete skills of modeling and encoding data.
So let’s keep trucking along with the data lens. We’ll get to operations and their composition in due time.
Operations we considered
Compositions we considered
Reifying relationships
75
Data Lens Part 2
It is common to encode relationships using references, either by name or identifier, within the data structure itself. For example, the customer field is part of the order data structure. Another way to do it, which is often overlooked, is to encode the relationship in its own data structure. We call this reifying the relationship, which means we make the relationship its own thing. We see this often in relational databases where we make a table just to relate two data items.
Let’s look at a scenario where reifying a relationship makes sense for practical coding reasons and directly encodes the model.
The Student-Course Problem
Imagine we’ve been hired to write software for keeping track of student registrations. When a student registers for a course, we need to record that fact so that the professor can get a roster of students and the student can see their schedule. Here are the concepts we need to relate:
to reify : to represent something abstract as a concrete thing
“reify.” Merriam-Webster.com. 2024. https://www.merriam-webster.com (14 Feb 2024).
Student
Schedule
Roster
Course
This is a two-way relationship. Students enroll in courses, and courses know their students. We’re going to look at four different options for modeling two-way relationships starting on the next page.
Encoding two-way relationships
The Student/Course relationship is a two-way relationship. The student needs to know their courses and the course needs to know its students.
There are four options for modeling two-way relationships. You can prioritize one direction over the other and model it as a one-way relationship, losing some fitness to the domain. There are two ways to do that, shown as two different arrow directions in the diagram. Another options is to maintain both references, which is hard to do. And finally, you can reify the relationship to its own piece of data.
76
student→course
student←course
student↔course
relationship
student course
If I know the student, I can get the course.
If I know the course, I can get the student.
If I know either, I can get the other.
If I know either, I can get the other.
The reason maintaining both references is hard to do is that if one changes, we have to change the other. These are in two separate data structures. They are now coupled together. When we change the code for one, we have to change the code for the other. This is error prone.
While all of these options have their place, the last one is not talked about as much as the others, even though it is often the best for the situation. In particular, it decouples the two structures by centralizing the two references of the relationship.
Let’s go over each option. We’ll show the types for the encodings for student and course. And we’ll implement some operations to compare the options.
these two models sacrifice fitness to the domain to make encoding easier
this model matches the domain, but encoding it is difficult
this model matches the domain, and encoding it is easier
Three parts
Chapter 2
Prioritizing one direction
77
Data Lens Part 2
A common recommendation (that I don’t agree with) for handling two-way relationships is to prioritize one direction. The reasoning goes like this:
This sacrifices fitness between the model and the domain to make encoding it easier. The encoding to the right prioritizes the student → course relationship.
Here are implementations of our two operations:
Because we only have one direction, we have to iterate through all students to generate the roster. It is expensive and awkward for the computer.
There are three problems with the the prioritzation argument. First, it fails to consider the option of reifing the relationship. Reification is always an option.
Second, it is rare to find that one direction is not important. Both directions are usually important enough—that’s why they’re in the domain to begin with.
Third, even if one direction is significantly more important, the cost of operating on the omitted direction is high. If we have many more students than courses and we generate many more schedules than rosters, it implies student → courses is more important. But it is very expensive to generate those rosters.
type Student = {
id: StudentID;
name: String;
courses: CourseID[];
...
};
type Course = {
id: CourseID;
name: String;
...
};
function generateSchedule(student) { //=> Schedule
return { student: student.id, courses: student.courses };
}
function generateRoster(allStudents, courseId) { //=> Roster
return {
course: courseId,
students: allStudents
.filter(s => s.courses.includes(courseId))
.map(s => s.id)
};
}
Q&A
How can you model a two-way relationship with only one reference? Isn’t that technically impossible?
It is possible to model it however you want. It is like ignoring air resistance in a physics problem. We know there is air resistance, but we assume it is negligble. The process of abstraction is about ignoring details that don’t matter. The important question is: Do both directions matter?
78
Encoding student↔course
We can encode the two-way relationship using two references.
type Student = {
id: StudentID;
name: String;
courses: CourseID[];
...
};
function register(student, course) { //=> [Student, Course]
return [update(student, 'courses', arrayAddUnique, course.id),
update(course, 'students', arrayAddUnique, student.id)];
}
function unregister(student, course) { //=> [Student, Course]
return [update(student, 'courses', arrayRemove, course.id),
update(course, 'students', arrayRemove, student.id)];
}
type Course = {
id: CourseID;
name: String;
students: StudentID[];
...
};
Generating schedules is easy. We’ve already seen it. And generating rosters is very similar. But let’s find two operations that stress both references at the same time. What about registering and unregistring for courses?
Note that we have to return both the student and the course from each operation because they are both modified and we are using a functional approach. The coupling is apparent. Also note that we have to store the new versions of these data structures (not shown) so that the new versions can be looked up by identifier. To keep things consistent, we would prefer the two storages to happen atomically.
With those two notes out of the way, the fitness between the encoding, model, and domain are good. The implementations of these two functions is rather straightforward, though slightly awkward due to the coupling. We have to remember that we cannot change one side of the relationship without changing the other. Having explicit functions to register and unregister is helpful, but we could use some more help so we don’t have to rely exclusively on discipline.
We have one more option to work through. Let’s take a look on the next page.
Chapter 2
79
Data Lens Part 2
Reifying the student↔course relationship
The last option is to encode the two-way relationship by making a new type that references both:
type Student = {
id: StudentID;
name: String;
...
};
type Registry = {
byStudent: { [studentId: string]: CourseID[] };
byCourse: { [courseId: string]: StudentID[] };
};
function register(registry, studentID, courseID) { //=> Registry
const r2 = updateIn(registry, ['byStudent', studentID],
addArrayUnique, courseID);
return updateIn(r2, ['byCourse', courseID],
addArrayUnique, StudentID);
}
function unregister(registry, studentID, courseID) { //=> Registry
const r2 = updateIn(registry, ['byStudent', studentID],
arrayRemove, courseID);
return updateIn(r2, ['byCourse', courseID],
arrayRemove, studentID);
}
type Course = {
id: CourseID;
name: String;
...
};
Let’s see the implementations of the operations.
This encoding combines the two sides of the relationship into one place. That makes it easy to update them in one operation and one storage (not shown).
This encoding also has a cool advantage: You don’t need to have the student of the course, just their identifiers. But there is another advantage that is more subtle. We’ll see it on the next page.
function generateSchedule(registry, studentID) { //=> Schedule
return { student: studentID,
courses: registry.byStudent[studentID] || [] };
}
function generateRoster(registry, courseID) { //=> Roster
return { course: courseID,
students: registry.byCourse[courseID] || [] };
}
Generalizing a many-to-many relationship
type ManyToMany<A, B> = {
indexByA: Map<A, Set<B>>;
indexByB: Map<B, Set<A>>;
};
function relate(relationship, a, b) { //=> ManyToMany<A, B>
const r2 = updateIn(registry, ['indexByA', a],
setAdd, b);
return updateIn(r2, ['indexByB', b],
setAdd, a);
}
function unrelate(relationship, a, b) { //=> ManyToMany<A, B>
const r2 = updateIn(registry, ['indexByA', a],
setRemove, b);
return updateIn(r2, ['indexByB', b],
setRemove, a);
}
Now for the operations.
function reportA(relationship, a) { //=> Set<B>
return relationship.indexByA[a] || new Set();
}
function contains(relationship, a, b) { //=> boolean
return reportA(relationship, a).has(b);
}
function reportB(relationship, b) { //=> Set<A>
return relationship.indexByB[b] || new Set();
}
The student/course relationship is an example of a many-to-many relationship. Many-to-many relationships are super common. It would be a shame to have to implement the same thing every time.
Of course we don’t have to! Let’s encode the general many-to-many relationship so that we can reuse it. Previously, we used strings, objects, and array. This time, we’ll use custom types, maps, and sets. The principle is the same.
If your model has lots of many-to-many relationships, it may be worth it to generalize like this.
80
Chapter 2
Q&A
81
Data Lens Part 2
These data structures are supposed to encode real world objects that have relationships. Shouldn’t the relationship be part of the data structure?
Well, that’s an interesting question. When teaching object-oriented design, it is common to instruct students to model the real-world objects by encoding each kind of object as a class. However, are we really trying to encode real-world objects? In the Domain Lens chapter, we explore the question of what it is we are actually modeling.
But we can address it briefly here. Not everything in our encoding has to correspond to a part of a real-world object. Does a student walk around with a list of courses they’re in? Do they remember the course identifiers? Have they memorized their student id? Can you send them a “register for course” message? No.
From a different angle, imagine how a university would have kept track of students in 1900. They would probably record the student/course registrations in a book. A secretary would write the student’s identifier and the course identifier, following certain rules. Those rules likely maintain the many-to-many relationship. That book is called a registry.
So our Registry type is encoding that real-world object—the book a university would use—which didn’t appear in the problem description at all. Yet the registry book is a plausible idea. And I made it up! What’s going on?
We’re not encoding real-world objects. We’re encoding our concepts of them, which we’ve been calling a model. Particularly, we want the concepts that further the work of the software. Coming up with those good concepts is called abstraction.
We shouldn’t be too wedded to our initial, naive concepts, which is what happens when we take a look at the problem description and immediately encode the stuff in it. This whole book is my attempt to open your mind to the mental space between domain and encoding.
82
Theory corner: Isomorphisms
The data lens concepts have deep mathematical meanings. Knowing the mathematical notions can enhance our appreciation for the profundity of the work we do.
The domain modeling process we’ve been following is about constructing isomorphisms between the domain and the software. An isomorphism is when two different representations have the same structure so that you can operate on them interchangeably.
{
size : "super",
roast : "burnt",
addIns: ["soy"]
}
{
size : "mega",
roast : "burnt",
addIns: ["soy"]
}
Chapter 2
set size to mega
setSize(coffee, "mega")
domain
software
This diagram illustrates an isomorphism between a desired coffee and the JSON that encodes it. You can follow the arrows through two paths and get the same result.
Path 1: We start with the super coffee in the top left. This is in the domain. When the customer asks to change the size to mega, the barista could mentally change it to a mega coffee, still within the domain (top arrow).
Path 2: Starting from the super coffee they are ordering, we encode it into JSON in our software. Using the software, the barista triggers a setSize() operation, which produces a different JSON. The barista can then read that JSON or some visual representation of it (decode it) to understand what coffee the customer wants.
In an isomorphism, operating on either side of the domain/software wavy line will give the same results, and we can always convert between them. This is what allows our software to do its job. And it’s why it’s important to look to the domain for answers to design questions.
path 2
path 1
the customer orders this
the customer changes the order to this
Data Lens Part 2
83
Enforcing structure
It is easy to accidentally violate the isomorphism between the domain and the encoding. That is why we try to enlist as much help as we can from the language, tools, and programming practices we have available. We usually use a combination of the following:
Types
Types are the strictest way to enforce structure. Whatever you can encode in the types will be checked by the compiler. However, not all structure can be encoded in all type systems. And type systems also influence the way we model our domain. We will also consider how volatility in our domain affects our use of types in the volatility lens chapter.
Language features
All languages come with some built-in features (like classes, functions, etc.) that have their own semantics. As we mature as modelers, we move past basic recipes by deeply understanding feature semantics and choosing features that have similar semantics to what we are trying to encode. See the Data Lens Supplement for examples.
Automated tests
Tests can enforce the structure of our encodings. Instead of testing just the behavior of functions, consider testing other properties too. For instance, that two functions are inverses. We’ll explore properties more in the operation and composition lens chapters.
Data structures
Data structures enforce their own structure. If we can find a data structure that has the same structure as our model, we can use it to encode it.
Runtime checks
We developed normalization and validation functions that can be used to check and enforce structure at runtime.
Your brain and discipline
Using discipline is by far the most error-prone approach.
84
Chapter 2
Conclusion
In this chapter, we’ve seen how to encode the structure of relationships, including reifying relationships into their own data structure. We’ve also discussed the technical term isomorphism. And we’ve seen the common language features available to enforce the structures we’ve been uncovering.
Summary
Up next . . .
Now that we’re good at data modeling, we can forget it for a moment and focus on operations. The secret is that operations are a more powerful way to think about a model, but we couldn’t learn the operations without being good at data modeling. Modeling the operations is the topic of the next chapter. But don’t miss the Data Lens Supplement, which describes relationships commonly found in domains and various options for encoding them.