Focus on composition first

Where should we start when we are designing out data structures—especially the data that we expect to last a long time. The answer is in the composition operations.


Eric Normand: What is the type of operation that you should focus on first when you're designing an interface? I'm talking before you start doing anything else — including thinking about the names, or thinking about what kind of data you need to store — what is the thing that you should be focused on first?

Hi, my name is Eric Normand. These are my thoughts on functional programming.

We've got these operations in our interface. We've got some entity. We wanted to find the meaning of this entity. By meaning, I simply mean what can you do with it? What information can you get out of it. Those are your accessors, if you want to give that a name. How can I change this thing — immutably of course — to returning a changed copy or modified copy?

The operations about how to modify it, how do I make one, the constructors. But then, how do I compose it, right? To me, that is the hardest part of it. How do I compose?

Let's think about an abstraction that has existed for a long time, well before computers existed. It's the abstraction of double-entry bookkeeping. It's a system of accounting that everyone uses now. It was invented in the 14th century. You can credit it with the rise of the modern financial system, because it was that important.

It's a system where it records data. It's a data system — an information system — records data about transactions. It has this interesting property to it, which is that it tells you how to add a new transaction.

A naive approach to accounting would be when someone deposits $10, you erase the old amount in their account. You add $10 to and you write that down. In double entry bookkeeping, every account has a book. That was the page in the book.

What you do is you just write down all the information about that transaction. When you boil it down, which they've done over the years, you have an amount. You have the account that is being debited. You have an account that is being credited. There's three pieces of information, and of course, you want the date and stuff like that.

There are three pieces of information that constitute that transaction. The interesting thing to me is, that before you even think about what pieces of data are in that transaction, you have the idea of how you combine transactions.

You have this book, which is a collection of transactions. You have a new transaction. You want to combine onto it. What do you do? You always append it to the end. You never modify anything that's already in there. You just append a new transaction on a new line in the book.

This is the discipline of accounting. Even if there's a mistake, you write a note and you amend a new transaction to correct that mistake. You never go back and change those old things because you want a record of the mistake, too.

What's interesting about this, from a functional programming perspective is that it actually starts with this combined function, this composed function. When you add a transaction to the ledger, the book, you are doing it in a disciplined way. It's append only.

We need to do that when we are designing our data. We need to think how do I combine this? How do I add new data to my system? I'm not saying you have to do append-only, I'm saying you should start with the combinations of data. You should start with the functions, the operations, the calculations. If you will that, take two pieces of data and combine them into a third piece of data.

These are the hardest to design and they're the ones that are going to constrain the representation of that data the most. Notice in a ledger, in an accounting book, it's append-only which implies an order. It implies a place to add new stuff at the end. You're already constrained in what kind of data structure you can use for your ledger.

You're not so constrained in how you represent each transaction. You probably use a map with a debit, a credit and an amount, probably the date. Maybe some metadata on it like who recorded this transaction, stuff like that. In principle, if it's a map you can always add that stuff. Some systems might be interested in and some might not.

It's less constrained. It's obviously less constrained. What is constraining is the operation of adding. It really helps you design because the constrain will help you navigate all the possible options for how you design its data. It'll eliminate so many that don't...It wouldn't make sense for you ledger to be a map. Not just a map.

How do you add stuff? How do you append to a map? It doesn't make sense because there's no order. My suggestion when you're building a system using functional programming and you're designing your data first, and then you're saying, "How do we, you know, what are our operations on this data?" You start with a composition operation.

You're designing a data, but to design a data you need even to think about something even before that. As a subset of that, the first thing you design is your composition. List out all the operations and look at the ones that are composing things.

It may turn out that in the end even...In the last time I talked about something like a person entity and how you might have a change first name operation. If you're doing something like a transaction ledger, you never change anything. You never return a ledger with a modified transaction.

You might eventually get there with your person entity. The person entity is not just a key-value store, a map of data values. It might be a ledger. It might be a list of all the operations you've performed on it. First, set the name, set the address, set whatever you have to do for this person.

Then if you need to set the name again, you just append another set name operation to it. That might happen, OK? The more functional you turn your mutable data, the more it becomes something like that. At least for the main entities in your domain, in your business.

That's not what I'm really talking about right now. What I'm talking about is the combining forms. What I was trying to say was that the combining forms may start to look...Even change name operation may start to look more like combining form. You're combining an existing person entity with a first name. You're a getting a new person entity out of it. You can think of them like that if you wish.

What is a good example of a combining form? That's not something so person entity. You never really combine two entities, not really, two person entities at least. That's not true. You're working in a distributed system, you probably are. You're on the web.

One system receives a message that says, "Change the first name." Other system receives a message that says, "Change the last name." You want to reconcile these two changes. You've got two person entities that have the same ID but they have different data. How do you combine them? You might have to combine these two.

I have a perfect example. I have a contact database that I've been maintaining for years. It's just the people I know and their phone numbers in there, email addresses and stuff. Sometimes I update one on my phone because I've got my phone. Sometimes I'll update one on my computer. When they sync, there's usually a problem.

The problem is which one is newer. The computer can't figure it out. I've believed that the system is just designed improperly, that they should be able to figure out which operations to keep and what has been overwritten.

What they're basically doing is combining two records for the same person and determining what is the most up-to-date version of all the information. That's does the same such a hard problem. I know what they did as they never thought about the composition until it was a problem if you think about the composition first.

You know you're in a distributed system. You're going to have to combine to. Your syncing is a first-class operation. You should make an operation, maybe it's called sync, that takes two copies of the same entity and determines what should it look like, now knowing that these two things were recorded.

You should start with that. If you start with that, you start to think well. I want some kind of thing that says, "I want to keep the most recent version." If you modified on your phone, you modified the phone number on your phone. And you modified the email address in your desktop. When you combine them, I want to know that that email address is newer.

The one from the desktop is newer than the one on the phone. I need to replace the one on the phone. If you need to know newer, you need a date, timestamp when it was written. You need to keep that. That has to be part of your data. It becomes clear that I need that. It's part of my data to make this thing work.

Whereas if you do it, I guess I'll call it the naive way. You think syncing can wait. You've designed your data to just be a key value store — first name, last name, phone number, email address, which works from most operations. I'm not going to lie. That's the truth. It will work.

When you come to this operation which you've been putting off we just called sync, I need to store truly hard if I don't have a timestamp on every piece of information in here. They need to go back and redesign your data structure, so that they can be combined. That's what you get to do. That's why I suggest starting with it at first. Start with the composition.

Everything else is trivial once you have that like modifying. Changing a name, that's easy. Changing an email address, that's easy. Who do I need a capture? It's in my data model. Combining forms are the hardest, the most complicated. They, therefore, constrain your problem the most. They make the design of the data structures really easy.

All right. If you have enjoyed this, please share with your friends. Subscribe. You can follow me on Twitter. I always love to hear comments, questions, suggestions, complaints, compliments. On Twitter I'm @ericnormand with a D.

Happy to receive your email. I promise I will respond. Thank you so much. I'll see you next time.