Build an interface around data

Clojure programmers often complain about data structures getting unwieldy and hard to understand. How can we prevent this?

Transcript

Eric Normand: In Clojure and in other languages we often have this problem of unwieldy data structures. They're deeply nested. They've got tons of keys. It's just a big mess. It's hard to understand what's in them. What is the solution? What do we do about this? Hi, my name is Eric Normand. These are my thoughts on functional programming.

I've been doing Clojure programming since 2008. I first started doing Common Lisp. Then I moved to Clojure. I loved the data orientation in Clojure. The fact that there were so many different data types, data structures in it. Compared to Common Lisp it was great.

I already had this idea of data orientation. I found that in Clojure it was much easier to do. It was much stronger. I would say that Clojure is a more functional language than Common Lisp.

Since the beginning, people have been complaining about having these data structures where they...especially, working on a team. You know it's a map. You don't know what's in it. You forget what keys you should be using to get stuff out of the map.

People have typos in their work. They type the wrong keywords all the time. They are getting mills back. They're not sure where the mills come from. Then they eventually find that they forgot an S somewhere in their keyword. That was why it was breaking.

I hear these complaints all the time. I've worked at companies where we had this problem. I believe that this is one of the big reasons why Spec# was created. People want some kind of checks that their data structures make sense. They want documentation for what's supposed to be in the maps that they're passing around.

It's definitely a problem since people are complaining about it. I've never had this problem, not seriously. Of course I've made typos and stuff. I never had this problem where the data structures feel like they're out of control. I've been thinking about why it is, and looking at people's code who are complaining to see what's going on here.

It comes down to one thing, and that is that people are not designing their data. They're not thinking about what things should go together, and what makes sense to put together into maps. They're just cramming stuff in there.

This is a problem. I've heard in Ruby, people get into this. Where they have these classes that have hundreds of methods, and all this state, because it seems to make sense. Everyone talks about the user class getting all these methods. It's like a code's mail called the God Class or something. It just has everything. It can do everything.

What you should be doing is starting to pull those things out and having smaller classes. That's what happens with Clojure maps. It's you just think, "Oh, I'll just throw this in a map, you know. I'll pull it out in another place."

What happens is you couple your code all over the place, because this thing is producing a map, this part of the code. Part A is producing this map with a certain structure. To get the values out of the map, this other part of the code has to pull the values out with that known structure. You're duplicating the structure in two places.

In my first solution, the first cut is to just build an interface. Build a smaller interface instead of saying the data is its own interface. That's true. It's still true. The data is still its own interface. If you wanted to iterate through the keys and do all that stuff, you can.

As an entity, you should be thinking about all the things that you can do with this. You should be thinking about the meaning of this map. By meaning, I mean all the things you can do with it.

How do you make one? How do you access the values in it? How do you modify the values? Usually, of course, you create a copy with a modified copy. Then, also importantly, how do these things combine with other things? How do they compose?

We don't just throw them into a map. We might throw them into a map. We don't throw it into a map willy-nilly. We define how they combine with other things. By define, I mean, you write a function that does it. You write some code that is called change name. If you have a person and you want to change the person's name, it takes a person and the new name.

That's getting very granular. If you're treating it like a person, meaning, a person entity, not an actual person. You're treating it as a person entity. Person is an important concept in your domain. That should be a first-class operation. I would hate to see some assoc-in somewhere else that is digging down into the person and changing the first name, because that's not first class anymore.

You need to respect that operation. It's a thing that you are saying is something that we need to be able to do to this person. You're letting the chance and discipline dictate whether you can do that, because you're just treating it like a map. You're not treating it like a person entity anymore.

https://twitter.com/ericnormand/status/1016336682600271877?ref_src=twsrc%5Etfw

Like I said, it's still a map. You're still passing a map around. When you want to treat it like a person entity, you go through its interface.

It's going to make everything easier. If you ever have to change the structure of this map, you're going through an interface. You should treat it like a map when you're just treating it like data, like random data. Not, "I don't know what it is. It might be a person, it might not."

When would you treat it like a map? You could have a generic function that printed a table, an HTML table, to display on the web page, of every key and value in your map. It's generic. It works on any map. It's just for debugging. That is when you start treating it like a map.

I like keeping stuff as a map. Don't create a record. Don't create a new class or anything like that, because it is still data. At some point you're just going to serialize it to JSON, or EDN, or whatever format.

You're going to send it to a database. At certain points you know that this is a person entity. I want to treat it as such. You need a place to collect all the collected wisdom of your system. The assumptions and the constraints that you set as a developer, as the company, you're saying, "A person's name might be changed so we're going to have that operation."

Their address...they may move or we want to...I'm having trouble thinking of an example. There's all sorts of operations that your company says is important to the domain that they deal with. You want to make an explicit place where those things go. That place is called the interface.

You should be skeptical if you see code that treats a person entity — knows it's a person entity — and it treats it like a map. Because map operations are generic. It's coupling too much with the actual structure of the entity itself.

https://twitter.com/ericnormand/status/1060925312719114241

A good example of this is, I have a friend who was doing some work with...it was quiescent. It's like Om or Reagent where you're generating HTML and then on onClick handler. You need to say, "How do you modify the data?"

One big regret that he has about his software is that they spread the...they did update-ins on the ADMS all over the place. Everywhere where there's a handler that needs to modify the database, they just changed something right in the database using an update-in.

They got a pass to a specific point in that map that's deeply nested. They've changed it using update-in. Whenever they need to change the structure of their database, they have to go to every single one and modify it.

What he wishes they had done is, every time they want to modify the database, they would make a function that did that modification. Of course, before you write that function, you check, do we already have a function that does the same thing?

You would just reuse that existing function. If you need to change the structure of the database you have a much more limited set of things that need to change. Over time you might even start to refactor those functions because they look very similar.

You can factor out the common bits in two. You have a smaller set of functions. Sooner or later you have a very tight interface into that database. They didn't do that. They just have this big regret that it's too much work to take on right now. Every time they have to make a change, it's a lot of work.

That's the thing I'm talking about. You just need a place that defines what it means to modify this thing. When we are learning functional programming, and programming functionally, and talking about functional programming, we often talk about data transformation pipelines.

It's a very common pattern. In fact I need to make a note, to make an episode about this. It's a very good pattern. It's not the end-all be-all, because we've been focusing a lot on operations that access or modify entities.

https://twitter.com/ericnormand/status/1043878075073736704?ref_src=twsrc%5Etfw

Where the magic really starts to happen is when you define how to combine or compose — they're the same — two entities together. That's what I'm going to talk about next time.

If you want to get in touch, I would really love to hear your comments, your questions, your complaints, your compliments. I'm @ericnormand on Twitter. You can also reach me by email. Subscribe, like, share it with your friends, because sharing is caring. All right, see you later.