You are in a maze of deeply nested maps, all alike [TALK]
People complain a lot about difficulties with deeply nested maps in Clojure. I've never had that problem. I looked at other people's code to see what they were doing to get into trouble. In this talk you'll get a good idea of the wrong turns people take and you'll leave with some techniques to find your way out and to avoid getting lost in the first place.
**Eric Normand**: How is everybody doing today? I'm very happy to be here in India. Thank you to the organizers of IN/Clojure for inviting me. I've just had such a warm welcome. I know this won't be my last time in India. Thank you very much.
I'm going to talk about a problem that I hear a lot of people complain about in the Clojure world. It has a lot to do with using data. We get these deeply nested maps. We forget what keys we have, what entity we're supposed to use in certain places, and what's supposed to be in this map.
Let's talk about the symptoms of this problem. A very common complaint is, "I can't remember what keys I'm supposed to use in this map." Very common. First, show of hands. Who feels this? Oh, wow. About half of you have raised you hands.
I was talking to someone yesterday here at the conference. It just seem like they're all, "Yes, yes. How do I do that?" I'm glad I'm talking about this. I don't know what kind of map I have. What is this even? I got this passed to me in this function. What is it?
Another common problem — awkward code for manipulating these deeply nested structures. Here's an example. This is a map that represents this space voyage. You all recognize him I suppose. Yes? Rakesh Sharma.
Here's some code that will manipulate this. We basically want to capitalize all of the names of the crew members of different missions. We can analyze it just by looking at it. We're doing maps. Mapv, which means we're mapping over a vector. It's a vector of missions. Inside the crew key, there's another vector of crew members.
We're doing two updates. We need to define three anonymous functions. This is awkward code. It takes a while...This is actually well-written, well-formatted, well-organized. Everything is well-named. Still, look at the indentation. Look at how big it is and how deeply nested it is. This is the problem I see a lot.
Then, long functions. We're told that we're supposed to have functions one to three lines. You go into your code base and there is this big monster 30-line function.
It's usually stuff like this, four times. Four things like this, trying to manipulate these data structures, and put new things together, and come up with new combinations of them. We do it all in one big function. It's a problem I see.
There's a couple of technological solutions. These are common problems. There's Spec and there's Specter. Spec is good at at least checking that you're using the right keys. It'll help you with that. It'll also help you remember them because at least you had to write them down somewhere and you can go look it up. Also, what kind of entity you have. It can help you with that.
There's the library, Specter. You're all familiar with Specter? It's not very commonly used as far as I know. It helps manipulate this deep data structures. It has this composable system for reaching deep down into a data structure. You could do stuff like capitalize every other name, something like that. Instead of having to pull it apart and then put it back together, you just do it in one step.
I don't think there's anything really for dealing with long functions. I don't know if it's really even a technological solution. I just wanted to mention these because I think that they do show that the problem is real and that people are feeling this problem.
I'm not going to go into them anymore. I don't want to deal with this on a technological level. I don't think it's a technological problem. I actually think it is a social problem or a design problem let's call it.
The underlying cause of all these symptoms is that we're working at the wrong level of meaning. We have this expression. I actually heard it a few time yesterday here. "It's just data." It's one of the advantages of Clojure. You don't have to make all these classes, other kinds of abstractions, just to start working.
You just treat the thing as data. It's just data, but I think that that is misunderstood. It doesn't mean you don't come up with new levels of meaning and neglect the data. It's data and it's got meaning. As an example, this is some data. What's the outer thing? What is this piece of data? What kind of data is this? What kind of data structure? Someone shout it out.
**Audience Member**: [indecipherable 6:04] .
**Eric**: A map. Yes, it's a map, but it's also something else. It's also, this first thing is a character. It's a bunch of characters. This is what I mean by going at different levels. You can actually see that there's this code. It's a map but it's also just a sequence of characters.
Actually, you can go down. It's bytes on a disc, and to turn in into characters, you put it through a Reader to take those characters. You do a read that turns it into syntax. Then you can eval it, turns it into Clojure semantics.
Then you have all the code that you wrote that makes it, at another level of meaning, about space travel. This is your domain code. This is your job, is this last step.
What I'm trying to say is that "it's just data" means that you can always go up and down this tree, but you should want to work up at the top.
Let's just look at that top part. You have Clojure code at the bottom. This forms the base of your system. Then you're going to import a bunch of libraries and they'll sit on top of Clojure. Then you write your domain code on top. These are like layers of meaning that you're adding.
Clojure code is very abstract. It's general purpose. It doesn't know anything about space travel. Neither do your libraries, usually, but then your domain code is all about that.
Let me show you one of the problems, like an illustration of this problem, and how we get to here where we forget the keys. We have these gigantic maps that we don't know what belongs in them, what they mean.
Here's some reagent code. You'll notice it's got a component up at the top, so it defines an atom, a reagent/atom. It's got a component. Then another component that uses that component.
But I want to focus on this line here. This makes a button, so it has an on-click handler. Notice what it does. It swaps directly into the atom. It's just data. We're just adding something to this map that's in the atom, a favorite color. Now that's cool.
Let me graphically represent this. Let's say we add another component that lets you select your favorite animal. It also just associates right into the atom. We add another thing, and another thing, and another thing.
Pretty soon, we've got all these key words that could be in that atom, but where are they defined? All throughout the UI. The definition of what we will find in that atom is smeared all over your code. There's no one place where these are listed. When you go to say, "Let's re-factor," you don't even know where to look.
How do we know what is modifying this user-info thing? You might be able to grep for associates to user-info, but even that won't find everything. So, just little by little, we've started with data and we never turned it into a new layer of meaning. We just added one thing after another.
Pretty soon, you've got a team of 10 people. You've got hundreds of components, all of them are adding stuff to this map and you're in a mess. You don't know what to do.
Every time you make one of those things, you're thinking up here at your domain level. You're thinking, "Oh yeah, favorite color, that's a really important concept. We've got to save that to the atom." But really you just write a little swap within an assoc. You're writing down at the data level. You're just doing this little basic operation.
That's the mismatch. You're thinking up here and you're writing down here. You need to write where you're thinking.
Here's how we could fix this. Here's that same line of code extracted out. It's all just Clojure core libraries. How do you turn it into something at higher layer of meaning?
Not a higher level of abstraction. It's actually less abstract than...Assoc is very abstract. It's any key-value pair. We want just favorite color. How do we turn this into a new layer of meaning? The only thing we can do — we give it a name. We give this operation a name. We define a function, and there we have it.
I see some maybe skeptical looks that maybe this is too simple, that I'm giving something too basic. It might be basic but we don't do this enough. We don't actually sit there and think about what our operation should be.
Let's look at it some more. Here's the old code. This is what it's going to look like when we change it. I suggest adding a new name space called user-info where all those operations are going to go. We add this set-favorite-color function there. We can also add the other ones, so set-favorite-animal, set-favorite-fruit.
They all look very similar but now we have this nice check .Before, if you misspelled the keyword, nothing was going to bother you. Nothing would check that you got it wrong. If you're calling a function instead of associating a keyword, if that function doesn't exist Clojure's going to tell you. That's another advantage to this.
Another objection you might have is that this is a lot of code. It's true. It's more code. Let's count it up. We had one line before. Now we have three. It's three times the code. That's the con. The pros. It's three times the code but you can find it in one-third of the time. You can see all of the keys right there in that one name space.
Another thing is as you do this, as you factor things out into their own operations, you'll probably find some duplication, like you were setting the favorite color in two different places. You won't have to make a separate function for each of those.
Another thing is that the parts can evolve separately. You've got one layer of indirection where you can make changes without having to touch your UI. Before, if you wanted to change the key for favorite color, you'd have to go to the UI and then to whatever else is looking in that atom for the favorite color. Now you just can change it in one place.
Also, it's less to keep in your head. You won't have to think about, "What are all of the key words I might encounter here?" You have these operations. They can be tested. They're well defined. It's a smaller set to keep in your head.
Let's go back to this layer of meaning diagram. As you add more and more code, you're increasing the size of your domain code. It's getting bigger and bigger. At some point, you're going to reach a limit where you're going to need some other organizing principle for this code. You can't just rely on, "Oh I'm just moving it to name spaces and stuff."
What do you need to do? You make more layers. Your code becomes different layers. Let's take a look at that. Here's a similar map to what we had before with the Soyuz T-11 mission.
Let's say Rakesh needs to do some centrifuge training. His crewmates are wishing him luck. "Good luck. It's really hard. They're going to spin you around in this big machine really fast. Good luck." He passes. It's close but he passes.
We need a set that he passed it in this map. Take a look at this assoc-in. We get a mission and in the crew key, under the research-cosmonaut position, in that person's training map, under the centrifuge training, set the status to pass.
Who has stuff like this in their code? These really long paths. Yeah, I see a few shy hands going up. [laughs]
This is a problem. This is an indication that you have deeply nested maps. You haven't really thought about segmenting it into different things that have their own coherent meaning to them, that they form a kind of integral concept.
I would break this up. I drew some lines. The crew and the research-cosmonaut, I'm calling that...that's a part of the mission. The mission knows the crew key and that there's going to be position keywords under that. Training is part of the crew member — the person. The centrifuge and status, that is something that I'm going to call training record.
I'm breaking this up into different parts. I think two or maybe three is probably the longest path you should use, so this will work for this.
Sorry, I had the diagram here. Mission, person, and training. That means we're going to have to make name spaces for these.
Up at the top, we have the training. We are going to do a set-status. It takes the training record, and it takes the name of the training, and it sets the status in that record. Very easy. All this code is very easy to write, by the way. That's the nice thing about it.
Person. Instead of knowing what the training record looks like, it calls the training set-status on whatever is in its training key. Mission doesn't know what the person record looks like. It just knows where to find it and then what operation it's trying to call on it. This a separation so that they can evolve separately. They can change separately.
If you've got a couple of paths like that in your code, some deep assoc-ins or update-ins, you're coupling the structure of those maps all along that path to this other place in the code where it's just being used. You want to have these isolated entities that you can deal with on their own. Modify, update, and evolve on their own.
Then, what this turns into is we have these layers on top of libraries. We have our training layer. Then a person refers to that training layer, so we'll put it above it. Then a mission has many people in it, so that goes above it. There's probably some more in there, but somewhere up top we've got the UI.
In the OO world, they talk about this all the time. They call it the Law of Demeter. In the OO world, you're not supposed to do a long chain where you're like the missions, crew members, research cosmonauts, training...you know, the dot. You can't have a long chain of dots.
That's actually considered bad form because you're tying in that whole chain. You're reaching into the object and you have to know so much about this huge chain of objects. You should know as little as possible about everything down the line, how things are implemented.
I mention object-oriented programming and the Law of Demeter. I expect that someone out there is thinking, "Aren't I just encapsulating? It's all supposed to be data. Why would you want to encapsulate this?"
My answer is we are encapsulating, and it is just data. They're not in conflict. We're encapsulating it mentally. Mentally, we can operate on this map as if it were this very small, well-understood thing. But then, at the end of the day, it is just data.
There's actually a tension between a map, a generic key-value store, and an entity, which is a thing which has meaning and its own operations — the things that are valid — on it. We can move between these two, and we have to be conscious of where we are or how we're thinking about it when we're writing code.
There's nice things about treating it just like a map. It's an entity, but it's also we can treat it like a map. You can print it out. You can serialize it to JSON. You can enumerate the keys if you need to. You can store it in a database, compare it with equals. There's all these things that we can do when we just treat it like data.
That's great. You can't do that with a class, an object-oriented class. You have to write your own equals method. You have to write your own hash code. With this, it's a map. It just works with all these things that work well with maps.
Lastly, I want to say that you don't have to do the design up front. At the REPL, you can do an assoc. You can do an update-in right there without writing the operation. You can figure it out first. You can write those in your UI. Then later, refactor it out. It's faster, but you should refactor it out, at some point, into the entity.
When we're talking about the entity, we're talking about domain operations — a small, defined set of things that are valid to do on this object. It makes it conceptually smaller, easier to understand. It lets us program at high level.
It also lets us maintain these invariants. You wouldn't want a crew member with an empty name or number for the name. This is something that a map will never be able to do, but if you put constraints in there you can have an entity that makes sense.
At this point, I just want to talk about some more tricks and ways to think about this. What we're talking about here with all these layers of meaning is called stratified design. We talk about it in the structure and interpretation of computer programs.
The idea behind stratified design is that you're building layers of meaning on top of existing layers of meaning, and there's some constraints to it.
I'm using this example here of cooking/cuisine. At the bottom, there's chemistry. You can't escape chemistry, acids and bases, and proteins and starches. Those are all there. On top of that you've got some very basic concepts, like applying heat, and chopping, some skills like that. Then on top, we'll put ingredients.
Now, this is a designed system. It's designed. These concepts do...you could put them in another order but this is how I chose to do it. I put ingredients on top of chopping because you have to...The ingredients will know how they need to be chopped.
You can say a carrot needs to be chopped this way but an onion chopped this way. Then, on top of that, there's the basics. How to make a basic sauce or how to make a dough. Those things will have to know about ingredients. Then finally, at the top, we can start talking about recipes.
We can start talking about how to make a dosa. Something like that. The nice thing about this is when you draw it out like this, there's some nice constraints that help you figure out where your layers are.
The first principle is that you can separate things according to rates of change. Chemistry does not change. It's always the same, but then the dishes can change. The recipes can change. You can imagine if...This I did Indian cuisine but if you went to French cuisine, how much staff would you have to change?
Chemistry obviously is going to work the same there in France. Applying heat and chopping, those are probably the same. Maybe you'd have to add or remove some vegetables or some other ingredients but then, definitely, the dishes and the basics are going to be different.
The stuff at the top changes much more frequently than the stuff at the bottom. That's great because it means that you're actually finding stuff that is universal, that you can use and reuse for the lifetime of your software. Then the stuff at the top — yeah, that changes and it [inaudible 24:59] , but look nothing is depending on it. Nothing is built on top of it and that's fine.
Usually, at the top you'll have something like your business rules or your UI, which changes a lot. I'll move this button over here. That kind of stuff. Those things change very quickly.
Another principle is that the dependencies have to point downward. Just as a kind of an absurd example.
It would be weird if two things at the same level referred to each other. One should be on top, unless...If they don't refer to each other, they can be at the same level. It would be weird if applying heat depended on chopping. It would also be weird if the carrot knew the dishes that it could be made in. You want the dependency to be the other way, so let's X those out.
Similarly, it's not wrong to skip layers but it might be a code smell if you're skipping layers. For instance, you wouldn't want to define your dosa in terms of chemistry. You wouldn't want it to be like creative starch and water solution with this pH and apply heat. That's your dosa recipe, right? You actually want to build it out of stuff that's higher level than that.
Be careful of skipping layers. Sometimes you're going to skip layer though. Like if you got a map, you probably going to still assocs into it. You're still using the Clojure core when you're talking about the high level but you can also define it in terms of other stuff.
Another trick that I use for avoiding these kinds of problems is I use constructors. When I define an entity type, I make a function that would generate that entity with all the keys that I need. This function is great because it gives you a place to define the keys, check them for required ones, maintain some invariants, those kinds of things.
Let's look at an example. Here, I'm using this convention of ->person to mean construct a person and I'm using the keyword argument destructuring. I use the ampersand and then a map destructuring with keys so I can get some keys out of there.
This is just a very simple one. I've got name and trainings as the two arguments. Notice I've got a default for training, so an empty vector. I'm also checking that the name is a string. That's an invariant I want to check.
I'm doing a little calculation. I'm trimming it, getting rid of the white space on the outside. Then, I'm making sure that it's not empty. That's an invariant I wouldn't want to mess up. Then I'm naming the keys and making the map.
Does anyone do this? Define constructors? It's very common to just do an inline map. If you want to change something, you want to change a key, now you have to go find all the places where that inline map defines that entity.
It doesn't look that different. It's going to open paren, ->person, and then key value pairs. It's not that different from just open braces. It gives you a place to, if you need to, to change things and a good place to look for all the definitions.
This tip is a little harder to explain, but I think it's really important. As people progress in functional programming, they often get to this stage where they say, "Well, it's all about data transformation pipelines," and that's true.
We use data transformation pipelines a lot and it's a very powerful thing. There's more beyond that and, unfortunately, we don't talk enough about that. That you can actually go beyond just pipelines of data transformation.
This is one thing that's beyond it, combining operations. We choose to do these operations because they're hard. They're the hardest ones. It's really easy to do an operation that is just an assoc. That's easy. It's so obvious, it's kind of pointless. When you've got these combining operations...What do I mean by combining operations?
If I'm writing a contact app, so I'm keeping my friends, their email addresses, their phone numbers, and I want to sync it. What data do I need to combine this stuff on my phone with the stuff on my computer?
I've got this record of their email address on this computer and I change it on this computer. Then when they sync, how does it choose which one? That's a hard operation. It's going to constrain the kinds of data you have to keep.
You can't just keep a map because if you have two maps, and you merge them, something's going to get deleted. You need something more. You need either the time it was changed, or what computer it was changed on. Maybe you need a way to maintain multiple email addresses.
You can say, "Oh, you had this email address and this one," now you'll have both. This is a design problem that you have to think about.
It's easy to say, "Set this person's email address." That's too easy to start with. You want to start with the, "How do we combine the two?" If you start with that, there's all sorts of constraints that actually make the data, make it easier. It's a good thing that you have constraints. We can't just keep a map. We need something more. That's a constraint.
Then also, it gives you a potential for algebraic reasoning. For instance, you might want to say, "Look, I don't want it to depend on what order these computers connected to the Internet." No matter which one connects first, at the end, when they've both connected, I've got the same answer. That's an algebraic reasoning. A plus B is the same as B plus A.
Let's look at some simple code for this. This is in my training namespace. Let's say Rakesh was doing some training in India and some training in the USSR. When you combine these, there's a rule that says if you pass the centrifuge in India but you failed in the USSR, that's fine, you still can go to space. You passed somewhere, that's good enough.
We make a function that captures that idea. We call it combined training statuses. It takes two training statuses, A and B. If A or B is a pass, it returns pass, otherwise it returns fail.
Then, we can make a function that takes two training records and does a merge with this status-combining operation. That way, India can keep its records. Russia keeps its records. Then boom, at the end of the day, you can merge them.
This is what I mean by combining operations. This isn't like some inline if statement where I'm like, "Oh, if Rakesh did this, and then that, and then..." This is like defining how these records can combine, in one place, as an operation.
Now I want to go over some code smells, for what to look for in your code that could lead you to find layers of these meanings. We went over this one — really deep paths, really long paths. I think three or more is a smell.
Two is OK, because you have some structure in each entity. When you've got three, definitely when you've got four, you're crossing a conceptual boundary. That's going to limit your speed. It's going to make it so that things can't evolve independently and you've got keep so much more in your head.
Large hash maps. Like I showed before with the user info, where we're setting the favorite color, favorite animal, all that stuff. There's no organization. Every component is probably a team of 20 people, like every component, everybody's just throwing stuff into a big map.
If at any point, you would print it out, you would just get this huge list of stuff. What kind of organizing principles are in there? Nothing. Just like a sock drawer where you just throw all your socks.
Here's an example. Here's my information. This is me. You see I have email, phone, street one street two, some favorite things at the bottom. This isn't that big but it's already got enough stuff that you could start to see, some things are more related than others.
For example, does my email address go with my favorite color? No, probably not. Not as much as, say, the email address and the phone number. This, I kind of refactored it, reorganized it. We can see that we went from 10 top level keys to 3, and contact info becomes its own thing with its own keys.
Then address, "Oh all those things are very related, let's put them in their own map." I even did this other thing with the favorites, where I split up the key. From favorite color, favorite animal, I made favorites and then a little sub map.
You can already start to see that there's a little concepts showing up, like maybe address, we use them enough that we make it our own namespace just for address, with its own operations.
Maybe contact info has its own stuff too, so we know we call it email and not email-address. We call it phone and not phone number. The same with favorites. We have to know that the color is a string and animal is a keyword.
Here's another code smell. When we're talking with other systems, especially third-party systems, we want to isolate ourselves from them. They could change. The data they give us is not necessarily in the structure that we want it.
Here is an example. This is an API I use. This is actually 14 screens tall. It's JSON, but I turned it into EDEN. This is something I had to use. Whenever I'd be at the REPL, like, "What keys do I have? Where did they store the title of this thing?" It was three things deep. It was a pain.
I went through and I re-factored it. I figured out this is is the stuff I actually need. It's very small, fits on a slide. A few of these things are just taken out directly. Sometimes, I re-named the key. Sometimes, I had to dig deep to get one of these things out. Sometimes, I actually had to do a little bit of calculation. I combined two keys, two values into one value.
There it is. Now, this is something stable that my code can rely on. I just have this one little layer of indirection. This one little translation from what the API gives me to what I let into my system that I can trust. Before, what I was doing, was everywhere I needed the thing,
I would just write some little code that knew about the structure of this JSON that I was getting back. It was turning into a mess so I did this.
Here's my concluding slide. The main thing I want to drill home is know what level of meaning you are at when you're coding. Are you treating it like a hash map? That's fine. That might be the right thing. Or are you working the domain level? If you're treating it like a domain level, don't use just assoc, or whatever. You should be using an operation that has a name that has meaning.
We want to look for these semantic layers and we don't want this deep paths. We don't want to cross into...For instance, in a person, you don't want to be talking about the training record. You don't want the person to have to know the structure of that training record.
We want to organize these things so that we're not just adding keys willy-nilly all around the code. They should be put into a central place where that defines the thing. Combining operations are so important. Because we can, we tend to just start reaching into data, doing stuff inline wherever we are, combining, do some math on it and then, boom, we got our answer.
What does that mean? Can it have a name? Is it possible that you did some complex operation with combining two things that really should be its own operation semantically? Put some gap, some margin, some indirection between third party systems and yours. I like using constructors.
Then use good names. You should be able to have a small number of operations for each of your little entities that have semantic meaning.
Thank you very much. My name is Eric Normand. You can find me on the Internet, lispcast.com, also purelyfunctional.tv. Follow me on Twitter. You can find me in LinkedIn.
While we're here, I'd love to get to know all of you. Thank you very much.
**Eric**: We have a microphone up here. Oh, and back there.
**Audience Member**: I was just curious if you've come across libraries in the Clojure ecosystem where it achieved something along the lines of lenses in Haskell, which solve a similar type of problem? Did you come across something like that?
**Eric**: Specter is very close to lenses. When I first heard the talk, I thought it was a very impressive system, but for a problem that I didn't experience.
He's a very smart guy — Nathan Marz, the guy who wrote Specter — and so he must have this problem. He spent a lot of time on Specter. Check it out if that's what you're looking for, but I prefer to avoid that problem, not deal with nested stuff so much.
Someone once told me the issue is that people are making these really big...If you thought about it in types, that people are making these really big types. They should be thinking in terms of small types. Even a list, you think it can be infinite, but the type is small. It just has a head and a rest; a first and a rest. It's just a cons cell, and you can make a whole list out of that.
You should be thinking in these small pieces that compose together instead of this is all the stuff I have, so I'll put it in one big type.