What is an abstraction barrier?

Structure and Interpretation of Computer Programs talked about abstraction barriers as a way to hide the intricacies of data structures made out of cons cells. Is this concept still useful in a world of literal hashmaps with string keys? I say, yes, but in a much more limited way than before. In this episode, we go into what abstraction barriers are, why (and why not) to use them, and the limits of their usefulness.

Transcript

Eric Normand: What is an abstraction barrier? This is a concept from "Structure and Interpretation of Computer Programs." In this episode, we're going to talk about what it is, why to use it, and the limits of its usefulness.

My name is Eric Normand, and I help people thrive with functional programming. Like I said, this is a concept that I was first introduced to in Structure and Interpretation of Computer Programs. I imagine it goes back further than that just because it's a natural concept to develop.

What it is basically is instead of doing some complex operations inline, you move them, you extract them out into a function, you name that function. Now you have a barrier, where you don't really have to think about the internals of how this thing gets calculated, you have an operation that's got a nice, clear, meaningful name that you're working on.

If you do this with a data structure, so in structure and interpretation of computer programs they're using scheme. The data structure they use is just basically these cons cells, just pairs. They're like two poles, but always pairs. They're building these intricate little data structures out of them.

For instance, in one example I can remember, they build an associative data structure where you can put keys and values and replace keys and values and look up keys by values. It's all just cons cells. It's all just pairs, and deeply nested stuff. They build the operations like add a new key value pair. Find the value for this key.

These are operations. They're giving them nice names. When you look at the code it's just like...I have to explain this. In scheme, to get the first element from a console, you use a function called car C-A-R, and to get the second element from a console use a function called cdr, which is C-D-R.

You would put these things like get the first element and car of the cdr of the cdr of the car, the car, and that will give you the element you're looking for. As you're looking at these operations, they're not very meaningful except this like first, second, second, first, first, first.

It's just hard to wrap your head around what is that actually doing. It would be really nice if someone gave it a very meaningful name, like, "Find value given a key." [laughs] Or just like "get" something like that. You do this to help your mind to encapsulate to put a barrier on the meaning.

You can say, I don't have to think about this anymore. There's three operations I'm going to do on these things. I have them well named. I don't have to go digging around myself and remember how to get a cdr, you know, what cdr, how many cdrs, I need to get the value of something.

This is the what and the why. It's because sometimes you've get these deeply nested things. Just for your mental capacity it's hard when you inline those to be able to reason about the code. What's happening is you're just doing that basic naming operation. You're naming this thing that you're going to use a lot.

You're taking what are meaningless operations like car and cdr. They have a meaning but their level of meaning is very low. You're elevating that function into a new level based on the name. The name is something much more meaningful at a higher level of meaning.

It's not data hiding. It is different from data hiding in one very important respect. It's still all there. It's still transparent. If you want to pierce your abstraction barrier, go right ahead. You can still map over it. It's still a list. It's still cons cells. You can still call car on it.

You are not forced to use those abstraction barriers, those defined operations. Some people say it's a fine line, because you're saying you should. There's some arguments like if this data structure escapes, no one's going to know to use those things or they're going to have to use them anyway. You have to give them to them.

All that is true. I don't want to argue about that. What I just want to argue is that there is this very important difference, which is that you don't have to use those operations.

It's not like data hiding in an object-oriented system, where you've got all this private stuff. Then the three operations that you want to do on it are the public methods, and you don't know how it's implemented and you're supposed to not care. You're supposed to not even be able to mess with the data.

You can mess with the data if you want to. This is important, because you want to be able to move through those levels. The trouble is, cons cells are not a very rich data structure. They basically suck. I'm just going to say like that. They're really neat. They're complete. You can build trees and lists and a whole variety of data structures with them.

They're not self-describing. You don't know what you have, you just have this, if you printed them out they're just be parentheses with stuff inside. They don't have...They're not human-readable. They are, but not very. You need to know what the different levels of the of the parentheses mean.

You need some out-of-band communication, it's not self-describing. A lot of problems that we have with these data structures are solved by having self-describing names, like a hash map, where the keys are strings. You can have a nice name.

Before you had to build this associated data structure, now you can have a data structure that says, "Hey, I am an associated data structure. I have the curly braces in JSON. My keys are strings. My values are also some value that you can understand."

This is amazing. Now we don't need to have these abstraction barriers to do the same thing. A lot of problems are solved just by having better data structures. You don't need the abstraction. It removes a whole slew of reasons for needing abstraction barriers especially around these intricate data structures.

There's a thing where if you have public-facing data, you shouldn't really use abstraction barriers. You should design that data to be easy to consume, easy to produce, not necessarily using specific operations. You want someone to build a type in this literal JSON and it be correct.

You don't want something where you need some complex operations that they have to define and basically copy from your code base in order to build the thing up. You don't want that. A public-facing thing, you want very clear names. When you design them, and you put it out into the world, those names are a commitment.

They're a commitment on your part as an implementer that you're going to honor those names. If you send me this JSON to my NPI endpoint, I'm going to read it. I'm telling you what is this key means and what the value, how I'm going to interpret the value. That is a commitment that you're making. The self-describing nature of it is really important.

You shouldn't rely on an abstraction barrier. However, we also use besides using it for public-facing data, like a public-facing schema or spec. We also use data structures internally in our software. If you need some intermediate index of something, you'll use a hash map to index it.

Sometimes when you add it, when you make the index where you want to keep track of when you added the thing and when was the last time you accessed it. You've got all these other bits of information that you have to maintain. Sometimes, you want to maintain the order, and it's in a hash map that doesn't have order. You want to keep them in sync.

Now, you're starting to talk about this intricate data structure nested and other data structures. At some point, you're back to the same problem that you had with cons cells which is deeply nested. You're forgetting how many levels you have deep.

You're in-lining all these cdrs and cars, except they're not cdrs and cars. They're like, "I'll get this internal map. Inside of that, get this thing." Then that's going to give you a map. Then you need the value out of that map. It's all deeply nested again. It's easy to get wrong.

When you add a thing to the index, there's like five things that you need to do, and yet they have to be right. You want to make it easy to get those things right. It's like doing five things. It's probably five lines of code. You're repeating that everywhere. You want to try this up. You want to take that duplication, put it in a function, give it a good name.

All the sudden, you're doing abstraction barriers again. It's just the way it is. Is just happens when you have these complex things. Like I said, this isn't for something that's going to go external. External, you want to be nice, and clean, and neat, and human-writable, human-readable.

When you're working internally, sometimes you need an index that's really tricky and complicated, or you need some data structure that's super weirdly nested. You want to start extracting out all those operations again.

I want to say another thing. A lot of times...I think in SICP, it says there's two, and I disagree with it. I disagree with SICP, that Structure and Interpretation of Computer Programs. One of the reasons they give for using abstraction barriers is so that you can change the data structure if you need to.

This is just so overused. We write code today, more complicated than it needs to be because maybe one day in the future, we might want to change it. I just think that that's wrong. Why complicate your life today for something that might or might not happen in a way that you can't even predict?

If you know how it's going to change, if you're saying, "Look, I know I'm going to swap out my database in one year. I'm using this one database now because I can't afford the one I really want. When my company does better, I'll have an income. I'll be able to pay for that database I do want. I want to be able to swap it out easily."

Sure. If you've got some plan for changing it and you don't want to have to change all the code again, sure, fine. If that's part of your plan, you need to be able to change it, yes. Put some kind of indirection in there. If you're just doing it like a just in case, like maybe we'll need it, no, do not do that.

People say it, but I think you should not put abstraction barriers just because you might want to change it. You should put abstraction barriers to make it clear what's going on, especially when you got these intricate, tricky things. It's hard to get right.

You shouldn't use it for public-facing data. That should be well-designed, clean, simple, something that another person could write code to generate and not rely on your perfect implementation of all these operations.

You should use it for these intricate data structures that never leave. These indexes aren't meant to be printed out and send over a wire. They're meant to be stored in memory for some algorithm or something you're doing on it where you need constant time access.

All right, so abstraction barriers, I'm going to recap real fast. Abstraction barriers are simply taking operations that you're doing on some data structure or some piece of data. You're repeating it. It doesn't have enough meaning, so you extract it out and give it a name.

If you can count all the operations you're doing on this data structure...Let's say there's three, there's four of them. You extract all of them out, give them good names. Now, you no longer have to go down into the data structure and manipulate things at the low level. You can operate it at a higher level. That sounds like a good thing to me.

It differs from data hiding in that you can always pierce the barrier. You can look at it, and it's just raw data. It's not some encapsulated class or object that has some bespoke methods on it that you can't see how it's implemented inside. You can pierce it.

Hash maps and other very much more descriptive data structures that we have in the modern languages, these are because they're literals. They have descriptive names. They have more well-understood properties, like an array has certain order to it. You don't need to use a cons cell, which has almost no meaning behind it.

You've got higher-level stuff, self-describing. You have literal versions of it. You don't have to even think about constructors so much anymore. It's much nicer, but we still build up these intricate, highly nested things for internal use.

I believe that abstraction barriers, as I've defined them here, are useful for that, that you want to be able to be operating at a higher level even though it's this really intricate turning of machines and stuff. It's just normal.

Let me say it a different way. Hash maps, descriptive names and stuff remove a huge need for the abstraction barriers. We reinvent the problem, because we have all these highly-nested intricate data structures.

Again, they are now made of hash maps, and vectors, and sets, and whatever else we have. They're still there. They're just not cons cells anymore. They're just some other thing.

I've seen so many messes in languages like Clojure, that use these data structures a lot when they start getting nested, people forget what they have. They start coupling code together because they're coupling the "where a value lives" deeply nested in this map.

Because they're using some path into the nested data structure with the operation — what they want to do. The "where" and the "what they want to do" get coupled together. Having a little bit of barriers when you have a mess, it's like having little bins to put all your stuff in instead of having in one big bin. It's just a way to organize it in a way to keep a little bit of sanity when things start to get into a mess.

All right. This has been all about abstraction barriers. This might be a tad controversial. Abstraction barrier shouldn't be used all the time. I'm not saying that. I still think that they're really useful, especially when you've got nested data structures and you're starting to get into a mess.

If you liked this episode, please subscribe. Go to lispcast.com/podcast. There you're going to find all the past episodes. Listen to the one where I talk about building your interface first. Listen to the one where I say just use data. I really think these are subtle issues and it's not as simple as like use this, don't use that. You got to allow for some subtlety in there.

You'll find all the past episodes with audio, video and text transcripts. You listen to it however you want, watch it, or you can even read it if that's how you like to do it. You can also subscribe. There you'll find links to subscribe in the various platforms, and also links to find me on social media.

That's email, Twitter, LinkedIn. Get in touch with me. If you disagree with me, I would love to have a discussion about this because I think it is a bit controversial.

I'd love to hear more arguments for and against. If you've got one of those, or you've got a question because I didn't go over something clearly enough, come on, just hit me up, and we'll talk. Awesome.

My name is Eric Normand. This has been my thought on functional programming. Thank you for listening and rock on.