Why you shouldn't hide your data

This is an episode of Thoughts on Functional Programming, a podcast by Eric Normand.

Subscribe: RSSApple PodcastsGoogle PlayOvercast

In OOP, we wrap our data in an interface, which is called implementation-hiding or data-hiding. In functional programming, we don't do that. We use our data in the nude. We pass the data around and allow the context to interpret the data as it seens fit. In this episode, we look at this significant difference between OOP and FP and how to do it.

Transcript

Eric Normand: Why you shouldn't hide your data? By the end of this episode, you're going to understand the benefits of having your data exposed to the world, why functional programmers like it that way, and how to avoid some of the major pitfalls that you might come across if you do that. I want to end with a process that you can use that will help you design your data.

Hello, my name is Eric Normand, and I help people thrive with functional programming. This is important because this is one of those very significant, philosophical differences between object-oriented programming and functional programming, so it's worth digging into and emphasizing it, spending some time there.

The first idea that, I think, it's important to understand, if you're an object-oriented programmer, you might be wondering why don't you want to encapsulate your data? Isn't that a good thing? The question is why would you hide it in the first place? Why would an object-oriented programmer hide it in the first place?

In the object-oriented world, you have a bunch of mutable state. Instead of having it all grouped together and living in this big soup of mutable state, you say, "Hey. I'm going to take this little piece of mutable state, one little chunk of it, wrap it up in an object." So you can't directly get at the state.

I'm going to use the methods to ensure that the state is kept consistent. You might have two values that have to be kept in a very precise relationship. If I change one, I need to change the other.

[pause]

Eric: What you're doing is you're creating a level of indirection that does not allow them to get out of sync. The programmer writing to that interface doesn't have to worry about them getting out of sync anymore. They can put that out of their mind. It's compartmentalized. It's modular. It's wrapped up. That's the main reason why an object-oriented programmer wants to not have direct access to the data.

They need the data at some point. They need to know what those pieces of data are, but they have to ask. By asking through this interface, it allows the consistency.

Functional programmers, typically, are either not using mutable data or very sparingly are they using mutable data. A large part of that problem goes away. We don't have the same scale of problem that requires that kind of encapsulation.

If your data isn't changing, once you create it, it's fixed. You can create it in the right relationship and then you're done. You don't need to hide it. You don't need to encapsulate it.

It might be very useful to go and talk about what data actually is. If you look in the dictionary, it'll say that data is facts about events, so something happens and you write down what happened as much as you want about that thing.

The user bought this shirt. Even more precise, the user clicked the "Buy" button on this shirt. Or the thermometer said 25 degrees. That's at this time. It's an event that happened. We got a thermometer reading, that's the event. We wrote down some information about it, the fact.

What is the current value of the thermometer is different from the value of the reading, which is why we can say it's immutable. At that reading, this is what we got. It doesn't matter what it is now. This is what we got at that time. There's a certain structure to that data, and that structure is also immutable.

What functional programmers like to do is leave it as data. Do not wrap it up. Do not make a new API to talk about this data, or to manipulate this data, to read this data, because we're not modifying it. Do not make a new API, let it be itself. Let it be this piece of data that can move from context to context, and that context determines how they want to read it.

In one context, you might interpret, let's say the temperature, and think of it as you want to classify it into T-shirt, long-sleeve shirt, sweater, or jacket weather. You bucket it into those categories, so that you can tell the user what to wear that day.

Another part of the system would use that same bit of data and say, "Well, I'm collecting these things, and I'm going to show a graph over time of the temperature." Another part of the system might say, "Well, I'm going to record how often the temperature gets read. I can make sure, like a monitoring service, make sure that we're getting frequent enough temperature readings."

These are all valid ways to interpret that same data. The fact is the same, at 3:00 PM, the temperature was 25 degrees, but the usage is different. We want to keep it as raw and factual as possible, so that it can be as useful as possible to all these different contexts. Just to contrast it, not to say it's bad, but in an object-oriented world, you would need to deal with all of these contexts.

Typically, you would say, "Well, we're going to add methods to that temperature-reading class and that reading might be converted into what you should wear." Is it a jacket weather, or is it T-shirt weather? Another reading might be some way to turn it into an average, another method would be turn it into an average.

Typically, what happens is this class has to service so many different contexts that what you wind up doing anyway is putting a getter that says, "Get the temperature," and then those contexts read those getters. It doesn't have any functionality. It's not protecting anything. It's giving you the data.

If you're going to do that anyway, you might as well expose the data. Let it be exposed, be visible all the time, because then if it's reusing an existing data structure, then you don't have to learn a new API. You already know how to access that data structure.

This is one of the objections that functional programmers have, "You're creating this new API that I'm forced to use in all these different contexts, and at the end of the day, you need a backdoor." You put all these getters on there in case the API isn't complete, and then that's what people wind up using anyway. It doesn't make sense to go this roundabout way.

In practice what I've found, this is my experience, is that the context wind up defining their own operations on the data themselves in a way to interpret it. Instead of having some abstract universal API for this piece of data that is useful in all contexts, the only thing that's really useful in all contexts is getters.

Then you have each context having its own interface which determines how to interpret that data. I've really churned through that pretty well. You just treat it as data. Different languages do it in different ways.

In Clojure we tend to use a lot of maps with keywords as keys and then the values in the maps, the data values are in the values of the keys. We use the keywords to name the values and that gives it the structure. Its key-value pairs where the names are well understood among all the contexts.

Its getters. Don't get me wrong, its getters, except they are not custom for each piece of data. It's still a HashMap, you can still do what you need to do, what you'd normally expect a HashMap can do. You can list all the keys, for instance. You can put the keywords into an array or a vector.

Now you can say, "I only care about these four keys, give me a map that just has those four keys in it." This is an operation called select keys, as opposed to an object that does not custom and bespoke for that one type, for that one usage, that one concept.

That class does not participate in all these other operations that are common. You've lost out on this opportunity to be a deeper part of the ecosystem. I don't want to beat that horse too much. We like to leave things as data. Now you have Haskell which you, typically, would define a new type.

They don't use HashMaps the way we do in Clojure. You define a new type to represent the thing and yet, that type is still totally exposed. Once you know the type, you can get all the data inside. It's available. In Haskell, you get more leverage from the type classes that can be automatically derived from the data, like a breather in a printer.

Here's the thing though, even object owner programmers nowadays are already doing this and you can tell that they've been doing it for while, and they've...I don't want to go too deep into the object-oriented programming, but you yourself are probably already doing this.

When you call an API endpoint and you're passing on JSON or XML, that's data. That JSON that you send or that you get back, that's data. It's not an object with methods that are hiding your data. It is a piece of data. It's raw and in a sense, it's immutable because it's a copy.

The server serialized you a big JSON value and you de-serialized it into your own memory and it's a copy. Changing this thing is not going to change what's on the server. In a sense, it's immutable. You know what the server told you, that's the event. This is what the server told me to this request. I have a fact about it. This is what it is.

Now, you notice a lot of enterprise Java systems tend to serialize things a lot, serialize things to X amount and then even sometimes read it directly back in which is weird but when you look at it from the perspective of they're making copies of data and what they really want is the data itself.

They don't want all this interface around it, they're going to serialize it and read it back in again. There's something going on there, there's a reason they're doing it. It's very expensive to do, but they want the immutable data. They want the ability to get at the raw stuff. The thing is that functional programmers...How do I say this?

They want to have that same feeling of, "I called this method," or "I call this app endpoint and I got the JSON back and I have the data." They live in that, all throughout their whole system, not just that the API boundaries.

They're always there. They're always dealing with data. Haskell's going to be much stronger typed when you're within the language and not reading in random JSON from around the Internet. You've got much more control and knowledge of what each type is but you're still handling data. You're converting data from one format to another, from one structure to another. Its data and it feels nice. It really does.

Benefits. The first thing is you don't have to write all those getters and setters. That's huge. You're going to write them anyway, why not make that how you access the object as the baseline. You can defer writing all this huge interface until later when you understand the context more and you're starting to find how you're going to use this piece of data and then find that into some functions.

Another thing is when you're using the data structures that come with your language, they've already got printers. They've got readers. This is if you've got a decent language. For instance, if you're using objects, arrays, numbers, strings and JavaScript, you can serialize that to JSON without even thinking about it. Read it back in, not even thinking about it. It's just one function call.

If you've got a custom class, you have to come up with a way to serialize it. How are you going to represent that on the wire when you're going to send it over? To me, as a Clojure programmer who's used to saying, "I'm just working in maps and other data types," I can serialize it to disk if I want to. So freeing, so freeing. I don't have to worry about writing all that stuff myself.

Plus, they have known interfaces. If I get a HashMap, I know what I can do on it. I don't have to go look up the API docs. What are the methods on this object? I can print it out at the repo and just see, "Oh, here are the keys that it has. These are the ones I want to access."

There's something freeing about that. It gives it that immediacy. It feels if you had an object-oriented system, you would be like, "OK, you have all these different components. Each one has its own user manual. You have to read all the user manuals before you can start using the device."

Each class is like its own little device type. In functional, it's much more like, "OK, there's four main things, four main components. You learn how they work and now you're putting them together in different ways." It's like there's four kinds of LEGO blocks. You figure out how they work together and how to use each one individually. Boom, you're done.

Instead of having all these components that are all custom and all bespoke, you have four LEGO blocks. You can create millions of combinations when you have those. The last thing is you can always write functions to canonize or specify how you want to use the object, the data.

I've had another episode, wow, like 8 months ago, where I'd said, "Define your interface first." A couple of people, not a lot of people...I think, I was being pretty clear, but a couple of people were like, "What? That's like object-oriented programming. How can you do that? You want to leave the data as data."

Not saying when you write an interface, you're changing the way you access the data, at all. This is what I'm saying, that each context is going to use that data in a different way, and at some point, you can start taking all that...

In one context, you can start saying, "This is how this context wants to interpret this data," and you write it into functions. Then you call those functions instead of duplicating that functionality all over the context. That's all I'm saying.

By thinking about the interface, and especially, if you've got different context that you can foresee having, that those interfaces will influence the data and how it will be structured. That's what I'm saying. You can always write functions that define how this context is going to interpret this data. That's an interface. It's just that each context has its own interface, and its still data.

In Clojure, it's still a HashMap. In JavaScript, it's still JSON. It's still data, but this context will say, "I'm looking at these four keys, and I need to manipulate this part of it." The interface can come later. You can differ it, or you can think about it upfront if you want. It's up to you.

How do we do this? Just use data. Use the existing data structures that you have available in your language. Or if it's something like Haskell, that makes it really easy to define new data types, then use those. Rust, Haskell, Elm, these are great languages for defining new data types, but they're data types. They're not classes. They're not objects. They're data.

Start thinking of your data not as the current value of something but as immutable facts. It's like if someone wrote down a fact on a piece of paper, and then boom, that's it. That's the fact.

What if they were wrong? The fact is they were wrong, so you want to remember that. If they were wrong, that's the problem with computers, is garbage in, garbage out. You don't want to overwrite what they said, because what they said is what you got, was what got recorded down. What you need is a new [laughs] event, a new piece of data that represents the correction.

It's like a little new connection you've got to make, new way of thinking about it, that a fact is a fact and data is writing down the fact. Another thing is that as these pieces of data evolve over time, as your software evolves, they're going to change. The data types are going to change. The data that you represent is going to change.

One thing that helps with that evolution is to assume the least about the data as possible. You're going to have to assume some things, like, "I can't calculate the area of this circle if I don't have a radius. I can't calculate it." You don't need to know the color of the circle to calculate the area, so don't look at the color. Look for the radius.

As it evolves, more keys will be added, and that function will still work. More data will be added to that piece of data, but the radius will still be there. It will still work. That is one way of protecting, future-proofing your code. You shouldn't look at stuff that you're not interested in at that point in the context.

The other thing is, if it has keys that you don't need then let it be. "OK, you can pass me anything else, but as long as I have that radius, I'm good." Don't tightly validate, like, "I need a piece of data with only a radius," because then you're not allowing for future evolution. Let it be, let it pass through. "I'm just going to look at the radius. Whatever else is in there, whatever." That's a tip.

Let me recap. Data is data. It's just facts about events. It's in the dictionary. Look it up. It's immutable. A fact is a fact. If you ask someone a question and they answer it, the event is the answering and the fact is what they said.

They might've been wrong. They might've changed their minds, but that time that they said that thing, it happened. That's the only thing we can rely on, unless we ask them again in an hour, "What do you think now?" That's the thing that's going to circulate through the system. We can't know what's inside their head when they change.

You're probably already doing this, that you're doing it between an API boundary, this system talking to that system, this service talking to that service. You're sending JSON, or you're sending XML, or some other data format. It's just data. When you send it or when you get it, it's just data.

Functional programmers extend that more. The whole system is just data, why not? It has a lot of benefits. I certainly enjoy them. Encapsulating your data also has benefits. It does. It's sometimes nice when you have a really expressive, powerful interface to that data that comes along with the data.

It's nice, but it also has some costs. You have to write your own printer. You have to have a reader. You have to learn the interface. You have to read the manual. You have to understand how it's working. You have to trust it, that's another thing.

The benefits of not having to do all those things, it's much faster to get started, all the getters are known. Number two, they have all the printers, and the readers and a known interface. Number three, they already participate in all of the existing stuff that those data structures can participate in.

You can always add an interface program. That's what programming is. You're making functions that interpret that data in different ways.

So you need immutable data, and then the big one is, if you don't care about the color, do not look at the color. Look at the radius if you want to calculate the area. Number two is, don't worry if someone passes you in something you don't understand. You don't need it, don't worry about it, just goes to the next function how it is.

One thing that we do in Clojure a lot, which might scare some people but we really benefit from it, is you could have a function that takes the circle data structure. It's a map, has a radius key with a value. We're ignoring all the keys, but we write a function that takes that map, calculates, it takes out the radius, and then puts a new key in that is the area, area in like a hundred. It multiplies the radius, πr², and puts it into the map.

We do that sometimes. We're like, "I want to calculate the area one time. I'll add it to the map, so I don't have to calculate it again." If something needs the area, it's in there.

What we have to do is remember to return the modified map from that function, because it's immutable, we have to get it out of the function somehow. We don't return a map with radius and area. We've to return the original map with the area. That's another tip. Always be adding data. Always have more data.

I thought of a cool assignment for this. If you're an object-oriented programmer, and you're used to sitting down and making classes and designing these interfaces, what would it be like to pass around the data? What would that look like?

Where would you put all that functionality that was going into the methods? What if you didn't have to modify the data? What if it was immutable? What would that look like? Where would you put that functionality? What functionality wouldn't you need if you didn't have to modify it?

It's a thought experiment, because this is what we do in functional programming. I don't think I could sit here and tell you, "You're wrong." There's existence proofs that there's a lot of functional systems out there that are doing what I'm saying, that are operating on data, big systems that have been in production for a long time.

Think about what would it be if you didn't do that? I'm not asking you to change your opinions or anything, just thinking from a different perspective.

If you've liked this, wow, it's gone on pretty long. I'm already at 33 and half minutes. If you've liked this, please do me a favor and subscribe, because then you'll be notified when other things very much like it come out.

If you [laughs] think I went too long on this, if you think it's unimportant, if you disagree, if you agree, I would love to hear from you. I want to meet people. I'm broadcasting these things out here, and I want to know that people are listening.

I'm not always confident of what I'm saying, [laughs] even though I believe it. Once you say it, you're like, "Huh, people might disagree with this," or "I've never talked to anybody about that. I wonder what they say." Please, tell me what you think, because that's what I'm doing here. I'm trying to engage with more people, and that's why I'm broadcasting.

You can email me at eric@lispcast.com. Lispcast, L-I-S-P-C-A-S-T. You can find me on Twitter, I'm @ericnormand, with a D. I'm also on LinkedIn, so find me there. Take care. See you next time. Thanks for being there.