What is Data Orientation?

We often talk about data orientation in functional programming circles. It basically means programming with data, without hiding your data. Our software is information systems, so why not treat the data in the raw? In this episode, we dive into what is data, what data orientation is all about, and how you program with it.

Transcript

Eric Normand: What is data orientation? By the end of this episode, you will have a better idea of what we mean in functional programming when we talk about data orientation. It's a very common idea in functional languages.

Hi, my name is Eric Normand and I help people thrive with functional programming.

Data orientation really quickly, it's programming with data. Let's go a little deeper than that.

What is data? I'm going to start with the basics. Now you look it up in the dictionary and it says it's facts about events. Something happens, an event, and now you have some facts about it. Maybe you measured the temperature, that's a fact about the event of measuring. This is what the thermometer told me, it's 80 degrees.

They're facts about events, that's a very clear definition. When we're doing data orientation, we're programming with data. A lot of our code becomes data transformation. We're getting data from one source and we're processing it. We're storing it, we're transmitting it, we're doing something with this data.

That's one of the reasons why I like data orientation is we're making information systems, usually. Especially on the back end, we're taking in information, we're taking in HTTP requests. We're taking in transactions or some other API calls, or we're reading sensors. We're doing something, we're taking in information and we're doing something with it.

Data orientation is basically saying, "Let's leave it as data. It came in as data, it's going to go out as data. Let's treat it like data the whole way through."

Data has some interesting properties that make it good for this. One is that it has structure. We have — computer scientists that is — found different structures of data that are both efficient to store in memory and use.

Also, ergonomic. We've come up with algorithms that use them. They are convenient for the programmer. Talking things like arrays and hash maps, numbers, strings, all these things that we're used to. These are just data.

You can build structure, you can rely on the structure of those things. Data comes in, it's got certain attributes. Those attributes might have a name and a value. It becomes natural to think maybe this would be a good candidate to use a hash map for because I'm going to be looking up these values by the attribute name.

That's the structure part. Sort of a plus and a minus, it's a double edged sword is that data requires interpretation. It's not meaningful by itself. If I said the number five, you have no idea what that means even if I put a unit on it. Five pounds, you still don't know what it means.

It requires context before it can be used. That context might be embedded in the data, it might be from the software or the interpreter's perspective. It really requires interpretation. You have to have a purpose. Why am I reading this? What can I learn from it? What decisions can I make from it?

All of that stuff means that without those things, the data doesn't have any meaning by itself. As an example, I know this is silly but I look at data as a really old tradition. It goes back to the earliest record keepings that we have.

Before the invention of writing, you could mark the number of cattle that you were trading with someone on a stone tablet or something, or a clay tablet. You're counting, and you're keeping records. Each little mark is a fact about an event. A cow pass through this gate, you're marking that and then you have this aggregate data.

They're just marks on a clay tablet, but if you know how to interpret it you can glean a lot from it. Then the clay tablet gets buried somewhere. 4,000 years later, an archaeologist finds it and now they have to interpret it. They might figure out from context, "Oh, this was counting cattle."

What the person who originally made this thing was concerned with was getting a fair deal. They wanted to make sure that everyone's getting paid the right amount of money. That's their concern. Now this archaeologist can interpret the same marks in a totally different way.

The archaeologists can now say, "Oh well, let's learn about the economy of this civilization. Oh look, this year was a really good year. Look how many cattle came through here. Now this year is really bad because it had a fewer number. They must have been hungry at that time. Then that might explain why they went to war."

Whatever they can glean from it, that is now a different purpose from the same data. That's what I mean when data requires interpretation. That's like it's meaningless by itself, but on the plus side, it can be interpreted in multiple ways. It's neutral. You're writing down the temperature, and what do you do with that? Well, that is up to you.

At this point, I said data orientation is something I like because we're writing information system for taking in data, we're writing it out again. We're storing it, we're transforming it, processing it, generating new data from that original data. It seems like, "Well isn't that what programming is?"

I would say, "Yes. Except there are some styles of programming, some paradigms you might say, where things like data hiding are more prominent, are important parts of that paradigm." Like in object oriented programming, you might say we want to hide this data behind an interface.

We want to hide the pieces of information, the facts that we have. You're going to have to call methods or send me messages. I will do the interpreting for you. That's basically what object orientation is saying — I will do the interpretation for you. Data requires interpretation and so I will be your interpreter.

Sometimes that's nice because that interface lets you do polymorphism and stuff like that. Different things have different data, but they can still have the same interface. That's really nice, but that's not data orientation.

I'm trying to juxtapose the two. If you have hiding, you're adding a layer of interpretation on your data. That is required. You have to go through this interpreter layer to read the data. It makes it much harder to interpret the data in multiple ways.

When you have the data raw, you can interpret it however you want. That is data orientation — leaving it raw so that you can interpret it. Different parts of the system have different purposes. They can read it and do different things with it.

If you need to hide it, then what you're doing is you're going to bake in different interpreters into that interface. This will just make the class get bigger and bigger because you're going to need methods that this thing needs over there. You're going to need methods that that thing needs.

Or you could have some system for having different interfaces on the same data but then how do you get the data between them? It's really hard.

In data orientation, we just prefer to not hide, just expose it. We'll let whatever interpreters who want to run on it, who want to interpret it, do whatever they need to do. That is data orientation.

All right. I'm going to recap. Data orientation is just programming with data. No hiding. Data is facts about events. Data-oriented programming, you tend to do a lot of data transformations. Your system is inputting, storing, transmitting, processing data. That's what it does. Everything you're doing is something...not everything but most of what you're doing is one of those things.

Data has structure to it. It needs that structure because you need some common way of understanding, of using the data. What I mean by that is you know the structure of a string. You can access each character by index and the characters are going to be some kind of unicode thing. You can append two of them. It has a structure that we understand.

Then it requires interpretation. It's a double-edged sword. It requires it, but it allows different ways of interpreting the same data.

Awesome. If you like this episode, you should subscribe because I've got two more episodes lined up. They're right here. Here's my notes. Two more of these ready to record, and they'll be coming right down after this one. You should subscribe, and you'll get them.

If you want to, you should go to lispcast.com/podcast. You'll see links to subscribe and also to get in touch with me on social media. I love getting into discussions. A lot of these topics came out of questions that people had because I said something, and they didn't understand it. I was being confusing. I needed to explain more, so this is what this is.

I'll also love just getting into discussions. Maybe you disagree with me. Maybe you have a different idea of what data orientation means.

On lispcast.com/podcast, you'll also find all the old podcasts, the previous podcasts, the episodes with audio, video, and text transcripts, so you can consume it however you want.

All right. This has been my thought on functional programming. I'm Eric Normand. Thanks for listening and rock on.