What is Data Modeling?

Recorded by Eric Normand. Published: July 29, 2019. Updated: September 20, 2019.

This is an episode of The Eric Normand Podcast.

Subscribe: RSS Apple Podcasts Google Play Overcast

Data Modeling is a common technique in functional prgramming. It means capturing the essence of the concepts of your domain, their attributes, and their relationships in data.

Transcript

Eric Normand: What is data modeling? In this episode, I want to talk about what it is, the definition of this term, and why we talk about it so much in the functional programming community.

My name is Eric Normand. I help people thrive with functional programming. I went outside today, doing an old-school season one style walk-around episode. If you're hearing some noise on the audio, that's what's going on.

All right. Data modeling is something that we do in functional programming. Basically, with any kind of programming, you're dealing with some problem domain, some business problem. It might not be a business, but it's some domain that you're trying to represent in the software, in the memory of the computer.

Depending on the paradigm, you would use different parts of the language, different features of the language for representing those things. Data modeling just refers to representing those entities and concepts, their attributes and the relationships between them using data, using just data structures.

In an object-oriented system, you would probably do an object-oriented modeling where you would use a class to represent a concept. Then the class would have member variables. Those would represent the attributes. Then you'd use references between the classes to represent the relationships. That would be like a "has a" or a "has many" relationship.

That's a very standard way of representing a domain using classes in an object-oriented language. We don't do that in functional programming. What we tend to do is just represent it as data.

If you're using a typed language like Haskell, you would define new data types that have the attributes and relationships you need built in. If you're using something like Clojure, we tend to use the built-in data structures like the hash maps, and vectors, and things to represent those attributes.

What becomes interesting is very often in a problem, the relationships are more important than the entities. The entities are important, but they're not as important as the relationships. In those cases, you're probably better off making that relationship first class.

Let me give an example. I could have an entity. Let's say I want to build a graph. The nodes are my entities. Then the edges between the nodes, I could represent those as pointers. I go to a node. This could be in OO. It could be functional. It doesn't matter. I have some pointer to another node. The node has a list of another nodes it is connected to. I can walk this graph.

What if I invert that, and what if I say, "I'm going to keep a list of all the edges?" An edge would be...Let's say it's just a tuple of node A to node B. It becomes a pair, as a directed graph. I could have both ways, but they're represented separately.

I'd say, "Here's a list of all of the edges." In that case, the relationship, the edge is what's important. If it is important, that's the way I'd want to represent it. Now, I can count edges really easily. I can do stuff like walk through that list.

I can create an index of, give me all the things that A is pointing to, give me all the things that B is pointing to, all the things that C is pointing to. I can make an index of those, and also the reverse index and make that one too instead of seeing the graph as pointers that I have to walk between.

This is the kind of thinking that we do in data modeling. How can I represent this domain, this domain of graphs in a way that will allow me to do what I need to do with my algorithms. That's basically all it is.

In functional programming, in opposition to object-oriented programming, we tend to keep our data separate from the functions, from the behavior. That gives us a tremendous amount of flexibility, because there's no thing called a node anymore where the methods go.

I don't know if it's a psychological thing or a mental thing, but when you start making a hierarchical ontology of graphs in your type system or in your class inheritance system, in my experience, I get stuck. I can't do these mental manipulations anymore.

I've got this class called node. I've already added 10 methods to it. I can no longer think of, "Oh, a graph is just a collection of edges." I still see it as a graph is nodes connected with references. It just makes it a lot harder to switch the way I see it.

Whereas I find with data modeling, I see it in a different way. If I started with nodes as primary and had edges secondary, but I have an algorithm that needs the edges primary, I just think of it, "Oh, I just need to create a function that transforms a node first, node primary representation into an edge-first representation, and then I'll have what I need."

I see it as an advantage. You start to think more flexibly. It's just my opinion. It's my experience, but it's what I see.

All right. This has been me talking about data modeling, why we do it as functional programmers, what it is. You're probably doing it too already. That's what we do. We take a problem domain, and we represent it as data so that we can run algorithms on it. It's pretty simple.

This has been my thought on functional programming. I'm Eric Normand. You can find this episode and past episodes at lispcast.com/podcast. You'll also find links to subscribe into my social media. I love getting in touch with people. I love having people ask me questions, get into discussions.

If you agree or disagree, please get in touch with me in the best medium for that particular question. You be the judge. No need to ask me how best to get in touch with me. If you think Twitter is the right place, let's do it on Twitter. No problem. Awesome. See you later. Bye.