PurelyFunctional.tv Newsletter 364: Tip: seek the model and enshrine it in code

Issue 364 - February 10, 2020 · Archives · Subscribe

Clojure Tip 💡

seek the model and enshrine it in code

Let's say you wanted to write a CSV parser in Clojure. You want a function that takes a java.io.File and returns the parsed data.

(defn parse-csv [file]

The question is, what should this return?

Many people treat the rows of a CSV file like records. The first row defines the headers, and each following row is a record with each column corresponding to the field defined in the headers. That would imply that you'd parse a CSV file into a sequence of hash maps, because that's how we do records in Clojure. It's very convenient.

You might further try to do some automated inference of types, like Excel might do. You could try to parse the fields in the CSV as numbers, dates, etc. Then the user of your library wouldn't have to do it themselves. Very convenient.

But that's not what you should do. You shouldn't convert to hash maps and you shouldn't infer types. If you did those things, your CSV library might be more convenient for certain use cases, but be more difficult to work with for others. Those might be the most common use cases, but it's not worth it.

Why not?

Well, let's say I'm not in the 80% of use cases where I want hash maps. Now I have to write code that converts it back. But I've lost information! I don't have the order of the columns anymore because it's been put into a map.

And what does your library do when there's a row missing? Does it make a guess? Does it put in nil? Your library now has to cope with situations you, its programmer, can't even imagine. To make it easier for me, you're taking on a monumental task. There are billions of CSVs out there, so that 20% is still quite a lot.

What if I'm not in the 80% of use cases where I want the types parsed? I now have to unparse them. But by parsing those 30-digit decimals into doubles, you've lost information. When you parsed those dates, did you know that the timezone offset was in another column? You couldn't possibly have known the structure of my CSV.

What set of types should you parse? What if they don't parse? Leave the string as-is? You have to make lots of decisions as the library author, and now the end programmer has to cope with potentially different types in their columns. Sometimes it's a string (because it didn't parse as a number) and sometimes it's a number. And sometimes it's a date!

In short: assuming more than you can leads to complexity.

In the CSV example, we should do what it says on the tin: parsing CSV files. CSV is not quite a spec, but a kind of de facto standard. Rows are separated by newlines, columns are separated by commas, and there are rules for quoting and escaping newlines and commas if you need to. That's it. There's no rules for header rows. There's no rules for types. There's no rules for whether all of the rows have the same number of columns. You as a library programmer shouldn't assume anything more. (But you do have to read the docs!)

The correct answer to what should your CSV parser output is a sequence of sequences of strings. That's all the CSV "spec" tells you. Pass the burden of records and types and other edge cases to the consumer of your library, because they know the use case they have.

This is one of the reasons I feel very at home in the Clojure community. clojure.data.csv does exactly what it should do and nothing more. The library can be finished---once the bugs are found and fixed, there's no reason to add more features. The library seeks the model (in this example, the CSV "spec") and enshrines it in code.

In some languages, the CSV parser continues to get feature updates. They missed the model. They have frameworks for defining records and handling the edge cases, frameworks for parsing types, defining new types, resolving types that don't quite parse (what happens if there's a space in your number field, etc). And of course, there are more bugs due to the interactions of those features. Their library diverges because it's got the model wrong. When you find the correct model, your library should converge on a correct implementation.

We can see this same principle at play in the Ring Spec, which defines how HTTP Requests and Responses are represented in Clojure data. Although Ring the library does quite a lot, the Ring Spec seeks to encode the model, the HTTP spec. We could imagine the Ring Spec making all sorts of assumptions about what the request would be about (such as which CRUD operation it represents). But it doesn't. It is a straightforward mapping from the textual representation of HTTP Requests to a Clojure map with keyword keys. It's convergent. And that has let a lot of other libraries bloom in the ecosystem.

To sum up: find the model your library should enshrine. Read about the model and experiment with how to encode it. Seek an encoding that converges instead of diverges.

Single-Page Application Course Launch 🚀

The launch sale continues! Buy the course now for 50% off. The price on the page already reflects the discount.

I've been hard at work creating a new course. The course is in a new format. It's project based, so we build something real and specific out of a set of tools. Think of it like a Lego kit. The kit shows you what you will build. It gives you all the pieces you will need. And it gives you instructions. You build the kit, you learn something, then you can add all the pieces to your collection. The kit leaves you a little bit more capable, with more pieces and skills. That's the idea behind these courses.

In my new course, you can learn to build an in-browser Markdown editor with live preview. We use shadow-cljs (with npm) and Netlify to host it. Check out the free introduction video.


ook update 📖

Chapter 6 is out! Buy it now: Grokking Simplicity.

Also, the liveBook is ready for prime time! This means you can preview the content of the book before buying and you can read it online after you buy. Amazing!

You can buy the book and use the coupon code TSSIMPLICITY for 50% off.

Podcast episode🎙

In my most recent episode, we are reading from The Early History of Smalltalk by Alan Kay. Listen/watch/read here. Be sure to subscribe and tell your friends.

Clojure Challenge 🤔

Last week's challenge

The challenge in Issue 363 was to calculate the moving average of a list of numbers. You can see the submissions.

You can leave comments on these submissions in the gist itself. Please leave comments! You can also hit the Subscribe button to keep abreast of the comments. We're all here to learn.

This week's challenge

total price of a shopping list

I have a shopping list, represented as a vector. Inside the vector are hash maps for each item, which look like this:

{:item :bread
 :quantity 1
 :price 3.50}

There is a 10% tax added to the sum.

Write a function that takes a shopping list and returns a map like this:

{:total-before-tax 75.40
 :tax 7.54
 :total-after-tax 82.94}

As usual, please reply to this email and let me know what you tried. I'll collect them up and share them in the next issue. If you don't want me to share your submission, let me know.

Rock on!
Eric Normand