Clojure Gazette 174: Deepening the tree

Deepening the tree

Clojure
Gazette

Issue 174 - May 22, 2016

Please consider advertising in the Gazette.

Hi Clojurists,

I wrote last week that we need to write more, smaller abstractions. That's the key to reducing the risk of developing the wrong abstraction. However, no matter how great your language, abstractions have a line count overhead. Line count is highly correlated with bug rate and maintenance cost. Should we pay the cost of more abstraction?

Here's a snippet of code from one of my projects:

                  `{"    "}(+ x y))){"               "};; end of function                         above it{"\n"}                         {"                            "};; one blank line of overhead                         for separation{"\n"}(defn sum [nums]{"            "};; one                         line to name the abstraction{"\n"}                         {"  "}(reduce + 0 nums)){"\n"}`
                  
                

I could choose to write the reduce expression inline, where I need it. But instead, I chose to increase the number of abstractions and name it sum at the top level. That adds two lines of overhead, one line for the body, and I still have to call sum in another line. So that's four lines of code where one line sufficed. 4x! Try it in any language. Abstraction has a line count cost. For a small program like this, there are more lines of overhead than lines of useful code. It seems ridiculous to scale this to programs of hundreds of thousands of lines .

But not so fast! Things don't always scale linearly. Let's look at the call graph of an imaginary program. This program is one -main function and all of the lines call library or core functions.

The size of this code is eight lines plus one line of overhead to name it -main (9 lines). It lies at one extreme of the spectrum: there's no abstraction, just straight up system calls.

At the other extreme, we have an equivalent function with one line. That function calls two functions on one line. Each of those calls two functions on one line, etc.

The leaf nodes are the same as before. But our cost is now huge. There's one line to name the main, the one line in the body of main, and each function takes three lines to define. Total: 20 lines. So far, it's not looking good for abstraction.

What if you wanted to maximize the effect of adding one more line of code to the program? In the case of the first program, you could add one more system call to the -main function in one line. But in the second program, you could call one of the functions it already calls again. If you call c, d, e, or f, one extra line would result in two system calls. If you call a or b, your new line of code would result in four system calls! So we know at least in principle that we can get big leverage if we reuse stuff near the top of the tree .

To take advantage of this effect in our programs, we need our code to approximate this ideal system. We can do that by making our tree deeper rather than shallower and making more of our functions reusable. If a or b is not reusable, we can't really call it twice like we did in the example.

To make our tree deeper, we need to have lots of nodes that call other nodes. Practically, we need to write functions that call other functions that we've written. We need to build abstractions on top of our abstractions .

Making functions reusable is somewhat of an art. It requires a lot of knowledge about your language's features and your domain. But a great start is the rule "small abstractions are more reusable" if only because small abstractions have less that you don't want to reuse.

Each abstraction adds a fixed line count cost to our code, but they let us do exponentially more with each line. If your code base is significant, the exponential growth of system calls will outweigh the linear growth of overhead, so smaller abstractions can significantly decrease line count .

The keys to maximizing the exponential growth of the potential of one line of code is to build abstractions in terms of other abstractions you write (building a deeper tree) and making them small (so they're more likely to be reusable.) There's a third factor that we didn't explore, which is the branching factor of the tree. In our example, the branching factor was two. What effect does a higher branching factor have on our code size and the riskiness of abstractions? We'll talk about that next time.

Rock on!

PS Want to get this in your email? Subscribe !
PPS Want to advertise to smart and talented Clojure devs ?