Clojure Gazette 174: Deepening the tree
Your friendly reminder that if you aren't reading Eric's newsletter, you are missing out…
Lots of great content in the latest newsletter! Really glad I subscribed. Thanks, Eric, for your work.
Eric's newsletter is so simply great. Love it!
Deepening the tree
Issue 174 - May 22, 2016
Please consider advertising in the Gazette.
Hi Clojurists,
I wrote last week that we need to write more, smaller abstractions. That's the key to reducing the risk of developing the wrong abstraction. However, no matter how great your language, abstractions have a line count overhead. Line count is highly correlated with bug rate and maintenance cost. Should we pay the cost of more abstraction?
Here's a snippet of code from one of my projects:
`{" "}(+ x y))){" "};; end of function above it{"\n"} {" "};; one blank line of overhead for separation{"\n"}(defn sum [nums]{" "};; one line to name the abstraction{"\n"} {" "}(reduce + 0 nums)){"\n"}`
I could choose to write the reduce
expression inline, where I need it.
But instead, I chose to increase the number of abstractions and name it
sum
at the top level. That adds two lines of overhead, one line for
the body, and I still have to call sum
in another line. So that's four
lines of code where one line sufficed. 4x! Try it in any language.
Abstraction has a line count cost. For a small program like this, there
are more lines of overhead than lines of useful code. It seems
ridiculous to scale this to programs of hundreds of thousands of
lines .
But not so fast! Things don't always scale linearly. Let's look at the
call graph of an imaginary program. This program is one -main
function
and all of the lines call library or core functions.
The size of this code is eight lines plus one line of overhead to name
it -main
(9 lines). It lies at one extreme of the spectrum: there's
no abstraction, just straight up system calls.
At the other extreme, we have an equivalent function with one line. That function calls two functions on one line. Each of those calls two functions on one line, etc.
The leaf nodes are the same as before. But our cost is now huge. There's one line to name the main, the one line in the body of main, and each function takes three lines to define. Total: 20 lines. So far, it's not looking good for abstraction.
What if you wanted to maximize the effect of adding one more line of
code to the program? In the case of the first program, you could add one
more system call to the -main
function in one line. But in the second
program, you could call one of the functions it already calls again. If
you call c
, d
, e
, or f
, one extra line would result in two
system calls. If you call a
or b
, your new line of code would result
in four system calls! So we know at least in principle that we can get
big leverage if we reuse stuff near the top of the tree .
To take advantage of this effect in our programs, we need our code to
approximate this ideal system. We can do that by making our tree
deeper rather than shallower and making more of our functions reusable.
If a
or b
is not reusable, we can't really call it twice like we did
in the example.
To make our tree deeper, we need to have lots of nodes that call other nodes. Practically, we need to write functions that call other functions that we've written. We need to build abstractions on top of our abstractions .
Making functions reusable is somewhat of an art. It requires a lot of knowledge about your language's features and your domain. But a great start is the rule "small abstractions are more reusable" if only because small abstractions have less that you don't want to reuse.
Each abstraction adds a fixed line count cost to our code, but they let us do exponentially more with each line. If your code base is significant, the exponential growth of system calls will outweigh the linear growth of overhead, so smaller abstractions can significantly decrease line count .
The keys to maximizing the exponential growth of the potential of one line of code is to build abstractions in terms of other abstractions you write (building a deeper tree) and making them small (so they're more likely to be reusable.) There's a third factor that we didn't explore, which is the branching factor of the tree. In our example, the branching factor was two. What effect does a higher branching factor have on our code size and the riskiness of abstractions? We'll talk about that next time.
Rock on!
PS Want to get this in your email?
Subscribe !
PPS Want to advertise to smart and talented Clojure
devs
?