Avoid Naming at All Costs

Summary: If naming is one of the two hardest things in programming, it follows that every other possible solution (except those few involving cache invalidation) should be attempted before naming something. As a corollary, bad names are a code smell.

Phil Karlton (attributed):

There are only two hard things in Computer Science: cache invalidation and naming things.

Programs used to be written in binary. That is, the only names we had were those the computer understood directly. Over time, we've improved programming languages so that they are better for people to read and write. A lot of that improvement is building in higher-level concepts, such as functions, garbage-collection, etc. But the majority of the improvement comes from the ability to name things.

Naming things helps us organize our ideas about the software^1. A program has to deal with many levels of abstraction. We write about how data gets represented in the machine, how that relates to domain concepts, and what the user is intending to do. Naming things helps us organize those, just like good headings in an outline help us organize ideas about a topic.

And yet it is one of the hardest problems we solve regularly. There are times when I have looked for a good name for hours, only to find none. A bad name can cost a lot. Someone coming in later could be confused, wasting precious cognitive resources.

Naming is hard because of a fundamental property of abstraction: the name does not have to relate at all to what it is naming. Names are just a string of letters. They're not meaningful to the machine, just to us. Names can lie, and that's a fundamental part of carrying meaning. If you could not lie, you could not convey new truthful information, either. And even truthful names can begin to diverge with the original code with time.

Naming is hard because it's a different kind of thinking from the rest of programming. We are coding along, in a nice engineering flow, and all of a sudden, we need a nice, human-readable name. We need to find compassion for the reader from within our cold, calculating programmer trance. This is very difficult.

Naming is hard because names need to be at the right abstraction level. Are you doing a low-level trie operation? Or is it a concept from the problem domain? Another choice to make. But it gets worse! Domain experts invent new words all the time. They're called jargon. And they're very useful. Maybe you should invent a name, instead of trying to find a name. Another difficult choice.

When I'm having trouble naming something, there is often an easy change to the code that makes the name unnecessary. If we can avoid having to name something (while also keeping the code readable), we've avoided a very costly and error-prone process. Here are a few alternatives I use a lot:

  • Inline the code. Inline expressions don't need names. This works really well with anonymous functions.

  • Use threading. Instead of naming each intermediate value, thread the value through the process without naming it.

  • Name something else at a different level of abstraction. We're constantly switching the level of abstraction we're working at. Try going up or down the levels. It could be that there is something easy to name at an adjacent level that does the same thing.

  • Split it in two. Are you trying to name something that's really two things? If the two parts are easier to name, it's a good sign that you should split.

You'll notice these all play with the means of combination instead of naming. Recombine to avoid naming when naming is hard.

Since there are so many alternatives to naming that are easier than naming, it follows that if there is a bad name in our code, it means there might be a better way to organize it that we overlooked. That makes it a code smell. A little (re)factoring can get rid of that name.

Abelson and Sussman in SICP 1.1:

A powerful programming language is more than just a means for instructing a computer to perform tasks. The language also serves as a framework within which we organize our ideas about processes. Thus, when we describe a language, we should pay particular attention to the means that the language provides for combining simple ideas to form more complex ideas. Every powerful language has three mechanisms for accomplishing this:

primitive expressions, which represent the simplest entities the language is concerned with,

means of combination, by which compound elements are built from simpler ones, and

means of abstraction, by which compound elements can be named and manipulated as units.