Growth and breakage

Summary: For the most freedom to change, prefer required arguments and optional return values.

In his 2016 Clojure/conj keynote, Rich Hickey talked about the evils of breaking changes. He made a strong statement that "change" is either one of:

  1. Growth
    1. provide more
    2. require less
    3. bash bugs
  2. Breakage
    1. require more
    2. provide less
    3. different stuff under same name

I've been wanting to systematically analyze what kinds of changes are breaking and what kinds are growth. For this analysis, we're going to look at a function called request in an imaginary HTTP client library. It makes a web request given a request map and returns a response map.

(request {:method :GET :url "http://example.com/"}) ;=> {:body "OK!" :status 200}

Notice that the function takes a map and returns a map.

About the schema

Let's look at a very simple schema. A key can either be unused, optional, or required. Now we can make a table showing the results of changing from one of those schemes to another.

Often this schema gets overloaded. An API might say a key is optional, but also say "you need one of these three optional keys". That is, each one is optional on its own, but together, one must be present. Or the docs might say "if key X has the value strict, then the optional key Y must be present." In other words, they're conditionally required, and they're overloading the optional scheme.

To me, these show a lack of fit between the simple unused/optional/required schema and the semantics of the API. The interdependencies between keys must be relayed in the documentation and not in the schema itself. However, in my experience, it is common practice, so I will analyze it.

Because these out-of-band schemas are so varied, I'll ignore the details and model them with a fourth scheme: conditionally required.

We have four possible schemes:

  1. unused: the key has no defined semantics (per the library) and so should be ignored.
  2. optional: the key defines semantics for when it is present and when it is absent.
  3. conditionally required: the key defines semantics and must be present depending on the semantics of other keys.
  4. required: the key defines semantics and must be present.

We will assume that unused keys be ignored by the consumer of the key. The consumer is the library function in the case of arguments and the library client in the case of return values.

Arguments

For our example, our function takes a single map as an argument. However, that gives us a great place for growth and breakage: We can change the keys in the map.

Let's read an example from this table. If we have an optional key and we want to change it to required, that's a breaking change.

Before ↓ / After →

unused

optional

conditionally required

required

unused

(not change)

optional

(not change)

conditionally required

(not change)

required

(not change)

Let's go over each change.

unused → optional

In this case, existing clients have not been sending this key. And now we're making it optional. So the existing call sites are safe, as long as we haven't changed the semantics.

For example, we could add a new optional key :timeout. If you don't pass the key, we do the same behavior as before (not timing out). If you do pass the key, we will time out.

unused → conditionally required

This one is questionable. It could be okay, but you have to navigate it carefully.

Here's a case where it breaks things. Let's say you add a conditionally required key :on-invalid-tls that determines the behavior of the request if the TLS certificate doesn't validate. This key is only required if the :url starts with "https". However, existing clients have already been passing URLs with "https", so this will break them.

On the other hand, we could add two keys at the same time to get this functionality. We could add an optional key :validate-tls. If that key is set to true, then we require the :on-invalid-tls key. Notice this doesn't break existing clients. Nobody has been passing the :validate-tls key so far, so no existing client will trigger the condition.

unused → required

This is a breaking change. If the new version requires a key that was previously unused, all existing call sites are now broken!

If we add a new required key :timeout, we break all existing clients that called that function.

optional → unused

This one you need to be careful with. It's more complicated than it looks.

  1. Your clients either send the value or they don't. And both of those are still valid when the library stops reading that key.
  2. However, it is very hard to make this change without changing the semantics of those function calls.

Let's look at two examples.

Sometimes we want to deprecate an optional key. Let's say we had an optional key called :prefer-https. If you passed it true, it would try https versions of urls first, then fall back to http. But now we want to ignore it. Existing call sites might rely on the semantics of :prefer-https, and now they're broken.

However, sometimes the key is not so important to the semantics. For example, what if we had a function that took an optional sort function. The idea was you got to choose which sort algorithm was used to sort the results. Over time, though, the developers decided that was too much control with little benefit. They want to deprecate the :sort-fn key and use the same sort function every time. This is okay. It doesn't change the semantics in any significant way.

optional → conditionally required

This is questionable for the same reason going from unused to conditionally required was questionable. We may be able to navigate this if we don't break existing clients. See the examples for that above.

optional → required

This one will break your clients. Some clients send the key, and some do not. The ones who do not are now broken.

required → unused

This one is questionable for the same reason moving from optional to unused is questionable: Ignoring a key that was previously required may change the semantics. As long as you're not changing the semantics, it is safe.

required → optional

This one is growth. All of your existing clients are passing the key, so they don't need to change.

required → conditionally required

This one is safe because all clients are currently sending the key, and that is valid.

Discussion

On ignored keys

We've been working under the assumption that we are truly ignoring unused keys. It means that a client can pass data that the function doesn't need and knows nothing about.

This can cause a problem: If the function considers a key unused, but later makes it optional or required with a certain semantics. Existing call sites were okay sending their version of the key, now it means something different.

The solution here is to reserve keys. In Clojure, we do that by using namespaced keys. The library can say "Please don't use keys from this namespace except for those we specify." Because at any point, a key in that namespace could be assigned a meaning.

This means that we shouldn't have used namespace-less keywords. Instead, we should have namespaced them:

me.ericnormand.http/url
me.ericnormand.http/method
me.ericnormand.http/timeout
etc

On which schema to prefer

The table also reveals the preferred schema for arguments: required.

Look at the row for required in the table above. I'll reproduce it here.

Before ↓ / After →

unused

optional

conditionally required

required

required

(not change)

Notice that there is no red! No change is breakage. The other two rows have breakage. The required row contains a questionable change (required → unused), but those changes can be navigated to make them growth. To get the most freedom to change your function in the future, you should use required arguments. You can always make them optional or unused later. One corollary is that you shouldn't make an argument optional with no good reason.

Now, you can't require every key. Many of your keys are going to be unused. The only option for bringing in new keys is to make them optional.

Argument Summary

  • Ignore unused keys.
  • Use keys in a reserved namespace.
  • Prefer required keys at a function's inception.
  • New keys need to be optional.
  • Don't change semantics when deprecating keys.

Returns

Return values are an entirely different story, almost the opposite of arguments.

For the examples, we'll still be using the same function, but this time we're talking about the return map:

(request {:method :GET :url "http://example.com/"}) ;=> {:body "OK!" :status 200}

We're not going to handle conditionally required for return values. In my experience, conditional return values are equivalent to optional values.

Before ↓ / After →

unused

optional

required

unused

(not change)

optional

(not change)

required

(not change)

Symmetrical to our assumption about unused arguments, we will assume the client ignores keys they don't need or understand.

unused → optional

Adding new keys to a return map doesn't break anything. Existing clients were not relying on the key, and now the key is present, and they will ignore it.

unused → required

Again, existing clients were not relying on the key, so they won't break.

optional → unused

This one is trickier but still safe. Because the key was optional, clients had to handle both cases: 1) when the key was present and 2) when the key was not present. The key is now always not present, which the existing clients could handle.

optional → required

This one is safe. The existing clients had to handle both the presence and the absence of the key. Now it's always present, so the clients should be able to handle it.

required → unused

This one is unsafe. Clients were previously able to rely on the presence of the key, and now it is absent.

required → optional

This one is unsafe. Clients were previously able to rely on the presence of the key, and now it is sometimes absent.

Discussion

On which schema to prefer

In this case, we see the opposite as before. Starting with required keys gives you the least freedom. You essentially need to support it forever. unused and optional keys are actually the most free.

This finding directly contradicts my tendency. I want to be nice and please the clients. So one of the first things I do is list all the nice data I'm going to return, sometimes even when no client needs it yet! I'm promising so much. But that promise is a promise that can be broken.

For example, I might add the IP address of the server to the response of the request function. Why not? I have to look it up anyway and it may be useful.

But! Now that commits me to always returning the IP. It may not be so convenient to return that in the future.

I need to rethink that tendency. I'd rather promise that I'll never break my clients than promise some return value I may not be able to provide. I could either not return the IP or make the key optional, saying I'll return it if I have it, but the client should handle both cases.

On namespaced keys

It may be beneficial if the function promised to only add new keys from a namespace the library authors control. This would avoid collisions with other keys that may be in the map.

Return Summary

  • Clients should ignore keys they don't need.
  • Prefer optional keys for return values. Promise as little as possible.