Service upgrades and versioning

Summary: We have many options when choosing how we deal with unknown data in our services and their clients. Each choice has consequences, though none is the right choice for all situations. In this article, we address the options systematically.

In today's environment of highly distributed service-oriented systems, we must address the concern of how the independently deployed services can be updated as requirements change without breaking the system. Each service is both a server (servicing many clients) and a client (connecting to many servers).

The main problem is uncoordinated deploys

One of the main advantages of a service-oriented architecture is that the services may be deployed independently. This feature allows the teams working on those services to act largely without coordination. In theory, this means each team can move faster.

When two services communicate, they must exchange data of a known format. The specification for that format is known as a schema. The client initiates a request of a given schema. And the server responds with data of a given schema.

The problem comes when a schema needs to change. If the client and server could upgrade simultaneously, there would be no problem. But because we want them to upgrade independently, one will inevitably deploy first. After the first one has updated but before the second one has updated, there is a time when they are using different schemas. Problems can arise.

Different problems arise depending on the choices we make for how they implement the schema. We'll have a very simple message with a simple schema. Our message looks like this:

{
  "required": "abc",
  "optional": "xyz",
  "unknown": "jkl"
}

There is one key required in the message called "required" for clarity. Likewise, there is one optional key called "optional" for the same reason. And there is one key that is not mentioned in the schema called "unknown". The values associated with the keys do not matter in this situation. We're talking only about the presence or absence of keys.

If a client sends a message to a server missing the "required" key, the server cannot proceed. That's what required means. The server has very little recourse but to respond with an error. The key must be in the message, and a message without the key is an error.

If a client sends a message without the "optional" key, that's okay. The server can continue with or without that value, usually because there is a default value the server can use. The key may or may not be in the message, and the server should process the message.

The "unknown" key is different. An unknown key does not show up in the schema, so the server has a choice. There are three options:

  1. Respond with an error
  2. Accept the message, but silently filter out the unknown key
  3. Accept the message, leaving the unknown key, and pass it along downstream

These options also exist for a client dealing with a server's response.

Although this is a simple message, it will give us enough to analyze. The "required" key stands in for any required key in your schema. Likewise for "optional" and "unknown".

In this article, we systematically go through these three different strategies for dealing with messages received by the server or by the client. And we will see how these options affect the upgradeability of a service. Our upgrades will consist of changing the status of the keys in the message. For instance, we could upgrade the request schema to change the "unknown" key from unknown to required.

Required to Optional

If the request's required key is changed to optional, this has no effect on the client. The server can make this change and update with no need of an update from the client. However, if the client changes to optional first, and the server still treats it as required, the client may send a request without the key. Therefore, servers should make this change and redeploy before notifying the clients to avoid deployment ordering problems.

If the response's required key is changed to optional, there is a problem. The client is expecting the key to be there. It may be unprepared for the previously required key to be missing. For this change to occur without errors, the client must be upgraded first. If there are many clients, it may be impossible to upgrade them all before the server is upgraded.

Recommendation: A request schema may change a required key to optional at any time without consequence, as long as the server is upgraded before any clients. The response schema can never change a required key to optional without significant expense.

One should further consider making more response keys optional than is customary.

Optional to Required

If the request's optional key is changed to required, the clients who do not send the key will need to upgrade before the server. Again, this is very difficult, if not impossible.

If the response's optional key is changed to required, the clients will not be affected. They should already be prepared to handle the presence of the key. However, the server will need to upgrade before the clients. If clients upgrade first, they may have a problem with responses that do not include the key.

Recommendation: A response schema may change an optional key to required at any time without consequence as long as the server is upgraded first. A request schema can never change an optional key to required without significant expense.

Strict strategy: Responding with an error

A very popular strategy is to disallow all unknown keys. Usually, the message is validated against the schema immediately upon receipt using a strict mode which disallows unknown keys. If it does not validate for any reason, an error is returned. We will call this strategy strict.

There are two main benefits to this strategy:

  1. It prevents unknown and unchecked data from entering the system. It reduces the security surface area.
  2. It detects errors early. Debugging is faster when errors occur closer to their causes.

Now let's see how this strategy deals with upgrades.

Required to unknown

Now that we're dealing with unknown keys, the "error on unknown" strategy comes into play.

If the request's required key is removed from the schema (so it becomes unknown), there's a real problem. If the server upgrades first, it will error on all requests from old clients (since it won't recognize the previously required key). If the client upgrades first, it will omit a key that the server still thinks is required.

The solution is a three-phase approach. It's costly, but if you can pull it off, it will work. The trick is to first change the required key to optional on the server. Then upgrade the clients over time to unknown. Because the key is optional on the server, it can handle old clients that are sending the key and new clients that are not. Finally, when all clients have been upgraded, you can upgrade the server to remove the key.

If the response's required key is removed from the schema, there will also be problems. If the strict client is upgraded first, the server will be sending messages which cause errors in the client. If the server is upgraded first, the client will fail due to missing keys that it still considers required.

The solution is a three-phase approach. First, upgrade the clients to make the key optional. Once enough clients are upgraded, upgrade the server to make the key unknown.

Optional to unknown

Again, the strict strategy comes into play.

If the request's optional key is removed from the schema, the order of the upgrades is important. If the server upgrades first, some clients will send the optional key, which causes an error on the server. If the client upgrades first, there is no problem, since the client will stop sending the key (now unknown), and the server is fine with that.

If you can ensure that all clients are upgraded, it will be safe to upgrade the server. One way to know that is to measure how many requests still send that key. Once three weeks pass where no requests are received that have that key, you should be safe. Or maybe yo u should wait three months. Who knows? There will always be a risk if you don't control all clients.

Otherwise, there are two options. One is to leave the key as optional, but ignore it on the server. Mark it as deprecated in the documentation. Old clients may continue to send it with no effect. And new clients are encouraged not to send it. Perhaps this strategy deserves a new schema mode: deprecated.

The other option is to change from a strict strategy to a filter strategy for this particular key. Then you can remove it from the schema.

If the response's optional key is removed from the schema, there is an easier upgrade path. If the client upgrades first, the client might throw an error when the server responds with the key. If the server upgrades first, there is not problem on the clients. In fact, the clients will never need to upgrade to the new schema.

Unknown to required

Sometimes you want to add a completely new key to the schema. How does this work?

If the request's unknown key is changed to required, we have upgrade problems. If the server is upgraded first, it will error due to missing required fields. If the client upgrades first, the server will error due to unknown fields.

The solution is to move unknown to optional on the server and update it first. Then, you begin updating the clients with the unknown key set to required. Once you're comfortable with the clients being updated, you can move to the server's schema set to required.

If the response's unknown key is changed to required, . If the server upgrades first, the strict client will error when it receives the key it doesn't recognize. If the client upgrades first, the client will error because a required key is not found.

The solution, again, is to move through the optional mode. If you make the unknown key optional on the client, once the clients are upgraded, the server can then set the key to required. Then you can upgrade the clients again to make it required.

These three-phase approaches are very expensive. They require lots of coordination between client and server teams. This is not always possible. And this coordination is one of the downsides of using the strict strategy.

Unknown to optional

Unknown to optional is easier than unknown to required.

If the request's unknown key is changed to optional, the upgrade order is important. If the client is upgraded first, it may start sending the key that the server does not know about yet, which results in an error. However, if the server is upgraded first, the clients won't be sending the key, which is totally allowed by the schema.

If the response's unknown key is changed to optional, the upgrade order is also important. If the server is upgraded first, it may send a key the strict clients are not able to handle yet. If the clients are upgraded first, they can handle the messages without the key until the server is upgraded. However, I should stress the difficulty of upgrading clients again.