The User Wizard Scenario

Summary: We start with a Haskell type that models a feature very well. As the feature changes, the data model eventually evolves into something like Clojure's hash maps. Discussion follows.

Here is a thought experiment I call "The User Wizard Scenario". In this scenario, we are on the back-end team of a web application. There is also a front-end team building the in-browser UI. Product managers bring us new requirements, which force changes to the data model. And we are coding in Haskell.

Here is one walk-through of the scenario, written as a narrative.

A simple start

The product manager of the application brought us a new requirement: we needed to store information about a user. Specifically, we needed:

  • userID, an integer that uniquely identifies a user
  • firstName, their first name
  • lastName, their last name
  • email, so that we can contact them with important account information

In Haskell, we described this type as follows:

    data User = User { userID    :: Int
                     , lastName  :: Text
                     , firstName :: Text
                     , email     :: Text
                     }

Saving to the database

We wrote a routine for saving this to the database. For simplicity in this example, we called it an IO effect.

    saveUser :: User -> IO ()

But we noticed a problem: the userID that was part of User came from the database. We wouldn't have a userID until after the user's information was stored. There were several options, but we settled on a simple one. We created a new type that represented the user's information before the information was saved to the database.

    data UserInfo = UserInfo { uiFirstName :: Text, uiLastName :: Text, uiEmail :: Text }

    saveUser :: UserInfo -> IO User

To use it, we first fill out a UserInfo and pass it to saveUser, which stores it in the database and returns a User.

Here was our first kind of volatility: even within a single workflow, data requirements vary. The front-end team created a form to capture the first name, last name, and email address which passed them to our server.

Multi-step wizard

After a few weeks, it turned out the form was not working. There was only a 10% completion rate of the form, just because there were too many fields shown at once. The product manager insisted we try a wizard to get the information one step at a time. Each step of the wizard would require a round trip to the server. We knew that by the end of the wizard, we needed a complete UserInfo. We came up with a few new types to represent the progress along the way.

    data UserInfoStep1 = UserInfoStep1 Text
    data UserInfoStep2 = UserInfoStep2 Text Text
    data UserInfoStep3 = UserInfoStep3 Text Text Text

And some operations for transitioning between them:

    captureFirstName ::                  Text -> UserInfoStep1
    captureLastName  :: UserInfoStep1 -> Text -> UserInfoStep2
    captureEmail     :: UserInfoStep2 -> Text -> UserInfoStep3

    saveUser :: UserInfoStep3 -> IO User

This request was a second type of volatility: changing requirements over time.

We coded up the three-step wizard and it worked well.

Changing the order of steps

But then the product team had a new request. They wanted to change the order of the steps. At first, we complied by simply changing the signatures of the functions and let the compiler's error messages guide us to complete the refactoring. We wound up with:

    captureFirstName :: UserInfoStep1 -> Text -> UserInfoStep2
    captureLastName  ::                  Text -> UserInfoStep1
    captureEmail     :: UserInfoStep2 -> Text -> UserInfoStep3

This means the last name was captured first, then the first name, followed by the email. That worked and we were thankful for the compiler errors.

However, the next day, the product team asked for a different order. We anticipate these requests would go on for a while since there were six possible orders. So we tried to nip it all in the bud at once:

    data UserInfo = NoInfoYet
                  | JustFirstName Text
                  | JustLastName  Text
                  | JustEmail     Text
                  | FirstAndLastNames Text Text
                  | FirstNameAndEmail Text Text
                  | LastNameAndEmail  Text Text
                  | FirstAndLastNamesAndEmail Text Text Text

    captureFirstName :: UserInfo -> Text -> UserInfo
    captureFirstName ui fn =
      case ui of
        NoInfoYet -> JustFirstName fn
        JustFirstName _ -> JustFirstName fn
        JustLastName ln -> FirstAndLastNames fn ln
        JustEmail em    -> FirstNameAndEmail fn em
        FistAndLastNames  _ ln -> FirstAndLastNames fn ln
        FirstNameAndEmail _ em -> FirstNameAndEmail fn em
        LastNameAndEmail ln em -> FirstAndLAstNamesAndEmail fn ln em
        FirstAndLastNamesAndEmail _ ln em -> FirstAndLastNamesAndEmail fn ln em

    captureLastName  :: UserInfo -> Text -> UserInfo
    captureEmail     :: UserInfo -> Text -> UserInfo

Of course, the compiler errors helped make this change. It made sure we covered all cases. And it made sure we made the change everywhere that was required. But the compiler could miss more bad situations now. There was a lot of room for error in this intricate code. For example, it couldn't check that the UserInfo was complete at compile time like it could before. Plus, the three functions had identical signatures, and their code looked very similar. Surely, that was a bad sign.

In general, a lot more of the work had been moved to runtime. This was very evident with the change to saveUser.

    saveUser :: UserInfo -> IO User
    saveUser ui =
      case ui of
        FirstAndLastNamesAndEmail fn ln em -> ...
        _ -> error "Incomplete UserInfo record"

We now had to check at runtime that we had all of the data before we could create a User.

Someone on the team suggested we collapse the unknown order into Maybes, like this:

    data UserInfo = UserInfo { uiFirstName :: Maybe Text
                             , uiLastName  :: Maybe Text
                             , uiEmail     :: Maybe Text
                             }

We could see that this captured the same information, but more succinctly. The previous version had 8 cases. This one captured 8 as well, but instead of 8 separate cases, it combined 3 cases of 2 cases each, or 2^3 = 8 cases.

The captureX functions looked better, too:

    captureFirstName :: UserInfo -> Text -> UserInfo
    captureFirstName ui fn = ui { uiFirstName = Just fn }

And the saveUser function looked okay still. We just had to pattern match where all fields were Just x.

    saveUser :: UserInfo -> IO User
    saveUser ui =
      case ui of
        UserInfo { uiFirstName = Just fn
                 , uiLastName  = Just ln
                 , uiEmail     = Just em
                 }                         -> ...
        _ -> error "Incomplete UserInfo record"

Well, that looked better, and so we went with that.

Adding new fields

The wizard was working well. Users could go back and forth between the steps, skip steps, and redo steps at will. But the product team had a new requirement: some people have middle names. We needed to capture those, too.

The obvious thing was to add a new field to the UserInfo type:

    data UserInfo = UserInfo { uiFirstName  :: Maybe Text
                             , uiLastName   :: Maybe Text
                             , uiEmail      :: Maybe Text
                             , uiMiddleName :: Maybe Text
                             }

That was quick and easy. Maybe too quick! Because the feature was done so fast, the product team asked for the following fields shortly afterward:

  • phone number
  • mother's maiden name
  • eye color
  • hair color
  • Social Security Number
  • preferred language

This type was starting to look big again.

    data UserInfo = UserInfo { uiFirstName         :: Maybe Text
                             , uiLastName          :: Maybe Text
                             , uiEmail             :: Maybe Text
                             , uiMiddleName        :: Maybe Text
                             , uiPhoneNumber       :: Maybe Text
                             , uiMothersMaidenName :: Maybe Text
                             , uiEyeColor          :: Maybe Text
                             , uiHairColor         :: Maybe Text
                             , uiSSN               :: Maybe Text
                             , uiPreferredLanguage :: Maybe Text
                             }

Goodness. With so many fields, we imagined all of the captureX functions:

    captureFirstName :: UserInfo -> Text -> UserInfo
    captureLastName  :: UserInfo -> Text -> UserInfo
    ...

And besides, who was to say they wouldn't ask for more next sprint. Someone on the team came up with this idea: why didn't we make a type called UserField and use a Map.

    data UserField = FirstName
                   | LastName
                   | Email
                   | MiddleName
                   | PhoneNumber
                   | MothersMaidenName
                   | EyeColor
                   | HairColor
                   | SSN
                   | PreferredLanguage

    type UserInfo = Map UserField Text

The optionality of the fields was represented by the presence or absence of a key in the map, so we could get rid of the Maybes. And it let us collapse all the different captureX functions into one:

    captureUserInfo :: UserInfo -> UserField -> Text -> UserInfo
    captureUserInfo ui f s = insert f s ui

This worked great. The product managers added and removed fields, but the turnaround time was small. We merely had to add or remove lines from the UserInfo type declaration.

A new field type

One day, we got a new requirement: we needed to store the user's number of children. Our sense of ease told us to store it like everything else: as a Text. But a tingling sense of pessimism told us this wouldn't be the last new type we would need to store. Would we really parse fields as we needed them?

Someone suggested a new approach (yet another!). We would create a type that captured the field name and the type it required, like so:

    data UserField = FirstName         Text
                   | LastName          Text
                   | Email             Text
                   | MiddleName        Text
                   | PhoneNumber       Text
                   | MothersMaidenName Text
                   | EyeColor          Text
                   | HairColor         Text
                   | SSN               Text
                   | PreferredLanguage Text
                   | NumberOfChildren  Int

Then we could store them in a list:

    type UserInfo = [UserField]

We could deal with duplicates at runtime. There was a certain appeal to it.

Someone else fielded a different idea: create a UserValue type that could hold either a string or an integer.

    data UserValue = UVText Text | UVInt Int

    type UserInfo = Map UserField UserValue

This one didn't sit so well. We were exponentiating the number of cases that had an incorrect type. For instance, if we had a FirstName associated with the value UVInt 3, it was allowed by this type, even if it wouldn't make sense in our model. We would have to limit that at runtime, just like we dealt with duplicates at runtime in the List solution.

Both solutions presented runtime problems. The first had the problem of duplicates, which seemed tractable. The second had the problem of incorrect types, which threw out much of the help the type system could provide. We were about to go with the list solution and deal with duplicates when a big request came in from the front-end team.

Front-end freedom

The front-end team was tired of asking us, the back-end team, for each and every field they wanted to add. They wanted to add, remove, and modify fields faster. They were growing the app very quickly and needed the ability to test out many new wizard steps. We, the back-end team, were the slowest bottleneck. We required a description of the field and its type, and it would be deployed in the next release after we got to it. The front-end team needed to cycle faster than that.

We recognized our own annoyance that the front-end team would often change their minds halfway through the sprint. They would request a BirthYear field as an Int on Monday, but by Wednesday it was changed to BirthDate as Date. We couldn't keep up. We thought they were just being wishy-washy. But they were right. We couldn't adapt fast enough.

The requirement was clear: allow the front end to specify field names and values of any type. The back end would accept whatever value was passed to it from the front end and return validation results.

Here are the new types:

    data UserField = UserField Text
    data UserValue = UVText Text
                   | UVInt Int
                   | UVDate Date
                   | ...
    type UserInfo = Data.Map UserField UserValue

This allowed the front-end team to quickly try out new field names. And most of the code we had in the back end was drastically simplified. And occasionally they would ask us to add a type. The cost was a bit of type safety for this particular part of the application.

But note that at this point, we were only slightly better off than JSON and the Aeson data type for it. We considered going that route, but we liked having a custom type for it.

There were some type errors sometimes, so we eventually added runtime type validation.

Our first pass at validation was very simple:

    data UserValueType = UVTText
                       | UVTInt
                       | UVTDate

    validT
ype? :: UserValue -> UserValueType -> Bool
    validType? (UVText _) UVTText = true
    validType? (UVInt  _) UVTInt    = true
    validType? (UVDate _) UVTDate   = true
    validType? _          _         = false

We could store the types of fields in a map, like this:

    fieldTypes :: Map UserField UserValueType
    fieldTypes = fromList [(UserField "First Name", UVTText)
                          ,(UserField "Last Name",  UVTText)
                          ,...
                          ]

    isValid :: UserField -> UserValue -> Bool
    isValid uf uv =
      case lookup uf fieldTypes of
        Just ut -> validType? uv ut
        _       -> choose one
                   — true                    — no type defined means no restrictions
                   — false                   — no type defined means not allowed
                   - error "Field not found" — no type defined should not happen

Discussion

We started with a very small, easy-to-type situation. Volatility caused the complexity of the model to balloon quickly. We had a lot of unknowns (optional fields represented with Maybe). We began to rely more on runtime checks. Then when we added the need for flexibility to add and remove fields of different types, even more of the safety was moved to runtime. Until, finally, we were left with a model of data very much like what we find in Clojure---a map of string-like keys to values of any type. We then devised a very simple first version of a kind of runtime type check like Clojure Spec.

After all these steps, we landed on the very model that Clojure starts with. Is this a vindication of the Clojure way? No. However, it does explain the kind of situation a Clojure program thrives in. Clojure's design assumes that things often get to this point anyway, so why not start there? When that assumption is true, Clojure is a great tool. I'll state the design premise like this: sufficiently volatile data prefers a flexible model with optional runtime checks. And often, things are sufficiently volatile.

Clojure's data model handles volatility very well. And the language, with its huge library of data-oriented functions, always feels like it has foreseen issues you will run into. These are the parts of the language I miss most when I use a statically typed language. I feel like I eventually rebuild a significant part of Clojure anyway.

But we can also see the other side. There are lots of situations that never get to where we wound up. Many parts of the code don't have so much volatility and it's nice to have the static type system on our side. Clojure doesn't allow us to express those situations. Clojure only gives us the flexible model, which we have to use even when we're not dealing with such volatility. In those times, I miss the help of static typing.

Here's the thing: any significant program will have to model things across a spectrum of volatility. We would like to be able to use static typing for areas of low volatility and switch to a data-oriented, dynamic model for areas of high volatility. We need to see that it is not an either-or situation. We should bridge the gap and demand both. I'd love to see Clojure develop a static-type system, and I'd love to see Haskell have better facilities for dealing with uncertain data.

Bridging the gap: ideas

Of course, when I mention type systems for Clojure, many people will be curious about Core Typed's role. I think Core Typed is a valiant effort. It wants to type idiomatic Clojure. That means it wants to type the dynamic stuff that is useful for highly volatile data. It would be great if we could reduce errors in the code dealing with that. However, it is more valuable to type the stuff dealing with less volatile data. That is, things that Haskell is better at than Clojure. I propose a type system, stricter than Core Typed, that can be used in Clojure on select sections of code that are the least volatile. The type system would lock them down further, so it should only be used on the most stable parts of the code.

Of course, I am not well-versed enough in Haskell to say what it needs to handle the volatile data that Clojure excels at. However, I will open a line of discussion. I imagine an algebraic data type (ADT) that represents edn, similar to the ADT that represents JSON. And then a suite of functions that operate on it in a unitype way. That is, they are partial functions that are accepted by the community as such. It somehow works for Clojure, so why not for the most volatile data that Haskell handles? That unitype ethos, coupled with the "open map" ethos of Clojure, could bridge the gap.

I'd love to hear your constructive thoughts on this.