The User Wizard Scenario
Summary: We start with a Haskell type that models a feature very well. As the feature changes, the data model eventually evolves into something like Clojure's hash maps. Discussion follows.
Here is a thought experiment I call "The User Wizard Scenario". In this scenario, we are on the back-end team of a web application. There is also a front-end team building the in-browser UI. Product managers bring us new requirements, which force changes to the data model. And we are coding in Haskell.
Here is one walk-through of the scenario, written as a narrative.
A simple start
The product manager of the application brought us a new requirement: we needed to store information about a user. Specifically, we needed:
userID
, an integer that uniquely identifies a userfirstName
, their first namelastName
, their last nameemail
, so that we can contact them with important account information
In Haskell, we described this type as follows:
data User = User { userID :: Int
, lastName :: Text
, firstName :: Text
, email :: Text
}
Saving to the database
We wrote a routine for saving this to the database. For simplicity in this example, we called it an IO effect.
saveUser :: User -> IO ()
But we noticed a problem: the userID
that was part of User
came from
the database. We wouldn't have a userID
until after the user's
information was stored. There were several options, but we settled on a
simple one. We created a new type that represented the user's
information before the information was saved to the database.
data UserInfo = UserInfo { uiFirstName :: Text, uiLastName :: Text, uiEmail :: Text }
saveUser :: UserInfo -> IO User
To use it, we first fill out a UserInfo
and pass it to saveUser
, which
stores it in the database and returns a User
.
Here was our first kind of volatility: even within a single workflow, data requirements vary. The front-end team created a form to capture the first name, last name, and email address which passed them to our server.
Multi-step wizard
After a few weeks, it turned out the form was not working. There was
only a 10% completion rate of the form, just because there were too many
fields shown at once. The product manager insisted we try a wizard to
get the information one step at a time. Each step of the wizard would
require a round trip to the server. We knew that by the end of the
wizard, we needed a complete UserInfo
. We came up with a few new types
to represent the progress along the way.
data UserInfoStep1 = UserInfoStep1 Text
data UserInfoStep2 = UserInfoStep2 Text Text
data UserInfoStep3 = UserInfoStep3 Text Text Text
And some operations for transitioning between them:
captureFirstName :: Text -> UserInfoStep1
captureLastName :: UserInfoStep1 -> Text -> UserInfoStep2
captureEmail :: UserInfoStep2 -> Text -> UserInfoStep3
saveUser :: UserInfoStep3 -> IO User
This request was a second type of volatility: changing requirements over time.
We coded up the three-step wizard and it worked well.
Changing the order of steps
But then the product team had a new request. They wanted to change the order of the steps. At first, we complied by simply changing the signatures of the functions and let the compiler's error messages guide us to complete the refactoring. We wound up with:
captureFirstName :: UserInfoStep1 -> Text -> UserInfoStep2
captureLastName :: Text -> UserInfoStep1
captureEmail :: UserInfoStep2 -> Text -> UserInfoStep3
This means the last name was captured first, then the first name, followed by the email. That worked and we were thankful for the compiler errors.
However, the next day, the product team asked for a different order. We anticipate these requests would go on for a while since there were six possible orders. So we tried to nip it all in the bud at once:
data UserInfo = NoInfoYet
| JustFirstName Text
| JustLastName Text
| JustEmail Text
| FirstAndLastNames Text Text
| FirstNameAndEmail Text Text
| LastNameAndEmail Text Text
| FirstAndLastNamesAndEmail Text Text Text
captureFirstName :: UserInfo -> Text -> UserInfo
captureFirstName ui fn =
case ui of
NoInfoYet -> JustFirstName fn
JustFirstName _ -> JustFirstName fn
JustLastName ln -> FirstAndLastNames fn ln
JustEmail em -> FirstNameAndEmail fn em
FistAndLastNames _ ln -> FirstAndLastNames fn ln
FirstNameAndEmail _ em -> FirstNameAndEmail fn em
LastNameAndEmail ln em -> FirstAndLAstNamesAndEmail fn ln em
FirstAndLastNamesAndEmail _ ln em -> FirstAndLastNamesAndEmail fn ln em
captureLastName :: UserInfo -> Text -> UserInfo
captureEmail :: UserInfo -> Text -> UserInfo
Of course, the compiler errors helped make this change. It made sure we covered
all cases. And it made sure we made the change everywhere that was required. But
the compiler could miss more bad situations now. There was a lot of room for
error in this intricate code. For example, it couldn't check that the UserInfo
was complete at compile time like it could before. Plus, the three functions had
identical signatures, and their code looked very similar. Surely, that was a bad
sign.
In general, a lot more of the work had been moved to runtime. This was
very evident with the change to saveUser
.
saveUser :: UserInfo -> IO User
saveUser ui =
case ui of
FirstAndLastNamesAndEmail fn ln em -> ...
_ -> error "Incomplete UserInfo record"
We now had to check at runtime that we had all of the data before we
could create a User
.
Someone on the team suggested we collapse the unknown order into
Maybe
s, like this:
data UserInfo = UserInfo { uiFirstName :: Maybe Text
, uiLastName :: Maybe Text
, uiEmail :: Maybe Text
}
We could see that this captured the same information, but more succinctly. The previous version had 8 cases. This one captured 8 as well, but instead of 8 separate cases, it combined 3 cases of 2 cases each, or 2^3 = 8 cases.
The captureX
functions looked better, too:
captureFirstName :: UserInfo -> Text -> UserInfo
captureFirstName ui fn = ui { uiFirstName = Just fn }
And the saveUser
function looked okay still. We just had to pattern
match where all fields were Just x
.
saveUser :: UserInfo -> IO User
saveUser ui =
case ui of
UserInfo { uiFirstName = Just fn
, uiLastName = Just ln
, uiEmail = Just em
} -> ...
_ -> error "Incomplete UserInfo record"
Well, that looked better, and so we went with that.
Adding new fields
The wizard was working well. Users could go back and forth between the steps, skip steps, and redo steps at will. But the product team had a new requirement: some people have middle names. We needed to capture those, too.
The obvious thing was to add a new field to the UserInfo
type:
data UserInfo = UserInfo { uiFirstName :: Maybe Text
, uiLastName :: Maybe Text
, uiEmail :: Maybe Text
, uiMiddleName :: Maybe Text
}
That was quick and easy. Maybe too quick! Because the feature was done so fast, the product team asked for the following fields shortly afterward:
- phone number
- mother's maiden name
- eye color
- hair color
- Social Security Number
- preferred language
This type was starting to look big again.
data UserInfo = UserInfo { uiFirstName :: Maybe Text
, uiLastName :: Maybe Text
, uiEmail :: Maybe Text
, uiMiddleName :: Maybe Text
, uiPhoneNumber :: Maybe Text
, uiMothersMaidenName :: Maybe Text
, uiEyeColor :: Maybe Text
, uiHairColor :: Maybe Text
, uiSSN :: Maybe Text
, uiPreferredLanguage :: Maybe Text
}
Goodness. With so many fields, we imagined all of the captureX
functions:
captureFirstName :: UserInfo -> Text -> UserInfo
captureLastName :: UserInfo -> Text -> UserInfo
...
And besides, who was to say they wouldn't ask for more next sprint.
Someone on the team came up with this idea: why didn't we make a type
called UserField
and use a Map
.
data UserField = FirstName
| LastName
| Email
| MiddleName
| PhoneNumber
| MothersMaidenName
| EyeColor
| HairColor
| SSN
| PreferredLanguage
type UserInfo = Map UserField Text
The optionality of the fields was represented by the presence or absence of a
key in the map, so we could get rid of the Maybe
s. And it let us collapse all
the different captureX
functions into one:
captureUserInfo :: UserInfo -> UserField -> Text -> UserInfo
captureUserInfo ui f s = insert f s ui
This worked great. The product managers added and removed fields, but
the turnaround time was small. We merely had to add or remove lines from
the UserInfo
type declaration.
A new field type
One day, we got a new requirement: we needed to store the user's number
of children. Our sense of ease told us to store it like everything else:
as a Text
. But a tingling sense of pessimism told us this wouldn't
be the last new type we would need to store. Would we really parse
fields as we needed them?
Someone suggested a new approach (yet another!). We would create a type that captured the field name and the type it required, like so:
data UserField = FirstName Text
| LastName Text
| Email Text
| MiddleName Text
| PhoneNumber Text
| MothersMaidenName Text
| EyeColor Text
| HairColor Text
| SSN Text
| PreferredLanguage Text
| NumberOfChildren Int
Then we could store them in a list:
type UserInfo = [UserField]
We could deal with duplicates at runtime. There was a certain appeal to it.
Someone else fielded a different idea: create a UserValue
type that
could hold either a string or an integer.
data UserValue = UVText Text | UVInt Int
type UserInfo = Map UserField UserValue
This one didn't sit so well. We were exponentiating the number of cases that had
an incorrect type. For instance, if we had a FirstName
associated with the
value UVInt 3
, it was allowed by this type, even if it wouldn't make sense in
our model. We would have to limit that at runtime, just like we dealt with
duplicates at runtime in the List solution.
Both solutions presented runtime problems. The first had the problem of duplicates, which seemed tractable. The second had the problem of incorrect types, which threw out much of the help the type system could provide. We were about to go with the list solution and deal with duplicates when a big request came in from the front-end team.
Front-end freedom
The front-end team was tired of asking us, the back-end team, for each and every field they wanted to add. They wanted to add, remove, and modify fields faster. They were growing the app very quickly and needed the ability to test out many new wizard steps. We, the back-end team, were the slowest bottleneck. We required a description of the field and its type, and it would be deployed in the next release after we got to it. The front-end team needed to cycle faster than that.
We recognized our own annoyance that the front-end team would often
change their minds halfway through the sprint. They would request a
BirthYear
field as an Int
on Monday, but by Wednesday it was changed
to BirthDate
as Date
. We couldn't keep up. We thought they were just
being wishy-washy. But they were right. We couldn't adapt fast enough.
The requirement was clear: allow the front end to specify field names and values of any type. The back end would accept whatever value was passed to it from the front end and return validation results.
Here are the new types:
data UserField = UserField Text
data UserValue = UVText Text
| UVInt Int
| UVDate Date
| ...
type UserInfo = Data.Map UserField UserValue
This allowed the front-end team to quickly try out new field names. And most of the code we had in the back end was drastically simplified. And occasionally they would ask us to add a type. The cost was a bit of type safety for this particular part of the application.
But note that at this point, we were only slightly better off than JSON and the Aeson data type for it. We considered going that route, but we liked having a custom type for it.
There were some type errors sometimes, so we eventually added runtime type validation.
Our first pass at validation was very simple:
data UserValueType = UVTText
| UVTInt
| UVTDate
validT
ype? :: UserValue -> UserValueType -> Bool
validType? (UVText _) UVTText = true
validType? (UVInt _) UVTInt = true
validType? (UVDate _) UVTDate = true
validType? _ _ = false
We could store the types of fields in a map, like this:
fieldTypes :: Map UserField UserValueType
fieldTypes = fromList [(UserField "First Name", UVTText)
,(UserField "Last Name", UVTText)
,...
]
isValid :: UserField -> UserValue -> Bool
isValid uf uv =
case lookup uf fieldTypes of
Just ut -> validType? uv ut
_ -> choose one
— true — no type defined means no restrictions
— false — no type defined means not allowed
- error "Field not found" — no type defined should not happen
Discussion
We started with a very small, easy-to-type situation. Volatility caused
the complexity of the model to balloon quickly. We had a lot of unknowns
(optional fields represented with Maybe
). We began to rely more on
runtime checks. Then when we added the need for flexibility to add and
remove fields of different types, even more of the safety was moved to
runtime. Until, finally, we were left with a model of data very much
like what we find in Clojure---a map of string-like keys to values of
any type. We then devised a very simple first version of a kind of
runtime type check like Clojure Spec.
After all these steps, we landed on the very model that Clojure starts with. Is this a vindication of the Clojure way? No. However, it does explain the kind of situation a Clojure program thrives in. Clojure's design assumes that things often get to this point anyway, so why not start there? When that assumption is true, Clojure is a great tool. I'll state the design premise like this: sufficiently volatile data prefers a flexible model with optional runtime checks. And often, things are sufficiently volatile.
Clojure's data model handles volatility very well. And the language, with its huge library of data-oriented functions, always feels like it has foreseen issues you will run into. These are the parts of the language I miss most when I use a statically typed language. I feel like I eventually rebuild a significant part of Clojure anyway.
But we can also see the other side. There are lots of situations that never get to where we wound up. Many parts of the code don't have so much volatility and it's nice to have the static type system on our side. Clojure doesn't allow us to express those situations. Clojure only gives us the flexible model, which we have to use even when we're not dealing with such volatility. In those times, I miss the help of static typing.
Here's the thing: any significant program will have to model things across a spectrum of volatility. We would like to be able to use static typing for areas of low volatility and switch to a data-oriented, dynamic model for areas of high volatility. We need to see that it is not an either-or situation. We should bridge the gap and demand both. I'd love to see Clojure develop a static-type system, and I'd love to see Haskell have better facilities for dealing with uncertain data.
Bridging the gap: ideas
Of course, when I mention type systems for Clojure, many people will be curious about Core Typed's role. I think Core Typed is a valiant effort. It wants to type idiomatic Clojure. That means it wants to type the dynamic stuff that is useful for highly volatile data. It would be great if we could reduce errors in the code dealing with that. However, it is more valuable to type the stuff dealing with less volatile data. That is, things that Haskell is better at than Clojure. I propose a type system, stricter than Core Typed, that can be used in Clojure on select sections of code that are the least volatile. The type system would lock them down further, so it should only be used on the most stable parts of the code.
Of course, I am not well-versed enough in Haskell to say what it needs to handle the volatile data that Clojure excels at. However, I will open a line of discussion. I imagine an algebraic data type (ADT) that represents edn, similar to the ADT that represents JSON. And then a suite of functions that operate on it in a unitype way. That is, they are partial functions that are accepted by the community as such. It somehow works for Clojure, so why not for the most volatile data that Haskell handles? That unitype ethos, coupled with the "open map" ethos of Clojure, could bridge the gap.
I'd love to hear your constructive thoughts on this.