Functional Programming in Practice: Validation with Applicative Functors
deOur first article in the series „Functional Programming in Practice“ on the topic of data validation with applicative functors.
Welcome to our first article in the series „Functional Programming in Practice“. In this series, we present concepts from functional programming using real examples and show how we actually apply them in practice every day.
Today we‘ll focus on data validation and show how applicative functors help us systematically collect and process inconsistencies in data.
Validating JSON
Web services like to communicate with each other using JSON objects
that end up in our application from users and other services. It‘s no
secret that the JSON format is limited by its narrow range of
types. In practice, this means that complex data types are usually
serialized as strings. Additionally, we often have specific
requirements for the form of the concrete data: a "username" should
not be empty, an "email" should contain at least an "@" symbol,
and serialized values must match our expectations to be converted into
internal data types.
For the following example, we‘ll work with this data analysis. A user consists of:
- a name
- an email address
- a role
A role is one of the following:
- a developer
- an admin
- a bug reporter
For the implementation, we‘ll use the Haskell programming language. It could look like this:
data UserRole = ADMIN | DEVELOPER | REPORTER
data User = { userName :: String
, userEmail :: String
, userRole :: UserRole }
-- Examples
marco = User "Marco" "marco.schneider@active-group.de" DEVELOPER
simon = User "Simon" "simon.haerer.at.active-group.de" DEVELOPER
The two examples in the code immediately show a problem: while we know that Simon‘s email address isn‘t valid, if we read it directly from a JSON object, nothing stops us from ignoring such an invariant („every email address contains an ‚@‘ symbol“). So we need to validate the data somehow before it penetrates further into the system.
We could, of course, write a constructor for User that catches such
errors. We‘ll use Either a b = Left a | Right b for the
result. Here:
- An error is signaled with
Left a - A success is signaled with
Right b
type Error = String
makeUser :: String -> String -> Role -> Either Error User
makeUser name email role
| '@' `elem` email = Right (User name email role)
| otherwise = Left "not a valid email"
Great, problem solved. If everything goes smoothly, we get a Right
User, otherwise a Left Error. But wait, what if the username is
empty? A second attempt:
type Error = String
makeUser :: String -> String -> Role -> Either Error User
makeUser name email role
| null name = Left "not a nonempty string"
| '@' `elem` email = Right (User name email role)
| otherwise = Left "not a valid email"
That‘s a bit better. But what if both the name is empty and the
email doesn‘t contain a '@'? After all, we want to know all the
errors, not just the first one. But that also means we need a
different signature (with a list of errors for Left).
type Error = String
type Errors = [Error]
makeUser :: String -> String -> UserRole -> Either Errors User
makeUser name email role =
let nameOk = not (null name)
emailOk = '@' `elem` email
in
if nameOk && emailOk
then Right (User name email role)
else
if nameOk && not emailOk
then Left ["not a valid email"]
else
if not nameOk && emailOk
then Left ["not a nonempty string"]
else Left ["not a valid email" "not a nonempty string"]
Well, that‘s not really satisfying. Moreover, the number of cases we need to check in increasingly nested branches explodes. This won‘t work.
However - I think - we‘ve learned something about what we actually want:
- We want to be able to perform each validation individually
- We want to be able to combine multiple validations
- We want to write a validation function that returns a validated result exactly when everything is fine or a list of all errors if something is wrong.
Validation Functions
To give this whole thing some structure, let‘s first define our own data type that corresponds to our validation results. Such a result is one of the following:
- A successful validation (
Ok <value>) of a value, or - a list of errors that occurred during validation (
Fail Errors). Why a list? Imagine validating a web form with more than one form field. It would be annoying to only be told about one error at a time, only to see the next error after correcting it. We‘d rather have all errors at once.
type Errors = [String]
data Validation c = Ok c
| Fail Errors
Our Validation represents the result of a validation. It‘s
parameterized over the type of a candidate value c. This allows us
to specify validations for values of any type.
Next, we‘ll write a few functions that represent our validations from above:
-- | Validate that `candidate` is not the empty string.
validateNonemptyString :: String -> Validation String
validateNonemptyString candidate
| null candidate = Fail ["not a nonempty string"]
| otherwise = Ok candidate
-- | Validate that a string represents an email.
validateEmail :: String -> Validation String
validateEmail candidate
| '@' `elem` candidate = Ok candidate
| otherwise = Fail ["not an email"]
-- | Validate that a `candidate` represents a `UserRole`.
validateUserRole :: String -> Validation UserRole
validateUserRole candidate
| candidate == "ADMIN" = Ok ADMIN
| candidate == "DEVELOPER" = Ok DEVELOPER
| candidate == "REPORTER" = Ok REPORTER
| otherwise = Fail ["not a role"]
With this, we can already perform individual validations:
validateNonemptyString "ok"
-- Ok "ok" : Validation String
validateNonemptyString ""
-- Fail ["not a nonempty string"] : Validation String
validateEmail "marco.schneider@active-group.de"
-- Ok "marco.schneider@active-group.de" : Validation String
validateEmail "marco.schneider.at.active-group.de"
-- Fail ["not a valid email"] : Validation String
validateUserRole "ADMIN"
-- Ok ADMIN : Validation UserRole
validateUserRole "DEVELOPER"
-- Ok DEVELOPER : Validation UserRole
validateUserRole "unknown"
-- Fail ["not a valid role"]
Point one (perform each validation individually) is done now. Now, however, we want to combine multiple results with each other.
Combining Results
Let‘s first look at combining two error cases: since both errors consist of lists of error messages, we can concatenate them and retain all information. That sounds good!
combineValidations :: Validation a -> Validation a -> Validation a
combineValidations (Fail es) (Fail fs) = Fail (es ++ fs)
The Haskell compiler is, of course, still unsatisfied: what if on the
left, right, or even in both positions of the function there‘s an
Ok?
Our third requirement demands: Either a valid result or all
errors. With just one Fail on the left side, we already know it can
only result in a Fail, even if there‘s a success on the right. So in
that case, we simply ignore the right side:
combineValidations :: Validation a -> Validation a -> Validation a
combineValidations (Fail es) (Fail fs) = Fail (es ++ fs) -- from above
-- When there's an error on the left and a success on the right,
-- we're no longer interested in the success:
combineValidations (Fail es) _ = Fail es
Okay, we‘ve covered two of four cases - so far, so simple. To make the Haskell compiler happy, we still need two more cases:
- Success on the left, error on the right
- Success on the left, success on the right
This might sound strange at first, but: we functional programmers
don‘t have an awful lot of tools at our disposal. We can feed
functions with arguments and see what the result is (or in the case of
Haskell, keep trying until the compiler gives its okay). So we don‘t
have much choice at this point but to simply pretend that on the left
there‘s an Ok whose value is a function that knows what to do with
the value on the right. If we accept that for a moment, we can write a
little helper function for it.
applyOkFunctionToValidation :: (a -> b) -> Validation a -> Validation b
-- Well, a failure remains a failure,
applyOkFunctionToValidation _ (Fail es) = Fail es
-- However, we'd like to keep a success. So we take what's
-- wrapped in the success and apply `f` to it. If you look at
-- the type signature, there's not much else left (except for the trivial solution).
applyOkFunctionToValidation f (Ok c) = Ok (f c)
What does this say? Assuming we get a function from an Ok on the
left side, we can use applyOkFunctionToValidation to apply the value
on the right side to it. A Fail on the right side remains an error
(so it propagates along the computation in a way) and remains
unchanged. An Ok results in another Ok by applying the f wrapped
in the left Ok to it. This way, we‘ve made one success out of two.
Now we could almost use applyOkFunctionToValidation in
combineValidations, if it weren‘t for its type signature Validation
c -> Validation c -> Validation c getting in the way - we need a
Validation (a -> b) -> Validation a -> Validation b. But if you‘ve paid
attention, you noticed: we haven‘t actually used the a in the
signature yet (so far, we‘ve only been interested in the error, which
has nothing to do with a). Thus, we can change the signature without
any problem. With that, combineValidations is finished and can be
used.
applyOkFunctionToValidation :: (a -> b) -> Validation a -> Validation b
applyOkFunctionToValidation _ (Fail es) = Fail es
applyOkFunctionToValidation f (Ok c) = Ok (f c)
combineValidations :: Validation (a -> b) -> Validation a -> Validation b
combineValidations (Fail es) (Fail fs) = Fail (es ++ fs)
combineValidations (Fail es) _ = Fail es
combineValidations (Ok f) v = applyOkFunctionToValidation f v
Isn‘t Something Missing?
Yes, kind of. Unfortunately, we can‘t just use our validation
functions defined above as the first argument for combineValidations
anymore - we want a function in the Ok. To get rid of that problem
too, we‘ll take one last detour.
Imagine we have a validation on the left that, combined with a
validation on the right, always returns the right result. Haskell
already provides the function id : a -> a, whose result is
always exactly its argument. If we wrap that in an Ok, it looks like
this:
(Ok id) `combineValidations` validateNonemptyString ""
-- Fail ["not a nonempty string"]
(Ok id) `combineValidations` validateNonemptyString "Marco"
-- Ok "Marco"
Does that help us? A bit. It doesn‘t really matter which function is
in the Ok. So we can quite generally turn any function into a
validation:
makeValidation x = Ok x
makeValidation reverse `combineValidations` validateNonemptyString "Marco"
-- Ok "ocraM"
Doesn‘t help much, but it works. Although, it does help somehow, because quite inconspicuously a solution for points two and three from above has crept in here.
makeValidation User
`combineValidations` (validateNonemptyString "Marco")
`combineValidations` (validateEmail "marco.schneider@active-group.de")
`combineValidations` (validateUserRole "DEVELOPER")
-- Ok (User {userName = "Marco", userEmail = "marco.schneider@active-group.de", userRole = ADMIN})
If you don‘t believe it, feel free to check the error cases:
-- Everything wrong
makeValidation User
`combineValidations` (validateNonemptyString "")
`combineValidations` (validateEmail "marco.schneider.at.active-group.de")
`combineValidations` (validateUserRole "DEVELOPE")
-- Fail ["not a nonempty string","not an email","not a role"]
-- Partially wrong
makeValidation User
`combineValidations` (validateNonemptyString "Marco")
`combineValidations` (validateEmail "marco.schneider.at.active-group.de")
`combineValidations` (validateUserRole "DEVELOPE")
-- Fail ["not an email","not a role"]
This probably looks magical at first: why does makeValidation User
suddenly become a validation function when it only wraps the original
constructor? There‘s no way around it, we need to briefly look at how
to imagine the individual substitutions (important: this is only for
illustration and doesn‘t really correspond to the execution
machinery). I‘ll separate the representation of the execution steps
with =>:
makeValidation User
`combineValidations` (validateNonemptyString "")
`combineValidations` (validateEmail "marco.schneider@active-group.de")
`combineValidations` (validateUserRole "DEVELOPER")
=> Inserting the definition of `User`
(Ok (\name email role -> User name admin role))
`combineValidations` (validateNonemptyString "Marco")
`combineValidations` (validateEmail "marco.schneider@active-group.de")
`combineValidations` (validateUserRole "DEVELOPER")
=> Result of the first right side
(Ok (\name email role -> User name admin role))
`combineValidations` (Ok "Marco")
`combineValidations` (validateEmail "marco.schneider@active-group.de")
`combineValidations` (validateUserRole "DEVELOPER")
=> Inserting the first result into the left side
(Ok (\email role -> User "Marco" admin role))
`combineValidations` (validateEmail "marco.schneider@active-group.de")
`combineValidations` (validateUserRole "DEVELOPER")
=> Now this repeats for the remaining arguments
(Ok (\email role -> User "Marco" admin role))
`combineValidations` (Ok "marco.schneider@active-group.de")
`combineValidations` (validateUserRole "DEVELOPER")
=>
(Ok (\role -> User "Marco" "marco.schneider@active.group" role))
`combineValidations` (validateUserRole "DEVELOPER")
=>
(Ok (\role -> User "Marco" "marco.schneider@active.group" role))
`combineValidations` (Ok DEVELOPER)
=>
(Ok (User "Marco" "marco.schneider@active.group" DEVELOPER))
Presto! Everything is fine. Analogously for the case when errors occur (we‘ll jump in where it gets interesting):
(Ok (\email role -> User "Marco" admin role))
`combineValidations` (validateEmail "marco.schneider.at.active-group.de")
`combineValidations` (validateUserRole "DEVELOPER")
=>
(Ok (\email role -> User "Marco" admin role))
`combineValidations` (Fail ["not an email"])
`combineValidations` (validateUserRole "DEVELOPER")
=> We remember from above: errors on the right consume `Ok`s on the left
(Fail ["not an email"])
`combineValidations` (validateUserRole "dummy")
=> And once we have the error, we just collect the remaining errors together.
(Fail ["not an email"])
`combineValidations` (Fail ["not a role"])
=>
(Fail ["not an email" "not a role"])
What helps us here is that in Haskell functions can be partially applied. That means a function with three arguments, when we apply one argument, results in a function with two arguments. In languages where this isn‘t the case (like Clojure, for example), we have to work a bit harder.
With that, we actually have everything we need to write our validation function:
validateUser :: String -> String -> String -> Validation User
validateUser name email role =
makeValidation User
`combineValidations` validateNonemptyString name
`combineValidations` validateEmail email
`combineValidations` validateUserRole role
The official part is done, because our three criteria from above are fulfilled. We want to:
- be able to perform each validation individually -> we write
functions that have
Validations as results - chain multiple validations together -> with
combineValidations - write a validation function that returns a validated result when
everything is fine or a list of all errors if something is
wrong → with
validateUser
The practical part is now complete. We‘ve looked at how we can write composable validation functions for arbitrary data types, how to link them together, and be sure that we really capture all errors.
However, if you‘re wondering what‘s so incredibly applicative about this and where the functor is hiding, you can continue enjoying yourself below.
Who or What is an Applicative Functor Here?
As functional programmers, we‘re always interested in finding more general properties and structures in our code (or designing it that way from the start). An example of our approach at home is the „Find-the-Functor-Game“™, because: where a functor is hiding, there might also be an applicative functor, there might also be a monad. Or generally: if you find a bit of structure, there‘s usually more.
Those who already know what makes a functor in programming have already spotted it above. In Haskell, you write it as a type class like this:
class Functor f where
fmap :: (a -> b) -> f a -> f b
To avoid falling into metaphors about what a functor is, I‘d rather
show where the functor was hiding above. If we look at the signature
of fmap, it looks suspiciously like applyOkFunctionToValidation!
fmap :: (a -> b) -> f a -> f b
applyOkFunctionToValidation :: (a -> b) -> Validation a -> Validation b
So we could quite generally say that our Validation is a functor:
instance Functor Validation where
fmap _ (Fail es) = Fail es
fmap f (Ok c) = Ok (f c)
-- or more simply
-- fmap = applyOkFunctionToValidation
Armed with this knowledge, we can make a first small change to the code from above.
combineValidations :: Validation (a -> b) -> Validation a -> Validation b
combineValidations (Fail es) (Fail fs) = Fail (es ++ fs)
combineValidations (Fail es) _ = Fail es
combineValidations (Ok f) v = fmap f v
That doesn‘t look like much, but it‘s already quite a bit. Our use of
fmap at this point (and of course the implementation of Functor)
signals to all readers that:
- certain laws apply here (please don‘t miss the warning further below about laws)
- all code from all libraries that know how to do something with functors is also usable for us.
Applicative Functor
It was somehow already clear that an applicative functor is hiding here - otherwise the title of the post would be quite a dud.
Let‘s first look again at the definition of the corresponding type class in Haskell:
class Functor f => Applicative f where
pure :: a -> f a
(<*>) :: f (a -> b) -> f a -> f b
Both signatures might already look familiar to us:
pure :: a -> f a
makeValidation :: a -> Validation a
(<*>) :: f (a -> b) -> f a -> f b
combineValidations :: Validation (a -> b) -> Validation a -> Validation b
Then let‘s implement it:
instance Applicative Validation where
pure = makeValidation
(<*>) = combineValidations
And finally:
validateUser :: String -> String -> String -> Validation User
validateUser name email role =
pure User <*> validateNonemptyString name
<*> validateEmail email
<*> validateUserRole role
Nicely, we get (as already with the Functor) everything that‘s
available for applicative functors for free. Just a small example: the
<$> operator ((<$>) :: Functor f => (a -> b) -> f a -> f b, where
(<$>) = fmap) makes the validation a little bit prettier still:
validateUser email name role =
User <$> validateNonemptyString "Marco"
<*> validateEmail "marco.schneider@active-group.de"
<*> validateUserRole "DEVELOPER"
What Does All This Get Me?
You would right in wondering why we‘re going through all this effort. Why look for the functor? And even applicative functors? Since we‘re dealing here with the practical application of functional programming, I‘d like to pick out just one detail.
If you look at the type signature of (<*>) :: f (a -> b) -> f a -> f
b (or even the definition), you‘ll notice: the individual arguments
are independent of each other. That means, I‘m theoretically free in
the expression
make <$> foo <*> bar <*> baz
to evaluate foo, bar, and baz in any order - or in any context
(as long as the types „fit“)! This means, for example, that I could
decide to write my implementation of <*> such that each argument is
evaluated on its own thread and the result is called when all parallel
computations are finished. This then naturally applies to every
instance of Applicative that adheres to the laws - in this case, the
associativity law.
pure (.) <*> u <*> v <*> w = u <*> (v <*> w)
It should hold that: the application of the composition pure (.) <*>
u <*> v <*> w is equivalent to applying u to the result of v <*>
w (or more simply put: composition of applicative values with pure
(.) behaves like composition of functions with .). So it should
not matter in which order the result is calculated.
Conclusion
We‘ve presented to you a variant of data validation in functional programming practice. To do this, we used an applicative functor and showed that with it we can very easily combine partial results and thus always know exactly what went wrong.
The accompanying source code for this article can be found on Github.
Important Postscript
One more note on the topic of type classes and signatures. Just because the signatures of functions look like they could fit a type class doesn‘t mean it will become a correct type class. Such a class usually consists of:
- A set of operations (
fmaporpureplus<*>) - A set of laws that must hold for values of the type under these operations
Unfortunately, Haskell doesn‘t give us the right means to ensure that the respective laws are also upheld by the instance. However, showing that our instances are correct would exceed the scope of this already long post.
Update July 9, 2022 Of course, applicative validation has nothing to
do with Haskell or even static typing. You can see this, for example,
in our Clojure library
active-clojure. Here
you‘ll find an implementation of applicative functors in the
dynamically typed language Clojure.