Our first article in the series „Functional Programming in Practice“ on the topic of data validation with applicative functors.

Welcome to our first article in the series „Functional Programming in Practice“. In this series, we present concepts from functional programming using real examples and show how we actually apply them in practice every day.

Today we‘ll focus on data validation and show how applicative functors help us systematically collect and process inconsistencies in data.

Validating JSON

Web services like to communicate with each other using JSON objects that end up in our application from users and other services. It‘s no secret that the JSON format is limited by its narrow range of types. In practice, this means that complex data types are usually serialized as strings. Additionally, we often have specific requirements for the form of the concrete data: a "username" should not be empty, an "email" should contain at least an "@" symbol, and serialized values must match our expectations to be converted into internal data types.

For the following example, we‘ll work with this data analysis. A user consists of:

  • a name
  • an email address
  • a role

A role is one of the following:

  • a developer
  • an admin
  • a bug reporter

For the implementation, we‘ll use the Haskell programming language. It could look like this:

data UserRole = ADMIN | DEVELOPER | REPORTER

data User = { userName  :: String
            , userEmail :: String
            , userRole  :: UserRole }

-- Examples
marco = User "Marco" "marco.schneider@active-group.de" DEVELOPER
simon = User "Simon" "simon.haerer.at.active-group.de" DEVELOPER

The two examples in the code immediately show a problem: while we know that Simon‘s email address isn‘t valid, if we read it directly from a JSON object, nothing stops us from ignoring such an invariant („every email address contains an ‚@‘ symbol“). So we need to validate the data somehow before it penetrates further into the system.

We could, of course, write a constructor for User that catches such errors. We‘ll use Either a b = Left a | Right b for the result. Here:

  • An error is signaled with Left a
  • A success is signaled with Right b
type Error = String

makeUser :: String -> String -> Role -> Either Error User
makeUser name email role
  | '@' `elem` email = Right (User name email role)
  | otherwise        = Left  "not a valid email"

Great, problem solved. If everything goes smoothly, we get a Right User, otherwise a Left Error. But wait, what if the username is empty? A second attempt:

type Error = String

makeUser :: String -> String -> Role -> Either Error User
makeUser name email role
  | null name        = Left "not a nonempty string"
  | '@' `elem` email = Right (User name email role)
  | otherwise        = Left "not a valid email"

That‘s a bit better. But what if both the name is empty and the email doesn‘t contain a '@'? After all, we want to know all the errors, not just the first one. But that also means we need a different signature (with a list of errors for Left).

type Error = String
type Errors = [Error]

makeUser :: String -> String -> UserRole -> Either Errors User
makeUser name email role =
  let nameOk = not (null name)
      emailOk = '@' `elem` email
  in
    if nameOk && emailOk
    then Right (User name email role)
    else
      if nameOk && not emailOk
      then Left ["not a valid email"]
      else
        if not nameOk && emailOk
        then Left ["not a nonempty string"]
        else Left ["not a valid email" "not a nonempty string"]

Well, that‘s not really satisfying. Moreover, the number of cases we need to check in increasingly nested branches explodes. This won‘t work.

However - I think - we‘ve learned something about what we actually want:

  1. We want to be able to perform each validation individually
  2. We want to be able to combine multiple validations
  3. We want to write a validation function that returns a validated result exactly when everything is fine or a list of all errors if something is wrong.

Validation Functions

To give this whole thing some structure, let‘s first define our own data type that corresponds to our validation results. Such a result is one of the following:

  • A successful validation (Ok <value>) of a value, or
  • a list of errors that occurred during validation (Fail Errors). Why a list? Imagine validating a web form with more than one form field. It would be annoying to only be told about one error at a time, only to see the next error after correcting it. We‘d rather have all errors at once.
type Errors = [String]

data Validation c = Ok c 
                  | Fail Errors

Our Validation represents the result of a validation. It‘s parameterized over the type of a candidate value c. This allows us to specify validations for values of any type.

Next, we‘ll write a few functions that represent our validations from above:

-- | Validate that `candidate` is not the empty string.
validateNonemptyString :: String -> Validation String
validateNonemptyString candidate
  | null candidate = Fail ["not a nonempty string"]
  | otherwise      = Ok candidate
  
-- | Validate that a string represents an email.
validateEmail :: String -> Validation String
validateEmail candidate
  | '@' `elem` candidate = Ok candidate
  | otherwise            = Fail ["not an email"]

-- | Validate that a `candidate` represents a `UserRole`.
validateUserRole :: String -> Validation UserRole
validateUserRole candidate
  | candidate == "ADMIN"     = Ok ADMIN
  | candidate == "DEVELOPER" = Ok DEVELOPER
  | candidate == "REPORTER"  = Ok REPORTER
  | otherwise                = Fail ["not a role"]

With this, we can already perform individual validations:

validateNonemptyString "ok"
-- Ok "ok" : Validation String
validateNonemptyString ""
-- Fail ["not a nonempty string"] : Validation String

validateEmail "marco.schneider@active-group.de" 
-- Ok "marco.schneider@active-group.de" : Validation String
validateEmail "marco.schneider.at.active-group.de" 
-- Fail ["not a valid email"] : Validation String

validateUserRole "ADMIN"
-- Ok ADMIN : Validation UserRole
validateUserRole "DEVELOPER"
-- Ok DEVELOPER : Validation UserRole
validateUserRole "unknown"
-- Fail ["not a valid role"]

Point one (perform each validation individually) is done now. Now, however, we want to combine multiple results with each other.

Combining Results

Let‘s first look at combining two error cases: since both errors consist of lists of error messages, we can concatenate them and retain all information. That sounds good!

combineValidations :: Validation a -> Validation a -> Validation a
combineValidations (Fail es) (Fail fs) = Fail (es ++ fs)

The Haskell compiler is, of course, still unsatisfied: what if on the left, right, or even in both positions of the function there‘s an Ok?

Our third requirement demands: Either a valid result or all errors. With just one Fail on the left side, we already know it can only result in a Fail, even if there‘s a success on the right. So in that case, we simply ignore the right side:

combineValidations :: Validation a -> Validation a -> Validation a
combineValidations (Fail es) (Fail fs) = Fail (es ++ fs)  -- from above
-- When there's an error on the left and a success on the right,
-- we're no longer interested in the success:
combineValidations (Fail es) _ = Fail es

Okay, we‘ve covered two of four cases - so far, so simple. To make the Haskell compiler happy, we still need two more cases:

  1. Success on the left, error on the right
  2. Success on the left, success on the right

This might sound strange at first, but: we functional programmers don‘t have an awful lot of tools at our disposal. We can feed functions with arguments and see what the result is (or in the case of Haskell, keep trying until the compiler gives its okay). So we don‘t have much choice at this point but to simply pretend that on the left there‘s an Ok whose value is a function that knows what to do with the value on the right. If we accept that for a moment, we can write a little helper function for it.

applyOkFunctionToValidation :: (a -> b) -> Validation a -> Validation b
-- Well, a failure remains a failure, 
applyOkFunctionToValidation _ (Fail es) = Fail es
-- However, we'd like to keep a success. So we take what's
-- wrapped in the success and apply `f` to it. If you look at
-- the type signature, there's not much else left (except for the trivial solution).
applyOkFunctionToValidation f (Ok c) = Ok (f c)

What does this say? Assuming we get a function from an Ok on the left side, we can use applyOkFunctionToValidation to apply the value on the right side to it. A Fail on the right side remains an error (so it propagates along the computation in a way) and remains unchanged. An Ok results in another Ok by applying the f wrapped in the left Ok to it. This way, we‘ve made one success out of two.

Now we could almost use applyOkFunctionToValidation in combineValidations, if it weren‘t for its type signature Validation c -> Validation c -> Validation c getting in the way - we need a Validation (a -> b) -> Validation a -> Validation b. But if you‘ve paid attention, you noticed: we haven‘t actually used the a in the signature yet (so far, we‘ve only been interested in the error, which has nothing to do with a). Thus, we can change the signature without any problem. With that, combineValidations is finished and can be used.

applyOkFunctionToValidation :: (a -> b) -> Validation a -> Validation b
applyOkFunctionToValidation _ (Fail es) = Fail es
applyOkFunctionToValidation f (Ok c) = Ok (f c)

combineValidations :: Validation (a -> b) -> Validation a -> Validation b
combineValidations (Fail es) (Fail fs) = Fail (es ++ fs)
combineValidations (Fail es) _ = Fail es
combineValidations (Ok f) v = applyOkFunctionToValidation f v

Isn‘t Something Missing?

Yes, kind of. Unfortunately, we can‘t just use our validation functions defined above as the first argument for combineValidations anymore - we want a function in the Ok. To get rid of that problem too, we‘ll take one last detour.

Imagine we have a validation on the left that, combined with a validation on the right, always returns the right result. Haskell already provides the function id : a -> a, whose result is always exactly its argument. If we wrap that in an Ok, it looks like this:

(Ok id) `combineValidations` validateNonemptyString ""
-- Fail ["not a nonempty string"]
(Ok id) `combineValidations` validateNonemptyString "Marco"
-- Ok "Marco"

Does that help us? A bit. It doesn‘t really matter which function is in the Ok. So we can quite generally turn any function into a validation:

makeValidation x = Ok x

makeValidation reverse `combineValidations` validateNonemptyString "Marco"
-- Ok "ocraM"

Doesn‘t help much, but it works. Although, it does help somehow, because quite inconspicuously a solution for points two and three from above has crept in here.

makeValidation User
  `combineValidations` (validateNonemptyString "Marco")
  `combineValidations` (validateEmail "marco.schneider@active-group.de")
  `combineValidations` (validateUserRole "DEVELOPER")
-- Ok (User {userName = "Marco", userEmail = "marco.schneider@active-group.de", userRole = ADMIN})

If you don‘t believe it, feel free to check the error cases:

-- Everything wrong
makeValidation User
  `combineValidations` (validateNonemptyString "")
  `combineValidations` (validateEmail "marco.schneider.at.active-group.de")
  `combineValidations` (validateUserRole "DEVELOPE")
-- Fail ["not a nonempty string","not an email","not a role"]

-- Partially wrong
makeValidation User
  `combineValidations` (validateNonemptyString "Marco")
  `combineValidations` (validateEmail "marco.schneider.at.active-group.de")
  `combineValidations` (validateUserRole "DEVELOPE")
-- Fail ["not an email","not a role"]

This probably looks magical at first: why does makeValidation User suddenly become a validation function when it only wraps the original constructor? There‘s no way around it, we need to briefly look at how to imagine the individual substitutions (important: this is only for illustration and doesn‘t really correspond to the execution machinery). I‘ll separate the representation of the execution steps with =>:

makeValidation User
  `combineValidations` (validateNonemptyString "")
  `combineValidations` (validateEmail "marco.schneider@active-group.de")
  `combineValidations` (validateUserRole "DEVELOPER")

=> Inserting the definition of `User`

(Ok (\name email role -> User name admin role))
  `combineValidations` (validateNonemptyString "Marco")
  `combineValidations` (validateEmail "marco.schneider@active-group.de")
  `combineValidations` (validateUserRole "DEVELOPER")

=> Result of the first right side

(Ok (\name email role -> User name admin role))
  `combineValidations` (Ok "Marco")
  `combineValidations` (validateEmail "marco.schneider@active-group.de")
  `combineValidations` (validateUserRole "DEVELOPER")

=> Inserting the first result into the left side

(Ok (\email role -> User "Marco" admin role))
  `combineValidations` (validateEmail "marco.schneider@active-group.de")
  `combineValidations` (validateUserRole "DEVELOPER")

=> Now this repeats for the remaining arguments

(Ok (\email role -> User "Marco" admin role))
  `combineValidations` (Ok "marco.schneider@active-group.de")
  `combineValidations` (validateUserRole "DEVELOPER")

=> 

(Ok (\role -> User "Marco" "marco.schneider@active.group" role))
  `combineValidations` (validateUserRole "DEVELOPER")

=>

(Ok (\role -> User "Marco" "marco.schneider@active.group" role))
  `combineValidations` (Ok DEVELOPER)

=>

(Ok (User "Marco" "marco.schneider@active.group" DEVELOPER))

Presto! Everything is fine. Analogously for the case when errors occur (we‘ll jump in where it gets interesting):

(Ok (\email role -> User "Marco" admin role))
  `combineValidations` (validateEmail "marco.schneider.at.active-group.de")
  `combineValidations` (validateUserRole "DEVELOPER")

=>

(Ok (\email role -> User "Marco" admin role))
  `combineValidations` (Fail ["not an email"])
  `combineValidations` (validateUserRole "DEVELOPER")

=> We remember from above: errors on the right consume `Ok`s on the left

(Fail ["not an email"])
  `combineValidations` (validateUserRole "dummy")

=> And once we have the error, we just collect the remaining errors together.

(Fail ["not an email"])
  `combineValidations` (Fail ["not a role"])

=>

(Fail ["not an email" "not a role"])

What helps us here is that in Haskell functions can be partially applied. That means a function with three arguments, when we apply one argument, results in a function with two arguments. In languages where this isn‘t the case (like Clojure, for example), we have to work a bit harder.

With that, we actually have everything we need to write our validation function:

validateUser :: String -> String -> String -> Validation User
validateUser name email role =
	makeValidation User
	  `combineValidations` validateNonemptyString name
	  `combineValidations` validateEmail email
	  `combineValidations` validateUserRole role

The official part is done, because our three criteria from above are fulfilled. We want to:

  1. be able to perform each validation individually -> we write functions that have Validations as results
  2. chain multiple validations together -> with combineValidations
  3. write a validation function that returns a validated result when everything is fine or a list of all errors if something is wrong → with validateUser

The practical part is now complete. We‘ve looked at how we can write composable validation functions for arbitrary data types, how to link them together, and be sure that we really capture all errors.

However, if you‘re wondering what‘s so incredibly applicative about this and where the functor is hiding, you can continue enjoying yourself below.

Who or What is an Applicative Functor Here?

As functional programmers, we‘re always interested in finding more general properties and structures in our code (or designing it that way from the start). An example of our approach at home is the „Find-the-Functor-Game“™, because: where a functor is hiding, there might also be an applicative functor, there might also be a monad. Or generally: if you find a bit of structure, there‘s usually more.

Those who already know what makes a functor in programming have already spotted it above. In Haskell, you write it as a type class like this:

class Functor f where
  fmap :: (a -> b) -> f a -> f b

To avoid falling into metaphors about what a functor is, I‘d rather show where the functor was hiding above. If we look at the signature of fmap, it looks suspiciously like applyOkFunctionToValidation!

fmap                        :: (a -> b) ->          f a ->          f b
applyOkFunctionToValidation :: (a -> b) -> Validation a -> Validation b

So we could quite generally say that our Validation is a functor:

instance Functor Validation where
  fmap _ (Fail es) = Fail es
  fmap f (Ok c)    = Ok (f c)
  -- or more simply
  -- fmap = applyOkFunctionToValidation

Armed with this knowledge, we can make a first small change to the code from above.

combineValidations :: Validation (a -> b) -> Validation a -> Validation b
combineValidations (Fail es) (Fail fs) = Fail (es ++ fs)
combineValidations (Fail es) _         = Fail es
combineValidations (Ok f)    v         = fmap f v

That doesn‘t look like much, but it‘s already quite a bit. Our use of fmap at this point (and of course the implementation of Functor) signals to all readers that:

  • certain laws apply here (please don‘t miss the warning further below about laws)
  • all code from all libraries that know how to do something with functors is also usable for us.

Applicative Functor

It was somehow already clear that an applicative functor is hiding here - otherwise the title of the post would be quite a dud.

Let‘s first look again at the definition of the corresponding type class in Haskell:

class Functor f => Applicative f where
  pure  :: a -> f a
  (<*>) :: f (a -> b) -> f a -> f b

Both signatures might already look familiar to us:

pure           :: a ->          f a
makeValidation :: a -> Validation a

(<*>)              ::          f (a -> b) ->          f a ->          f b
combineValidations :: Validation (a -> b) -> Validation a -> Validation b

Then let‘s implement it:

instance Applicative Validation where
  pure  = makeValidation
  (<*>) = combineValidations

And finally:

validateUser :: String -> String -> String -> Validation User
validateUser name email role =
	pure User <*> validateNonemptyString name
	          <*> validateEmail email
	          <*> validateUserRole role

Nicely, we get (as already with the Functor) everything that‘s available for applicative functors for free. Just a small example: the <$> operator ((<$>) :: Functor f => (a -> b) -> f a -> f b, where (<$>) = fmap) makes the validation a little bit prettier still:

validateUser email name role =
  User <$> validateNonemptyString "Marco"
       <*> validateEmail "marco.schneider@active-group.de"
       <*> validateUserRole "DEVELOPER"

What Does All This Get Me?

You would right in wondering why we‘re going through all this effort. Why look for the functor? And even applicative functors? Since we‘re dealing here with the practical application of functional programming, I‘d like to pick out just one detail.

If you look at the type signature of (<*>) :: f (a -> b) -> f a -> f b (or even the definition), you‘ll notice: the individual arguments are independent of each other. That means, I‘m theoretically free in the expression

make <$> foo <*> bar <*> baz

to evaluate foo, bar, and baz in any order - or in any context (as long as the types „fit“)! This means, for example, that I could decide to write my implementation of <*> such that each argument is evaluated on its own thread and the result is called when all parallel computations are finished. This then naturally applies to every instance of Applicative that adheres to the laws - in this case, the associativity law.

pure (.) <*> u <*> v <*> w = u <*> (v <*> w)

It should hold that: the application of the composition pure (.) <*> u <*> v <*> w is equivalent to applying u to the result of v <*> w (or more simply put: composition of applicative values with pure (.) behaves like composition of functions with .). So it should not matter in which order the result is calculated.

Conclusion

We‘ve presented to you a variant of data validation in functional programming practice. To do this, we used an applicative functor and showed that with it we can very easily combine partial results and thus always know exactly what went wrong.

The accompanying source code for this article can be found on Github.

Important Postscript

One more note on the topic of type classes and signatures. Just because the signatures of functions look like they could fit a type class doesn‘t mean it will become a correct type class. Such a class usually consists of:

  1. A set of operations (fmap or pure plus <*>)
  2. A set of laws that must hold for values of the type under these operations

Unfortunately, Haskell doesn‘t give us the right means to ensure that the respective laws are also upheld by the instance. However, showing that our instances are correct would exceed the scope of this already long post.

Update July 9, 2022 Of course, applicative validation has nothing to do with Haskell or even static typing. You can see this, for example, in our Clojure library active-clojure. Here you‘ll find an implementation of applicative functors in the dynamically typed language Clojure.