Many applications use configurations to customize their behavior at runtime. The parameters in these configurations can have default values that are used if nothing else is specified. Other values, such as passwords, have no default values and must therefore always be provided when starting the application.

In this article, we will go through several iterations to model configurations in our programs using higher-kinded data in Haskell.

Note: Accompanying source code can be found on Github.

First Attempt

Suppose we have a program that requires a password, the URL of a service, and its port as configuration. We could then model the configuration with the following data type.

data Config = Config
  { password :: String
  , serviceUrl :: String
  , servicePort :: Int
  }
  deriving (Show)

We assume that the parts of the configuration are read from environment variables at program startup. If an environment variable is not set, or if the given value cannot be interpreted, a default value should be used. Additionally, the password must always be provided at runtime and therefore has no default value.

The function that implements all of this could look like this:

getConfig :: IO Config
getConfig = do
  pw <- getPassword
  url <- fromMaybe "localhost" <$> getUrl
  port <- fromMaybe 8080 <$> getPort
  pure (Config pw url port)

getPassword :: IO String
getPassword = do
  mPassword <- lookupEnv "PASSWORD"
  pure (fromMaybe (error "Environment variable PASSWORD not set") mPassword)

getUrl :: IO (Maybe String)
getUrl = lookupEnv "SERVICE_URL"

getPort :: IO (Maybe Int)
getPort = do
  mPortStr <- lookupEnv "SERVICE_PORT"
  pure (readMaybe =<< mPortStr) 

We use the function:

lookupEnv :: String -> IO (Maybe String)

which returns the value of an environment variable if it is set.

We see three helper functions here that handle reading and interpreting the respective parts of the configuration. In the getConfig function, everything is assembled. This function constructs values of the Config data type and also merges of default values with the read values.

This approach has at least two problems:

  • The getConfig function mixes building the configuration with reading the individual parameters and combining them with default values. This means it‘s not immediately clear whether a field has a default value.
  • Other configuration types must write their own getConfig function.

Second Attempt

Another possibility would be to choose the following data type:

data Config' static dynamic = Config
  { password :: dynamic String
  , serviceUrl :: static String
  , servicePort :: static Int
  }

As we can see, two type parameters have been introduced compared to the first Config' data type. Both parameters are type functions with kind Type -> Type. By choosing these two type functions, we can achieve the following: dynamic values cannot be specified at compile-time, while static values can.

To implement the semantics behind static and dynamic, we use the two built-in data types Identity and Proxy. Identity a is a data type that contains a value of type a. This means that in all definitions of a configuration, these values must be present. Proxy a, on the other hand, does not contain a value of type a. Thus we can define a value of type Proxy a without having to provide a value of type a.

We define the following type aliases to make the assignment of static and dynamic clearer in different situations.

type DefaultConfig = Config' Identity Proxy
type PartialConfig = Config' Maybe Identity
type Config = Config' Identity Identity

PartialConfig here replaces static with the Maybe type constructor and dynamic with Identity. Thus, for partial configurations, static values can potentially be provided, while dynamic values must be provided. In a value of type Config, both static and dynamic values must be provided.

With this, we can now define a default configuration:

defaultConfig :: DefaultConfig
defaultConfig =
  Config
    { password = Proxy
    , serviceUrl = Identity "localhost"
    , servicePort = Identity 8080
    }

Although this definition only contains the values we know statically, calls to Proxy and Identity are still necessary here, which impairs readability.

The function that reads the environment variables at runtime now looks like this:

readInPartialConfig :: IO PartialConfig
readInPartialConfig = do
  password <- Identity <$> getPassword
  url <- getUrl
  port <- getPort
  pure (Config password url port)

Note here that both the URL and the port can be read, but do not necessarily have to be present. The password, on the other hand, must be loaded dynamically.

The last missing building block is the function that combines the default and partial configuration into the final Config. Values from the PartialConfig are preferred over those from the DefaultConfig.

combineConfig :: DefaultConfig -> PartialConfig -> Config
combineConfig
  (Config _defaultPasswordProxy defaultServiceURL defaultServicePort)
  (Config pw url port) =
    Config
      pw
      (maybe defaultServiceURL Identity url)
      (maybe defaultServicePort Identity port)

The getConfig function now simply uses the functions defined so far:

getConfig :: IO Config
getConfig = combineConfig defaultConfig <$> readInPartialConfig

With our new definition of the configuration data type, we were able to fix the first of the above problems: From the definition of Config', it is clearly recognizable which fields have default values and which do not, and there is a value defaultConfig that contains all these values.

Unfortunately, the code is not reusable in this form. For a new configuration, we still have to rewrite the data type and some functions.

Type Families to the Rescue

To make the code reusable, we define the type family HKD (Higher-Kinded Data). This is how we model the application of type functions to types:

type HKD :: (Type -> Type) -> Type -> Type
type family HKD f a where
  HKD Identity a = a
  HKD f a = f a

Through this definition, we save ourselves unnecessary calls to Identity later.

Our configuration type is unchanged except for the call to HKD in the field signatures. Also, for technical reasons, we need to have an instance of the Generic type class generated.

data Config' static dynamic = Config
  { password :: HKD dynamic String
  , serviceUrl :: HKD static String
  , servicePort :: HKD static Int
  }
  deriving (Generic)

The definition of defaultConfig is also almost the same, except for the missing calls to the Identity constructor:

defaultConfig :: DefaultConfig
defaultConfig =
  Config
    { password = Proxy
    , serviceUrl = "localhost"
    , servicePort = 8080
    }

Similarly for the partial configuration:

readInPartialConfig :: IO PartialConfig
readInPartialConfig = do
  password <- getPassword
  url <- getUrl
  port <- getPort
  pure (Config password url port)

Because we derived the Generic type class in the definition of Config', we can write a function genericApply with the following somewhat simplified type.

genericApply :: c Identity Proxy -> c Maybe Identity -> c Identity Identity

This function takes over the job of the combineConfig function. We can implement it using GHC.Generics without using the definition of Config'. Thus, genericApply can be moved to a library and does not have to be rewritten for each configuration type. The exact implementation goes too far here; it can be found in the Github repo linked above.

By replacing the type variable c with Config', we get the same type as for combineConfig. With this, we have everything needed to build the getConfig function:

getConfig :: IO Config
getConfig = do
  partialConfig <- readInPartialConfig
  pure (genericApply defaultConfig partialConfig)

With this iteration of Config', we finally did it. For a new configuration type, we only need to define the type, the default value, and the reading of dynamic values. We get combining for free. In particular, the HKD type family, the genericApply function, and the necessary helper functions can be moved to a library, since they are always the same regardless of the Config' definition.

With three more small type aliases, it‘s even possible to remove Proxy and Identity:

type Default c = c Identity Proxy
type Partial c = c Maybe Identity
type Complete c = c Identity Identity

dynamic :: Proxy a
dynamic = Proxy

These definitions can also be moved to the library.

Finally, the complete configuration definition looks like this:

data Config' static dynamic = Config
  { password :: HKD dynamic String
  , serviceUrl :: HKD static String
  , servicePort :: HKD static Int
  }
  deriving (Generic)

type DefaultConfig = Default Config'
type PartialConfig = Partial Config'
type Config = Complete Config'

defaultConfig :: DefaultConfig
defaultConfig =
  Config
    { password = dynamic
    , serviceUrl = "localhost"
    , servicePort = 8080
    }
  
readInPartialConfig :: IO Config
readInPartialConfig =do
  password <- getPassword
  url <- getUrl
  port <- getPort
  pure (Config password url port)

getConfig :: IO Config
getConfig = do
  partialConfig <- readInPartialConfig
  pure (genericApply defaultConfig partialConfig)

Conclusion

With the help of type functions and Higher-Kinded Data, we managed to separate the default values of our configuration from reading the dynamic values. We were able to make clear at the type level which values are known dynamically and which are known statically. Finally, we were able to extract functions so that only relevant parts need to be rewritten.