Problems with products, part one

You can find this article as a literate Haskell file here.

Where we see a type parameter to the right of a function arrow (or with no arrow at all, since a is isomorphic to () -> a), we have a covariant functor (or simply, "functor").

newtype Decode k v a = Decode{ decode :: Map k v -> a }

Covariant functors quite often admit an applicative aspect that allows us to build up larger stuff within the type parameter.

(★) :: Decode k v a -> Decode k v b -> Decode k v (a, b)

at :: Ord k => k -> Decode k Char Char
at k = Decode (Map.findWithDefault '\0' k)

abc :: Map Int Char
abc = [(1, 'a'), (2, 'b'), (3, 'c')]

(abc & decode (at 1 ★ at 2))  ==  ('a', 'b')

To define (★) is left as an exercise for the reader; it is a good exercise for anyone who does not already immediately know how. The particularly Haskell-savvy, however, will likely choose to use a derived Applicative instance.

deriving newtype instance Functor (Decode k v)
deriving newtype instance Applicative (Decode k v)

(★) = liftA2 (,)

The two-tuple only takes us so far, and sooner or later (but probably sooner, if we're not just playing around) we'll want to start working with larger products. More than two things. (★) can be nested. (I knew this ahead of time, which is why I named it as an infix operator.)

(abc & decode (at 1 ★ at 2))           ==  ('a', 'b')
(abc & decode ((at 1 ★ at 2) ★ at 3))  ==  (('a', 'b'), 'c')
(abc & decode (at 1 ★ (at 2 ★ at 3)))  ==  ('a', ('b', 'c'))

This, of course, sucks, because nobody wants nested tuples like that. You want a three-tuple, right? Thus enters the real clever trick of the Applicative class.

(abc & decode (pure (,,) <*> at 1 <*> at 2 <*> at 3))
    == ('a', 'b', 'c')

Better than tuples, even, we can use types that are more meaningful to us.

data XYZ a = XYZ{ x :: a, y :: a, z :: a } deriving Eq

decodeXYZ :: Decode Int Char (XYZ Char)
decodeXYZ = pure XYZ <*> at 1 <*> at 2 <*> at 3

And we can use do and record sugar to forget about field order.

decodeXYZ' = do x <- at 1; y <- at 2; z <- at 3; pure XYZ{ x, y, z }

decodeXYZ'' = do x <- at 1; z <- at 3; y <- at 2; pure XYZ{ x, y, z }

And it is, I think, fair to call it a trick. Because, for one thing, history implies that this approach wasn't obvious; I've seen nothing predating Applicative Programming with Effects which seems to have started circulating in 2005, the class appears in GHC's base library starting in 2006, and ApplicativeDo doesn't hit the compiler until 2016.

And for another thing, it's sort of weird. Consider the types of the subterms:

pure (,,)                   :: Decode Int Char (Char -> Char -> Char -> (Char, Char, Char))
pure (,,) <*> at 0          :: Decode Int Char (        Char -> Char -> (Char, Char, Char))
pure (,,) <*> at 0 <*> at 1 :: Decode Int Char (                Char -> (Char, Char, Char))

The idea that we have a Decode that represents a parser which produces a result of type Char -> (Char, Char, Char) is, frankly, not even what I'm thinking about when I write an applicative expression. It is only something I think about when I'm explaining why it works. To me this is enough to place any technique squarely in the category of clever trick.

The final reason I call applicative programming a trick is that when we step outside of covariant functors, we find that Applicative is no longer applicable. (Indeed the superclass constraint said this already.)

Just like we encode a product of values such as XYZ from an input map, so too may we be interested in the reverse: turning a record into a Map by turning each field into a Map and merging all the results together.

(This is not so contrived a topic. The problem posed here is one encountered whenever we encode and write process environment variables, HTTP headers, JSON objects, and so on.)

newtype Encode k v a = Encode{ encode :: a -> Map k v }

to :: k -> Encode k v v
to k = Encode \v -> Map.singleton k v

The to function can construct an Encode definition suitable for each of the three fields x, y, z of the XYZ record.

('a' & encode (to 1))  ==  [(1, 'a')]
('b' & encode (to 2))  ==  [(2, 'b')]
('c' & encode (to 3))  ==  [(3, 'c')]

We can define a 2-tuple combination, much like we did for Decode.

(★★) :: Ord k => Encode k v a -> Encode k v b -> Encode k v (a, b)
encodeA ★★ encodeB = Encode \(a, b) ->
    (a & encode encodeA) <>
    (b & encode encodeB)

(('a', 'b') & encode (to 1 ★★ to 2))  ==  [(1, 'a'), (2, 'b')]

And we can define an Encode for XYZ.

encodeXYZ :: Encode Int a (XYZ a)
encodeXYZ = Encode \xyz ->
    (x xyz & encode (to 1)) <>
    (y xyz & encode (to 2)) <>
    (z xyz & encode (to 3))

(XYZ{ x = 'a', y = 'b', z = 'c' } & encode encodeXYZ)
    == [(1, 'a'), (2, 'b'), (3, 'c')]

What we don't have is a combinator that works directly on Encode like <*> does for Decode. This may seem like an unimportant quibble, since encodeXYZ was not particularly difficult to define, but it ends up compounding into a bigger practical problems.

There is also one other concern of more obvious practical relevance:

encodeXYZ' = Encode \xyz ->
    (x xyz & encode (to 1)) <>
    (z xyz & encode (to 3))

If we accidentally neglect to encode one of the fields in the applicative-do Decode expression, the compiler emits a warning. If we omit an Encode step, as in encodeXYZ' above, the mistake goes unremarked upon.

When a type variable appears in both a covariant and a contravariant way, we have an invariant functor. The contravariant aspect prohibits defining fmap, and the covariant aspect prohibits defining contramap.

This happens, for example, when we bundle Decode and Encode together into one type, which seems like a reasonable thing to want to do:

data Codec k v a = Codec{ co :: Encode k v a, dec :: Decode k v a }

key :: Ord k => k -> Codec k Char Char
key k = Codec{ co = to k, dec = at k }

We can concoct combinations for invariant functors of products.

(★★★) :: Ord k => Codec k v a -> Codec k v b -> Codec k v (a, b)
formatA ★★★ formatB = Codec
    { dec = dec formatA ★ dec formatB
    , co = co formatA ★★ co formatB
    }

Exercise: Define Format Int Char XYZ in terms of key 1, key 2, and key 3.

codecXYZ = Codec{ co = undefined, dec = undefined }

It isn't terribly difficult, but it isn't terribly easy either. What you have to do is write the Decode and the Encode separately. They don't compose in parallel. It's up to you to make sure they're defined harmoniously. It's all terribly inconvenient.

Part two, if I am able to finish it, will be about solutions.