Algebraic data types

algebra, jargon, types

Front page
Listing

In the most basic sense, most kinds of types in various programming languages can be constructed by composing several simpler types together.

The name algebraic data types comes from the realization that some kinds of compositions of types can be used to create algebraic structures. The term itself often refers specifically to sum types (and sometimes also product types).

Sum types

A sum type, usually written $T+U$ for some types $T$ and $U$, is a type whose values are either of type $T$ or of type $U$. Sum types can also be called disjoint unions, tagged unions, variant types, or occasionally enum types.

In most languages, sum types are created with type definitions. As an example, the equivalent of the $+$ type constructor looks like the following in Haskell and Rust:

data Either a b = Left a
                | Right b
enum Either<T, U> {
    Left(T),
    Right(U),
}

The type Bool has two values while a type Unit has one value. The type Bool + Unit then has $2+1=3$ values; the number of values in a sum type is the sum of the number of values of its constituent parts.

In a sense, the Either type is the only sum type you need - all others can be represented as just nested Either, which is why more formal type theory models typically only include the $T+U$ way of forming sum types. In fact, it turns out that sums are associative up to isomorphism, which means that the types $(T + U) + V$ and $T + (U + V)$ can be freely converted between each other without any loss of information. The same goes for commutativity; although $T+U$ and $U+T$ are distinct types, a value of one can easily be turned into a value of the other without losing information.

Note that a sum type is different from a union type; although a union like Int or Int is the same as just Int, the sum type Int + Int is not. Concretely, the latter contains one more bit of information (whether or not the value is of the “left” or the ”right” Int.

Product types

A product type, usually written $T\cdot U$, $T*U$, or $T\times U$, is a type whose values contain both a value of type $T$ and a value of type $U$. Product types can also be called records, structs, or codata types.

Like sum types, product types can all be represented with the type constructor $\times$, since product types are also associative and commutative up to isomorphism.

data Both a b = Both a b
enum Both<T, U> {
    Both(T, U),
}

// or more idiomatically

struct Both<T, U>(T, U);

Product types are similar to cartesian products. The number of elements in a product type like Bool * Int is the number of elements in a Bool times the number of elements in an Int.

Identities

Certain types are identity types with respect to these type constructor, in the sense that combining any other type with these adds no extra information to the overall type.

For product types, the identity is the Unit type – also known as $1$ or Void – which has one element (often ()), and since one is identity element for multiplication, doing $T\times 1$ adds no extra information. In particular, given a value $t:T$, we only have one way of constructing a value $T\times 1$, namely $(t, ())$.

For sum types, the identity is the Never type – also known as $0$, Bottom or confusingly, Void – which has no elements. This means that it is impossible to construct a value which has that type, which further means that the only way to construct a value of type $T+0$ given some value $t:T$ is $\mathrm{Left}\;t$.

The bottom type is similar to $0$ in that a product type $T\times 0$ is itself not possible to construct. All of this may seem uncannily close to how natural numbers work: addition and multiplication are associative and commutative operations, adding zero or multiplying by one does nothing, and multiplying by zero gives zero. In fact, sums and products for both the natural numbers and for types with units and bottom types form a so-called semiring, which is a kind of algebraic structure – hence the algebraic.

Read more

Bartosz Milewski has a good introduction to ADTs as part of his “Categories for Programmers” series on category theory.