Algebraic data types
In the most basic sense, most kinds of types in various programming languages can be constructed by composing several simpler types together.
The name algebraic data types comes from the realization that some kinds of compositions of types can be used to create algebraic structures. The term itself often refers specifically to sum types (and sometimes also product types).
Sum types
A sum type, usually written $T+U$ for some types $T$ and $U$, is a type whose values are either of type $T$ or of type $U$. Sum types can also be called disjoint unions, tagged unions, variant types, or occasionally enum types.
In most languages, sum types are created with type definitions. As an example, the equivalent of the $+$ type constructor looks like the following in Haskell and Rust:
data Either a b =Left a |Right b
enum Either <T ,U > {Left (T ),Right (U ), }
The type Bool
has two values while a type
Unit
has one value. The type Bool + Unit
then has $2+1=3$ values; the number of values in a sum type is the
sum of the number of values of its constituent parts.
In a sense, the Either
type is the only sum type you
need - all others can be represented as just nested
Either
, which is why more formal type theory models
typically only include the $T+U$ way of forming sum types. In fact,
it turns out that sums are associative up to isomorphism,
which means that the types $(T + U) + V$ and $T + (U + V)$ can be
freely converted between each other without any loss of information.
The same goes for commutativity; although $T+U$ and $U+T$ are
distinct types, a value of one can easily be turned into a value of
the other without losing information.
Note that a sum type is different from a union type; although a
union like Int or Int
is the same as just
Int
, the sum type Int + Int
is not.
Concretely, the latter contains one more bit of information (whether
or not the value is of the “left” or the
”right” Int
.
Product types
A product type, usually written $T\cdot U$, $T*U$, or $T\times U$, is a type whose values contain both a value of type $T$ and a value of type $U$. Product types can also be called records, structs, or codata types.
Like sum types, product types can all be represented with the type constructor $\times$, since product types are also associative and commutative up to isomorphism.
data Both a b =Both a b
enum Both <T ,U > {Both (T ,U ), }// or more idiomatically struct Both <T ,U >(T ,U );
Product types are similar to cartesian products. The number of
elements in a product type like Bool * Int
is the
number of elements in a Bool
times the number of
elements in an Int
.
Identities
Certain types are identity types with respect to these type constructor, in the sense that combining any other type with these adds no extra information to the overall type.
For product types, the identity is the Unit
type
– also known as $1$ or Void
– which has one
element (often ()
), and since one is identity element
for multiplication, doing $T\times 1$ adds no extra information. In
particular, given a value $t:T$, we only have one way of
constructing a value $T\times 1$, namely $(t, ())$.
For sum types, the identity is the Never
type –
also known as $0$, Bottom
or confusingly,
Void
– which has no elements. This means that it
is impossible to construct a value which has that type, which
further means that the only way to construct a value of type $T+0$
given some value $t:T$ is $\mathrm{Left}\;t$.
The bottom type is similar to $0$ in that a product type $T\times 0$ is itself not possible to construct. All of this may seem uncannily close to how natural numbers work: addition and multiplication are associative and commutative operations, adding zero or multiplying by one does nothing, and multiplying by zero gives zero. In fact, sums and products for both the natural numbers and for types with units and bottom types form a so-called semiring, which is a kind of algebraic structure – hence the algebraic.
Read more
Bartosz Milewski has a good introduction to ADTs as part of his “Categories for Programmers” series on category theory.