Algebraic data types and pattern matching
So far we've only seen typed that can encode single values, such as integers, strings, or functions. Practical programming, however, is nearly impossible without composite types. Tuples, lists, trees, and other datastructures are staples of software development.
The most common building blocks of composite types in OCaml are so called algebraic data types (ADTs). While OCaml has records, objects, and arrays too, the most common built-in types are ADTs.
Algebaric types are closely connected with pattern matching and explaining them has something of a chicken and egg problem: it is impossible to explain pattern matching without explaining algebraic types, but algebraic types make little sense until you know how to use them in pattern matching. We will try to break the circle by learning how to define them by example, and then learning how to actually use them.
→ Product types
Product types are also known as tuples. They get their name from their relation to the cartesian product of sets. A cartesian product of two sets A and B is a set of pairs of all items of A and B. For example, a plane is a product of a line with itself.
This is how we could define a type of points on a plane:
type point = float * float
Note that while you can define a named product type, they are usually left unnamed, and if you call a
float * float
point, the OCaml compiler will not start referring to any value of type
float * float as
* operator is a type constructor that creates a new product type from two other types, in this case
a tuple of two floats. You can create tuples of more than two items in the same fashion:
type point3d = float * float * float
This is just the type though, and naming product types is not a standard practice. Values of product types use comma as an item separator, like in many other languages. Note that parentheses around tuples are optional and required only when leaving them out will cause ambiguity.
let zero = 0.0, 0.0
→ Sum types
Sum types generalize what is often known as enum, union, or variant record. They are also rather harder to explain than product types because they have no direct equivalent in most languages.
Mathemarically, a sum type is a disjoint union: a union of sets where every element is attached to a tag indicating which set it came from.
The simplest kind of a sum type is used to model finite sets. This is equivalent to enums in C-like languages.
type chess_piece = Pawn | Knight | Bishop | Rook | Queen | King
This is the simplest use case however, and their greatest expressive power comes from the ability to attach values to sum type members. For example, suppose we are writing a geometry program and we need a type for shapes. A circle can be defined by its radius, a square by its side length, and a triangle can be defined by lengths of its sides. So our type may look like this:
type shape = Circle of float | Square of float | Triangle of (float * float * float)
Sum types can also be polymorphic. For example, there is a type in OCaml standard library that is meant for functions that can produce some value or no value at all, it is called option. For example, searching for something in a database can produce a list of results, or not find anything, and it is nice to be able to encode the latter case explicitly. The option type can be defined as follows:
type 'a option = Some of 'a | None
There is also a type meant for functions that can explicitly signal error conditions:
type ('a, 'b) result = Ok of 'a | Error of 'b
As you can see, polymorphic types need to have one or more type variables on their left hand side.
→ Terminology and syntax
You should remember that user-defined data constructors must always start with a capital letter.
The anatomy of a sum type definition is shown in the following picture:
The name of the type is referred to as type constructor, because it can used to create new monomorphic types
with different type variables, such as
int option or
string option, or
(int * string) result and
(float * unit) result.
Names of sum type members are called data constructors, since they can construct new values from existing ones,
Some 3 or
Error "Not found".
→ The truth about unit and boolean values
While there is special syntax for the unit value
() and boolean constants
internally, they are just sum types.
If special syntax didn't exist for them, they could be defined as:
type unit = Unit type bool = True | False
→ Pattern matching
We have already seen some basic use of patterns, and learnt that the left hand side of
including function definitions, is a pattern.
Here is what we know is a valid pattern:
- A variable name
- A constant
The wildcard (
Now we can add that a tuple, any defined data constructor, and any combination thereof is also a valid pattern.
→ Patterns in let-expressions
Let's see how we can use tuple and data constructor patterns in
Some languages have special constructs for multiple assignment.
In OCaml, you can do the same by using a tuple pattern, so no special construct is needed.
let a, b = 1, 2 in Printf.printf "%d %d\n" a b
If you have a function that returns a tuple, and you only want one item of that tuple, you can combine the tuple pattern with the wildcard pattern to discard the unwanted part:
let f x y = x, x + y let _, x = f 3 2 let y, _ = f 3 2 let () = Printf.printf "%d %d\n" x y
The relationship of
let-bindings with sum types is more complicated. While they can technically be used on the left hand
side of a
let-expression, if your type has more than one data constructor, it will result in unhandled cases,
which will result in compile time warnings and runtime errors.
Consider this program:
let x = None let (Some y) = x let () = Printf.printf "%d\n" y
If you compile and run it, or paste into the REPL, you will get this warning at the compilation stage:
Warning 8: this pattern-matching is not exhaustive. Here is an example of a case that is not matched: None
And at the execution stage it will fail with
Match_failure exception. While you could catch that exception
(we will learn about exceptions later), it would be a very unidiomatic thing to do. Types with more than one
data constructor should be handled by proper cases analysis with
match expressions that are covered by the
The unit type is an exception to this rule, since it has only one possible value and thus immune to match
failures. That is why it can safely be used in
let-expressions when you want to enforce that the expression
on the right hand side evaluates to unit.
→ Match expressions
match expressions play a role similar to
switch in many other languages, but
due to pattern matching support, have more expressive power.
Their simplest use can be demonstrated on primitive types, such as
int. Constants of any type are valid patterns,
and we can match on them.
Let's compare functions that
determine is given number is zero, written with a conditional expression and pattern matching:
(* With a conditional *) let is_zero n = if n = 0 then true else false (* With a match expression *) let is_zero' n = match n with 0 -> true | _ -> false
match expressions, some people like to add a
| character before the first case as well, for
let is_zero n = match n with | 0 -> true | _ -> false
A function determines if given character is a whitespace character or not can be a more interesting example:
let is_whitespace c = match c with ' ' -> true | '\t' -> true | '\n' -> true | '\r' -> true | _ -> false
You can see that it's rather repetitive. To avoid repetition, OCaml allows conflating cases:
let is_whitespace c = match c with ' ' | '\t' | '\n' | '\r' -> true | _ -> false
So far this is equivalent to
switch statements in C-like languages, so let's move on to
more interesting patterns that allow us to destructure complex values.
To demonstrate using tuple patterns inside
match expressions, we will reimplement the logical AND function.
Logical AND is only true when its both arguments are true, otherwise it's false. With a
and a tuple pattern we can express it consicely:
let (&&) x y = match (x, y) with | true, true -> true | _, _ -> false
We could use nested matches, but that would be unwieldy. Instead, in the
we join both arguments into a tuple so that we can match on them both at the same time in our
cases, and this way we need only two cases.
Now let's see how we can combine data constructors of sum types and tuples in pattern matching. Remember the type for geometric shapes that we introduced earlier. This is how we can write a function for calculating the area of different shapes:
type shape = Circle of float | Square of float | Triangle of (float * float * float) let area s = match s with | Circle r -> Float.pi *. (r ** 2.0) | Square s -> s ** 2.0 | Triangle (s1, s2, s3) -> let s = (s1 +. s2 +. s3) /. 2.0 in sqrt @@ s *. (s -. s1) *. (s -. s2) *. (s -. s3) let () = Printf.printf "%f\n" @@ area (Triangle (3.0, 4.0, 5.0))
→ Nested match expressions
In the logical AND function, we managed to get away with a single
but there are cases when nesting them is unavoidable. The issue you should be aware of is that,
since indentation in OCaml is not significant, nested matches need explicit delimeters.
Like any other expressions, you can wrap them in parentheses, but there is also syntactic sugar
in form of
end keywords. They are syntactically equivalent to parentheses, to the point
that the unit value can be written as
begin end, as in
print_newline begin end, though using them
in this role is obviously bad for readability. But for grouping expressions, they can provide
a readability improvement.
Let's rewrite our logical AND function in an overly verbose way for demonstration.
let (&&) x y = match x with | true -> begin match y with | true -> true | false -> false end | false -> false
If you forget to group
match expressions properly, they will be treated as one long
may cause type errors, or, worse, break your program logic.
→ A note on exhaustiveness check
As we have already seen, the OCaml compiler performs match exhaustiveness checks and warns you if it finds
a case that is not handled. The checking mechanism is consistent (free of false negatives), that is, it will never fail to spot a
genuine unhandled case. However, it is not complete (not free of false positives), and sometimes will
issue warnings when matching is actually exhaustive. This happens especially often if you use nested
The compiler warnings must be taken seriously. Only if you can prove that your matching is indeed exhaustive, then you can safely ignore them.
Write a function
twice that takes a function
f and a value
x and applies
Try to predict its type before you check it with the REPL or another tool.
Simple: write a function with type
'a -> unit.
Somewhat harder: write a function with type
('a -> 'b) -> ('b -> 'a) -> 'a -> 'b.
Define a sum type that models a card deck.
match expression, write a function
is_vowel : char -> char that checks if given
character is a vowel.
Write a function with deliberately non-exsaustive pattern matching and cause it to fail with
Write a logical XOR function using a
match expression and no more than three cases.
shape type and the
area function with one or more new shapes of your choice.