Hello Internet. In this blog post we are going to learn some type theory, which is a field that lies in the intersection of mathematics and computer science. We are eventually going to learn the variant called Martin-Loef Type Theory, or MLTT for short. To motivate it, we are going to try answering the question, “What is a valid program?”

Not every program makes sense, for example what would a program consisting of the single expression x + 2 mean? If we don’t know what x is, then we don’t know what the meaning of that expression is. More importantly, a program can combine expressions that are incompatible with each other. For example, if + is the usual operation that adds two numbers then an expression like "abc" + 5 can’t be given a meaning. To avoid such meaningless programs, we should talk about the different kinds of data and the valid operations on these pieces of data. In Type Theory, we call the different kinds of data “types.” Type theory gives each piece of data a type, and the type tells you how the data is constructed and how to write a meaningful program that processes that data. We will be working at an informal level of detail, much more detail will be required if for example, we want to implement a programming language based on type theory. Also MLTT is quite complex at first glance, so we’ll build our way there in small steps.

System T

System T has two types. The first one is natural numbers, whose type we’ll call \mathbb{N}. The second one is functions A \to B, which take a value of type A and return a a value of type B. So an example function type will be \mathbb{N} \to \mathbb{N}, any value of this type will be a function that maps natural numbers to natural numbers. Before we jump in, it’ll be useful to introduce some notation. An obviously useful piece of notation will be x : A, which means x has the type A, for example we expect to prove that 5 : \mathbb{N}, meaning 5 is a natural number. Occasionally, I will say 5 is an element of the type of natural numbers, to mean 5 : \mathbb{N}.

Let’s start defining the function type.

What is a function A \to B? It is a machine, that allows you to put in a value of type A in and get out a value of type B. For example, the statement (+1) : \mathbb{N} \to \mathbb{N} means that assuming you have a natural number x : \mathbb{N}, you can get another one (+1)(x) : \mathbb{N}. To express this idea more formally, we need a way of asserting that “Given certain assumptions, some value has the type A“. We write this as \Gamma \vdash x : A, which means given the assumptions \Gamma, the value x has the type A. This assertion is pronounced “\Gamma entails that x has the type A. The assumptions will \Gamma be of the form x_1 : A_1, x_2 : A_2, \cdots, which simply means that we are assuming that x_1 has the type A_1, x_2 has the type A_2 and so on. In Type Theory, we call the rule that tells you how to use a value of a given type the elimination rule. So for functions we get the following:

Elimination Rule for Functions If \Gamma \vdash f : A \to B, and \Gamma \vdash a : A then \Gamma \vdash f(a) : B. In other words, if certain assumptions allow you to make a function then you can plug in a value of the appropriate type and get a particular type of output. Technically speaking we need one such axiom for every expression f, A and B, and we need to write down a grammar that defines all the valid expressions, I’ll leave that to you it’s not that difficult.

Okay, but how do you make a function? Well, let’s say you are making a function with the type \mathbb{N} \to \mathbb{N}, for example. You have to create a program with a variable x, which will be a placeholder for the input to the function. But, in order for your function to really have the type \mathbb{N} \to \mathbb{N}, you can’t just use any old expression involving x, you have to make sure that whenever x is a natural number then that expression is also a natural number. So, in conclusion if you have an expression \phi such that given the assumption x : \mathbb{N}, we have \phi : \mathbb{N}, then you can construct a function \mathbb{N}\mathbb{N}. The notation for this function is \lambda \, x.\phi which is shorthand for “The function which takes x as input and returns \phi as output. If we generalise to arbitrary input and output types, we get this rule:

Introduction Rule for Functions If \Gamma, x : A \vdash \phi : B, then \Gamma \vdash \lambda \, x.\phi : A \to B.

Let me give an example of using the above two rules. Let’s assume that we have a function that adds one to a natural number (+1) : \mathbb{N} \to \mathbb{N}. How do we define a function that adds two to a natural number? Well, (+1) : \mathbb{N} \to \mathbb{N}, x : \mathbb{N} \vdash (+1)(x) : \mathbb{N} by the elimination rule for functions. Applying the elimination rule again, we have (+1) : \mathbb{N}, x : \mathbb{N} \vdash (+1)((+1)(x)) : \mathbb{N}. The introduction rule then allows us to form the function \lambda \, x. (+1)((+1)(x)) : \mathbb{N} \to \mathbb{N}, that takes in x and adds one to it twice. I’ll introduce some shorthand for this definition. Let us write f(x) :\equiv y to mean that the function f is defined to be equal to \lambda x.y, so for example we would write

\mathrm{plus\_two}(x) :\equiv (+1)((+1)(x))

The elimination rule tells us that we use a function by applying it to input. The natural question is, “What is the resulting value of the expression?” Since, we hope, every function is made using the introduction rule, we just have to specify what (\lambda \, x.\phi)(y) means when y : A. Intuitively, what should happen is that we substitute y for x in \phi, however defining this is quite difficult. Let’s say that, after a lot of work you managed to define integration, and a type of real numbers \mathbb{R}. Consider the function \lambda \, x.\int^2_1 x - y\, dy. Assuming that y : \mathbb{R}, how do we calculate \left(\lambda \, x. \int^2_1 \, x - y \, dy\right)(y)? The wrong thing to do would be to simply replace the symbol x for the symbol y and get \int^2_1 \, y - y \, dy, because the two uses of y are different, one is the variable we are integrating with respect to and the other is the input to the function. Making sure that different variables which have the same name don’t get confused with each other, so that instead of the wrong answer we get something like \int ^2_1 y - t\, dt is the primary difficulty with actually defining substitution. For now, we’ll just sweep this detail under the rug. So what we want to say is that the value of (\lambda \, x \cdot \phi)(y) is defined to be \phi but with x substituted to be y. This motivates the idea of definitional equality. We say that two values of the same type, a, b : A for example, are definitionally equal if a is equal to b by definition. This means that whenever we have a we can always swap it with b and get the same program. To give an example, given an appropriate definition of addition 2 + 3 will be definitionally equal to 5, but if we assume that x, y : \mathbb{N} then x + y won’t be definitionally equal to y + x. The definition of addition allows one to simplify 2 + 3 as being equal to 5, but x + y can’t be further simplified. Indeed, one actually has to prove that x + y = y + x, it isn’t a simple matter of using the definition of addition to write x + y as y + x. The notation is \Gamma \vdash a \equiv_A b, which means a and b are definitionally equal to each other as elements of the type A. So, without further ado:

Computation rule for functions If \Gamma, x : A \vdash \phi : B, and \Gamma \vdash y : A then \Gamma \vdash (\lambda \, x.\phi)(y) \equiv_B \phi[x/y], where \phi[x/y] just means “\phi with x substituted with y.”

Let me give an example of the above rule. Given our good old (+1) function, we derived that \lambda x \,. (+1)((+1)(x)) : \mathbb{N} \to \mathbb{N}, and so by the computation rule for functions (\lambda x \, . (+1)((+1)(x)))(5) \equiv_{\mathbb{N}} (+1)(+1)(5). From now I will be lazy and write \equiv instead of \equiv_B for example.

The last thing we want to say, is that every function can be written uniquely using \lambda. This results in the totally obvious:

Uniqueness principle for functions If \Gamma \vdash f : A \to B, then \Gamma \vdash f \equiv (\lambda \, x . f(x)).

You might have the question, “How do we talk about functions like +, which take two arguments?” The answer is quite elegant actually, a function that takes two arguments, one of type A and the other of type B and returns something of type C, is the same thing as a function with the type A \to (B \to C). Such a function f : A \to (B \to C), takes the first argument as input, and returns a function that takes the second argument and outputs the result. For example, we will soon be able to define + : \mathbb{N} \to (\mathbb{N} \to \mathbb{N}), such that +(3) : \mathbb{N} \to \mathbb{N}, is the function that adds three to its input. In a similar way a function with three inputs will have a type like A \to (B \to (C \to D)), which is quite cumbersome to write. So I’ll be lazy and write f : A \to B \to C \to D, and while I’m at it I’ll be lazier and write f(a, b, c) instead of f(a)(b)(c), for example. The idea of defining multi argument functions like this is called currying.

Let’s move on from functions and think about how to define the natural numbers. The pattern will be mostly the same, we will give introduction, elimination and computation rules.

Introduction Rule for Natural Numbers We always have \Gamma \vdash 0 : \mathbb{N}, i.e. zero is a natural number. Further if, \Gamma \vdash 0 : \mathbb{N} then \Gamma \vdash \mathrm{succ}(n) : \mathbb{N}. The function \mathrm{succ} allows you to construct new natural numbers from old ones, by adding one to them. So, given no assumptions 0 : \mathbb{N}, and so \mathrm{succ}(0) : \mathbb{N} as well. The name humans give to \mathrm{succ}(0) is 1, and similarly \mathrm{succ}(\mathrm{succ}(0)) : \mathbb{N} which gives us the number 2 and so on. We call 0 and \mathrm{succ} the constructors for \mathbb{N}, a constructor is a fundamental way of making an element of a type, these constructors are part of the definition of the type.

Elimination rule for Natural Numbers We want to make a function that takes a natural number as input. Let’s think about one of the simplest such functions, the factorial. The factorial is defined like this, assuming a proper definition of multiplication:

\begin{aligned} 0! &\equiv 1 \\ \mathrm{succ}(n)! &\equiv \mathrm{succ}(n) \cdot n! \end{aligned}

At first glance, it does seem kind of circular to define any function in terms of itself, like what the second equality tries to do. However, this definition makes sense, and always produces a natural number n! : \mathbb{N} for any n : \mathbb{N}, here is some intuition about how this works. Say we want to simplify 4!, we can use the definition like this:

\begin{aligned} 4! &\equiv 4 \cdot 3 \cdot 2! \\    &\equiv 4 \cdot 3 \cdot 2 \cdot 1! \\    &\equiv 4 \cdot 3 \cdot 2 \cdot 1 \cdot 0! \\    &\equiv 4 \cdot 3 \cdot 2 \cdot 1 \cdot 1 \end{aligned}

In the above calculation we use the definition 4 \equiv \mathrm{succ}(3), and apply the second equation above. We repeat this process until we get to 0, after which we may apply the first equation. So the reason such “circular” definition works, is because we reduce the number step by step from \mathrm{succ}(n) to n repeatedly until we get 0, and we know that this will eventually reach 0 since 0 and \mathrm{succ} are the only ways we gave to make a natural number given in the introduction rule. Computer scientists may \mathrm{rec}ognise this idea as recursion. So, we should define functions f : \mathbb{N} \to A by their values f(0) : A and f(\mathrm{succ}(n)) : A, and we should be able to define f(\mathrm{succ}(n)) in terms of n : \mathbb{N} and f(n) : A. This elimination rule is kind of intuitive given the introduction rules above. Now, some definitions may look valid, but may run forever for example if we define:

\begin{aligned} f(0) &\equiv 0 \\ f(\mathrm{succ}(n)) &\equiv f(\mathrm{succ(n)}) \end{aligned}

Then, we are simply defining the value of the function as itself, which doesn’t help one to compute the value of the function. Let’s say I try to work out f(4), we’ll the second rule allows me to write this as f(4), and I can apply the second rule again and get f(4), and so on, we’re just going in circles. To prevent this, the elimination rule is carefully designed to make sure that when you define f(\mathrm{succ}(n)) you may only use the values of n and f(n), and in particular you can’t use say f(n+1). So here is the elimination rule:

Given the following data

  • e_0 : A
  • e_\mathrm{succ} : \mathbb{N} \to A \to A

We have:

  • \mathrm{rec}_\mathbb{N}(e_0, e_\mathrm{succ}) : A \to \mathbb{N}

Here I’m being lazy again, and I’m leaving out the assumptions \Gamma since obviously these rules will apply whatever assumptions you have. The first piece of data you need to make a function f : A \to \mathbb{N}, is the value of f(0), this is given by e_0. The next piece of data you need, is the value of f(\mathrm{succ}(n)) given n and the value of f(n), this is provided by e_\mathrm{succ}(n) which is a function. The function e_\mathrm{succ} takes as input n, the value of f(n) and returns the value of \mathrm{succ}(n), phrasing the elimination rule this way makes sure that we can only define sensible functions by recursion. We call \mathrm{rec}_\mathbb{N} the recursor for the type \mathbb{N} of natural numbers.

Let us do an example. We will define + : \mathbb{N} \to \mathbb{N} \to \mathbb{N}: We will have to write +(x) :\equiv \cdots where \cdots is a function \mathbb{N} \to \mathbb{N}, whose definition must depend on x. We will define this function by recursion, since we want +(x, 0) :\equiv x, and +(x, succ(y)) :\equiv \mathrm{succ}(+(x, y)), so here we go:

+(x, y) :\equiv \mathrm{rec}_\mathbb{N}(\mathbb{N}, x,                 \lambda \, y. \lambda \, \mathrm{x\_plus\_y}. \mathrm{succ}(\mathrm{x\_plus\_y}))(y)

Now strictly speaking, we have to write the output type of the recursor but it turns out that there is an automatic algorithm for figuring output types, that even works in languages much more general than System T. So I’ll usually leave them out.

The only thing we need to make sure is that we really have +(x, 0) \equiv x and +(x, \mathrm{succ}(y)) \equiv \mathrm{succ}(y), and for this we need a computation rule.

Computation rule for Natural Numbers Given e_0 : A, e_{\mathrm{succ}} : \mathbb{N} \to A \to A we have:

\begin{aligned} \mathrm{rec}_{\mathbb{N}}(A, e_0, e_{\mathrm{succ}})(0) &\equiv 0 \\ \mathrm{rec}_{\mathbb{N}}(A, e_0, e_{\mathrm{succ}})(\mathrm{succ}(n)) &\equiv e_\mathrm{succ}(n, \mathrm{rec}_{\mathbb{N}}(A, e_0, e_{\mathrm{succ}})(n)) \end{aligned}

The equalities for + follow as a special case of this. Now it’s your turn:

Exercise Write the definitions for addition, multiplication and exponentiation using \mathrm{rec}_\mathbb{N}. After that, just so you can see how powerful \mathrm{rec}_\mathbb{N} can be when combined with other constructs we’ve been talking about define the Ackermann function, ack : \mathbb{N} \to \mathbb{N} \to \mathbb{N}, which satisfies the following equalities:

\begin{aligned} ack(0, n) &\equiv \mathrm{succ}(n) \\ ack(\mathrm{succ}(m), 0) &\equiv ack(m, 1) \\ ack(\mathrm{succ}(m), \mathrm{succ}(n)) &\equiv ack(m, ack(\mathrm{succ}(m), n)) \\ \end{aligned}

So that’s the definition of System T.

One of the things we would want to add to System T are pairs, so that we can talk about multiple pieces of data at once. For this we define the product type A \times B of pairs (a, b) where a : A and b : B, the rules are intuitive enough that I would encourage you to figure them out yourself before reading on.

Introduction Rule Given a : A and b : B, we have (a, b) : A \times B

Elimination Rule Given f : A \to B \to X, we have \mathrm{rec}_{A \times B}(f) : (A \times B) \to X

Computation Rule Given f : A \to B \to X, a : A, and b : B we have:

\mathrm{rec}_{A \times B}(f)((a, b)) \equiv f(a, b)

Finally, often in programming we have to talk about either having one kind of data or a different kind. For example, we might want to say “When I look up a value in a table, I either get a result or I get nothing.” For this we will introduce the disjoint union A + B which either contains a value of type A or a value of type B.

Introduction Rule Given a : A, we have \mathrm{inl}(a) : A + B, and similarly given b : B we have \mathrm{inr}(b) : A + B.

Elimination Rule Given e_\mathrm{inl} : A \to X and e_\mathrm{inr} : B \to X we have \mathrm{rec}_{A + B}(e_\mathrm{inl}, e_\mathrm{inr}) : A + B \to X, or in other words if you can get an X from either an A or a B then you can get an X from a value with the type A + B.

Computation Rule Given e_\mathrm{inl} : A \to X, e_\mathrm{inr} : B \to X, we have:

  • Assuming a : A, we have \mathrm{rec}_{A + B}(e_\mathrm{inl}, e_\mathrm{inr})(\mathrm{inl}(a)) \equiv a.
  • Assuming b : B, we have \mathrm{rec}_{A + B}(e_\mathrm{inl}, e_\mathrm{inr})(\mathrm{inr}(b)) \equiv b.

Now, we’ll introduce some more abbreviations. It is quite cumbersome to write out recursors like \mathrm{rec}_{A + B} all the time, so I’ll use what’s called pattern matching notation. Here is an example, we can define + using pattern matching like this:

\begin{aligned} x + 0 &:\equiv x \\ x + \mathrm{succ}(n) &:\equiv \mathrm{succ}(x + n) \end{aligned}

So first of all I wrote x + 0 for example, instead of +(x, 0), also I just wrote the definitional equalities I want satisfied instead of defining the function with the recursor. Of course, if one is too sloppy with pattern matching notation one can get functions that aren’t definable with the recursor, but once you’ve defined a few functions with the recursor you get a lot of intuition about what how to translate between pattern matching and recursors. In fact, many programming languages based on type theory can automatically translate between recursors and pattern matching.

Logic

Now you may be wondering, “How does this have anything to do with mathematics?” Well, after very little more we can already do logic.

The main idea is that you can define any statement, by describing what it would take to prove that statement. For example, you prove the statement “A and B are true,” by giving two proofs, you first prove A and then you prove B, this constitutes a definition of the the statement “A and B are true.” So we will model propositions as types, where a proposition is viewed as the same thing as the type of its proofs.

So let’s say you have a proposition A, and another one B. We will call the type whose elements are proofs of A, by the same name and we will do the same thing with B, since there a statements meaning is given by the data one needs to prove it there is no need to distinguish between the two.

So what is a proof of “A and B“? Well, it is just a pair of proofs, one for A and another for B, which just an element of the product type A \times B. Similarly, what is a proof of “A or B“? It’s just either a proof of A, or a proof of B or in other terms an element of the disjoint union A + B. In the same vein, the statment “A implies B,” is proven by giving a process that transforms proofs of A into proofs of B, in other words a function A \to B. The only thing we are missing to do propositional logic is the propositions true (\top) and false (\bot). Let’s define them using propositions as types, starting with true.

How do you prove the proposition “true.” Well, the proposition “true,” is trivially true, and in particular you don’t need to do anything to prove it true. So the corresponding type, which we call \mathbf{1}, has a constructor that is trivial in the sense that it doesn’t require any arguments. Let’s define \mathbf{1}, but again I would advise you to come up with its introduction, elimination and computation rules yourself

Introduction Rule We always have * : \mathbf{1}.

Elimination Rule Given e_* : X, we have a function \mathrm{rec}_\mathbf{1}(e_*) : \mathbf{1} \to X.

Computation Rule Assuming e_* : X, we have \mathrm{rec}_\mathbf{1}(e_*, *) \equiv e_*.

Okay, how about “false.” How do you prove “false”? The short answer is that you can’t, “false” is not trivially provable. If our logic is inconsistent, then maybe you can make a proof of false but it won’t be a simple matter of invoking one of the constructors. However, one can prove in the standard foundation of mathematics that this logic is consistent, and so it’s just as reliable as any result in mathematics. Now on to the (very simple definition) of the type \mathbf{0} which corresponds to the proposition “false.”

There is no introduction rule, since there shouldn’t be a way of proving false.

Elimination Rule Many people find this elimination rule tricky, so let me phrase it this way. The elimination rules for A + B and \mathbb{N} have two cases, since these types have two constructors. The elimination rule for say, \mathbf{1} has one case since it has one constructor. Since \mathbf{0} has no constructors, it’s elimination rule should have no cases, we can always just claim \mathrm{rec}_\mathbf{0} : \mathbf{0} \to X since there are no constructors to deal with. Logicians call this principle ex falso quodlibet, which means “from the false anything follows,” if you assume that false things are true (or equivalently that \mathbf{0} has an element), then you can prove whatever you want from that.

The last logical notion we need to define, is the proposition “not A”. We will simply define this as the type A \to \mathbf{0}, since the statement “A is not true” means the same thing as “A implies a falsehood.” Thinking in terms of proofs, you can prove the statement “not A”, by assuming it to be true and showing something that can’t be true. However, such a proof is just a function A \to \mathbf{0} which takes a proof of A as input and returns a proof of \mathbf{0}, a false statement, as output.

Let’s prove some logical theorems by writing them out as computer programs in System T. Let’s first try to prove (A \to \mathbf{0}) + (B \to \mathbf{0}) \to (A \times B) \to \mathbf{0}, or in English “not A or not B implies, not (A and B).” I’ll write the proof using pattern matching notation, it’s quite a good exercise to translate this into recursors. So, there are two possible cases for the first input:

\begin{aligned} p &: (A \to \mathbf{0}) + (B \to \mathbf{0}) \to (A \times B) \to \mathbf{0} \\ p(\mathrm{inl}(notA), \cdots) &:\equiv \cdots \\ p(\mathrm{inr}(notB), \cdots) &:\equiv \cdots \end{aligned}

After pattern matching on the pair, assuming it is of the form (a, b) we get this:

\begin{aligned} p &: (A \to \mathbf{0}) + (B \to \mathbf{0}) \to (A \times B) \to \mathbf{0} \\ p(\mathrm{inl}(notA), (a, b)) &:\equiv \cdots \\ p(\mathrm{inr}(notB), (a, b)) &:\equiv \cdots \end{aligned}

Now each of the \cdots has to have the type \mathbf{0}, seeing as we can’t explicitly construct \mathbf{0} we have to use some of our input data somehow. However, in the first case, we have a : A and notA : A \to \mathbf{0} so notA(a) : \mathbf{0}, and we can apply a similar thing to the second case. Here is the finished proof

\begin{aligned} p &: (A \to \mathbf{0}) + (B \to \mathbf{0}) \to (A \times B) \to \mathbf{0} \\ p(\mathrm{inl}(notA), (a, b)) &:\equiv notA(a) \\ p(\mathrm{inr}(notB), (a, b)) &:\equiv notB(b) \end{aligned}

Indeed the proof reads almost naturally, once you get used to type theory. The two lines of the type theoretic proof, are exactly the two cases of an informal proof, so you could read the proof like this: “Suppose (not A) or (not B), then there are two cases. Firstly, we may have (not A) in which case if we also have A and B, then we have A and so (not A) and A is true which is a contradiction. Secondly, we may have (not B), and assuming (A and B) we have B and in particular notB and B is true which is a contradiction.” This is one of the things that makes type theory an elegant foundation of mathematics, whereas traditionally you would have data encoded as sets and you would reason about that data separately using logic, both of these mathematical activities type theoretically speaking are just programming, and the idea of a type unifies them both.

Anyways, that’s quite enough for today. Next time, we’ll generalise type theory by making every type a value, i.e. an element of another type. This will allow us to use propositions as types to reason about our programs, so that we can prove that, for example a certain algorithm sorts a list, or more simply that addition is commutative. We will introduce new types which exploit this, which will lead to a logic with quantifiers and equality. But for now, here are some exercises

Exercises

Prove the following statements using propositions as types:

  • If A, then (if B then A)
  • If A, then (not (not A).
  • If A or B, and A implies C, and B implies C then C.

Let’s say that instead of the recursor \mathrm{rec}_\mathbb{N}, I gave you the following function, which basically is the recursor but it doesn’t keep track of the current number while recursing:

\begin{aligned} iter &: A \to (A \to A) \to \mathbb{N} \to A\\ iter(c_0, c_\mathrm{succ}, 0) &:\equiv c_0 \\ iter(c_0, c_\mathrm{succ}, \mathrm{succ}(n)) &:\equiv c_\mathrm{succ}(iter(c_0, c_\mathrm{succ}, n)) \end{aligned}

Can you define \mathrm{rec}_\mathbb{N} such that it satisfies the definitional equalities we listed before?

How clear is this post?