Over the last week I have embarked on a new mission for webassemblyjs: Write a type checker1. Webassemblyjs is an open source JavaScript toolchain for Webassembly to which I have been contributing quite a bit2 3. Toolchain here simply means that we develop a collection of packages that help JavaScript code deal with Webassembly code. For a browser to run Webassembly code it needs to parse and validate it first. Since the language is statically typed, this validation step includes type checking. The specification on validation can look quite intimidating at first, but it turns out that Webassembly is designed in a way to make validation as easy (and fast) as possible.

While the discussions and explorations of type systems can become really abstract and sometimes seem to drift off into some esoteric, mathematical (yet fascinating) meta-worlds, my goal today is to understand things on a very hands-on level. I will not bother to understand things in their most general form, but in the context of the webassembly virtual machine.

What we expect from a type checker

So I want to write a type checker. This is a piece of code which ensures that a given program obeys the type rules defined by the language4. It computes a simple function (pseudocode):

typeChecker(program) -> [typeErrors]

The result is simply a list of type errors. If the list is empty, the program is well-typed. A simple example of a type error:

(module
  (func (result i32)
    (f32.const 0)
  )
)

The function has a return type of i32 but the actual result is of type f32. My type checker should detect this and report a type error. This is because Webassembly is a stack machine. The last value pushed in a function body becomes its return value. For a nice introduction to stack machines I suggest Sven’s article “Imagine the computer is a stack”5. While not all type errors are this simple, detecting them fortunately does not become much harder than this. The key thing to realize is for a function to be well-typed, two things need to happen:

  • Every instruction in the function body has to match with the current stack.
  • At the end of the function the stack needs to match its result type.

Instruction types

In order to check these things we need a description for each instruction saying what stacks it can be applied to and what the resulting stack after the instruction looks like. Guess what, this is exactly what the type of an instruction is. In the specification they are denoted as function types6:

Let’s make this more concrete by looking at types of some simple instructions:

These are all directly derived from the fact that Webassembly is a stack machine: Instructions pop their operands off the stack and push their results onto the stack7.

Why do we need “type rules” instead of just types?

Now you might wonder, why we need typing rules. Can we not just give the types for all instructions as above? If you are not sure, pause here and try to come up with the type of get_global. Since it pushes a global value onto the stack, its type should be \( [] \rightarrow [t] \) where \( t \) is the type of the global. Therefore its type depends on some property of the module. There are many such instructions, think of function calls for example: The type of call should be the type of the function it calls.

So the question we need to answer is not “What is the type of this instruction?”, but “What is the type of this instruction in this module at this point?”. A typing rule provides the answer to this question by saying something of the following form:

"If you know x about the module, then instruction y has type z."

Since a specification needs to be precise and natural language is not, usually some form of mathematical notation is used for type rules. The Webassembly specification has both, so it is a perfect place to practice or even learn reading the formal notation. All you need to know is that premises (what you know) is above and conclusions (the type of an instruction) are below the horizontal line8. \( C \) denotes the module context and the exact syntax for it is not important to get the gist.

"If the global x is of type t, then instruction get_global x results in type t."

"If the function x is of type t, then instruction call x is of the same type."

"The instruction t.const results in type t."

"The instruction t.add takes two values of type t and results in type t."

"The instruction t.eq takes two values of type t and results in an i32 value."

So these fancy typing rules turn out to be pretty simple in most cases. All of above can be found in the validation section9 of the specification. It also contains some more puzzling aspects like control flow, unreachable code and traps which I might cover in a future post.