While working on webassemblyjs we recently discussed our approach to new WebAssembly features and versions1. All major browsers now ship the MVP of the language, but the WebAssembly specification keeps getting developed and enhanced. The process is similar to the development of JavaScript through EcmaScript. Proposals for new features can be followed on GitHub and we realized that we can provide Babel-like2 tooling for wasm modules. Such tooling would enable the translation of new features into old ones with better browser support. Just like Babel enabled you to use let and const way before browser support was good enough, we could enable early experimenting with new WebAssembly features. Apart from its usefulness it is definitely a fun challenge!

Inspired by Colin Eberhardt’s work on transpiling multi return values3 I started looking into the sign extension operator proposal4. This proposal adds five new integer instructions that perform sign extension from short to longer integers:

  • i32.extend8_s
  • i32.extend16_s
  • i64.extend8_s
  • i64.extend16_s
  • i64.extend32_s

In the next paragraph I try to explain what these instructions do (they are pretty simple). If you are not interested in what they exactly do on the bit level you can skip the next paragraph and continue with the implementation of the transformation.

What sign extension does

The instruction T.extendN_s takes a value of type T, reads its lowest N bits and produces a new value of type T. The new value is constructed by sign extension: The lowest N - 1 bits are untouched and the N-th bit is replicated until the highest bit. This is hard to describe in words, but easy in examples:

i32.extend8_s: 0x1234_8600 -> 0x0000_0000
i32.extend16_s: 0x1234_8600 -> 0xFFFF_8600

In the first case we look at the lowest 8 bits, that is 0x00 and replicate its “sign bit” in the remaining bits. The sign bit is just the highest bit which is 0 here so the result is all zeroes. The second example looks at the lowest 16 bits which are 0x8000. Now the sign bit is 1 and therefore all other bytes become F which is just four ones.

Supporting new instructions in parsers

The first step is to support the five new instructions in our parsers. For implementation of the text format in wast-parser does not check instructions names at all, so no changes are required there. The current version is already able to produce abstract syntax trees that containing these new instructions. Binary format is almost as easy but not completely free. Instructions are identified by a certain byte for which we have a mapping5 that we use in our wasm-parser. So it only takes one new entry per new instruction in order to support the proposal in the binary format. Here is main part of the diff for this part:

WebAssembly transpiling sign extension operators diff 1

Implementing the transformation

WebAssembly is a stack-machine so instructions pop their argument of the stack and push the result back onto it. My idea was to express the new instruction’s logic with existing instructions and then wrap these in a functions. Then I could transform code with sign extension instructions by inserting my polyfill functions and then replacing each occurence of ̀T.extendN_s with a call to the corresponding function.

Writing the polyfills

The polyfill functions I use currently are a direct translation of my explanation of sign extension. First we check whether the relevant sign bit is set or not and set all higher bits accordingly. In C this can be written in a very direct way using bitwise operators. This can then be compiled to WebAssembly using Emscripten for example. Originally this is what I did. Then we moved the polyfills to TypeScript and compiled them to WebAssembly using AssemblyScript 6. However, after a discussion on GitHub I agreed to remove both and just keep a WebAssembly text format version of each polyfill as source. Here is the one for i32.extend8_s:

WebAssembly transpiling sign extension operators diff 2

This code is a WebAssembly version of the following C function:

#include "stdint.h"

int32_t i32_extend8_s(int32_t x) {
  return (x & 0x00000080) ? (x | 0xffffff80) : (x & 0x0000007f);
}

All other polyfills work very similar. The only things that change are the constants, argument and return types.

Transforming a given AST

Now that we have the polyfills, there are two things that need to be done:

  • Insert the necessary polyfill functions into the module
  • Replace all T.extendN_s instruction occurences with call instructions to the polyfill functions

I started writing code but then realized that there is a bit of a circular dependency:

  • The necessary polyfills are determined by the occuring T.extendN_s instructions
  • The index of the newly inserted call instructions depends on the number of necessary polyfills

This only appears to be a circular reasoning on the surface though. I use the following algorithm:

  1. Count the number of functions n in the original module. Ideally this would be given to us from parsing, but currently we have to do this with a separate traversal. This number n is exactly the index of the first new function we insert at the end of the module.

  2. Do a second traversal and visit all T.extendN_s instructions. Foreach T.extendN_s instruction, remember the index of their polyfill in the module which is initially -1 (not present). Whenever such an instruction is encountered for the first time, assign it the index n and then increment n by one.

  3. Now the all T.extendN_s instructions have been replaced with call m instruction where m is the corresponding polyfill’s index. So we should insert the polyfill function at this index now.

The result is that given the initial number of functions we need just one AST traversal to insert only the necessary polyfills.

In the tests for the transformation I use wast-parser and wast-printer to lift the AST transform to a text to text transformation. Here is an example test case:

Input7

(module
  (func (param i32) (result i32)
    (get_local 0)
    (i32_extend16_s)
    (i32_extend8_s)
  )
  (func (param i32) (result i32)
    (get_local 0)
  )
  (func (param i32) (result i32)
    (get_local 0)
    (i32_extend8_s)
  )
)

Output8

(module
  (func (param i32) (result i32)
    (get_local 0)
    (call 3)
    (call 4)
  )
  (func (param i32) (result i32)
    (get_local 0)
  )
  (func (param i32) (result i32)
    (get_local 0)
    (call 4)
  )
  (func $i32_extend16_s (param i32) (result i32)
    (get_local 0)
    (i32.const -32768)
    (i32.or)
    (get_local 0)
    (i32.const 32767)
    (i32.and)
    (get_local 0)
    (i32.const 32768)
    (i32.and)
    (select)
  )
  (func $i32_extend8_s (param i32) (result i32)
    (get_local 0)
    (i32.const -128)
    (i32.or)
    (get_local 0)
    (i32.const 127)
    (i32.and)
    (get_local 0)
    (i32.const 128)
    (i32.and)
    (select)
  )
)

What is next

There are a couple of things left to do:

  • Extend the typechecker to support the new instructions as well. This would be a great first contribution to webassemblyjs! Let me know if you are interested, I am happy to guide you.

  • Extend the API to support direct transformation of binary modules using our wasm-edit package. This is very little code to write, but I haven’t found a good way to test it yet. (I am not very eager to write binary wasm modules by hand).

  • Since WebAssembly has a full formal specification including big-step semantics, one could try to formulate and prove correctness of the transformation.