Polyfilling WebAssembly sign extension operators proposal
While working on webassemblyjs we recently discussed our approach to new WebAssembly features and versions^{1}.
All major browsers now ship the MVP of the language, but the WebAssembly specification keeps getting developed and enhanced.
The process is similar to the development of JavaScript through EcmaScript.
Proposals for new features can be followed on GitHub and we realized that we can provide Babellike^{2} tooling for wasm modules.
Such tooling would enable the translation of new features into old ones with better browser support.
Just like Babel enabled you to use let
and const
way before browser support was good enough,
we could enable early experimenting with new WebAssembly features.
Apart from its usefulness it is definitely a fun challenge!
Inspired by Colin Eberhardt’s work on transpiling multi return values^{3} I started looking into the sign extension operator proposal^{4}. This proposal adds five new integer instructions that perform sign extension from short to longer integers:
i32.extend8_s
i32.extend16_s
i64.extend8_s
i64.extend16_s
i64.extend32_s
In the next paragraph I try to explain what these instructions do (they are pretty simple). If you are not interested in what they exactly do on the bit level you can skip the next paragraph and continue with the implementation of the transformation.
What sign extension does
The instruction T.extendN_s
takes a value of type T
, reads its lowest N
bits and produces a new value of type T
.
The new value is constructed by sign extension: The lowest N  1
bits are untouched and the N
th bit is replicated
until the highest bit.
This is hard to describe in words, but easy in examples:
i32.extend8_s: 0x1234_8600 > 0x0000_0000
i32.extend16_s: 0x1234_8600 > 0xFFFF_8600
In the first case we look at the lowest 8
bits, that is 0x00
and replicate its “sign bit” in the remaining bits.
The sign bit is just the highest bit which is 0
here so the result is all zeroes.
The second example looks at the lowest 16
bits which are 0x8000
.
Now the sign bit is 1
and therefore all other bytes become F
which is just four ones.
Supporting new instructions in parsers
The first step is to support the five new instructions in our parsers.
For implementation of the text format in wastparser
does not check instructions names at all,
so no changes are required there.
The current version is already able to produce abstract syntax trees that containing these new instructions.
Binary format is almost as easy but not completely free.
Instructions are identified by a certain byte for which we have a mapping^{5} that we use in our wasmparser
.
So it only takes one new entry per new instruction in order to support the proposal in the binary format.
Here is main part of the diff for this part:
Implementing the transformation
WebAssembly is a stackmachine so instructions pop their argument of the stack and push the result back onto it.
My idea was to express the new instruction’s logic with existing instructions and then wrap these in a functions.
Then I could transform code with sign extension instructions by inserting my polyfill functions
and then replacing each occurence of ̀T.extendN_s
with a call to the corresponding function.
Writing the polyfills
The polyfill functions I use currently are a direct translation of my explanation of sign extension.
First we check whether the relevant sign bit is set or not and set all higher bits accordingly.
In C this can be written in a very direct way using bitwise operators.
This can then be compiled to WebAssembly using Emscripten for example.
Originally this is what I did.
Then we moved the polyfills to TypeScript and compiled them to WebAssembly using AssemblyScript ^{6}.
However, after a discussion on GitHub I agreed to remove both and just keep a WebAssembly text format version of each polyfill as source.
Here is the one for i32.extend8_s
:
This code is a WebAssembly version of the following C function:
#include "stdint.h"
int32_t i32_extend8_s(int32_t x) {
return (x & 0x00000080) ? (x  0xffffff80) : (x & 0x0000007f);
}
All other polyfills work very similar. The only things that change are the constants, argument and return types.
Transforming a given AST
Now that we have the polyfills, there are two things that need to be done:
 Insert the necessary polyfill functions into the module
 Replace all
T.extendN_s
instruction occurences with call instructions to the polyfill functions
I started writing code but then realized that there is a bit of a circular dependency:
 The necessary polyfills are determined by the occuring
T.extendN_s
instructions  The index of the newly inserted call instructions depends on the number of necessary polyfills
This only appears to be a circular reasoning on the surface though. I use the following algorithm:

Count the number of functions
n
in the original module. Ideally this would be given to us from parsing, but currently we have to do this with a separate traversal. This numbern
is exactly the index of the first new function we insert at the end of the module. 
Do a second traversal and visit all
T.extendN_s
instructions. ForeachT.extendN_s
instruction, remember the index of their polyfill in the module which is initially1
(not present). Whenever such an instruction is encountered for the first time, assign it the indexn
and then incrementn
by one. 
Now the all
T.extendN_s
instructions have been replaced withcall m
instruction wherem
is the corresponding polyfill’s index. So we should insert the polyfill function at this index now.
The result is that given the initial number of functions we need just one AST traversal to insert only the necessary polyfills.
In the tests for the transformation I use wastparser
and wastprinter
to lift the AST transform to a text to text transformation.
Here is an example test case:
Input^{7}
(module
(func (param i32) (result i32)
(get_local 0)
(i32_extend16_s)
(i32_extend8_s)
)
(func (param i32) (result i32)
(get_local 0)
)
(func (param i32) (result i32)
(get_local 0)
(i32_extend8_s)
)
)
Output^{8}
(module
(func (param i32) (result i32)
(get_local 0)
(call 3)
(call 4)
)
(func (param i32) (result i32)
(get_local 0)
)
(func (param i32) (result i32)
(get_local 0)
(call 4)
)
(func $i32_extend16_s (param i32) (result i32)
(get_local 0)
(i32.const 32768)
(i32.or)
(get_local 0)
(i32.const 32767)
(i32.and)
(get_local 0)
(i32.const 32768)
(i32.and)
(select)
)
(func $i32_extend8_s (param i32) (result i32)
(get_local 0)
(i32.const 128)
(i32.or)
(get_local 0)
(i32.const 127)
(i32.and)
(get_local 0)
(i32.const 128)
(i32.and)
(select)
)
)
What is next
There are a couple of things left to do:

Extend the typechecker to support the new instructions as well. This would be a great first contribution to webassemblyjs! Let me know if you are interested, I am happy to guide you.

Extend the API to support direct transformation of binary modules using our
wasmedit
package. This is very little code to write, but I haven’t found a good way to test it yet. (I am not very eager to write binary wasm modules by hand). 
Since WebAssembly has a full formal specification including bigstep semantics, one could try to formulate and prove correctness of the transformation.

https://blog.scottlogic.com/2018/05/29/transpilingwebassembly.html ↩

https://github.com/WebAssembly/signextensionops/blob/master/proposals/signextensionops/Overview.md ↩

https://github.com/xtuc/webassemblyjs/blob/master/packages/helperwasmbytecode/src/index.js#L105 ↩

https://github.com/xtuc/webassemblyjs/blob/2c07ba090cc0d8ed424fdfb4dae7edbe08d7ed48/packages/proposalsignextensionops/test/wast/mixed/input.wast ↩

https://github.com/xtuc/webassemblyjs/blob/2c07ba090cc0d8ed424fdfb4dae7edbe08d7ed48/packages/proposalsignextensionops/test/wast/mixed/output.wast ↩