Mauro Bringolf

Currently geeking out as WordPress developer at WebKinder and student of computer science at ETH.

Verifying syntactical correctness of a Babel AST transformation with ECMAScript grammar

November 21, 2017
, , , ,

We are currently learning about grammars of formal languages in a lecture on theoretical computer science. I enjoy studying this topic and came up with the following exercise: Use ECMAScript grammar1 to formally reconstruct a source code transformation performed by Babel to verify that it is syntactically valid. For one concrete transformation this is easy, simply perform the transformation and run the output code. If it throws no syntax errors it is valid syntax. But what I want to see is that a transformation is syntactically correct for any input code. I am not sure how involved this is going to be, so I take a simple transformation written by myself called babel-plugin-check-constants2.

It doesn’t really matter what exactly this plugin does, all that matters is that it inserts a bit of extra code at some places. The three possible node types where code is inserted are AssignmentExpression, ForXStatement and UpdateExpression. For more details on const-violations have a look at the post I wrote about that a few months ago3. For each one of those I will look at what kind of code insertion or replacement happens and then try to replicate this with corresponding ECMAScript grammar productions. Yay!


// Input code (Assume a is const)
a = 17;

// Transformed output code
a = (function () {
  throw new Error("\"a\" is read-only");
}(), 17);

We all understand that the right-hand side of an assignment is an expression whose value is to be assigned to the left-hand side of the assignment. So I expected this derivation to be something along the lines of:

AssignmentExpression &\rightarrow LeftHandSideExpression = Expression \\
&\rightarrow LeftHandSideExpression = SequenceExpression \\
&\rightarrow LeftHandSideExpression = (Expression, Expression) \\

This is how I justified the transformation in my head when writing the corresponding code: An assignment assigns to an expression, so I can replace that expression with a comma expression and it will still be valid. But then I started looking at the grammar for AssignmentExpression and was puzzled for a while. It does kind of work the way I showed above, but with a lot more intermediate steps:

AssignmentExpression &\rightarrow LeftHandSideExpression = AssignmentExpression \\
&\rightarrow LeftHandSideExpression = ConditionalExpression \\
&\rightarrow LeftHandSideExpression = LogicalORExpression \\
&\rightarrow LeftHandSideExpression = LogicalANDExpression \\
&\rightarrow LeftHandSideExpression = BitwiseORExpression \\
&\rightarrow LeftHandSideExpression = BitwiseXORExpression \\
&\rightarrow LeftHandSideExpression = BitwiseANDExpression \\
&\rightarrow LeftHandSideExpression = EqualityExpression \\
&\rightarrow LeftHandSideExpression = RelationalExpression \\
&\rightarrow LeftHandSideExpression = ShiftExpression \\
&\rightarrow LeftHandSideExpression = AdditiveExpression \\
&\rightarrow LeftHandSideExpression = MultiplicativeExpression \\
&\rightarrow LeftHandSideExpression = ExponentiationExpression \\
&\rightarrow LeftHandSideExpression = UnaryExpression \\
&\rightarrow LeftHandSideExpression = UpdateExpression \\
&\rightarrow LeftHandSideExpression = LeftHandSideExpression \\
&\rightarrow LeftHandSideExpression = NewExpression \\
&\rightarrow LeftHandSideExpression = MemberExpression \\
&\rightarrow LeftHandSideExpression = PrimaryExpression \\
&\rightarrow LeftHandSideExpression = CoverParenthesizedExpressionAndArrowParameterList \\
&\rightarrow LeftHandSideExpression = ( Expression ) \\
&\rightarrow LeftHandSideExpression = ( Expression, AssignmentExpression ) \\
&\rightarrow LeftHandSideExpression = ( AssignmentExpression, AssignmentExpression ) \\

That means it is syntactically allowed to replace x = z with x = (y, z) as long as y is an \(AssignmentExpression\). The node inserted in this Babel transform represents code of the form function(){...}() which is a \(CallExpression\). Above we have seen how to derive \(AssignmentExpression\) to \(LeftHandSideExpression\). And the production for \(LeftHandSideExpression\) gives \(CallExpression\) as one option, which concludes the derivation:

\dots &\rightarrow LeftHandSideExpression = ( LeftHandSideExpression, AssignmentExpression ) \\
&\rightarrow LeftHandSideExpression = ( CallExpression, AssignmentExpression )


// Input code (Assume a is const)

// Transformed output code
(function () {
  throw new Error("\"a\" is read-only");
}(), ++a);

This one is simple once we understand how to convert the various different expression types:

UpdateExpression &\rightarrow \ldots \\
&\rightarrow ( CallExpression, AssignmentExpression ) \\
&\rightarrow \ldots \\
&\rightarrow ( CallExpression, UpdateExpression )

The dots represent sequences of applied productions that we already used in the AssignmentExpression case and I don’t want to repeat all over the place.


// Input code (Assume a is const)
for(a in [1,2,3]) {}

// Transformed output code
for(a in [1,2,3]) {
  throw new Error("\"a\" is read-only");

Since the plugin calls ensureBlock() we can assume that the for-loop has a full block as body instead of just a single expression. It then shifts the list of statements within that block to the right and inserts a throwStatement . Since the block previously contained a valid StatementList, this insertion is syntactically valid and can be constructed by using the following production steps:

Block &\rightarrow \{ StatementList \} \\
&\rightarrow \{ Statement, StatementList \} \\
&\rightarrow \{ throwStatement, StatementList \}

Enough spec reading for now, let me know if you spot any mistakes!