Jekyll2018-02-17T22:09:58+00:00/Mauro BringolfCurrently studying computer science at ETH Zürich. Making websites at WebKinder. Sometimes hacking on open source.OSS deep dive: What does a JavaScript minifier do?2018-02-17T00:00:00+00:002018-02-17T00:00:00+00:00/2018/02/oss-deep-dive-what-does-a-javascript-minifier-do<p>Code minifiers have been a black box for me up to now.
I have seen them in two languages: CSS and JavaScript.
A minifier is a program which transforms source code into other code that is <em>smaller but behaves the same way</em>.
Since browsers download JavaScript and CSS as source code,
smaller code means better performance.
Altough the immediate effect of a minifier is on size,
the target effect is on performance.
Today I will try to get a high-level overview of what Babel’s JavaScript minifier<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup> does.</p>
<h2 id="its-all-babel-plugins">It’s all Babel plugins</h2>
<p>The Babel minifier is just a Babel preset that you put inside your <code class="highlighter-rouge">.babelrc</code> configuration file.
And since a preset is just a collection of Babel plugins,
all transformations performed by the minifier are already neatly split
across multiple packages inside <a href="https://github.com/babel/minify/blob/master/packages">packages</a>.
If you are unfamiliar with Babel’s architecture I suggest you take a look at the <em>Basics</em> section of the handbook<sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup>.
This setup makes the actual minification steps very accessible,
so we can jump right into those by examining the different plugins.</p>
<h2 id="a-simple-minfication-boolean-literals">A simple minfication: Boolean literals</h2>
<p>While minification as a whole might seem complex,
some steps are really simple just like <strong>babel-plugin-transform-minify-booleans</strong><sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup>.
This plugin performs minification of boolean literals, namely <code class="highlighter-rouge">true</code> and <code class="highlighter-rouge">false</code>.
These are excessively long and can be replaced by the shorter expressions <code class="highlighter-rouge">!0</code> and <code class="highlighter-rouge">!1</code>.
Why use negation and not <code class="highlighter-rouge">0</code> and <code class="highlighter-rouge">1</code> directly?</p>
<p>The reason is that <code class="highlighter-rouge">0</code> and <code class="highlighter-rouge">false</code> are not values of the same type.
More concrete, if you transform <code class="highlighter-rouge">typeof false</code> into <code class="highlighter-rouge">typeof 0</code> the code no longer behaves the same way.
Using the negation always works, because it results in a boolean value.
I think technically it must also be guaranteed that the <code class="highlighter-rouge">!</code> operator always takes precedence
over whatever operator might use the value <code class="highlighter-rouge">!0</code>.
Looking at the operator precedence table on MDN<sup id="fnref:4"><a href="#fn:4" class="footnote">4</a></sup>,
I think that is given.</p>
<h2 id="advanced-transformations">Advanced transformations</h2>
<p>Most transformations are not a simple as the one above.
If you want to see a full list of transformations I suggest you browse through
all the plugin readmes in the <code class="highlighter-rouge">packages</code> folder.
Instead of duplicating this content I will highlight some points I found interesting:</p>
<ul>
<li>
<p>Not all transformations reduce code size.
Some plugins produce code of the exact same size but try to make it better suited for compression algorithms like gzip<sup id="fnref:5"><a href="#fn:5" class="footnote">5</a></sup>.</p>
</li>
<li>
<p>Constant folding is a general optimization concept:
Perform pure computation on constants at compile time and use the result instead.
The plugin implementing this optimization does more than I expected though.
It not only evaluates arithmetic expressions,
but also function calls and member access to built in things like <code class="highlighter-rouge">[].length</code> or <code class="highlighter-rouge">[].reverse()</code>.</p>
</li>
<li>
<p>Mangling variable names is the process of shortening all variable names down as much as possible.
By looking through the source code I realized that this is not as trivial as it sounds.
If you walk through all variables in one scope and simply start renaming at <code class="highlighter-rouge">a</code>,
you might end up with a less than optimal result.
It matters how many times a variable name occurs in the current scope.
For a scope with many variables,
some of them will have to take names longer than one character.
In order to minimize the output length,
most frequently referenced variables need to get the shortest names.</p>
</li>
</ul>
<hr />
<div class="footnotes">
<ol>
<li id="fn:1">
<p><a href="https://github.com/babel/minify">https://github.com/babel/minify</a> <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p><a href="https://github.com/jamiebuilds/babel-handbook/blob/master/translations/en/plugin-handbook.md#toc-basics">https://github.com/jamiebuilds/babel-handbook/blob/master/translations/en/plugin-handbook.md#toc-basics</a> <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p><a href="https://github.com/babel/minify/tree/master/packages/babel-plugin-transform-minify-booleans">https://github.com/babel/minify/tree/master/packages/babel-plugin-transform-minify-booleans</a> <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p><a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Operator_Precedence">https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Operator_Precedence</a> <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
<li id="fn:5">
<p><a href="https://github.com/babel/minify/tree/master/packages/babel-plugin-minify-flip-comparisons">https://github.com/babel/minify/tree/master/packages/babel-plugin-minify-flip-comparisons</a> <a href="#fnref:5" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>Code minifiers have been a black box for me up to now. I have seen them in two languages: CSS and JavaScript. A minifier is a program which transforms source code into other code that is smaller but behaves the same way. Since browsers download JavaScript and CSS as source code, smaller code means better performance. Altough the immediate effect of a minifier is on size, the target effect is on performance. Today I will try to get a high-level overview of what Babel’s JavaScript minifier1 does. https://github.com/babel/minify ↩Small but useful programming insights2018-02-10T00:00:00+00:002018-02-10T00:00:00+00:00/2018/02/small-but-useful-programming-insights<p>Do you know these <em>Aha</em>-moments when you discover something really simple yet insightful in programming?
I think everyone does. Here are some random ones of mine that I collected over the last couple of weeks:</p>
<ul>
<li>
<p>Signed two’s complement integer division can overflow,
meaning that the result might require more bits than the inputs.
I believe there is only one case though: <code class="highlighter-rouge">INT_MIN / -1</code>.
The result is <code class="highlighter-rouge">INT_MAX + 1</code> which cannot be represented by the same amount of bits due to asymmetry of two’s complement range.
<em>Discovered in <a href="https://github.com/xtuc/js-webassembly-interpreter/blob/master/src/interpreter/runtime/values/i32.js#L43-L55">js-webassembly-interpreter</a>.</em></p>
</li>
<li>
<p>I will just quote this one: “More correctly, there’s no such a thing as a compiled or interpreted language
– it is a property of the implementation, not of the language”.
<em>Discovered in <a href="https://cs.stackexchange.com/questions/71979/why-are-some-programming-languages-faster-or-slower-than-others/71988#71988">cs.stackexchange.com</a>.</em></p>
</li>
<li>
<p>Vim tip: You can jump to next occurrence of character in line by typing <code class="highlighter-rouge">f</code> followed by the character.
Similarly <code class="highlighter-rouge">F</code> will take you to the previous one.
These can be combined with <code class="highlighter-rouge">d</code> or <code class="highlighter-rouge">y</code> to delete or copy the current line up to some character.
To copy the current line up to the next dot, I type <code class="highlighter-rouge">yf.</code> for example.</p>
</li>
<li>
<p>In Git, you can use <code class="highlighter-rouge">stash</code> in diffs to see what they contain,
i.e. <code class="highlighter-rouge">git diff stash</code> to compare stashed state to current working state.
I find this very helpful because you do not have to remember what is in the <code class="highlighter-rouge">stash</code>.</p>
</li>
</ul>Do you know these Aha-moments when you discover something really simple yet insightful in programming? I think everyone does. Here are some random ones of mine that I collected over the last couple of weeks:Some notes on babylons parsing algorithm and tokenizer2018-02-07T00:00:00+00:002018-02-07T00:00:00+00:00/2018/02/some-notes-on-babylons-parsing-algorithm-and-tokenizer<p>In the last OSS deep dive post<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup> I looked at babylon’s source code
and learned that it does not do recursive descent parsing but some other strategy.
At the end of the post I was left with a few questions that I was able to answer in the mean time.
It turns out that the first two questions were answered in a lecture I watched the day I wrote the first post.
Here are some of my notes from the lecture and a closer look at tokens in babylon.</p>
<hr />
<p><em>What parsing algorithm is this? Is it possible to generalize it or is it specific to one grammar?</em></p>
<p>The parsing strategy of babylon is called <strong>predictive parsing</strong>.
Given the current token, the parser is able to predict the correct grammar rule to apply.
This is exactly what is happening in the <code class="highlighter-rouge">switch</code> statement over the token type we saw last time.
However, not all context free grammars allow this strategy.
The set of grammars where the next <script type="math/tex">k</script> tokens determine one production rule is called <script type="math/tex">LL(k)</script>.
Interestingly enough, this is not a property of the language but the grammar.
Sometimes the grammar can be “fixed” to be in <script type="math/tex">LL(1)</script> and be parsed efficiently with a lookahead of <script type="math/tex">1</script> for example.
The details still escape my understanding at this point though.</p>
<hr />
<p><em>How many tokens does the parser need to see in order to determine the applicable grammar rule?</em></p>
<p>Since the <code class="highlighter-rouge">lookahead</code> method returns one more token after the current one,
I believe this number to be <script type="math/tex">k=1</script> in babylon.
For now I am more interested in the last question.</p>
<p><em>Can babylon do lexing without parsing?</em></p>
<p>Lexing (or tokenizing) is the process of aggregating the smallest units of text in a program.
This is similar to dividing raw text (a sequence of characters) into a sequence words and punctuation.
The code I looked through in the previous article already operated on a higher level of abstraction,
it deals with tokens and not raw text.
But I found out that you can ask babylon to give you not only the AST,
but also the sequence of tokens.
To do this, simply pass the option <code class="highlighter-rouge">tokens: true</code> to a <code class="highlighter-rouge">babylon.parse</code> call.</p>
<p>Here is an example:</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="p">{</span> <span class="nx">parse</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'babylon'</span><span class="p">)</span>
<span class="kd">const</span> <span class="nx">input</span> <span class="o">=</span> <span class="s1">'const x = 3;'</span>
<span class="kd">const</span> <span class="nx">output</span> <span class="o">=</span> <span class="nx">parse</span><span class="p">(</span><span class="nx">input</span><span class="p">,</span> <span class="p">{</span> <span class="na">tokens</span><span class="p">:</span> <span class="kc">true</span> <span class="p">})</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">output</span><span class="p">.</span><span class="nx">tokens</span><span class="p">)</span> <span class="c1">// Array of Token objects</span>
</code></pre></div></div>
<p>This way we get the raw result of the lexing process. The first token in the example above for example is <code class="highlighter-rouge">output.tokens[0]</code>:</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">Token</span> <span class="p">{</span>
<span class="nl">type</span><span class="p">:</span>
<span class="nx">KeywordTokenType</span> <span class="p">{</span>
<span class="nl">label</span><span class="p">:</span> <span class="s1">'const'</span><span class="p">,</span>
<span class="nx">keyword</span><span class="p">:</span> <span class="s1">'const'</span><span class="p">,</span>
<span class="nx">beforeExpr</span><span class="p">:</span> <span class="kc">false</span><span class="p">,</span>
<span class="nx">startsExpr</span><span class="p">:</span> <span class="kc">false</span><span class="p">,</span>
<span class="nx">rightAssociative</span><span class="p">:</span> <span class="kc">false</span><span class="p">,</span>
<span class="nx">isLoop</span><span class="p">:</span> <span class="kc">false</span><span class="p">,</span>
<span class="nx">isAssign</span><span class="p">:</span> <span class="kc">false</span><span class="p">,</span>
<span class="nx">prefix</span><span class="p">:</span> <span class="kc">false</span><span class="p">,</span>
<span class="nx">postfix</span><span class="p">:</span> <span class="kc">false</span><span class="p">,</span>
<span class="nx">binop</span><span class="p">:</span> <span class="kc">null</span><span class="p">,</span>
<span class="nx">updateContext</span><span class="p">:</span> <span class="kc">null</span>
<span class="p">},</span>
<span class="nx">value</span><span class="p">:</span> <span class="s1">'const'</span><span class="p">,</span>
<span class="nx">start</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="nx">end</span><span class="p">:</span> <span class="mi">5</span><span class="p">,</span>
<span class="nx">loc</span><span class="p">:</span> <span class="nx">SourceLocation</span> <span class="p">{</span>
<span class="nl">start</span><span class="p">:</span> <span class="nx">Position</span> <span class="p">{</span>
<span class="nl">line</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="nx">column</span><span class="p">:</span> <span class="mi">0</span>
<span class="p">},</span>
<span class="nx">end</span><span class="p">:</span> <span class="nx">Position</span> <span class="p">{</span>
<span class="nl">line</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="nx">column</span><span class="p">:</span> <span class="mi">5</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>So yes, we can get the sequence of tokens from babylon. But it seems to be coupled with parsing and AST construction in this case.</p>
<hr />
<div class="footnotes">
<ol>
<li id="fn:1">
<p><a href="/2018/01/oss-deep-dive-babels-javascript-parsing-algorithm/">/2018/01/oss-deep-dive-babels-javascript-parsing-algorithm/</a> <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>In the last OSS deep dive post1 I looked at babylon’s source code and learned that it does not do recursive descent parsing but some other strategy. At the end of the post I was left with a few questions that I was able to answer in the mean time. It turns out that the first two questions were answered in a lecture I watched the day I wrote the first post. Here are some of my notes from the lecture and a closer look at tokens in babylon. /2018/01/oss-deep-dive-babels-javascript-parsing-algorithm/ ↩Pull requests in January 20182018-02-02T00:00:00+00:002018-02-02T00:00:00+00:00/2018/02/open-source-pull-requests-in-january-2018<p>This is supposed to be the kind of post I would have liked to read when I started contributing to open source last year.
I wondered what things people do and what their pull requests look like.
Of course you can go look at random pull requests on GitHub,
but I wanted simple quick summaries explaining the changes and motivation of a PR.
So I started writing down a few sentences for interesting pull requests this month.</p>
<p><em>You can get your own list by going to <strong>github.com/pulls</strong><sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup>
and searching for <code class="highlighter-rouge">created:>=2017-12-31 is:pr author:githubusername archived:false is:public</code>.</em></p>
<ul>
<li>
<p><strong>xtuc/js-webassembly-interpreter: Floating point hexadecimals</strong><sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup>:
This is an implementation of hexadecimal literals for floating point numbers.
I have written about this format and project in a previous post<sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup>.
In fact, I made the parsing function for these literals a separate npm package<sup id="fnref:4"><a href="#fn:4" class="footnote">4</a></sup>
and then added it as dependency for the project.
During that process I learned a bit about MIT licensing
which needs to be redistributed with all projects depending on such a library.</p>
</li>
<li>
<p><strong>prettier/vim-prettier: Add default option values from Prettier for configuration</strong><sup id="fnref:5"><a href="#fn:5" class="footnote">5</a></sup>:
The description of vim-prettier says that their prettier settings are different from the defaults.
They also provide a list with their setting values but without reference to what the actual default value was.
I simply added a comment with the prettier default value to all options,
so it becomes clearer where the differences are.</p>
</li>
<li>
<p><strong>babel/babel: Remove check-constants plugin</strong><sup id="fnref:6"><a href="#fn:6" class="footnote">6</a></sup>: This one is actually not included with the search query above
because it was not created in January, but on December 7th last year.
Moreover, the issue it resolves was opened on July 19th 2017<sup id="fnref:7"><a href="#fn:7" class="footnote">7</a></sup>.
This is a prime example how patience can be key in open source.
The pull request simply moves code from one place to another without any functional changes.
It makes the code base more consistent and puts the <code class="highlighter-rouge">const</code> checks in a more appropriate place.
This is also the kind of work where a lot more time is spent reading than writing code.</p>
</li>
<li>
<p><strong>DmitrySoshnikov/hdl-js: Add two-input decoder example</strong><sup id="fnref:8"><a href="#fn:8" class="footnote">8</a></sup>:
A repository with examples provides an easy entry point for contribution:
Come up with a new example.
This project implements a hardware description language and emulator for digital circuits.
Everything is written in JavaScript which makes it really accessible, a big plus in my opinion.
I took a circuit from a book on digital design and implemented it in this hardware description language
(This sounds way more complicated than it is).</p>
</li>
<li>
<p><strong>DmitrySoshnikov/hdl-js: Correct number range for random pin integer values</strong><sup id="fnref:9"><a href="#fn:9" class="footnote">9</a></sup>:
After playing around with the project for a bit I realized an odd behavior:
Sometimes the truth tables for my gates would print <code class="highlighter-rouge">10</code> as value for a binary input which should only be <code class="highlighter-rouge">0</code> or <code class="highlighter-rouge">1</code>.
It turns out that the code producing this error runs only under special conditions,
but the bug itself was easy to fix.
It was just a simple off-by-one error,
where the upper bound for possible inputs was one more than it should have been.</p>
</li>
</ul>
<hr />
<div class="footnotes">
<ol>
<li id="fn:1">
<p><a href="https://github.com/pulls">https://github.com/pulls</a> <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p><a href="https://github.com/xtuc/js-webassembly-interpreter/pull/54">https://github.com/xtuc/js-webassembly-interpreter/pull/54</a> <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p><a href="/2017/12/hexadecimal-floating-point-notation/">/2017/12/hexadecimal-floating-point-notation/</a> <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p><a href="https://github.com/maurobringolf/webassembly-floating-point-hex-parser">https://github.com/maurobringolf/webassembly-floating-point-hex-parser</a> <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
<li id="fn:5">
<p><a href="https://github.com/prettier/vim-prettier/pull/94">https://github.com/prettier/vim-prettier/pull/94</a> <a href="#fnref:5" class="reversefootnote">↩</a></p>
</li>
<li id="fn:6">
<p><a href="https://github.com/babel/babel/pull/6987">https://github.com/babel/babel/pull/6987</a> <a href="#fnref:6" class="reversefootnote">↩</a></p>
</li>
<li id="fn:7">
<p><a href="https://github.com/babel/babel/issues/5967">https://github.com/babel/babel/issues/5967</a> <a href="#fnref:7" class="reversefootnote">↩</a></p>
</li>
<li id="fn:8">
<p><a href="https://github.com/DmitrySoshnikov/hdl-js/pull/18">https://github.com/DmitrySoshnikov/hdl-js/pull/18</a> <a href="#fnref:8" class="reversefootnote">↩</a></p>
</li>
<li id="fn:9">
<p><a href="https://github.com/DmitrySoshnikov/hdl-js/pull/19">https://github.com/DmitrySoshnikov/hdl-js/pull/19</a> <a href="#fnref:9" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>This is supposed to be the kind of post I would have liked to read when I started contributing to open source last year. I wondered what things people do and what their pull requests look like. Of course you can go look at random pull requests on GitHub, but I wanted simple quick summaries explaining the changes and motivation of a PR. So I started writing down a few sentences for interesting pull requests this month.JS Numbers: Bitwise operators2018-01-30T00:00:00+00:002018-01-30T00:00:00+00:00/2018/01/js-numbers-bitwise-operators<p><em>This article is part of series on numbers in JavaScript.
The previous post<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup> is about integers and you should understand the gist of it for this one to make sense.</em></p>
<h2 id="the-operators-themselves">The operators themselves</h2>
<p>The bitwise operators in JS are <code class="highlighter-rouge">&</code>, <code class="highlighter-rouge">|</code>, <code class="highlighter-rouge">^</code>, <code class="highlighter-rouge">~</code>, <code class="highlighter-rouge"><<</code> and <code class="highlighter-rouge"><<<</code>.
They represent the standard bitwise operations present in various programming languages.
The corresponding MDN page is full of helpful examples on the topic.
The only thing I want to remark on the operators themselves is that in JS we have two distinct operators for
sign-propagating right shift (<code class="highlighter-rouge"><<</code>) or zero-fill right shift (<code class="highlighter-rouge"><<<</code>).
So the bitwise operators work on integers the way they do in other languages as well.
However, there are no integers in JavaScript as we have seen in the previous post<sup id="fnref:1:1"><a href="#fn:1" class="footnote">1</a></sup>.
The MDN page is not super clear on how operands are converted to integers.
How is <code class="highlighter-rouge">0.1 & 0.5</code> computed for example?
We need to consider the language specification for that.</p>
<h2 id="toint32">ToInt32</h2>
<p>Bitwise operators are defined in section <strong>12.12</strong> of the EcmaScript 2017 language specification.
How operands have to be evaluated is defined by the semantics section <strong>12.12.3</strong><sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup>.
It looks like both operands are converted to a 32 bit integer using the operation <code class="highlighter-rouge">toInt32</code>.
By going to its specification, we find the following definition:</p>
<blockquote>
<ol>
<li>Let number be ? ToNumber(argument).</li>
<li>If number is <script type="math/tex">NaN</script>, <script type="math/tex">+0</script>, <script type="math/tex">-0</script>, <script type="math/tex">+\infty</script>, or <script type="math/tex">-\infty</script>, return <script type="math/tex">+0</script>.</li>
<li>Let int be the mathematical value that is the same sign as number and whose magnitude is floor(abs(number)).</li>
<li>Let int32bit be <script type="math/tex">int</script> modulo <script type="math/tex">2^{32}</script>.</li>
<li>If int32bit <script type="math/tex">\geq 2^{31}</script>, return int32bit <script type="math/tex">- 2^{32}</script>; otherwise return int32bit.</li>
</ol>
</blockquote>
<p>So we see that bitwise operators do not let us access bits of the underlying representation at all.
Instead operands are mapped to 32 bit integers whose bits are (conceptually) used for the operation.
Let’s verify that the algorithm above does not change values from the 32 bit two’s complement range,
so we get the expected behavior of bitwise operators.</p>
<p>The 32 bit integers run from <script type="math/tex">-2^{31}</script> to <script type="math/tex">2^{31}-1</script>.
Let <script type="math/tex">x</script> be an arbitrary one of those.
We already have an integer,
so steps 1 and 3 are doing nothing to it.
Step 2 will return <script type="math/tex">0</script> if <script type="math/tex">x = 0</script>, which satisfies our requirements.
So we consider only <script type="math/tex">x \neq 0</script> and steps 4 and 5.</p>
<p>If <script type="math/tex">x > 0</script> step 4 and 5 will do nothing,
because the highest possible value for <script type="math/tex">x</script> is still smaller than <script type="math/tex">2^{31}</script> and <script type="math/tex">2^{32}</script>.
However, a negative <script type="math/tex">x</script> will be mapped to a positive one by the modulo operation in step 4.
But because negative values are mapped to large positive values, step 5 reverts that.
More precisely:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
-2^{31} \leq x < 0 &\Rightarrow int32bit = x \mod 2^{32} = 2^{32} + x \\
&\Rightarrow int32bit \geq 2^{32} + (-2^{31}) = 2^{31}
\end{align*} %]]></script>
<p>Therefore step 5 will return <script type="math/tex">int32bit - 2^{32} = x</script>.
So indeed, bitwise operations on 32 bit integer values will yield the same results as in languages that have 32 bit integers.
We can learn two things from this, one useful and one rather useless:</p>
<ul>
<li>
<p>Since the operation <code class="highlighter-rouge">ToNumber</code> maps anything (that I’ve tried so far) to a number,
we can use bitwise operators not only on numbers but also strings or objects.
This is the rather useless one, yup.</p>
</li>
<li>
<p>Although JavaScript defines only one floating point number type these operators make it look like
integers are represented using 32 bit two’s complement.
But this is only a temporary representation, the result will always be a regular floating point number.</p>
</li>
</ul>
<hr />
<div class="footnotes">
<ol>
<li id="fn:1">
<p><a href="https://maurobringolf.ch/2018/01/js-numbers-there-are-no-integers-but-how-many/" target="_blank">https://maurobringolf.ch/2018/01/js-numbers-there-are-no-integers-but-how-many/</a> <a href="#fnref:1" class="reversefootnote">↩</a> <a href="#fnref:1:1" class="reversefootnote">↩<sup>2</sup></a></p>
</li>
<li id="fn:2">
<p><a href="https://www.ecma-international.org/ecma-262/8.0/index.html#sec-binary-bitwise-operators-runtime-semantics-evaluation" target="_blank">https://www.ecma-international.org/ecma-262/8.0/index.html#sec-binary-bitwise-operators-runtime-semantics-evaluation</a> <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>This article is part of series on numbers in JavaScript. The previous post1 is about integers and you should understand the gist of it for this one to make sense. https://maurobringolf.ch/2018/01/js-numbers-there-are-no-integers-but-how-many/ ↩Product rule for differentiation of more than two factors2018-01-25T00:00:00+00:002018-01-25T00:00:00+00:00/2018/01/product-rule-for-differentiation-of-more-than-two-factors<p>In one dimensional calculus we have a simple formula to compute the derivative of a product of two functions.
In german we call it the <em>“Produktregel” (product rule)</em> and it looks like this:</p>
<script type="math/tex; mode=display">(f * g)'(x) = f'(x) * g(x) + f(x) * g'(x)</script>
<p>It can be proved by applying the definition of differentation and some limit calculus.
However, sometimes you encounter a product of more than two functions.
If you have three factors, say <script type="math/tex">(f * g * h)'(x)</script> then one option is to apply the product rule twice to obtain the result.
It gets trickier when the number of factors is a parameter to your calculation:</p>
<script type="math/tex; mode=display">(\prod_{i=1}^n f_i )'(x)</script>
<p>We cannot apply the product rule <script type="math/tex">n</script> times, because <script type="math/tex">n</script> is a variable.
In the past I was simply given the formula and that was that.
This actually happened in multiple courses and nobody ever bothered giving a proof for the thing.
Well, the proof is essentialy <em>apply the product rule <script type="math/tex">n</script> times</em> which needs to be done by induction.
I have a feeling that proofs by induction often get skipped or hand-waved.
I do not like skipping things and a little practice in proof writing (and <script type="math/tex">\LaTeX</script>) is always welcome, so let’s do it.</p>
<h2 id="the-formula">The formula</h2>
<p>It looks quite fancy:</p>
<script type="math/tex; mode=display">(\prod_{i=1}^n f_i )'(x)
=
\sum_{i=1}^n \left( \frac{d}{dx} f_i(x) \prod_{i=1, i \neq j}^n f_j(x) \right)</script>
<p>I won’t bother stating assumptions on differentiability and these things,
I just want to see that the formula works.</p>
<h2 id="sanity-check-n--2">Sanity check (n = 2)</h2>
<p>The first sanity check is to verify that for two factors we get the simple product rule back.
And indeed,
there are two terms in the sum and one term in the product which is simply the function we did not derive in this term.
Sorry if that sounds confusing, the point of a formula is to avoid weird sentences like this one.</p>
<h2 id="induction-step-n---1---n">Induction step (n - 1 -> n)</h2>
<p>Hey, we already started the proof by induction: Our sanity check above is the base case for <script type="math/tex">n = 2</script> factors.
How does the induction step for <script type="math/tex">n > 2</script> work?
We split the product and apply the induction hypothesis for <script type="math/tex">2</script> and <script type="math/tex">n-1</script>:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
(\prod_{i=1}^n f_i )'(x) &=
(f_n * \prod_{i=1}^{n-1} f_i)'(x) \\
&\stackrel{\text{two factors}}{=} f_n'(x) * \prod_{i=1}^{n-1} f_i(x) + f_n(x) * (\prod_{i=1}^{n-1} f_i)'(x) \\
&\stackrel{\text{n-1 factors}}{=} f_n'(x) * \prod_{i=1}^{n-1} f_i(x) + f_n(x) *
\sum_{i=1}^{n-1} \left( \frac{d}{dx} f_i(x) \prod_{i=1, i \neq j}^{n-1} f_j(x) \right) \\
&\stackrel{(1)}{=} f_n'(x) * \prod_{i=1}^{n-1} f_i(x) +
\sum_{i=1}^{n-1} \left( \frac{d}{dx} f_i(x) \prod_{i=1, i \neq j}^{n} f_j(x) \right) \\
&\stackrel{(2)}{=} \sum_{i=1}^n \left( \frac{d}{dx} f_i(x) \prod_{i=1, i \neq j}^n f_j(x) \right)
\end{align*} %]]></script>
<ul>
<li>
<p>(1): By distributivity we can multiply each term of the sum instead of the whole sum by <script type="math/tex">f_n(x)</script>.</p>
</li>
<li>
<p>(2): The first term is exactly the summand for <script type="math/tex">i=n</script> in the sum at the end.
No computation happened here.</p>
</li>
</ul>
<h2 id="fun-facts-n--1-n--0">Fun facts (n = 1, n = 0)</h2>
<p>We started the induction at the base case <script type="math/tex">n=2</script>.
For some nice edge case thinking, let’s consider <script type="math/tex">n=1</script> and <script type="math/tex">n=0</script>.</p>
<ul>
<li>
<p><script type="math/tex">n=1</script>: In this case the function we want to derive is a product of one factor which is just the function itself.
So the result we should get is just the derivative of this function.
And indeed, the sum contains only one term for <script type="math/tex">i=1</script>.
The product runs over all indices from <script type="math/tex">1</script> to <script type="math/tex">1</script> which are not <script type="math/tex">1</script>, or in other words <em>no indices at all</em>.
The empty product is usually defined as <script type="math/tex">1</script> exactly for cases like this one.
So we get the correct derivative in this case, because multiplying by one never hurts.</p>
</li>
<li>
<p><script type="math/tex">n=0</script>: This is the <em>weird</em> stuff, but I think it works.
The function we want to derive is a product of zero factors which is again <script type="math/tex">1</script>.
The derivative of the constant function <script type="math/tex">1</script> is <script type="math/tex">0</script>.
And indeed, now both the sum and the product run over indices from <script type="math/tex">1</script> to <script type="math/tex">0</script> which makes them empty.
Since the sum is the outer construct, the total result is indeed <script type="math/tex">0</script>.</p>
</li>
</ul>In one dimensional calculus we have a simple formula to compute the derivative of a product of two functions. In german we call it the “Produktregel” (product rule) and it looks like this:OSS deep dive: Babel’s JavaScript parsing algorithm2018-01-22T00:00:00+00:002018-01-22T00:00:00+00:00/2018/01/oss-deep-dive-babels-javascript-parsing-algorithm<p>Most of the time when I read and try to understand other people’s code I end up learning something.
With this post I want to try a new writing format inspired by this fact:
Read open source code and try to explain how it works.
Or even better, try to find answers to my own questions deep in the details of source code.
I have done this only a couple of times in the past and realized that reading source code is a skill of its own.
You need to develop an intuition to find the key parts,
because there is a million rabbit holes to get lost in.
Today’s project is Babel<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup>, more precisely its parser babylon<sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup>.</p>
<h2 id="recursive-descent-parsers">Recursive descent parsers</h2>
<p>My knowledge of parsers is still quite limited, but I recently started looking at the theory behind them.
After learning about grammars, lexers and tokens I am currently looking at parsing algorithms.
A parsing algorithm should answer the question: How do we check a sequence of tokens against a grammar
and construct a corresponding abstract syntax tree (AST).</p>
<p>One such algorithm or type of parser is called <strong>recursive descent parser</strong><sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup>.
This is the first one I learned about and I am curious if babylon implements it.
The main advantage of this strategy is that it is easy to reason about.
A parser is quite an involved piece of code to write and needs to cover all edge cases allowed by the grammar,
so simplicity is a plus in this situation.</p>
<p>The algorithm works in a top-down fashion.
It starts with the full task (parse a program) and splits it into smaller subtasks (parse a variable declaration).
The difficulty lies in the fact that this division step is not deterministic.
In JavaScript, a program can consist of any number of statements and each statement can be of any kind.
So how does a recursive descent parser split the large tasks into smaller ones?
It simply <strong>tries all possibilities</strong>, one after another and uses the first one that works.
To try a possibility means to check if the input code matches a grammar rule.
So we could parse a JavaScript statement like this:</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Input code</span>
<span class="kd">const</span> <span class="nx">a</span> <span class="o">=</span> <span class="mi">3</span>
</code></pre></div></div>
<ul>
<li>Is this an <code class="highlighter-rouge">if</code>-statement? No.</li>
<li>Is this a <code class="highlighter-rouge">for</code>-loop? No.</li>
<li>Is this a function declaration?
<ul>
<li>Is this a named function declaration? No.</li>
<li>Is this an anonymous function declaration? No.</li>
</ul>
<p>–> No.</p>
</li>
</ul>
<p>.
.
.</p>
<ul>
<li>Is this a variable declaration? Yes.
<ul>
<li>Is this a <code class="highlighter-rouge">var</code> declaration? No.</li>
<li>Is this a <code class="highlighter-rouge">let</code> declaration? No.</li>
<li>Is this a <code class="highlighter-rouge">const</code> declaration? Yes.</li>
</ul>
</li>
</ul>
<p>In reality there would be more intermediate steps, because a variable declaration for example
allows multiple variables to be declared, i.e. <code class="highlighter-rouge">const a,b;</code>.
But I hope you can see the point:
Try any syntactically allowed structure (grammar rule) until one matches<sup id="fnref:4"><a href="#fn:4" class="footnote">4</a></sup>.
And as indicated by the name, this procedure is recursive even in the case when it does not match.
It might be obvious that <code class="highlighter-rouge">const a = 3</code> is not a function declaration,
but remember that the parser has to work correctly for any syntactically valid source code
which includes some <em>really</em> weird stuff<sup id="fnref:5"><a href="#fn:5" class="footnote">5</a></sup>.
An additional benefit is that since this algorithm does not use any specifics of the grammar,
there are tools that can generate the parser from the grammar without any human interaction.
Of course, the downside is performance.</p>
<p>The key point of the algorithm is that it tries to parse input according to a rule
and has to roll back if it does not match.
It does not decide what rule to use and therefore it can happen that parts of the input are processed multiple times.</p>
<h2 id="is-babylon-a-recursive-descent-parser">Is Babylon a recursive descent parser?</h2>
<p><em>What follows is a polished version of my notes from digging through babylon’s source code.
The goal is to find out if the parser implements a recursive descent strategy or something else.
Everytime I reference a class or method name for the first time,
I link it to open the corresponding source code in a new tab.
All of these links target the <code class="highlighter-rouge">v7.0.0-beta.38</code> version of babel,
so they will make sense even after future releases change the code.</em></p>
<p>Babylon exposes a <a href="https://github.com/babel/babel/blob/v7.0.0-beta.38/packages/babylon/src/index.js#L20" target="_blank"><code class="highlighter-rouge">parse</code></a> method
that takes program text as input and a <a href="https://github.com/babel/babel/blob/v7.0.0-beta.38/packages/babylon/src/types.js#L124" target="_blank"><code class="highlighter-rouge">File</code></a> object as output.
This type has a field called <code class="highlighter-rouge">File.program</code> where the root of the abstract syntax tree (AST) is stored.
Actually I just went to astexplorer.net and saw that <code class="highlighter-rouge">File</code> is the root node of the AST there.
None the less, we are on the right track to find the core parsing algorithm.
The <code class="highlighter-rouge">parse</code> function contains some code to handle options and plugins.
As far as I know, Babel has no public API for parser plugins, but it looks like something similar is used internally.
I will ignore all of this though and dig deeper into the <a href="https://github.com/babel/babel/blob/v7.0.0-beta.38/packages/babylon/src/parser/index.js#L12" target="_blank"><code class="highlighter-rouge">Parser</code></a> class which the input and options are passed to.</p>
<p>Here we can see the <code class="highlighter-rouge">Parser</code> class interacting with the given input for the first time.
If the source file contains a leading hashbang line like <code class="highlighter-rouge"><span class="err">#</span><span class="o">!</span><span class="sr">/usr/</span><span class="nx">bin</span><span class="o">/</span><span class="nx">env</span> <span class="nx">node</span></code>, <a href="https://github.com/babel/babel/blob/v7.0.0-beta.38/packages/babylon/src/parser/index.js#L23-L31" target="_blank">this code</a> is responsible for skipping it.
It looks like <code class="highlighter-rouge">this.state.pos</code> points to the part of the input that still needs to be parsed.
Since the hashbang needs to be the first line of a file, the check here is <code class="highlighter-rouge">this.state.pos === 0</code>.
A successful check will call <code class="highlighter-rouge">this.skipLineComment</code>, which moves the position pointer on to the second line I assume.</p>
<p>The hashbang skip is part of the constructor and not the main <code class="highlighter-rouge">Parser.parse</code> method.
Maybe this is because the hashbang is not valid JavaScript syntax and it is easier to deal with this exception separately.
In any case, we are now at the first JavaScript character in the input and want to start parsing it.
This process seems to be kicked off with <code class="highlighter-rouge">this.nextToken</code> and <code class="highlighter-rouge">this.parseTopLevel</code>
which are both inherited from the parent class <a href="https://github.com/babel/babel/blob/v7.0.0-beta.38/packages/babylon/src/parser/statement.js#L17" target="_blank"><code class="highlighter-rouge">StatementParser</code></a>.</p>
<p>Now we are deep into the code: 1697 lines of source code and most methods names contain the word <code class="highlighter-rouge">parse</code>.
The header comment for <code class="highlighter-rouge">parseTopLevel</code> explains that a program is a sequence of statements and this method is responsible for parsing that.
Following the method calls <a href="https://github.com/babel/babel/blob/v7.0.0-beta.38/packages/babylon/src/parser/statement.js#L624" target="_blank"><code class="highlighter-rouge">parseBlockBody</code></a>
and <a href="https://github.com/babel/babel/blob/v7.0.0-beta.38/packages/babylon/src/parser/statement.js#L641" target="_blank"><code class="highlighter-rouge">parseBlockOrModuleBlockBody</code></a> we find the first lines of code
translating input text into AST nodes.
The field <code class="highlighter-rouge">Program.body</code> was passed all the way down here <a href="https://github.com/babel/babel/blob/v7.0.0-beta.38/packages/babylon/src/parser/statement.js#L675">where we now</a> insert statements
constructed by <a href="https://github.com/babel/babel/blob/v7.0.0-beta.38/packages/babylon/src/parser/statement.js#L69" target="_blank"><code class="highlighter-rouge">parseStatement</code></a> and <a href="https://github.com/babel/babel/blob/v7.0.0-beta.38/packages/babylon/src/parser/statement.js#L76" target="_blank"><code class="highlighter-rouge">parseStatementContent</code></a>.
This array represents the sequence of statements that make up the program at the top level.</p>
<p>Here comes the point where we can check if babylon uses a recursive descent parsing strategy.
A statement can be many things, so we need to decide what grammar rule to try now.
As explained above, in a recursive descent parser we would simply try one after another and roll back
whenever a grammar rule does not work for the given input.</p>
<p>But that is not what’s happening here. As explained in a <a href="https://github.com/babel/babel/blob/v7.0.0-beta.38/packages/babylon/src/parser/statement.js#L80-L82" target="_blank">source code comment</a>,
it is possible to recognize most statement types by the next few tokens.
And indeed, the switch statement in this method determines the type of statement by the type of the current token
stored in <code class="highlighter-rouge">this.state.type</code>.</p>
<p>In some cases it also checks the next token’s type with the <a href="https://github.com/babel/babel/blob/v7.0.0-beta.38/packages/babylon/src/tokenizer/index.js#L178" target="_blank"><code class="highlighter-rouge">lookahead</code></a>method.
How many tokens does it need to check at most to determine the next grammar rule?
I looked at the source of <code class="highlighter-rouge">lookahead</code> and found that it only returns one token after the current one.
There are other means to access more tokens though, so I am not sure how to find that number.
I guess the exact number also depends on the granularity of tokens, but I am not sure about that.</p>
<p>But I am pretty confident in the fact that babylon does not implement a recursive descent parsing algorithm,
because it makes decisions about what type of grammar rules to apply instead of trying them all.
It is still top down however, we start of by calling parse methods for <code class="highlighter-rouge">Program</code>,
then <code class="highlighter-rouge">Statement</code> and more and more specific grammar rules after that.
There is probably a name for this parsing algorithm which I will hopefully learn soon.</p>
<h2 id="conclusion-and-further-questions">Conclusion and further questions</h2>
<p>Babylon is <strong>not</strong> a recursive descent parser.
The JavaScript grammar seems simple enough to determine the grammar rules by only looking at the next few tokens.
This naturally leads me to the following questions:</p>
<ul>
<li><em>What parsing algorithm is this? Is it possible to generalize it or is it specific to one grammar?</em></li>
<li><em>How many tokens does the parser need to see in order to determine the applicable grammar rule?</em></li>
</ul>
<p>I will have to catch up with theory to answer these,
but I hope a future post will do so.
I discovered another thing when reading the source code:
The process of lexing (tokenizing) and parsing seem to be coupled in babylon.
All theory I looked at presented it as two separate processes (functions):</p>
<ol>
<li><code class="highlighter-rouge">lexer(text) -> [...tokens...]</code></li>
<li><code class="highlighter-rouge">parser([...tokens...]) -> ast</code></li>
</ol>
<p>From what I can tell, babylon calls <code class="highlighter-rouge">nextToken</code> while parsing so both things are happening at the same time.
So a third question I have is:</p>
<ul>
<li><em>Can babylon do lexing without parsing?</em></li>
</ul>
<p>If you know the answers, let me know!
I might do a follow-up post to address them.</p>
<hr />
<div class="footnotes">
<ol>
<li id="fn:1">
<p><a href="https://github.com/babel/babel">https://github.com/babel/babel</a> <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p><a href="https://github.com/babel/babel/tree/master/packages/babylon">https://github.com/babel/babel/tree/master/packages/babylon</a> <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p><a href="https://en.wikipedia.org/wiki/Recursive_descent_parser">https://en.wikipedia.org/wiki/Recursive_descent_parser</a> <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p>The order in which we try the rules <em>is</em> important and needs to be defined by the grammar. <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
<li id="fn:5">
<p>I encountered some fun edge cases in Babel pull requests, for example <a href="https://github.com/babel/babel/pull/6855#discussion_r152961191">https://github.com/babel/babel/pull/6855#discussion_r152961191</a>. <a href="#fnref:5" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>Most of the time when I read and try to understand other people’s code I end up learning something. With this post I want to try a new writing format inspired by this fact: Read open source code and try to explain how it works. Or even better, try to find answers to my own questions deep in the details of source code. I have done this only a couple of times in the past and realized that reading source code is a skill of its own. You need to develop an intuition to find the key parts, because there is a million rabbit holes to get lost in. Today’s project is Babel1, more precisely its parser babylon2. https://github.com/babel/babel ↩ https://github.com/babel/babel/tree/master/packages/babylon ↩2017 - First year of this blog2018-01-15T00:00:00+00:002018-01-15T00:00:00+00:00/2018/01/2017-first-year-of-this-blog<p>Seems like it has been already one full year since I started this blog.
I just revisited my first post<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup> and am really happy that I stuck with it.
I learned and understood a lot of technical things that I would have long forgotten without writing about it.
And more importantly:
Deep diving into topics and writing about them helped me understand better what topics and problems <em>really</em> interest me.
This is not concrete yet,
but I feel like I am in a much better place to pick a field to work in because of writing all these posts.
That has been the biggest payoff from writing for me so far,
as I generally have a hard time telling what interests me.
So I want to continue to write in 2018 and hopefully be able to further chase down my interests.</p>
<p>I published 51 posts and tagged them with 42 different tags.
For some reason that I don’t remember I decided to use tags instead of categories.
The most used tag is <em>javascript</em> with 17 posts.
The reason for this is simple but insightful:
I found great interest in contributing to open source and generally being an active GitHub user.
The JavaScript ecosystem is welcoming and many projects encourage new contributors,
so I did almost all of my open source work there.
As a result I spent more time digging further into the language and discovered lots of interesting aspects
of it and its ecosystem and ended up writing about them.</p>
<p>I definitely want to continue my efforts in open source and I think future posts will reflect that.
My contributions to the Babel project were a big milestone for me,
even though most of them were minor and will not affect a lot of users.
I wrote a post when the pull request that I considered my first real code and creative contribution<sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup> got merged into master.
I am super proud of it.</p>
<h2 id="insights-on-writing-itself">Insights on writing itself</h2>
<p>When starting out I decided to focus on writing posts instead of figuring out the best setup for doing so.
Writing takes a lot of time, so I want to use the little time I spend on this blog as efficiently as possible.
However, I think one year of writing 51 posts is a good data set to do some evaluation on.
Currently, this blog is a WordPress site with a custom theme I threw together.
I wrote a small plugin to make the footnotes<sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup> and created all posts.
However, I have a couple of issues with this setup:</p>
<ul>
<li>
<p><strong>Text processing</strong>:
I wrote all posts in the WordPress editor in HTML.
I don’t mind typing “verbose” HTML tags at all.
My belief is that if speed of typing is the bottleneck in writing or programming
then the article or program is not interesting enough and not worth optimizing for.
But I have come to enjoy working with markdown a lot more.
It feels closer to the raw content of the text for me.</p>
</li>
<li>
<p><strong>Vim</strong>:
I started learning Vim and need any practice I can get.
Writing posts is a different kind of text editing than programming,
so I want to be able to write posts directly in Vim.</p>
</li>
<li>
<p><strong>Longevity of content</strong>:
The posts I am writing here are an insightful track record that I want to keep and extend
as long as possible in one form or another.
That means that the text should be stored in its rawest form possible without loosing value.
If WordPress, the web, HTML and UTF-8 all die I want to be able to convert and recover my work into the next representation.</p>
</li>
<li>
<p><strong>Version control</strong>:
I love Git. I use it for every line of code that I keep for longer than a day,
all university assignments (code and non-code) and all lecture notes I take in digital form.
I <em>need</em> it for my writings too, period.
WordPress puts my post into a MySQL database which is not very compatible with Git.</p>
</li>
</ul>
<p>So these are issues I have with my current blog setup.
I played around with the static site generator Jekyll for a bit and I think it can fix most of these issues.
I hope to convert everything to a Jekyll blog this month,
so let’s see how that goes.</p>
<h2 id="other-cool-things-that-happened">Other cool things that happened</h2>
<ul>
<li>
<p>This blog got mentioned in a GitHub article on getting started in open source<sup id="fnref:4"><a href="#fn:4" class="footnote">4</a></sup>[mnml_footnote].
They linked my article on my Babel PR I mentioned above,
so this article was not only the most important for me but also the one who got the most traffic this year.</p>
</li>
<li>
<p>I got feedback on how to improve one of my bash scripts which was pretty cool<sup id="fnref:5"><a href="#fn:5" class="footnote">5</a></sup>.</p>
</li>
<li>
<p>I got an interesting response on my post on Euclid’s proof of infinitely many primes that ended in disagreement<sup id="fnref:6"><a href="#fn:6" class="footnote">6</a></sup>.</p>
</li>
</ul>
<h2 id="2018-year-two">2018, year two!</h2>
<p>This is the paragraph that I will reflect against on January 15th of 2019.
I’d like to be more involved in open source and my writing to be a platform for things learned and experienced through this world.
Contribution to a non-JavaScript project with an associated post would be a cool achievement.
In general, writing posts about my open source activities and studies
should become a habit and not require much more time than taking notes along the way.
And as a last point, I want to make a clearer distinction between posts on personal experience
and technical articles explaining things.</p>
<p>So let’s 2018!</p>
<hr />
<div class="footnotes">
<ol>
<li id="fn:1">
<p><a href="https://maurobringolf.ch/2017/01/why-i-started-this-blog-and-how-i-built-my-first-website/">https://maurobringolf.ch/2017/01/why-i-started-this-blog-and-how-i-built-my-first-website/</a> <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p><a href="https://maurobringolf.ch/2017/07/open-source-9-steps-to-my-first-feature-contribution-in-babel/">https://maurobringolf.ch/2017/07/open-source-9-steps-to-my-first-feature-contribution-in-babel/</a> <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p><a href="https://wordpress.org/plugins/mnml-footnotes/">https://wordpress.org/plugins/mnml-footnotes/</a> <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p><a href="https://github.com/collections/choosing-projects">https://github.com/collections/choosing-projects</a> <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
<li id="fn:5">
<p><a href="https://dev.to/maurobringolf/bash-exercise-delete-all-local-git-branches-except-current-one-9a6">https://dev.to/maurobringolf/bash-exercise-delete-all-local-git-branches-except-current-one-9a6</a> <a href="#fnref:5" class="reversefootnote">↩</a></p>
</li>
<li id="fn:6">
<p><a href="https://erismathsview.com/2017/06/29/response-to-a-developer-about-euclids-proof/">https://erismathsview.com/2017/06/29/response-to-a-developer-about-euclids-proof/</a> <a href="#fnref:6" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>Seems like it has been already one full year since I started this blog. I just revisited my first post1 and am really happy that I stuck with it. I learned and understood a lot of technical things that I would have long forgotten without writing about it. And more importantly: Deep diving into topics and writing about them helped me understand better what topics and problems really interest me. This is not concrete yet, but I feel like I am in a much better place to pick a field to work in because of writing all these posts. That has been the biggest payoff from writing for me so far, as I generally have a hard time telling what interests me. So I want to continue to write in 2018 and hopefully be able to further chase down my interests. I published 51 posts and tagged them with 42 different tags. For some reason that I don’t remember I decided to use tags instead of categories. The most used tag is javascript with 17 posts. The reason for this is simple but insightful: I found great interest in contributing to open source and generally being an active GitHub user. The JavaScript ecosystem is welcoming and many projects encourage new contributors, so I did almost all of my open source work there. As a result I spent more time digging further into the language and discovered lots of interesting aspects of it and its ecosystem and ended up writing about them. I definitely want to continue my efforts in open source and I think future posts will reflect that. My contributions to the Babel project were a big milestone for me, even though most of them were minor and will not affect a lot of users. I wrote a post when the pull request that I considered my first real code and creative contribution2 got merged into master. I am super proud of it. https://maurobringolf.ch/2017/01/why-i-started-this-blog-and-how-i-built-my-first-website/ ↩ https://maurobringolf.ch/2017/07/open-source-9-steps-to-my-first-feature-contribution-in-babel/ ↩Learning sparse matrix storage formats by building a React app2018-01-09T00:00:00+00:002018-01-09T00:00:00+00:00/2018/01/learning-sparse-matrix-storage-formats-by-building-a-react-app<p>The title pretty much says it:
I learned about storage formats for sparse matrices by building a small app to play around with them.
You can enter numbers into a matrix and see the corresponding representation
as dense array, triplet format, compressed row storage and compressed column storage
(hopefully more in the future).
If you don’t know what these words mean, here is the place to find out:</p>
<p><strong>App:</strong> <a href="https://maurobringolf.github.io/sparse-matrix-storage-formats">maurobringolf.github.io/sparse-matrix-storage-formats</a></p>
<p><strong>Repository:</strong> <a href="https://github.com/maurobringolf/sparse-matrix-storage-formats">github.com/maurobringolf/sparse-matrix-storage-formats</a></p>
<p>It took me one day to build this mainly because of some awesome tools.
I really enjoyed the development process with create-react-app<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup>
and testing with Jest.
A couple remarks:</p>
<ul>
<li>
<p>I spent the majority of time writing, fixing and refactoring tests.
The code is easy to test because each format is essentially a mathematical function that needs to be implemented.
But I changed the underlying data representation of the output halfway through,
so I had to change all existing tests up to that point.
Defining all functions and data representations in the beginning could possibly have prevented this,
but it was not too bad after all.</p>
</li>
<li>
<p>I spent zero time thinking about Babel or Webpack.</p>
</li>
<li>
<p>I spent zero time formatting code because I used Prettier<sup id="fnref:3"><a href="#fn:3" class="footnote">2</a></sup> for all source files including CSS and Markdown.</p>
</li>
<li>
<p>Deployment to Github pages literally took no longer than 2 minutes to setup.
Now I simply type <code class="highlighter-rouge">npm version patch|minor|major</code> and the code is versioned,
tested, built and deployed.
I simply followed <a href="https://github.com/facebookincubator/create-react-app/blob/master/packages/react-scripts/template/README.md#github-pages">this guide</a>.</p>
</li>
</ul>
<hr />
<div class="footnotes">
<ol>
<li id="fn:1">
<p><a href="https://github.com/facebookincubator/create-react-app">https://github.com/facebookincubator/create-react-app</a> <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p><a href="https://github.com/prettier/prettier">https://github.com/prettier/prettier</a> <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>The title pretty much says it: I learned about storage formats for sparse matrices by building a small app to play around with them. You can enter numbers into a matrix and see the corresponding representation as dense array, triplet format, compressed row storage and compressed column storage (hopefully more in the future). If you don’t know what these words mean, here is the place to find out: App: maurobringolf.github.io/sparse-matrix-storage-formatsJS Numbers: There are no integers, but how many?2018-01-03T00:00:00+00:002018-01-03T00:00:00+00:00/2018/01/js-numbers-there-are-no-integers-but-how-many<p>This is the first post of what I hope to become series about numbers in JavaScript.
It is supposed to be a summary of my findings for the open source project <strong>xtuc/js-webassembly-interpreter</strong><sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup>.
JavaScript is dynamically typed which means variables are not bound to contain only values of one type.
This is in contrast to statically typed languages like C, C++ or Java.
One consequence is that JavaScript does not distinguish between different number types such as <code class="highlighter-rouge">int</code>, <code class="highlighter-rouge">double</code> or <code class="highlighter-rouge">long</code>.
Everything is abstracted as a <code class="highlighter-rouge">number</code>:</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// So simple, I love it</span>
<span class="k">typeof</span> <span class="mi">7</span> <span class="c1">// "number"</span>
<span class="k">typeof</span> <span class="p">.</span><span class="mi">3</span><span class="nx">e</span><span class="o">-</span><span class="mi">10</span> <span class="c1">// "number"</span>
<span class="k">typeof</span> <span class="mh">0x234</span> <span class="c1">// "number"</span>
<span class="c1">// Okay I can live with this</span>
<span class="k">typeof</span> <span class="kc">Infinity</span> <span class="c1">// "number"</span>
<span class="c1">// Now wtf is this</span>
<span class="k">typeof</span> <span class="kc">NaN</span> <span class="c1">// "number"</span>
</code></pre></div></div>
<p>This post is not about the <code class="highlighter-rouge">typeof</code> operator though.
I think it makes perfect sense that “Not-a-Number” is a number and we might see why in a later post.
The takeway here is that all these values are of the same primitive type.</p>
<p>To summarize:
<strong>In JavaScript, all numerical values are of the same type called “number”.</strong>
I am trying to explain quite a few things in this article from the ground up.
Depending on what you already know you might want to skip some parts:</p>
<ul>
<li><a href="#integers-and-floats-in-bits">Integers and floats in bits</a></li>
<li><a href="#js-integers-are-floats">JavaScript integers are floating point numbers</a></li>
<li><a href="#floating-point-representation">How are floating point numbers represented?</a></li>
<li><a href="#integers-in-floating-point">integers exist in 64-bit floating point?</a></li>
<li><a href="#not-all-is-lost">Not all is lost</a></li>
</ul>
<h2 id="integers-and-floats-in-bits">Integers and floats in bits</h2>
<p>As we know, JavaScript and other programming languages are high-level abstractions for bits changing inside computers.
At some point in the chain of abstractions,
a number has to become a sequence of bits.
One method that you might know is the binary representation of natural numbers.
Here is how a sequence of bits can be interpreted as a natural number with that method:</p>
<script type="math/tex; mode=display">1011_2 = 1 * 8 + 0 * 4 + 1 * 2 + 1 * 1 = 8 + 2 + 1 = 11_{10}</script>
<p>I used subscripts to indicate that the result is to be read as the number eleven (base 10),
but the input as a binary string.
As you can see, each bit can add a power of two corresponding to its position in the bit string.
If you find this confusing, think about how we write down numbers and determine their values.
You will see that you do this everyday but with powers of ten instead of two.</p>
<p>The representation above is simple and unique for all natural numbers.
That is great, but we might want negative numbers too.
The simplest solution would be to store one additional bit and let it decide about the sign of the value.
That is called sign-magnitude representation and can be done,
but it turns out that it makes hardware operating on those bit strings unnecessarily complicated.
That is why a different representation called two’s complement is used.
If you’re interested in how that works,
I have written about it before<sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup> but its details are not necessary to understand integers in JS.</p>
<p>Assume we have a way to represent all integers, positive and negative.
That is still not good enough, because the real world deals with real numbers (not really, if you’re into math).
We definitely want to have values like <code class="highlighter-rouge">94.53471</code> to work with data in any meaningful way.
These numbers are called <em>floating point</em> numbers.
Two’s complement cannot handle these, but there is a widely accepted standard called IEEE754
which describes how to represent floating point numbers with bit strings.</p>
<p>To summarize:
<strong>A sequence of bits is not a number.
Different ways of interpreting the same bit string can yield different values.
The rules of interpretation are defined by the type associated to the value.
Conversely, the same value can be represented by a different bit string depending on its type.</strong></p>
<h2 id="js-integers-are-floats">JavaScript integers are floating point numbers</h2>
<p>If you put together the two main points from above you might see where this is going:
JavaScript uses the same representation to encode all numerical values.
Here is an excerpt from the EcmaScript specification<sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup>:</p>
<blockquote>
<p>“The Number type has exactly <script type="math/tex">18437736874454810627</script> (that is,<script type="math/tex">2^{64}-2^{53}+3</script>) values,
representing the double-precision 64-bit format IEEE 754-2008 values
as specified in the IEEE Standard for Binary Floating-Point Arithmetic.”</p>
</blockquote>
<p>Your reaction to this statement depends mainly on your background:
If you come from a place where bits and bytes are too close to the metal for you to be interesting,
then you might consider this fact as perfectly reasonable.
Floating point can do everything we need,
so why not have a nice abstraction for it and never worry about tricky conversion rules between number types.
However, if you program in C or other languages that sit just above the assembly level
and deal more directly with memory you probably think that this is just crazy.
I personally do not think that this design is particularly good or bad,
but I want to understand its consequences.</p>
<p>Of course, this does not mean that you cannot use integers in JavaScript.
It just means that they are represented using the floating point standard.
A natural question now is:</p>
<p><em>What range of integers can be encoded using 64-bit floating point representation?</em></p>
<p>This range is clearly not empty since there are integer values in JavaScript.
But it is also clear that there are less integers than with a 64-bit two’s complement representation
which is used for large integers in C for example.
One bit string can only represent one number after all.
So the fact that <code class="highlighter-rouge">0.5</code> exists in JS means that there is at least one 64-bit integer that is missing in JS.
Don’t think about this too hard,
we will count all quantities properly in the next paragraph.</p>
<h2 id="floating-point-representation">How are floating point numbers represented?</h2>
<p>In order to understand what integers exist in floating point,
we need a clear understanding of how floating point numbers are encoded.
Once we know the meaning of each of the 64 bits,
we can look at which of these combinations represent integers.
A floating point number is represented by three parts:</p>
<ul>
<li>A <strong>sign</strong> bit ( s ) indicating whether the number is positive or negative.</li>
<li>A number ( m ) between ( 1 ) and ( 2 ) called <strong>mantissa</strong> or <strong>significand</strong>.</li>
<li>An integer ( e ) called the <strong>exponent</strong>.</li>
</ul>
<p>The idea behind these numbers is easy to explain:
The mantissa <script type="math/tex">m</script> is some decimal, i.e. <script type="math/tex">1.625</script> and the exponent <script type="math/tex">e</script>
moves the decimal point to the left or right in that number.
The sign bit can be used to add a minus sign in front of it.
The advantage of this representation is that it allows us to go from very small to very large values without loosing precision.
If you want to dig further into this,
I suggest you compare this representation to fixed point arithmetic<sup id="fnref:4"><a href="#fn:4" class="footnote">4</a></sup>.</p>
<p>Now that we understand the idea it is time to look at it bit for bit.
Above I cheated by saying how we interpret the three numbers <script type="math/tex">s,m,e</script> but not how they are encoded in bits.
According to the standard,
one bit is used for <script type="math/tex">s</script>, <script type="math/tex">11</script> bits are for <script type="math/tex">e</script> and <script type="math/tex">52</script> bits for <script type="math/tex">m</script> which adds up to the total 64 bits we want.
The sign bit is the easiest, <script type="math/tex">0</script> means positive and <script type="math/tex">1</script> means negative.
The exponent is an integer so we can use 11-bit two’s complement to encode it as bit string<sup id="fnref:5"><a href="#fn:5" class="footnote">5</a></sup>.
The mantissa is between <script type="math/tex">1</script> and <script type="math/tex">2</script>,
so we use a simple trick to store it:
Ignore the leading <script type="math/tex">1.</script> part and just store the rest as a binary fraction.
A binary fraction works just like decimal fraction:</p>
<script type="math/tex; mode=display">\begin{align*}
0.625_{10} = \frac{6}{10} + \frac{2}{100} + \frac{5}{1000} = \frac{625}{1000} = \frac{5}{8} \\
0.101_2 = \frac{1}{2} + \frac{0}{4} + \frac{1}{8} = \frac{5}{8}
\end{align*}</script>
<p>Again, note how a sequence of bits is not a number but can be interpreted as different numbers.
The exponent moves the point in the binary fraction, not the decimal one which is different.
So if we have the mantissa as above and exponent <script type="math/tex">3</script>,
the represented value is <script type="math/tex">1101.0_2 = 13_{10}</script>.
Positive exponents shift the point to the right and negative to the left.
In case you are wondering how the value <script type="math/tex">0.625</script> relates to 13:
With the implicit leading <script type="math/tex">1</script> this value represents the mantissa <script type="math/tex">1 + 0.625 = \frac{13}{8}</script>.
Shifting the point 3 places to the right is the same as multiplying by <script type="math/tex">2^3</script>,
yielding a result of <script type="math/tex">\frac{13}{8} * 8 = 13</script>.</p>
<h2 id="integers-in-floating-point">What integers exist in 64-bit floating point?</h2>
<p>So we found the representation of the integer <script type="math/tex">13</script>.
What is the condition for the result to become an integer?
Any binary digit after the fractional point has to be zero.
As we have seen the binary representation of the mantissa always starts with <script type="math/tex">1,\dots</script> followed by <script type="math/tex">52</script> digits.
So if the binary representation of a natural number <script type="math/tex">1b_2b_1b_0</script> is of length <script type="math/tex">4</script>
we can represent that number in floating point by setting the mantissa bits to <script type="math/tex">b_2b_1b_0 0 \dots 0</script>
and the exponent to be <script type="math/tex">3</script>.
This is exactly what we did for <script type="math/tex">13</script> above.
The exact same procedure works as long as we manage to shift all digits to the left of the decimal point,
so this works for natural numbers whose binary representations are up to <script type="math/tex">53</script> digits long.</p>
<p>The maximum integer in 53-bit binary is <script type="math/tex">2^{53} - 1</script>
so with the method above including the sign bit we can represent all integers <script type="math/tex">- (2^{53} - 1), \dots, -1, 0, 1, \dots, 2^{53} -1</script>.
<strong>This includes the range of 32-bit signed integers</strong> which goes from <script type="math/tex">- (2^{31} , \dots, -1, 0, 1, \dots, 2^{31} - 1)</script>.
Good news, it means that the range is not smaller than working with <code class="highlighter-rouge">int</code>’s in C for example.
If you have a feeling for exponentials or powers of two,
you can see that 64-bit floating point in fact has a lot more integers than 32-bit integers.
But then it also has a considerably smaller range of integers than 64-bit integer representation<sup id="fnref:6"><a href="#fn:6" class="footnote">6</a></sup>.
At the time of writing, there exists a stage 3 proposal to add 64-bit integers to JavaScript<sup id="fnref:7"><a href="#fn:7" class="footnote">7</a></sup>.</p>
<h2 id="not-all-is-lost">Not all is lost</h2>
<p>To finish the first article of this series,
I would like to mention that everything said so far is about what the language specification says.
JavaScript engines are free to use whatever representation of numbers they want to,
as long as they make it look like they use what is defined in the specification.
Actually, if you read the quote about the number type from the specification carefully
it never says what representation an engine should use.
It only says that its values and behavior must be as if they were IEEE 754-2008 64-bit floating point numbers.
To make the final statement as paradox as its title:
Even though JS does not have an integer type, an engine might still use integer types to represent numbers.</p>
<hr />
<div class="footnotes">
<ol>
<li id="fn:1">
<p><a href="https://github.com/xtuc/js-webassembly-interpreter">https://github.com/xtuc/js-webassembly-interpreter</a> <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p><a href="https://maurobringolf.ch/2017/05/a-formal-approach-to-twos-complement-binary-representation-of-integers/">https://maurobringolf.ch/2017/05/a-formal-approach-to-twos-complement-binary-representation-of-integers/</a> <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p><a href="https://www.ecma-international.org/ecma-262/8.0/index.html#sec-ecmascript-language-types-number-type">https://www.ecma-international.org/ecma-262/8.0/index.html#sec-ecmascript-language-types-number-type</a> <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p><a href="https://en.wikipedia.org/wiki/Fixed-point_arithmetic">https://en.wikipedia.org/wiki/Fixed-point_arithmetic</a> <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
<li id="fn:5">
<p>In reality a mechanism called <a href="https://en.wikipedia.org/wiki/IEEE_754#Representation_and_encoding_in_memory">bias</a> is used instead of two’s complement. It does not matter for understanding the format though. <a href="#fnref:5" class="reversefootnote">↩</a></p>
</li>
<li id="fn:6">
<p>I have not shown all integers here. For example <script type="math/tex">2^{100}</script> can be represented by setting the mantissa to <script type="math/tex">1</script> and the exponent to <script type="math/tex">100</script>. But there are “wholes” once you go past the range described so these values are less useful to work with. <a href="#fnref:6" class="reversefootnote">↩</a></p>
</li>
<li id="fn:7">
<p><a href="https://github.com/tc39/proposal-bigint">https://github.com/tc39/proposal-bigint</a> <a href="#fnref:7" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>This is the first post of what I hope to become series about numbers in JavaScript. It is supposed to be a summary of my findings for the open source project xtuc/js-webassembly-interpreter1. JavaScript is dynamically typed which means variables are not bound to contain only values of one type. This is in contrast to statically typed languages like C, C++ or Java. One consequence is that JavaScript does not distinguish between different number types such as int, double or long. Everything is abstracted as a number: https://github.com/xtuc/js-webassembly-interpreter ↩