First impressions of peephole optimizations in LLVM
Compilers continue to fascinate me and I try to take every chance to get a glimpse inside their world. For a lot of things I still feel like I lack basic concepts and theory, but every now and then I read something and understand it well enough to be impressed.
Today I read about formally verified compiler optimizations^{1} and want to take notes on some things that I learned or impressed me. The authors have implemented a tool called Alive^{2} to formally check machine instruction level optimizations employed in LLVM^{3}. Broadly speaking, an optimization at this level replaces some instructions like additions and multiplications with more performant instructions that compute the same value. Their tool converts the replacement into a set of logical formulas that an existing tool can check. If an optimization is invalid the tool will directly produce a counter example which I think is pretty neat. Not only is it fascinating but this tool also helped find, fix and prevent various bugs in LLVM. It is really encouraging to see research results having such a direct practical impact.
In a sense, this post is the counterpart of an older one^{4} where I explored some things the compiler cannot do for you. In these cases, the compiler can perform optimizations for you. Here are some of my notes I took while reading the article:

(x ^ 1) + C == (C  1)  x
: Why is that? It is not explained in the paper, but I explained it to myself in the following way. Since1
is the bitvector of all1
s the result ofx ^ 1
is exactly the flipped version ofx
, call ity
. The two’s complement inverse ofx
andy
are related through the basic resultx == y + 1
^{5}. Therefore(x ^ 1) + C == y + C == (x  1) + C == (C  1)  x
. Okay but what did we win?C  1
is a compile time value and via constant folding the result will be inlined. The resulting code is a single subtraction instruction compared to the source code which consists of two instructions. 
a & (a  1) = 0
: When translating code transformations into formulas the tool also considers facts about inputs that some other part of LLVM computes. One of them is a boolean indicating whether an input is a power of two. The way they encode the fact thata
is a power of two isa & (a1) = 0
. The direction was easier to understand for me. Since is a sequence of1
s of length in binary^{5} and is a1
followed by zeros their bitwise&
is always zero. My rationale for the other direction is a bit less formal, but here is what I came up with: If the formula fora
is true, then adding1
to(a1)
must turn all its ones into zeros. This is only possible if it is a sequence of ones without any zeros in it. Thereforea  1
. 
Bitlevel counterexamples seem to be most intuitive with values between 48 bits. Alive produces counter examples for incorrect transformations which means that it prints bitvectors for inputs and results. From their experience counterexamples of length 4 to 8 are the easiest to understand, even though shorter ones of length 1 or 2 might exist. I thought this was an interesting observation and something to consider when coming up with a bit level counter example next time.
Studying this article made me think about DSLs (domain specific languages) a bit. Their power really shows here since I no almost nothing about LLVM and definitely nothing about its code, but am still able to understand the optimizations written in the Alive DSL. Second, this really encouraged me to study theory because results applied with a direct impact like this are amazing.

The article appeared in CACM and is a writeup in of this paper http://www.cs.utah.edu/~regehr/papers/pldi15.pdf. ↩

/2017/11/simpleexamplesofobviousoptimizationsacompilercannotperform/ ↩

/2017/05/aformalapproachtotwoscomplementbinaryrepresentationofintegers/ ↩ ↩^{2}