Performance optimization is a topic I do not have a lot of experience with yet, but last week I encountered a couple of insightful examples that I would like to highlight. These examples show how sometimes an optimization can look obvious to the programmer but impossible to the compiler. When we write code in high level languages (read: not assembly) the compiler or interpreter goes to great lengths in order to make our code run fast. However, sometimes “obvious” optimizations are left out. This is not because compilers are not smart enough, but because the optimizations are only obvious to the programmer who makes additional assumptions about the program. And it turns out that these assumptions are not visible or verifiable by the compiler.

Removing redundant function calls

void some_func(char *s) {
	int i;
	for (i = 0; i < strlen(s); i++)
		// Do something
	}
}

This is a classic error. The function takes a string argument and loops over it. The function strlen returns the length of the string giving us the correct bound for the loop. But the semantics of the for-loop require the bounds check to be evaluated before each iteration. Therefore this code asks the function strlen to be called during every single one of these checks. You might think that the compiler is blind if it does not see and remove this redundancy by computing the length only once before the loop like this:

void some_func(char *s) {
	int i;
	int length = strlen(s);
	for (i = 0; i < length; i++)
		// Do something
	}
}

Here comes the twist: These two programs are not necessarily equivalent and it is impossible for the compiler to check if they are. Here are some reasonable assumptions that you might make when writing this code:

  1. strlen does not modify its argument.
  2. strlen will always return the same value for the same string.
  3. strlen does not have any side effects.
  4. The code inside one loop iteration does not change the length of the string s.

Assumptions 1-3 are definitely reasonable if not obvious, right? After all, the function simply counts the number of characters in the string and returns this number. And number four is even more obvious, since we are writing this code ourselves! But for the compiler these things are not obvious and even worse, not verifiable. The compiler has to adhere to semantics defined by the language specification in all possible cases. Since it is impossible to check the assumptions above by just static source code analysis and the language specification allows violating any of them, the compiler cannot perform the optimization shown above. It cannot prove that the resulting program is equivalent to the original one, because there are cases where it is not.

But we as programmers know the loop code and understand that strlen does not violate any of the given assumptions. Therefore we can and must perform the optimization ourselves and cannot rely on the compiler to do it for us, although it is obvious.

Pointer aliasing

Here is another example where you probably make assumptions that are not generally true:

void f( int *x, int *y ) {
	*x += *y;
	*x += *y;
}

It looks like this function takes two pointers to integers and adds twice the value at y to the value at x. We certainly do not need two operations to perform this calculation and can expect the compiler to optimize it, right? Maybe like this:

void f( int *x, int *y ) {
	*x += 2 * (*y);
}

Plot twist: These two implementations of f do not compute the same function! Consider the case where the two pointers reference the same memory location (x == y). The original implementation will result in the value at x being quadrupled (4*x), while the “optimized” implementation only triples it (3*x). Again, if we know that this function is only going to be called with distinct memory locations we can and must perform the optimization ourselves.

jQuery selector caching

A few days after I encountered these examples in a lecture, I realized that something very similar is happening in JavaScript code that uses jQuery. jQuery is a JavaScript library that provides an abstraction layer for interacting with the document object model (DOM) in browsers. Or in simpler terms: It has simple functions to interact with HTML from a JS script. Let’s say we use a jQuery selector to grab the value of some input field by its ID and then later write back to it:

// At some point in time
var value = jQuery('.my-input-field').val()

// At another point in time
jQuery('.my-input-field').val('new value')

Do you see the problem? The semantics of jQuery selectors require all matching elements to be considered. This set might have changed between the two function calls, so it has to search the DOM twice in order to be correct! Therefore it cannot cache the selector for us, but we can do it ourselves if we know the set of matching elements remains the same:

// At some point in time
var inputField = jQuery('.my-input-field')
var value = inputField.val()

// At another point in time
inputField.val('new value')

The fundamental problem I am trying to highlight is the same: Assumptions that are obvious for the programmer might be difficult or impossible to be verified automatically. In some cases we can lose performance if these assumptions are not properly encoded into our program.