Kompilatorer och interpretatorer: Lecture 13

Note: This is an outline of what I intend to say on the lecture. It is not a definition of the course content, and it does not replace the textbook.

Today: Optimization.
ASU chapter 10. (KP chapter 6.)

10. Code Optimization

Optimization: faster, smaller. (Not "optimal", just "better".)

"Hand optimization", example:

  for (i = 0; i < n; ++i) {
    do_something(a[i]);
  }

Equivalent to:

  i = 0;
  while (i < n) {
    do_something(a[i]);
    ++i;
  }

Can be "optimized" to:

  p = &a[0];
  p_after = &a[n];
  while (p != p_after) {
    do_something(*p);
    ++p;
  }

Probably no effect, or even slower. Better handled by the compiler!

Two rules about optimization by hand:

Don't do it. (Usually not needed. If it is needed, leave it to the compiler.)
Don't do it yet. ("90/10 rule". Profile first!)

Types of optimization:

Algorithms and data structures. (Ex: Change sorting algorithm, or replace a linked list with a hash table.) Best gains (years to seconds)! Hard for the compiler. But: SQL!
"Low-level" optimzation. (As above.) Better done by the compiler!
Machine-dependent optimizations. (Register allocation, instruction choice, as in chapter 9. Instruction reordering to improve pipe-lining, etc.)

Peep-hole optimization (ASU 9.9): Simple transformations of the generated assembly (or machine) code. Ex:

MOV R0, a
MOV a, R0

can be changed to

MOV R0, a

10.1 Introduction

A quicksort function (adapted from ASU fig 10.2):

void quicksort(int m, int n) {
  int i, j;
  int v, x;
  if (n <= m)
    return;
  i = m-1; j = n; v = a[n];
  while (1) {
    do
      i = i + 1;
    while (v > a[i]);
    do
      j = j - 1;
    while (a[j] > v);
    if (i>=j)
      break;
  }
  x = a[i]; a[i] = a[n]; a[j] = x;
  quicksort(m, j);
  quicksort(i+1, n);
}

Some optimizations are not possible on the source level.
Example in Pascal: a[i]
Three-address code: t₁ = 4*i; t₂ = a[t₁];
A Pascal compiler (and a C programmer!) can replace some of the 4*i calculations.

ASU fig. 10.4, three-address code for a part of the quicksort function:

30 three-address statements

Steps for the optimizer:

Control flow analysis: basic blocks
Data flow analysis.
Transformations.

ASU fig. 10.5, basic blocks and flow graph for the quicksort function:

Six basic blocks in a flow graph

Three loops:

B₂
B₃
B₂, B₃, B₄, and B₅

10.2 The principal sources of optimization

"Some of the most useful code-improving transformations".
Local transformation = inside a single basic block
Global transformation = several blocks (but inside a single procedure)

Function-preserving transformations (ASU p. 592)

From ASU fig. 10.6, eliminating common subexpressions inside a basic block:

B₅:

t₆ := 4*i
x := a[t₆]
t₇ := 4*i
t₈ := 4*j
t₉ := a[t₈]
a[t₇] := t₉
t₁₀ := 4*j
a[t₁₀] := x
goto B₂

can be changed to

t₆ := 4*i
x := a[t₆]
t₈ := 4*j
t₉ := a[t₈]
a[t₆] := t₉
a[t₈] := x
goto B₂

(Remove repeat calculations, use t₆ instead of t₇, t₈ instead of t₁₀.)

Removing (non-local) common subexpressions (ASU p. 592-594)

Globally, an expression E is a common subexpression if E was previously computed, and the values of the variables in E haven't changed since then.

From fig 10.5/10.6:
(We can remove 4*i, 4*j, and 4*n completely from B₆!)
Remove 4*i and 4*j completely from B₅!

B₅:

t₆ := 4*i
x := a[t₆]
t₈ := 4*j
t₉ := a[t₈]
a[t₆] := t₉
a[t₈] := x
goto B₂

can be changed to

x := a[t₂]
t₉ := a[t₄]
a[t₂] := t₉
a[t₄] := x
goto B₂

x := a[t₂]
t₉ := a[t₄]
a[t₂] := t₉
a[t₄] := x
goto B₂

can then be changed to

x := t₃
t₉ := a[t₄]
a[t₂] := t₉
a[t₄] := x
goto B₂

Note: a hasn't changed, so a[t₄] still in t₅ from B₃

x := t₃
t₉ := a[t₄]
a[t₂] := t₉
a[t₄] := x
goto B₂

can then be changed to

x := t₃
a[t₂] := t₅
a[t₄] := x
goto B₂

ASU fig. 10.7, after eliminating (global) common subexpressions:

Basic blocks B5 and B6 have shrunk

Copy propagation (ASU p. 594-595)

Instead of:
copy = original; ... copy ...;
Always try to use:
copy = original; ... original ...;
(We may be able to eliminate the variable copy altogether!)

B₅:

x := t₃
a[t₂] := t₅
a[t₄] := x
goto B₂

can be changed to

x := t₃
a[t₂] := t₅
a[t₄] := t₃
goto B₂

Elimination of dead code (ASU p. 595)

Dead variables will never be used again.
Dead code (or useless code) computes values that will never be used,
Dead code can also mean code that can never be reached:
debug = 0; ... if (debug) printf(...);

B₅:

x := t₃
a[t₂] := t₅
a[t₄] := t₃
goto B₂

but x is dead, so:

a[t₂] := t₅
a[t₄] := t₃
goto B₂

Loop optimizations (ASU pp. 596-598)

Move code
Eliminate induction variables, that is, extra "counter variabless"
Reduce strength of operations, such as + instead of *

Examples of code motion

while (i < limit - 2)
  a[i] = x + y;

The expressions limit - 2 and x + y are loop-invariant.

t1 = limit - 2;
t2 = x + y;
while (i < t)
  a[i] = t2;

The next example is harder for the compiler, since strlen is just another function. (Or is it?)

for (i = 0; i < strlen(s1); ++i)
  s2[i] = s1[i];

n = strlen(s1);
for (i = 0; i < n; ++i)
  s2[i] = s1[i];

Example of elimination of induction variables

i = 0;
j = 1;
while (a1[i] != 0) {
  a2[j] = a1[i];
  ++i;
  ++j;
}

Both i and j are induction variables ("loop counters").

i = 0;
while (a1[i] != 0) {
  a2[i + 1] = a1[i];
  ++i;
}

Example of both induction-variable elimination and reduction in strength

ASU fig 10.9, strength reduction of 4 * j in basic block B₃.
We know that t₄ == 4 * j. Maintain this relationship!

t4 - 4 instead of 4 * j

Then a similar strength reduction of 4 * i in basic block B₂.
We know that t₂ == 4 * i. Maintain this relationship!

Then, i and j are used only in the test int B₄.
The test i >= j can be changed to 4 * t₂ >= 4 * t₄ (which is equivalent to t₂ >= t₄).
i and j become dead!

ASU fig 10.10, after eliminating induction variables i and j:

After eliminating i and j

Loop unrolling

for (i = 0; i < 20; ++i)
  for (j = 0; j < 2; ++j)
    a[i][j] = i + 2 * j;

may be transformed by unrolling the inner loop:

for (i = 0; i < 20; ++i) {
    a[i][0] = i;
    a[i][1] = i + 2;
}

10.3 - 10.12

Skip.

10.13 Symbolic debugging of optimized code

int a;
...
void f(string a) {
  ...
  while (1) {
    int a;
    ...        <-- In the debugger: print a
  }
}

What is needed, for symbolic debugging in general?

(Parts) of the symbol table from compilation: the lexeme a, the type, the location in memory
Scope information. Which a do we mean?
Where are we, in the program? (So we can apply the scope information.)
Some sort of mapping between source-language statements and machine code, so we know which source statement we are executing.

Why debug optimized code? Why not debug unoptimized code, and only turn on the debugger when the program finally works?

A program may work when unoptimized, but not when optimized. For example, the C standard specifies undefined behaviour in certain cases. such as:

char s[10];
...
s[14] = 'x';

The behaviour depends entirely on what happens to be stored at the memory location 5 bytes after the end of s: unused padding, a variable, or the return address in an activation record? This can be different with or without optimization, since optimization, for example, can eliminate variables.

Deducing values of variables in basic blocks

When the user wants the debugger to show the current value of a:

The variable a may have been eliminated during optimization!
Code may have been moved by the optimizer, for example an assignment to a, so a's current value may be inconsistent with the source program!