Maskinoberoende optimering. Intermediärkod. Treadresskod.

Kursen Kompilatorer och interpretatorer | Föreläsningar: 1 2 3 4 5 6 7 8 9 10 11 12

Det här är ungefär vad jag tänker säga på föreläsningen. Använd det för förberedelser, repetition och ledning. Det är inte en definition av kursinnehållet, och det ersätter inte kursboken.

Idag: Maskinoberoende optimering. Intermediärkod. Treadresskod.

ALSU-07 avsnitt 6.1, 6.2, 6.4, 6.6
ALSU-07 avsnitt 8.4, 8.5, 9.1
ALSU-07 avsnitt 12.1-12.2 (översiktligt)
(ASU-86 avsnitt 8.1-8.3, 9.4, kapitel 10)
(KP kapitel 5 och 6)

(Det här hinner vi nog inte med på bara en föreläsning.)

Kodoptimering ("Code Optimization")

Optimering: ett program eller programavsnitt skrivs om så det blir mindre och snabbare (antingen en av dessa eller båda). Exempel: Byt ut 1 * x mot x eller 2 * x + 1 * x mot 3 * x.

Man menar egentligen inte "optimalt" (bästa möjliga), utan bara "bättre".

Det heter "optimization" på engelska, och "optimering" på svenska. Inte "optimisering"!

Normalt är det optimeringsfasen (-faserna) i kompilatorn som gör optimeringen, men det finns också "handoptimering", där programmeraren själv ändrar programmet. Ett exempel: i = i * 2 kan ändras till i = i << 2. Bitskiftning är en enklare operation än multiplikation, och kan på en del processorer vara snabbare.

Ett annat exempel:

    for (i = 0; i < n; ++i) {
        do_something(a[i]);
    }

Ekvivalent med:

    i = 0;
    while (i < n) {
        do_something(a[i]);
        ++i;
    }

Kan "optimeras" till att stega fram en pekare, och använda den som loopvariabel:

    p = &a[0];
    p_after = &a[n];
    while (p != p_after) {
        do_something(*p);
        ++p;
    }

Men moderna kompilatorer kan sånt här! Det blir förmodligen inte snabbare, kanske till och med långsammare, och koden blir svårläst och skör. Görs bättre av den automatiska optimeraren i kompilatorn!

Two rules about optimization by hand:

Don't do it. (Usually not needed. If it is needed, leave it to the compiler.)
Only for experts: Don't do it yet. ("90/10 rule". Profile first!)

Types of optimization:

Algorithms and data structures. (Ex: Change sorting algorithm, or replace a linked list with a hash table.) Best gains (from years to seconds)! Hard for the compiler, so the programmer must do it. But: SQL!
"Low-level" optimzation. (As above.) Better done by the compiler! (Usually. Sometimes low-level hand optimization by the programmer is both effective and required.)
Machine-dependent optimizations. (Register allocation, instruction choice, as in ALSU-07 chapter 8. Instruction reordering to improve pipe-lining, etc.)

Machine-dependent optimization is sometimes done using Peep-hole optimization (ALSU-07 8.7, ASU-86 9.9): Simple transformations of the generated assembly (or machine) code. Ex:

MOV R0, a
MOV a, R0

can be changed to

MOV R0, a

But today: Automatic machine-independent optimization, on intermediate code (which is three-address code).

Mellankodsgenerering ("Intermediate-Code Generation")

(ALSU-07 kapitel 6, ASU-86 kapitel 8)

"Intermediate code" på engelsa. "Mellankod" eller ibland "intermediärkod" på svenska.

Why generate intermediate code? Why not do everything "in the Yacc grammar"? Some reasons:

Machine-independent optimizations on the intermediate code
Separating the phases (modularization is good!)
Separating front-end and back-end

Olika sorters mellankod

(ASU-86 avsnitt 8.1, ALSU-07 avsnitt 6.1-6.2)

Some ways to represent the program:

Trees (or DAGs)
Three-Address Code

Graphical representations: Trees (or DAGs)

As before. Just note two things more:

Postfix notation is a linearized representation of a syntax tree.
You don't need physical pointers to represent a tree!

Men: Postfixkod är svårjobbat om man ska optimera.
Exempel på infixkod: 1*(a+2)*b
I ett syntaxträd är det ganska enkelt att hitta, och optimera bort, multiplikationen med 1. (Görs i del B i labb 7.)
Postfixkoden: 1 a 2 + * b *
Svårt att hitta multiplikationen med 1 i postfixkoden! Svårt att ta bort den!

Treadresskod ("Three-Address Code")

(ASU-86 avsnitt 8.2, ALSU-07 avsnitt 6.2)

Example: x + y * z

temp₁ = y * z
temp₂ = x + temp₂

Treadresskod har (högst) tre adresser i varje instruktion:
var₁ = var₂ operation var₃

Note:

Compiler-generated temporary variables (such as temp₁)
Liknar processorns maskininstruktioner. Få processorer har en plus-gånger-instruktion, så man behöver två maskininstruktioner: multiplikationen först och additionen sen.
Simple => Good for optimization.
Easier to rearrange (and thus, optimize) than postfix ("stack machine") code

Treadresskod har högst tre adresser i varje instruktion. Fler typer:
var₁ = operation var₂
goto addr₁
if var₁ <= var₂ goto addr₃

Idea: Each internal node corresponds to a temp-variable!

Example: a = b * -c + b * -c (The tree in ASU-86 fig. 8.4a)

temp₁ = - c
temp₂ = b * temp₁
temp₃ = - c
temp₄ = b * temp₁
temp₅ = temp₂ + temp₃
a = temp₅

Types of three-address statements

Assignment: x = y op z (Ex: temp₅ = a + temp₄)
Assignment: x = op y (Ex: temp₃ = - temp₄)
Copy: x = y (Ex: a = temp₃)
Jump: goto L
Conditional jump: if x relop y goto L (Ex: if (temp₇ < temp₂) goto L7)
Procedure call: call p, n
Parameter for procedure call: param x
Return from procedure call: return y. Ex:
```
param temp₄
param a
param b
call f, 3
```
Indexed assignment: x = y[z] (ex: temp₅ [ temp₄ ] = temp₉)
Indexed assignment: x[y] = z
Pointer and address operations: x = &y
x = *y
*x = y

Syntax-directed translation into three-address code

(ALSU-07 avsnitt 6.4 och 6.6, ASU-86 avsnitt 8.3-8.5)

Synthesized attributes:
E.addr = the name of the temporary variable (kallades place i ASU-86)
E.code = the sequence of three-address statements that calculates the value (or they could be written to a file instead of stored in the attribute)

Production	Semantic rule
Start -> id = Expr	Start.code = Expr.code + [ id.addr ":=" Expr.addr ]
Expr -> Expr₁ + Expr₂	Expr.addr = make_new_temp(); Expr.code = Expr₁.code + Expr₂.code + [ Expr.addr = Expr₁.addr "+" Expr₂.addr; ]
Expr -> Expr₁ * Expr₂	Expr.addr = make_new_temp(); Expr.code = Expr₁.code + Expr₂.code + [ Expr.addr = Expr₁.addr "" Expr*₂.addr; ]
Expr -> - Expr₁	Expr.addr = make_new_temp(); Expr.code = Expr₁.code + [ Expr.addr = "-" Expr₁.addr; ]
Expr -> ( Expr₁ )	Expr.addr = Expr₁.addr; // No new temp! Expr.code = Expr₁.code;
Expr -> id	Expr.addr = id.addr; // No temp! Expr.code = ' '; // No code!

(Se tabellen i ALSU-07 fig. 6.19, eller i ASU-86 fig. 8.15.)

While statement (very similar to generating stack machine code, see föreläsning 9):

Production	Semantic rule
Stmt -> while ( Expr ) Stmt₁	Stmt.before = make_new_label(); Stmt.after = make_new_label(); Stmt.code = [ "label" Stmt.before ] + Expr.code + [ "if" Expr.addr "==" "0" "goto" Stmt.after; ] + Stmt₁.code + [ "goto" Stmt.before ] + [ "label" Stmt.after ];

(Se tabellen i ALSU-07 fig. 6.36, eller i ASU-86 fig. 8.23.)

Quadruples

A way to represent three-address code.

	Op	Arg1	Arg2	result
0	uminus	c		temp₁
1	*	b	temp₁	temp₂
2	uminus	c		temp₃
3	*	b	temp₃	temp₄
4	+	temp₂	temp₄	temp₅
5	:=	temp₅		a

Skip: Also "triples" (but they are hard to optimize) and "indirect triples" (as easy to optimize as quadruples, but more complicated).

Basic Blocks and Flow Graphs

(ALSU-07 avsnitt 8.4, ASU-86 avsnitt 9.4)

"Basic blocks" används för optimering.

Ett par termer:

Basic block
Flödesgraf ("flow graph")

Från KP sidan 119:

Basic blocks

Fler termer:

definiera ("define") = att sätta värdet på en variabel (jämför: odefinierat värde)
använda ("use") = att hämta värdet på en variabel
levande ("live") = kommer (kanske) att användas mer (jfr med skräpsamling)

Transformationer på basic blocks (kort, mer sen i exemplet):

Common subexpression elimination

a = b + c;
b = a - d;
c = b + c;
d = a - d;

->

a = b + c;
b = a - d;
c = b + c;
d = b;
Dead-code elimination

x = y + z;

->

if x is dead!
Algebraic transformations

a = a + 0;
b = b * 1;
c = d ** 2;

->

c = d * d;
Interchange of statements
Renaming temporary variables
normal-form block: never re-use temporary variables

Ännu fler termer:

Loop = den är "strongly connected" (alla noder i loopen går att nå från alla noder i loopen) samt att den har en enda ingång ("entry point")
Inre loop ("inner loop") = en loop utan andra loopar inuti

Quicksort-exemplet

En quicksort-funktion (ALSU-07 fig. 9.1, eller ASU-86 fig 10.2):

/* recursively sorts the array a, from a[m] to a[n] */
void quicksort(int m, int n) {
  int i, j;
  int v, x;
  if (n <= m)
    return;
  i = m - 1; j = n; v = a[n];
  while (1) {
    do
      i = i + 1;
    while (a[i] < v);
    do
      j = j - 1;
    while (a[j] > v);
    if (i >= j)
      break;
    x = a[i]; a[i] = a[j]; a[j] = x; /* swap */
  }
  x = a[i]; a[i] = a[n]; a[n] = x; /* swap */
  quicksort(m, j);
  quicksort(i + 1, n);
}

Some optimizations are not possible on the source level.
Example in Pascal: a[i]
Three-address code: t₁ = 4*i; t₂ = a[t₁];
A Pascal compiler (and a C programmer!) can replace some of the 4*i calculations.

Treadresskod för den fetmarkerade delen av quicksort-funktionen (ALSU fig. 9.2 eller ASU-86 fig. 10.4):

  (1) i = m - 1                        (16) t7 = 4 * i
  (2) j = n                            (17) t8 = 4 * j
  (3) t1 = 4 * n                       (18) t9 = a[t8]
  (4) v = a[t1]                        (19) a[t7] = t9
  (5) i = i + 1                        (20) t10 = 4 * j
  (6) t2 = 4 * i                       (21) a[t10] = x
  (7) t3 = a[t2]                       (22) goto (5)
  (8) if t3 < v goto (5)               (23) t11= 4 * i
  (9) j = j - 1                        (24) x = a[t11]
 (10) t4 = 4 * j                       (25) t12 = 4 * i
 (11) t5 = a[t4]                       (26) t13 = 4 * n
 (12) if t5 > v goto (9)               (27) t14 = a[t13]
 (13) if i >= j goto (23)              (28) a[t12] = t14
 (14) t6 = 4 * i                       (29) t15 = 4 * n
 (15) x = a[t6]                        (30) a[t15] = x

Steps for the optimizer:

Control flow analysis: basic blocks
Data flow analysis.
Transformations.

Basic blocks och flödesgraf för quicksort-funktionen (ALSU-07 fig. 9.3 eller ASU-86 fig. 10.5):

Six basic blocks in a flow graph

Three loops:

B₂
B₃
B₂, B₃, B₄, and B₅

The principal sources of optimization

(ALSU-07 avsnitt 9.1, ASU-86 avsnitt 10.2)

"Some of the most useful code-improving transformations".
Local transformation = inside a single basic block
Global transformation = several blocks (but inside a single procedure)

Semantics-Preserving Transformations

(ALSU-07 avsnitt 9-1, ASU-86 avsnitt 10.1 och 10.2)

Removing local common subexpressions
Removing non-local common subexpressions
Copy propagation
Dead-code elimination
...

Removing local common subexpressions (ALSU-07 sid. 588)

From ALSU-07 fig. 9.4 (ASU-86 fig. 10.6), eliminating common subexpressions inside a basic block:

B₅

t₆ := 4*i
x := a[t₆]
t₇ := 4*i
t₈ := 4*j
t₉ := a[t₈]
a[t₇] := t₉
t₁₀ := 4*j
a[t₁₀] := x
goto B₂

can be changed to

B₅

t₆ := 4*i
x := a[t₆]
t₈ := 4*j
t₉ := a[t₈]
a[t₆] := t₉
a[t₈] := x
goto B₂

(Remove repeat calculations, use t₆ instead of t₇, t₈ instead of t₁₀.)

Removing non-local common subexpressions (ALSU-07 9.1.4)

Globally, an expression E is a common subexpression if E was previously computed, and the values of the variables in E haven't changed since then.

From ALSU-07 fig 9.4, 9.5:
(We can remove 4*i, 4*j, and 4*n completely from B₆!)
Remove 4*i and 4*j completely from B₅!

B₅

t₆ := 4*i
x := a[t₆]
t₈ := 4*j
t₉ := a[t₈]
a[t₆] := t₉
a[t₈] := x
goto B₂

can be changed to

B₅

x := a[t₂]
t₉ := a[t₄]
a[t₂] := t₉
a[t₄] := x
goto B₂

B₅

x := a[t₂]
t₉ := a[t₄]
a[t₂] := t₉
a[t₄] := x
goto B₂

can then be changed to

B₅

x := t₃
t₉ := a[t₄]
a[t₂] := t₉
a[t₄] := x
goto B₂

Note: a hasn't changed, so a[t₄] still in t₅ from B₃

B₅

x := t₃
t₉ := a[t₄]
a[t₂] := t₉
a[t₄] := x
goto B₂

can then be changed to

B₅

x := t₃
a[t₂] := t₅
a[t₄] := x
goto B₂

Blocken B5 och B6 efter att vi eleminerat både lokala och globala gemensamma deluttryck (ALSU-07 fig. 9.5 eller ASU-86 fig. 10.7):

Basic blocks B5 and B6 have shrunk

Copy propagation (ALSU-07 9.1.5)

Instead of:
copy = original; ... copy ...;
Always try to use:
copy = original; ... original ...;
(We may be able to eliminate the variable copy altogether!)

B₅

x := t₃
a[t₂] := t₅
a[t₄] := x
goto B₂

can be changed to

B₅

x := t₃
a[t₂] := t₅
a[t₄] := t₃
goto B₂

Elimination of dead code (ALSU-07 9.1.6)

Dead variables will never be used again.
Dead code (or useless code) computes values that will never be used.
Dead code can also mean code that can never be reached:
debug = 0; ... if (debug) printf(...);

B₅

x := t₃
a[t₂] := t₅
a[t₄] := t₃
goto B₂

but x is dead, so:

B₅

a[t₂] := t₅
a[t₄] := t₃
goto B₂

Loop optimizations (ALSU-07 91.7-91.8)

Move code outside the loop (so it is executed only once)
Eliminate induction variables, that is, extra "counter variables"
Reduce strength of operations, such as + instead of *

Examples of code motion

(Shown here as C code, but could be done by the compiler on three-address code.)

while (i < limit - 2)
    a[i++] = x + y;

The expressions limit - 2 and x + y are loop-invariant.

t1 = limit - 2;
t2 = x + y;
while (i < t1)
  a[i++] = t2;

The next example is harder for the compiler, since strlen is just another function. (Or is it?)

for (i = 0; i < strlen(s1); ++i)
  s2[i] = s1[i];

n = strlen(s1);
for (i = 0; i < n; ++i)
  s2[i] = s1[i];

Example of elimination of induction variables

i = 0;
j = 1;
while (a1[i] != 0) {
  a2[j] = a1[i];
  ++i;
  ++j;
}

Both i and j are induction variables ("loop counters").

i = 0;
while (a1[i] != 0) {
  a2[i + 1] = a1[i];
  ++i;
}

Example of both induction-variable elimination and reduction in strength

Blocket B₃ (som utgör en inre loop), har j och t₄, som hela tiden stegas tillsammans så att t₄ == 4 * j. Bibehåll det sambandet!

B₃

j := j-1
t₄ := 4*j
t₅ := a[t₄]
if t₅ > v goto B₃

blir

B₃

j := j-1
t₄ := t₄-4
t₅ := a[t₄]
if t₅ > v goto B₃

Men det där är fel, för nu får t₄ inget startvärde. Men det kan vi fixa genom att peta in t₄ = 4*j i block B₁, som körs en enda gång (inte i B₂, som körs en massa massa gånger):

ALSU-07 fig. 9.9 (ASU-86 fig 10.9):

t4 - 4 instead of 4 * j

Then a similar strength reduction of 4 * i in basic block B₂.
We know that t₂ == 4 * i. Maintain this relationship!

Then, i and j are used only in the test int B₄.
The test i >= j can be changed to 4 * t₂ >= 4 * t₄ (which is equivalent to t₂ >= t₄).
i and j become dead!

Flödesgrafen efter att vi eliminerat induktionsvariablerna i och j (ASU-86 fig 10.10 eller ALSU-07 fig. 9.9):

After eliminating i and j

Exempel på en annan loop-optimering: Loop unrolling (ALSU-07 sid. 735)

(Shown here as C code, but could be done by the compiler on three-address code.)

for (i = 0; i < 20; ++i)
  for (j = 0; j < 2; ++j)
    a[i][j] = i + 2 * j;

may be transformed by unrolling the inner loop:

for (i = 0; i < 20; ++i) {
    a[i][0] = i;
    a[i][1] = i + 2;
}

or by unrolling the outer loop too:

    a[0][0] = 0;
    a[0][1] = 2;
    a[1][0] = 1;
    a[1][1] = 3;
    a[2][0] = 2;
    a[2][1] = 4;
    ....
    a[19][0] = 19;
    a[19][1] = 21;

Man behöver inte rulla ut alla varven, utan man kan rulla ut bara en del av dem.

Varning: cachen! En för stor loop kanske inte får plats i processorns instruktionscache.

Exempel på en annan viktig optimering: Eliminering av svans-rekursion

Beskrivs på källkodsnivå på sid 73 i ALSU-07 avsnitt 2.5.4, men görs automatiskt av moderna kompilatorer, till exempel gcc med optimeringsflaggan -O2.

void f(struct Node *p) {
    if (p == NULL)
        return;
    p->value++;
    f(p->next);
}

void f(struct Node *p) {
    while (p == NULL) {
        p->value++;
        p = p->next;
    }
}

9.2-9.10

Skip.

Symbolic debugging of optimized code

(Verkar inte stå i boken. Läs bara översiktligt, och förstå (a) varför man vill ha det, och (b) varför det ibland är krångligt.)

int a;
...
void f(string a) {
  ...
  while (1) {
    int a;
    ...        <-- In the debugger: print a
  }
}

What is needed, for symbolic debugging in general?

(Parts) of the symbol table from compilation: the lexeme a, the type, the location in memory
Scope information. Which a do we mean?
Where are we, in the program? (So we can apply the scope information.)
Some sort of mapping between source-language statements and machine code, so we know which source statement we are executing.

Why debug optimized code? Why not debug unoptimized code, and only turn on the debugger when the program finally works?

A program may work when unoptimized, but not when optimized. For example, the C standard specifies undefined behaviour in certain cases. such as:

char s[10];
...
s[14] = 'x';

The behaviour depends entirely on what happens to be stored at the memory location 5 bytes after the end of s: unused padding, a variable, or the return address in an activation record? This can be different with or without optimization, since optimization, for example, can eliminate variables.

An example program:

#include <stdlib.h>
#include <stdio.h>

int plus(int x, int y) {
    int a[2];
    int s;
    s = x - y;
    a[x] = x + y;
    return s;
}

int main(void) {
    int resultat;
    resultat = plus(-1, -2);
    printf("resultat = %d\n", resultat);
    return 0;
}

Running it without optimization, and then with optimization (the "-O" flag):

linux> gcc -Wall plus.c -o plus
plus.c: In function 'plus':
plus.c:5:9: warning: variable 'a' set but not used [-Wunused-but-set-variable]
     int a[2];
         ^
linux> ./plus
resultat = -3
linux> gcc -O -Wall plus.c -o plus
plus.c: In function 'plus':
plus.c:5:9: warning: variable 'a' set but not used [-Wunused-but-set-variable]
     int a[2];
         ^
linux> ./plus
resultat = 1
linux>

Excercise for the reader: What happened? Can you deduce something about how the compiler laid out the variables in the activation record for the function "plus"?

Trying to debug:

linux> gcc -g -O -Wall plus.c -o plus
plus.c: In function 'plus':
plus.c:5:9: warning: variable 'a' set but not used [-Wunused-but-set-variable]
     int a[2];
         ^
linux> ./plus
resultat = 1
linux> gdb plus

Some output from GDB removed

(gdb) break main
Breakpoint 1 at 0x400562: file plus.c, line 12.
(gdb) run 
Starting program: /home/padrone/tmp/14okt/plus 

Breakpoint 1, main () at plus.c:12
12      int main(void) {
(gdb) step
15          printf("resultat = %d\n", resultat);
(gdb) print resultat
$1 = <optimized out>
(gdb)

Deducing values of variables in basic blocks

When the user wants the debugger to show the current value of a:

The variable a may have been eliminated during optimization!
Code may have been moved by the optimizer, for example an assignment to a, so a's current value may be inconsistent with the source program!

ALSU-07 kapitel 12: Interprocedural Analysis

Det räcker med att känna till grunderna:

Skillnaden mellan intra-procedurell (engelska: intraprocedural) och inter-procedurell (engelska: interprocedural)
Vad kan man göra med interprocedurell analys, som man inte kan göra med intraprocedurell? (Ge exempel.)
Vad är en anropsgraf (engelska: call graph), och vad behöver man den till?
Vad innebär pointer aliasing?
Vad menas med procedur-inlining?

Kursen Kompilatorer och interpretatorer | Föreläsningar: 1 2 3 4 5 6 7 8 9 10 11 12

Thomas Padron-McCarthy (thomas.padron-mccarthy@oru.se) 14 oktober 2015