Compilers and Interpreters: Solutions to the exam 2024-10-31

Please note that these are suggested solutions. There can be other solutions that are also correct. Some of these solutions may be more comprehensive than required for full points on the exam, or they might only refer to where the answers can be found. In addition, it has occurred in the history of the world that teachers have made errors in the suggested solutions. I myself have, of course, never made any such errors, but others have. Does anyone really read introductions like this one? I don't know. It may be so. Rhubarb rhubarb rhubarb. I have heard that the extras on a movie set are told to say that to simulate background noise in a restaurant or similar place. Below are the suggested solutions.

Task 1 (6 p)

Here is a program segment written in a C-like language:

    a = b;
    if (c < d) {
        e = f * g / h + i + j;
    }
    else {
        if (k == l) {
            m = n;
        }
        else {
            o = p;
            q = r;
        }
        s = t / u / v;
        w = x;
    }

Translate the program segment to two of the following three types of representations.

a) an abstract syntax tree (by drawing it)

Answer:

An (abstract) syntax tree

The "K" nodes are connection nodes for lists of statements. Sometimes I have used ";" to mark these nodes.

b) postfix code for a stack machine

Answer:

        lvalue a
        rvalue b
        =
        rvalue c
        rvalue d
        <
        gofalse ELSE-START-1
        lvalue e
        rvalue f
        rvalue g
        *
        rvalue h
        /
        rvalue i
        +
        rvalue j
        +
        =
        goto AFTER-IF-1
label ELSE-START-1:
        rvalue k
        rvalue l
        ==
        gofalse ELSE-START-2
        lvalue m
        rvalue n
        =
        goto AFTER-IF-2
label ELSE-START-2:
        lvalue o
        rvalue p
        =
        lvalue q
        rvalue r
        =
label AFTER-IF-2:
        lvalue s
        rvalue t
        rvalue u
        /
        rvalue v
        /
        =
        lvalue w
        rvalue x
        =
label AFTER-IF-1:

c) three-address code

Answer:

        a = b
        if (c >= d) goto ELSE-START-1
        temp1 = f * g
        temp2 = temp1 / h
        temp3 = temp2 + i
        e = temp3 + j
        goto AFTER-IF-1
label ELSE-START-1:
        if (k != l) goto ELSE-START-2
        m = n
        goto AFTER-IF-2
label ELSE-START-2:
        o = p
        q = r
label AFTER-IF-2:
        temp4 = t / u
        s = temp4 / v
        w = x
label AFTER-IF-1:

Task 2 (3 p)

Here is a grammar. There are three terminals: a, b and c. There are three non-terminals: S, T and U. The start symbol is S.

S -> S T | ε
T -> a U | a b U
U -> c

There are two problems with this grammar that makes it unsuitable for use by an LL(1) parser. That is the type of predictive top-down parser with one-token lookahead that we have seen for example in the 2.9 program. Explain the problems!

Answer:

The first problem is that there is left recursion in the production S -> S T. When we enter the function S we will immediately call the same function S, without first consuming any tokens from the input. When we again enter the function S we will again immediately call the same function S, without first consuming any tokens from the input. And so on. We will have an infinite recursion in our parser.
The second problem is that there is a FIRST() conflict between the productions T -> a U and T -> a b U. When we inter the function T and see that the lookahead token is a, we can't decide which branch of the if statement to enter: the one that matches the right-hand-side a U, or the one that matches the right-hand-side a b U.

Task 3 (10 p)

Explain what the following things are, where they are used, and what they are used for:

a) activation record

Answer:

When a function is called during the execution of a program, we need somewhere to store that function's local variables, the place in the program to jump back to when the function returns, and some other data. This is stored in the activation record, which is an area in memory, usually on the call stack.

b) basic block

Answer:

A basic block is a sequence of instructions, for example in three-address code, where all the instructions will always be executed from start to finish. Thus there can be no jumps into the basic block, except to the first instruction, and no jumps out of the basic block, except from the last instruction. The basic blocks are used by the machine-independent code optimizer so it can re-use results that were calculated previously in the same block.

c) loop unrolling

Answer:

Loop unrolling is an optimization technique, used in the optimizer. It can sometimes replace a loop by several instances of the loop body, after one another. It makes the executable code longer, but since it doesn't need jumps and tests it can be faster.

d) reference counting

Answer:

Reference counting is a method to identify memory that is no longer in use, by keeping track of how many pointers point to each allocated memory area. Whenever we add, remove or change a pointer, we need to update the reference counters for the memory areas that are, or were, pointed to by those pointers. When the reference count for a memory area reaches zero, we know that there are no pointers to that area, so the area is unreachable and can be reclaimed.

Task 4 (4 p)

a) (2p) Which terminals are needed to write a grammar for the input language in the scenario?

Answer:

LEFTBRACE, RIGHTBRACE, MAP, AREA, CORNER, NORTH, EAST, SOUTH, WEST, NAME and COORDINATE

b) (2p) Out of these terminals, some will not have fixed lexemes. Write regular expressions for each such terminal.

Answer:

NAME: [A-Z][a-z]*
COORDINATE: [0-9]+\.[0-9]+

Task 5 (5 p)

Write a grammar for the input language. The start symbol should be input, which represents a complete input as described in the scenario above.

Answer:

input -> MAP NAME LEFTBRACE areas RIGHTBRACE
areas -> areas area | area
area -> AREA NAME LEFTBRACE corners RIGHTBRACE
corners -> corners corner | corner
corner -> CORNER COORDINATE direction COORDINATE direction
direction -> NORTH | EAST | SOUTH | WEST

(In reality the latitude can only have N and S, and the longitude can only have E and W, but the description in the scenario is not clear about that.)

Task 6 (4 p)

Given your grammar, draw a parse tree (also called a concrete syntax tree) for the following input:

map Kebnekaise {
    area Peak {
        corner 67.9187 N 18.5728 E
    }
}

Answer:

A parse tree

Task 7 (8 p)

Write a predictive recursive-descent parser for the input language, in a language that is at least similar to C, C++, C# or Java. You do not have to write exactly correct program code, but it should be clear which procedures exist, how they call each other, and what comparisons with token types are made. You can assume there is a function called scan, which returns the type of the next token, and a function called error, which you can call when something went wrong and which prints an error message and terminates the program.

Answer:

The grammar in task 5 above is left recursive. We need to eliminate the left recursion before we can write a top-down parser:

input -> map
map -> MAP NAME LEFTBRACE areas RIGHTBRACE
areas -> area moreareas
moreareas -> area moreareas | ε
area -> AREA NAME LEFTBRACE corners RIGHTBRACE
corners -> corner morecorners
morecorners -> corner morecorners | ε
corner -> CORNER COORDINATE direction COORDINATE direction
direction -> NORTH | EAST | SOUTH | WEST

For testing, you can download mapparser.c, plus a Lex scanner specification map.l, and a Bison version of the grammar map.y, and a Makefile.

int lookahead;

void match(int expected) {
    if (lookahead != expected)
        error();
    else
        lookahead = scan();
}

void input(void) {
    match(MAP); match(NAME); match(LEFTBRACE); areas(); match(RIGHTBRACE);
}

void areas(void) {
    area(); moreareas();
}

void moreareas(void) {
    if (lookahead == AREA) {
        area(); moreareas();
    }
    else {
        // Empty: Any lookahead matches, since you can always fit nothing anywhere
    }
}

void area(void) {
    match(AREA); match(NAME); match(LEFTBRACE); corners(); match(RIGHTBRACE);
}

void corners(void) {
    corner(); morecorners();
}

void morecorners(void) {
    if (lookahead == CORNER) {
        corner(); morecorners();
    }
    else {
        // Empty: Any lookahead matches, since you can always fit nothing anywhere
    }
}

void corner(void) {
    match(CORNER); match(COORDINATE); direction(); match(COORDINATE); direction();
}

void direction(void) {
    if (lookahead == NORTH) {
        match(NORTH);
    }
    else if (lookahead == EAST) {
        match(EAST);
    }
    else if (lookahead == SOUTH) {
        match(SOUTH);
    }
    else if (lookahead == WEST) {
        match(WEST);
    }
    else {
        error();
    }
}

void parse() {
    lookahead = scan();
    input();
    printf("Ok.\n");
}

int main() {
    printf("Write a complete input, ending with EOF:\n");
    parse();
}

Thomas Padron-McCarthy (thomas.padron-mccarthy@oru.se) November 6, 2024