Compilers and Interpreters: Solutions to the exam 2023-10-24

Please note that these are suggested solutions. There can be other solutions that are also correct. Some of these solutions may be more comprehensive than required for full points on the exam, or they might only refer to where the answers can be found. In addition, it has occurred in the history of the world that teachers have made errors in the suggested solutions. I myself have, of course, never made any such errors, but others have. Does anyone really read introductions like this one? I don't know. It may be so. Rhubarb rhubarb rhubarb. I have heard that the extras on a movie set are told to say that to simulate background noise in a restaurant or similar place. Below are the suggested solutions.

Task 1 (6 p)

Here is a program segment written in a C-like language:
    a = b;
    c = d * e * f - g * (h + i);
    while (j < k + l + m) {
        if (n < o) {
            p = q + r * s;
        }
        else {
            t = u + v;
        }
        w = x;
    }
    y = z;

Translate the program segment to two of the following three types of representations. (Should you answer with all three, the one with the highest points will be discarded.)

a) an abstract syntax tree (by drawing it)

Answer:

An (abstract) syntax tree

b) postfix code for a stack machine

Answer:

        lvalue a
        rvalue b
        =
        lvalue c
        rvalue d
        rvalue e
        *
        rvalue f
        *
        rvalue g
        rvalue h
        rvalue i
        +
        *
        -
        =
label WHILE-START:
        rvalue j
        rvalue k
        rvalue l
        +
        rvalue m
        +
        <
        gofalse AFTER-WHILE
        rvalue n
        rvalue o
        <
        gofalse ELSE-START
        lvalue p
        rvalue q
        rvalue r
        rvalue s
        *
        +
        =
        goto AFTER-ELSE
label ELSE-START:
        lvalue t
        rvalue u
        rvalue v
        +
        =
label AFTER-ELSE:
        lvalue w
        rvalue x
        =
        goto WHILE-START
label AFTER-WHILE:
        lvalue y
        rvalue z
        =

c) three-address code

Answer:

        a = b
        t1 = d * e
        t2 = t1 * f
        t3 = h + i
        t4 = g * t3
        c = t2 - t4
label WHILE-START:
        t5 = k + l
        t6 = t5 + m
        if j ≥ t6 goto AFTER-WHILE
        if n ≥ o goto ELSE-START
        t7 = r * s
        p = q + t7
        goto AFTER-ELSE
label ELSE-START:
        t = u + v
label AFTER-ELSE:
        w = x
        goto WHILE-START
label AFTER-WHILE:
        y = z

Or, if we number the instructions and use those numbers instead of labels:

 (1)    a = b
 (2)    t1 = d * e
 (3)    t2 = t1 * f
 (4)    t3 = h + i
 (5)    t4 = g * t3
 (6)    c = t2 - t4
 (7)    t5 = k + l
 (8)    t6 = t5 + m
 (9)    if j ≥ t6 goto 17
(10)    if n ≥ o goto 14
(11)    t7 = r * s
(12)    p = q + t7
(13)    goto 15
(14)    t = u + v
(15)    w = x
(16)    goto 7
(17)    y = z

Task 2 (5 p)

We compile the program, in the C-like language, from the task above, but when we do that we get some error messages. Some of the errors are because we made mistakes when we typed in the program, such as mistakenly replacing "if" with "while", and some may have other causes.

In which phase of the compiler, or at which other times, are the following errors detected?

Program Error
    a = b;
    c = d * e * f - g * h + i);
    while (j < k + l + m) {
        while (n < o) {
            p = q + r $ s;
        }
        else {
            t = u + v;
        }
        w = x;
    }
    y = z;
variable "a" is set but not used
unexpected end parenthesis


unknown operator "$"

"else" without a previous "if"


variable "x" undeclared


Answer:

  1. machine-independent code optimizer, or semantic analysis
  2. parser
  3. scanner
  4. parser
  5. semantic analysis

Task 3 (5 p)

We want a language that can handle expressions with numbers, addition (using the operator "+"), multiplication (using the operator "*"), and exponentiation (using the operator "^"). For example, an expression can look like 12 + 13 * 14 ^ 15 * 16.

We try to create a grammar for the language, with the following result. The start symbol is S, which represents a complete expression.

S -> S * T | T
T -> T ^ U | U
U -> U + number | number

a) Given this grammar, what is the precedence of the three operators?

Answer:

+ has the highest precedence, followed by ^, and * has the lowest-

b) Given this grammar, what is the associativity of the three operators?

Answer:

They are all left associative. For example, the expression 2+3+4 will be grouped as ((2+3)+4).

c) Using the normal rules for precedence and associativity, the expression
12 + 13 * 14 ^ 15 * 16 would be interpreted as 12 + ((13 * (14 ^ 15)) * 16), or, as a syntax tree:

An (abstract) syntax tree

Draw the syntax tree for this expression when parsed using the grammar above!

Answer:

An (abstract) syntax tree

Written as an expression with parentheses: (((12 + 13) * (14 ^ 15)) * 16)

Scenario for task 4-7

We have a new puppy!

Unfortunately, for a while we have to wash our carpets a lot:

To keep track of our carpets, and when we have washed them, we design a program with a special input language. Here is an example of how this input language will look:

carpet 1 1.2;
carpet 2 2;
cleaned carpet 1 on 2023-10-23;
carpet 3 0.6;
cleaned carpet 2 on 2023-10-23;
cleaned carpet 1 on 2023-10-24;
carpet 4 1.2 * 2.0;
carpet 5 1.2 * 5;
done;

As we can se, there are three commands that can be given: carpet, which defines a carpet with a number and an area, cleaned, which says that we have cleaned a certain carpet on a certain day, and done, which signals the end of the input.

The area of a carpet can be given as a single number, with or without decimals, or as a simple expression using the length and width of the carpet, as shown in the example above.

It should be possible to write the input in free format, so the following input should be equivalent to the one above:

carpet 1 1.2 ; carpet 2 2 ; cleaned carpet 1 on 2023-10-23 ; carpet 3
0.6 ; cleaned carpet 2 on 2023-10-23 ; cleaned carpet 1 on 2023-10-24
; carpet 4 1.2 * 2.0 ; carpet 5 1.2 * 5 ; done ;

Task 4 (4 p)

a) (2p) Which terminals are needed to write a grammar for the input language in the scenario?

Answer:

carpet
cleaned
on
done
semicolon (";")
times ("*")
date
integer (a number without decimals)
noninteger (a number with decimals)

b) (2p) Out of these terminals, some will not have fixed lexemes. Write regular expressions for each such terminal.

Answer:

date: [12][0-9][0-9][0-9]-[0-1][0-9]-[0-3][0-9]
integer: [0-9]+
noninteger: [0-9]+\.[0-9]+

Task 5 (5 p)

Write a grammar for the input language. The start symbol should be input, which represents a complete input as described in the scenario above.

Answer:

input -> commands donecommand
donecommand -> done semicolon
commands -> command commands | ε
command -> carpet integer area semicolon | cleaned carpet integer on date semicolon
area -> number | number times number
number -> integer | noninteger

Task 6 (4 p)

Given your grammar, draw a parse tree (also called a concrete syntax tree) for the following input:

carpet 17 3.0;
carpet 18 3 * 4;
done;

Answer:

A parse tree

Task 7 (8 p)

Write a predictive recursive-descent parser for the input language, in a language that is at least similar to C, C++, C# or Java. You do not have to write exactly correct program code, but it should be clear which procedures exist, how they call each other, and what comparisons with token types are made. You can assume there is a function called scan, which returns the type of the next token, and a function called error, which you can call when something went wrong and which prints an error message and terminates the program.

Answer:

The grammar from the task above has a FIRST()-conflict, so we must first left factor the productions for the non-terminal area, which gives us this new grammar:

input -> commands donecommand
donecommand -> done semicolon
commands -> command commands | ε
command -> carpet integer area semicolon | cleaned carpet integer on date semicolon
area -> number area-rest
area-rest -> times number | ε
number -> integer | noninteger

For testing, you can download carpetparser.c, plus a Lex scanner specification carpets.l, and a Bison version of the grammar carpets.y, and a Makefile.

int lookahead;

void match(int expected) {
    if (lookahead != expected)
        error();
    else
        lookahead = scan();
}

void input(void) {
    commands(); donecommand();
}

void donecommand(void) {
    match(DONE); match(SEMICOLON);
}

void commands(void) {
    if (lookahead == CARPET || lookahead == CLEANED) {
        command(); commands();
    }
    else {
        /* Empty */
    }
}

void command(void) {
    if (lookahead == CARPET) {
        match(CARPET); match(INTEGER); area(); match(SEMICOLON);
    }
    else if (lookahead == CLEANED) {
        match(CLEANED); match(CARPET); match(INTEGER); match(ON); match(DATE); match(SEMICOLON);
    }
    else {
        error();
    }
}

void area(void) {
    number(); area_rest();
}

void area_rest(void) {
    if (lookahead == TIMES) {
        match(TIMES); number();
    }
    else {
        /* Empty */
    }
}

void number(void) {
    if (lookahead == INTEGER) {
        match(INTEGER);
    }
    else if (lookahead == NONINTEGER) {
        match(NONINTEGER);
    }
    else {
        error();
    }
}

void parse() {
    lookahead = scan();
    input();
    printf("Ok.\n");
}

int main() {
    printf("Write a complete input, ending with \"done\", a semicolon and EOF:\n");
    parse();
}


Thomas Padron-McCarthy (thomas.padron-mccarthy@oru.se) November 13, 2023