Örebro University
School of Science and Technology
Thomas Padron-McCarthy (thomas.padron-mccarthy@oru.se)






Exam

Compilers and Interpreters

for Dataingenjörsprogrammet, and others

Tuesday October 24, 2023

Exam for:

DT135G Kompilatorer och interpretatorer, provkod A001




Aids: No aids.
Score requirements: Maximum score is 37.
To pass, at least 22 points are required.
Results: Announced no later than 15 working days after the exam.
Return of the exams: Electronically through the student portal "Studenttjänster".
Examiner and teacher on call: Thomas Padron-McCarthy, phone 070-73 47 013.




GOOD LUCK!!

Formulas

1. Eliminating left recursion

A left-recursive grammar can be transformed to a grammar that is not left recursive. Assume that the grammar contains a rule (or, more correctly, two productions) like this:
A -> A x | y
A is a non-terminal, but x and y are any constructions consisting of terminals and non-terminals.

The rule is replaced by the following two rules (or, more correctly, three productions), that describe the same language, but are not left recursive:

A -> y R
R -> x R | empty

2. Left factorization

Assume that the grammar contains this rule (two productions):
A -> x y | x z
A is a non-terminal, but x, y and z are any constructions consisting of terminals and non-terminals.

Replace with these three productions:

A -> x R
R -> y | z

Task 1 (6 p)

Here is a program segment written in a C-like language:
    a = b;
    c = d * e * f - g * (h + i);
    while (j < k + l + m) {
        if (n < o) {
            p = q + r * s;
        }
        else {
            t = u + v;
        }
        w = x;
    }
    y = z;

Translate the program segment to two of the following three types of representations. (Should you answer with all three, the one with the highest points will be discarded.)

a) an abstract syntax tree (by drawing it)

b) postfix code for a stack machine

c) three-address code

Note: Two of the three types, not all three.

Task 2 (5 p)

We compile the program, in the C-like language, from the task above, but when we do that we get some error messages. Some of the errors are because we made mistakes when we typed in the program, such as mistakenly replacing "if" with "while", and some may have other causes.

In which phase of the compiler, or at which other times, are the following errors detected?

Program Error
    a = b;
    c = d * e * f - g * h + i);
    while (j < k + l + m) {
        while (n < o) {
            p = q + r $ s;
        }
        else {
            t = u + v;
        }
        w = x;
    }
    y = z;
variable "a" is set but not used
unexpected end parenthesis


unknown operator "$"

"else" without a previous "if"


variable "x" undeclared


Task 3 (5 p)

We want a language that can handle expressions with numbers, addition (using the operator "+"), multiplication (using the operator "*"), and exponentiation (using the operator "^"). For example, an expression can look like 12 + 13 * 14 ^ 15 * 16.

We try to create a grammar for the language, with the following result. The start symbol is S, which represents a complete expression.

S -> S * T | T
T -> T ^ U | U
U -> U + number | number

a) Given this grammar, what is the precedence of the three operators?

b) Given this grammar, what is the associativity of the three operators?

c) Using the normal rules for precedence and associativity, the expression
12 + 13 * 14 ^ 15 * 16 would be interpreted as 12 + ((13 * (14 ^ 15)) * 16), or, as a syntax tree:

Draw the syntax tree for this expression when parsed using the grammar above!

Scenario for task 4-7

We have a new puppy!

Unfortunately, for a while we have to wash our carpets a lot:

To keep track of our carpets, and when we have washed them, we design a program with a special input language. Here is an example of how this input language will look:

carpet 1 1.2;
carpet 2 2;
cleaned carpet 1 on 2023-10-23;
carpet 3 0.6;
cleaned carpet 2 on 2023-10-23;
cleaned carpet 1 on 2023-10-24;
carpet 4 1.2 * 2.0;
carpet 5 1.2 * 5;
done;

As we can se, there are three commands that can be given: carpet, which defines a carpet with a number and an area, cleaned, which says that we have cleaned a certain carpet on a certain day, and done, which signals the end of the input.

The area of a carpet can be given as a single number, with or without decimals, or as a simple expression using the length and width of the carpet, as shown in the example above.

It should be possible to write the input in free format, so the following input should be equivalent to the one above:

carpet 1 1.2 ; carpet 2 2 ; cleaned carpet 1 on 2023-10-23 ; carpet 3
0.6 ; cleaned carpet 2 on 2023-10-23 ; cleaned carpet 1 on 2023-10-24
; carpet 4 1.2 * 2.0 ; carpet 5 1.2 * 5 ; done ;

Task 4 (4 p)

a) (2p) Which terminals are needed to write a grammar for the input language in the scenario?

b) (2p) Out of these terminals, some will not have fixed lexemes. Write regular expressions for each such terminal.

Task 5 (5 p)

Write a grammar for the input language. The start symbol should be input, which represents a complete input as described in the scenario above.

Task 6 (4 p)

Given your grammar, draw a parse tree (also called a concrete syntax tree) for the following input:

carpet 17 3.0;
carpet 18 3 * 4;
done;

Task 7 (8 p)

Write a predictive recursive-descent parser for the input language, in a language that is at least similar to C, C++, C# or Java. You do not have to write exactly correct program code, but it should be clear which procedures exist, how they call each other, and what comparisons with token types are made. You can assume there is a function called scan, which returns the type of the next token, and a function called error, which you can call when something went wrong and which prints an error message and terminates the program.