Please note that these are suggested solutions.
There can be other solutions that are also correct.
Some of these solutions may be more comprehensive than required for full points on the exam,
or they might only refer to where the answers can be found.
In addition, it has occurred in the history of the world that teachers have made errors in the suggested solutions.
I myself have, of course, never made any such errors, but others have.
Does anyone really read introductions like this one?
I don't know. It may be so. Rhubarb rhubarb rhubarb.
I have heard that the extras on a movie set are told to say that
to simulate background noise in a restaurant or similar place.
Below are the suggested solutions.
Task 1 (10 p)
A compiler's work is usually divided into a number of phases.
Which are those phases?
Explain shortly what each phase does.
What is the input and the output of each phase?
- Source code in the form of text, i. e. a sequence of characters, is read by
- (phase 1) the lexical analyser ("scanner"), which divides the program's source code text into tokens, and generates
- A sequence of tokens, with pairs of the token type and the lexical value, passed on to
- (phase 2) the syntactic analyser ("parser"), which analyzes the program (in the form of tokens) according to the source language grammar, and generates
- A syntax tree (which does not need to be explicitly constructed), which is passed on to
- (phase 3) the semantic analyser, which analyzes the meaning of the program, mainly with regard to data types and type correctness (type control), and has as output
- The same syntax tree, perhaps decorated with (mostly) data types, which is then sent to
- (phase 4) the intermediate code generator, which of course generates
- Intermediate code, not as text but for example as three-address code in the form of quadruples, which is sent to
- (phase 5) the machine-independent code optimizer, which makes the program (in the form of intermediate code) smaller and faster (or one of these), and thus has as output
- The same type of intermediate code, but smaller and/or faster, which is passed on to
- (phase 6) the Code Generator, which generates
- Target code written in the target language (e. g. assembly code), sometimes as text, which is passed on to
- (phase 7) the machine-dependent code optimizer, which makes optimizations that can only be done at the target code level, for example merging assembly instructions, and thus has as output
- The same type of target code, but smaller and faster
See also the book or the
lecture notes,
especially
this figure,
with the addition of the machine-dependent optimizer.
Task 2 (4 p)
In the following program, there are eight errors and warnings that will be reported,
as shown by the comments in italics.
Some of these errors and warnings may be detected in one of the compiler's phases,
and some may be detected at other times.
When will each of the errors and warnings be detected?
#include <stdio.h>
int main(void) {
prontf("Starting...\n"); // 1: undefined reference to `prontf'
int a = 17zzz; // 2: invalid suffix on integer constant
int b = ; // 3: expected expression before ';' token
int c = 0;
if (c == 0) {
int d = a / c; // 4: math exception
int e, f, g;
int h // 5: expected '=', ',' or ';' before 'e'
e = 17; // 6: variable 'e' set but not used
f = g; // 7: 'g' may be used uninitialized
printf("d = %d, f = %d\n", d, f);
}
return "Done!"; // 8: return makes integer from pointer
}
- 1: The linker (but could reasonably also be in the semantic analysis phase)
- 2: The scanner
- 3: The parser
- 4: At run-time (but could reasonably also be in the machine-independent optimizer)
- 5: The parser
- 6: Most likely in the machine-independent optimizer
- 7: Most likely in the machine-independent optimizer
- 8: Semantic analysis
Task 3 (6 p)
This is a program segment written in a C-like language:
x = 0;
y = z * 2 - t - 2 * 2;
while (x < y) {
if (x < 5) {
x = x + 1;
}
else {
x = x + 2;
y = y + 1;
}
}
a)
Translate the program segment to
an abstract syntax tree (by drawing it).
b)
Translate the program segment to
either
postfix code for a stack machine
or
three-address code. (Not both!)
Postfix code for a stack machine:
lvalue x
push 0
=
lvalue y
rvalue z
push 2
*
rvalue t
-
push 2
push 2
*
-
=
label WHILE-START:
rvalue x
rvalue y
<
gofalse AFTER-WHILE
rvalue x
push 5
<
gofalse ELSE-START
lvalue x
rvalue x
push 1
+
=
goto AFTER-IF
label ELSE-START:
lvalue x
rvalue x
push 2
+
=
lvalue y
rvalue y
push 1
+
=
label AFTER-IF:
goto WHILE-START
label AFTER-WHILE:
Three-address code:
x = 0
temp1 = z * 2
temp2 = temp1 - t
temp3 = 2 * 2
y = temp2 - temp3
label WHILE-START:
if (x >= y) goto AFTER-WHILE
if (x >= 5) goto ELSE-START
x = x + 1
goto AFTER-IF
label ELSE-START:
x = x + 2
y = y + 1
label AFTER-IF:
goto WHILE-START
label AFTER-WHILE:
Or, if we number the instructions and use those numbers instead of labels:
(1) x = 0
(2) temp1 = z * 2
(3) temp2 = temp1 - t
(4) temp3 = 2 * 2
(5) y = temp2 - temp3
(6) if (x >= y) goto 13
(7) if (x >= 5) goto 10
(8) x = x + 1
(9) goto 12
(10) x = x + 2
(11) y = y + 1
(12) goto 6
(13)
Task 4 (4 p)
a) (2p)
Which terminals
are needed to write a grammar for the animal language?
Name (of a species), number (of observed animals),
and the keywords species, declarations, done, observation and observations.
b) (2p)
Out of the terminals, some will not have fixed lexemes.
Write regular expressions for each such terminal.
name: [a-z]+
number: [0-9]+
Task 5 (4 p)
Write a grammar for the animal language.
The start symbol should be input,
which represents a complete input according to the scenario above.
input -> manydeclarations declarations done manyobservations observations done
manydeclarations -> onedeclaration manydeclarations | ε
onedeclaration -> species name
manyobservations -> oneobservation manyobservations | ε
oneobservation -> observation name number
Task 6 (8 p)
Write a predictive recursive-descent parser for the animal language,
in a language that is at least similar to C, C++, C# or Java.
You do not have to write exactly correct program code,
but it should be clear which procedures exist,
how they call each other,
and what comparisons with token types are made.
You can assume there is a function called scan,
which returns the type of the next token,
and a function called error,
which you can call when something went wrong
and which prints an error message and terminates the program.
Download:
animalsparser.c,
plus a Lex-scannerspecifikation animals.l,
and a Bison version of the grammar animals.y,
and a Makefile.
Since the grammar from the solution to the task above
is already suitable for constructing a predictive recursive-descent parser,
we can use it directly.
Otherwise, we would have had to make some transformations.
int lookahead;
void match(int expected) {
if (lookahead != expected)
error();
else
lookahead = scan();
}
void input(void) {
manydeclarations(); match(DECLARATIONS); match(DONE); manyobservations(); match(OBSERVATIONS); match(DONE);
}
void manydeclarations(void) {
if (lookahead == SPECIES) {
onedeclaration(); manydeclarations();
}
else {
/* Empty */
}
}
void onedeclaration(void) {
match(SPECIES); match(NAME);
}
void manyobservations(void) {
if (lookahead == OBSERVATION) {
oneobservation(); manyobservations();
}
else {
/* Empty */
}
}
void oneobservation(void) {
match(OBSERVATION); match(NAME); match(NUMBER);
}
void parse() {
lookahead = scan();
input();
printf("Ok.\n");
}
Thomas Padron-McCarthy
(thomas.padron-mccarthy@oru.se)
November 2, 2019