KOI: Exercise 6: syntax-directed translations, the run-time environment

You can do this exercise in any size group you wish. Don't worry if you find some of the questions difficult to answer, since they may be intended to initiate a discussion rather than have a single correct answer.

Preparations

Before this exercise you should have:

Question 1

Here is a grammar for a simple language for working with sets of numbers. For example, the expression { 1 3 } union { 2 3 5 } union { } should evaluate to { 1 2 3 5 }. The start symbol is expression.

expression -> literal
expression -> expression union expression
literal -> { elementlist }
elementlist -> empty
elementlist -> element elementlist
element -> number

empty should be written using the "e" symbol for an empty string, but it is difficult to write in HTML in a way that is guaranteed to work in all web browsers.

a) Draw a parse tree for the input { 1 2 } union { 3 } union { }. A parse tree has a node for each production used to generate the expression from the start symbol, or, which is the same thing in reverse, used to reduce the expression to the start symbol. This grammar is ambiguous, so there can be several different parse trees for the same input.

b) Add an attribute result to each node in the parse tree, and the value it has at each node.

c) Write a syntax-directed definition for calculating those values. A syntax-directed definition associates a semantic rule with each production in the grammar, stating how to calculate the values of the attributes on the left-hand side of the production from the values of the attributes on the right-hand side of the production. You can use standard set operations such as union.

d) Verify that the calculated values, using the syntax-directed definition, for the parse tree in (a) above agree with what you write in (b).

Question 2

Here is a C program:

#include <stdlib.h>
#include <stdio.h>

int a;
int b;

void f(int c, int d) {
  int b;
  int* p = malloc(sizeof(int));
  a = 10;
  b = 11;
  *p = 12;
  printf("Here!\n");
}

void g(int h) {
  int i;
  int* p = malloc(sizeof(int));
  b = 13;
  i = 14;
  *p = 15;
  f(h, 16);
}

int main(void) {
  int k;
  int m;
  k = 17;
  a = 18;
  b = 19;
  g(k);
  return 0;
}

A program's memory can be divided into four parts: program code and constants, static data, heap and stack.

When the execution of this program prints "Here!", we stop the program and look at its memory. There will be a number of storage spaces for integers, containing some integers between 10 and 19. Draw a picture of what the static data, the stack and the heap contains, with those storage spaces and their contents. Explain what they are, and what they correspond to in the source code. Your explanation should include the terms "static data", "stack", "heap", "activation record" (also called "stack frame" or in Swedish "aktiveringspost"), "variable" and "parameter".

Question 3

C does not have automatic garbage collection. Memory allocated with malloc has to be explicitly freed using calls to free if that memory is to be re-used by the program. On a computer with a normal operating system, such as Linux or Windows, all the program's memory is reclaimed by the operating system after the end of the program's execution.

The program in the question above ends its execution after the call to g in main. If the program instead had continued running, doing something else, at the end of the main function, the memory allocated with malloc in f and g would be lost, until the program ends.

a) Explain why the memory is lost.

b) Explain how automatic garbage collection could have found, and reclaimed, the memory.

Question 4

Here is another C program:

#include <stdio.h>

void h(int a) {
    int b = a + 10;
    printf("Here!\n");
}

void g(int a) {
    int b;
    if (a < 6) {
        g(a + 1);
    }
    else {
        b = a + 2;
        h(b + 3);
        g(a + 1);
    }
}

void f(int a) {
    g(a + 1);
}

int main(void) {
    f(3);
    return 0;
}

When the execution of this program prints "Here!" for the first time, we stop the program and look at its memory, especially the stack. The stack will contain a number of activation records, with some places for storing integers.

a) Draw the stack, with activation records, storage places, and contents. Explain what they correspond to in the source code.

b) What will happen if the program continues running for a while? Why?

c) If you have time: Does the answer to question b above depend on your compiler settings, especially optimization settings?

Question 5

(If you have time!)

Some languages, such as C, Python and Java, only have call by value, and call by reference has to be simulated using pointers. C++ also has call by reference.

We sometimes use "pointer" and "reference" as synonyms, for example when talking about "reference counting". In some other contexts we need to differentiate between them. For example, C++ has both pointers (which are memory addresses of variables) and references (which are aliases for variables). C++ compilers can use pointers to implement references, but not always, and conceptually they are different things.

Here is a C++ program:

int g = 1;

void f(int i, int& r, int* p, int*& rp) {
    i = 2;
    r = 3;
    *p = 4;
    p = &g;
    *rp = 5;
    rp = &g;
    // Here!
}

int main(void) {
    int a = 6;
    int b = 7;
    int c = 8;
    int* d = &c;
    int* e = &c;
    int& h = c;

    f(a, b, d, e);
}

The parameter i is a normal call-by-value parameter, receiving an integer. r is a reference parameter, to an integer variable. p is call-by-value parameter, even though it receives a pointer. rp is a reference parameter, to a pointer variable.

In the main function, a, b and c are normal integer variables. d and e are pointer variables, and they contain pointers to c. f is a reference variable, which is a unusual feature that C++ has, and it is an alias for the variable c. f has no memory of its own.

We stop the program when execution reaches the comment "Here!". Draw a picture of the stack, showing activation records, variables, parameters, integers and pointers.

As some more or less helpful hints, here are the memory locations of the variables in main from when I ran the program:

a is at  0xf63c
b is at  0xf640
c is at  0xf644
d is at  0xf648
e is at  0xf650
h is at  0xf644
Here are the locations of the parameters in f:
i is at  0xf60c
r is at  0xf640
p is at  0xf600
rp is at 0xf650
Values before the call to f:
a = 6, b = 7, c = 8, d = 0xf644, *d = 8, e = 0xf644, *e = 8, h = 8
Values after the call to f:
a = 6, b = 3, c = 5, d = 0xf644, *d = 5, e = 0x8010, *e = 1, h = 5


Thomas Padron-McCarthy (thomas.padron-mccarthy@oru.se) October 6, 2021