KOI: Some solutions to exercise 4

Please note that these are suggested solutions. There can be other solutions that are also correct.

Question 1

a) Language:

a b a b a b c
a b a a a b c
b a a b a b c
b a a a a b c

b) The problem:

There is a FIRST() conflict between the two productions for non-terminal T:

T -> a b U
T -> a a U
When the parser expects to find input matching a T, and finds the terminal a, it can't decide which branch of the if statement to chose:
    if (lookahead == 'a') {
        match('a'); match('b'); U();
    }
    else if (lookahead == 'a') {
        match('a'); match('a'); U();
    }
    else {
        error();
    }

c) Transformation:

Left Factoring (in Swedish: vänsterfaktorisering)

d) The old grammar for the non-terminal T:

T -> a b U
T -> a a U
is transformed to:
T -> a Rest
Rest -> b U | a U

e) New parser:

Download

// Correct parser: goodparser-1.c

#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>

int lookahead;

void error(void) {
    printf("Syntax error: unexpected token '%c'\n", lookahead);
    exit(EXIT_FAILURE);
}

// A simple scanner.
// Returns one of the tokens 'a', 'b' or 'c', or EOF. Other input is ignored.
int scan(void) {
    int input;
    while ((input = getchar()) != 'a' && input != 'b' && input != 'c' && input != EOF)
        ;
    return input;
}

void match(int expected) {
    if (lookahead == expected)
        lookahead = scan();
    else
        error();
}

// These are the non-terminals, calling each other recursively
void S(void), T(void), U(void), Rest(void);

void S(void) {
    if (lookahead == 'a') {
        match('a'); match('b'); T();
    }
    else if (lookahead == 'b') {
        match('b'); match('a'); T();
    }
    else {
        error();
    }
}

void T(void) {
    match('a'); Rest();
}

void Rest(void) {
    if (lookahead == 'b') {
        match('b'); U();
    }
    else if (lookahead == 'a') {
        match('a'); U();
    }
    else {
        error();
    }
}

void U(void) {
    match('a'); match('b'); match('c');
}

int main(void) {
    printf("Badparser 1. Enter input. End with EOF (CTRL-D on Linux).\n");
    lookahead = scan(); // To get started, so we have something to compare to
    S();
    if (lookahead != EOF)
        error();
    printf("Done!\n");
    return EXIT_SUCCESS;
}

Question 2

a) Language:

a b b c
a b b a c
a b b a a c
a b b a a a c
...
Or, with a regular expression: abba*c

b) The problem:

There is left recursion (in Swedish: vänsterrekursion) in this production for the non-terminal T:

T -> T a
When the parser finds input matching a T, it will then try to parse the right side of the production (T a), which starts with a T, so it will call the function T, which will call itself, and so on. The program will be stuck in an infinite recursion.
void T(void) {
    if (lookahead == 'b') {
        T(); match('a');
    }
    ...

c) Transformation:

Elimination of left recursion

d) The old grammar for the non-terminal T:

T -> T a
T -> b
is transformed to:
T -> b Rest
Rest -> a Rest | empty

e) New parser:

Download

// Correct parser: goodparser-2.c

#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>

int lookahead;

void error(void) {
    printf("Syntax error: unexpected token '%c'\n", lookahead);
    exit(EXIT_FAILURE);
}

// A simple scanner.
// Returns one of the tokens 'a', 'b' or 'c', or EOF. Other input is ignored.
int scan(void) {
    int input;
    while ((input = getchar()) != 'a' && input != 'b' && input != 'c' && input != EOF)
        ;
    return input;
}

void match(int expected) {
    if (lookahead == expected)
        lookahead = scan();
    else
        error();
}

// These are the non-terminals, calling each other recursively
void S(void), T(void), Rest(void);

void S(void) {
    match('a'); match('b'); T(); match('c');
}

void T(void) {
    match('b'); Rest();
}

void Rest(void) {
    if (lookahead == 'a') {
        match('a'); Rest();
    }
    else {
        // Any input is OK, since we can fit an empty sequence anywhere
    }
}

int main(void) {
    printf("Badparser 2. Enter input. End with EOF (CTRL-D on Linux).\n");
    lookahead = scan(); // To get started, so we have something to compare to
    S();
    if (lookahead != EOF)
        error();
    printf("Done!\n");
    return EXIT_SUCCESS;
}

Question 3

a) Language:

a b a b c c

b) The problem:

The grammar is ambiguous (in Swedish: tvetydig). The input a b a b c c can be parsed in two different ways:

One parse tree Another parse tree

c) Yes, the parser parser below parses the language correctly. There is no problem with the language, which just consists of the only valid input a b a b c c. The problem is with how it is to be matched using the productions in the grammar. Is a b c a T, or is it a U that is, in turn, a T?

d) A new, non-ambiguous, grammar that describes the same language:

S -> a b T c
T -> a b c
Another non-ambiguous grammar that also describes the same language:
S -> a b T c
T -> U
U -> a b c
Yet another non-ambiguous grammar that describes the same language is this, which is the simplest grammar for the language:
S -> a b a b c c

Follow-up question: With these three different grammars, the same input a b a b c c can be parsed in three different ways. Doesn't that mean that it is ambiguous? Do we still have the same problem that we tried to solve?

e) A new parser that implements the first new grammar given above, but still the same language as before:

Download

// A simple parser: goodparser-3.c

#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>

int lookahead;

void error(void) {
    printf("Syntax error: unexpected token '%c'\n", lookahead);
    exit(EXIT_FAILURE);
}

// A simple scanner.
// Returns one of the tokens 'a', 'b' or 'c', or EOF. Other input is ignored.
int scan(void) {
    int input;
    while ((input = getchar()) != 'a' && input != 'b' && input != 'c' && input != EOF)
        ;
    return input;
}

void match(int expected) {
    if (lookahead == expected)
        lookahead = scan();
    else
        error();
}

// These are the non-terminals, calling each other recursively
void S(void), T(void);

void S(void) {
    match('a'); match('b'); T(); match('c');
}

void T(void) {
    match('a'); match('b'); match('c');
}

int main(void) {
    printf("Goodparser 3. Enter input. End with EOF (CTRL-D on Linux).\n");
    lookahead = scan(); // To get started, so we have something to compare to
    S();
    if (lookahead != EOF)
        error();
    printf("Done!\n");
    return EXIT_SUCCESS;
}

Question 4

...


Thomas Padron-McCarthy (thomas.padron-mccarthy@oru.se) September 14, 2023