KOI: Some solutions to exercise 4

Please note that these are suggested solutions. There can be other solutions that are also correct.

Question 1

a) Language:

a b a b a b c
a b a a a b c
b a a b a b c
b a a a a b c

b) The problem:

There is a FIRST() conflict between the two productions for non-terminal T:

T -> a b U
T -> a a U
When the parser expects to find input matching a T, and finds the terminal a, it can't decide which branch of the if statement to choose:
    if (lookahead == 'a') {
        match('a'); match('b'); U();
    }
    else if (lookahead == 'a') {
        match('a'); match('a'); U();
    }
    else {
        error();
    }

c) Transformation:

Left Factoring (in Swedish: vänsterfaktorisering)

d) The old grammar for the non-terminal T:

T -> a b U
T -> a a U
is transformed to:
T -> a Rest
Rest -> b U | a U

e) New parser:

Download

// Correct parser: goodparser-1.c

#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>

int lookahead;

void error(void) {
    printf("Syntax error: unexpected token '%c'\n", lookahead);
    exit(EXIT_FAILURE);
}

// A simple scanner.
// Returns one of the tokens 'a', 'b' or 'c', or EOF. Other input is ignored.
int scan(void) {
    int input;
    while ((input = getchar()) != 'a' && input != 'b' && input != 'c' && input != EOF)
        ;
    return input;
}

void match(int expected) {
    if (lookahead == expected)
        lookahead = scan();
    else
        error();
}

// These are the non-terminals, calling each other recursively
void S(void), T(void), U(void), Rest(void);

void S(void) {
    if (lookahead == 'a') {
        match('a'); match('b'); T();
    }
    else if (lookahead == 'b') {
        match('b'); match('a'); T();
    }
    else {
        error();
    }
}

void T(void) {
    match('a'); Rest();
}

void Rest(void) {
    if (lookahead == 'b') {
        match('b'); U();
    }
    else if (lookahead == 'a') {
        match('a'); U();
    }
    else {
        error();
    }
}

void U(void) {
    match('a'); match('b'); match('c');
}

int main(void) {
    printf("Badparser 1. Enter input. End with EOF (CTRL-D on Linux).\n");
    lookahead = scan(); // To get started, so we have something to compare to
    S();
    if (lookahead != EOF)
        error();
    printf("Done!\n");
    return EXIT_SUCCESS;
}

Question 2

a) Language:

a b b c
a b b a c
a b b a a c
a b b a a a c
...
Or, with a regular expression: abba*c

b) The problem:

There is left recursion (in Swedish: vänsterrekursion) in this production for the non-terminal T:

T -> T a
When the parser finds input matching a T, it will then try to parse the right side of the production (T a), which starts with a T, so it will call the function T, which will call itself, and so on. The program will be stuck in an infinite recursion.
void T(void) {
    if (lookahead == 'b') {
        T(); match('a');
    }
    ...

c) Transformation:

Elimination of left recursion

d) The old grammar for the non-terminal T:

T -> T a
T -> b
is transformed to:
T -> b Rest
Rest -> a Rest | empty

e) New parser:

Download

// Correct parser: goodparser-2.c

#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>

int lookahead;

void error(void) {
    printf("Syntax error: unexpected token '%c'\n", lookahead);
    exit(EXIT_FAILURE);
}

// A simple scanner.
// Returns one of the tokens 'a', 'b' or 'c', or EOF. Other input is ignored.
int scan(void) {
    int input;
    while ((input = getchar()) != 'a' && input != 'b' && input != 'c' && input != EOF)
        ;
    return input;
}

void match(int expected) {
    if (lookahead == expected)
        lookahead = scan();
    else
        error();
}

// These are the non-terminals, calling each other recursively
void S(void), T(void), Rest(void);

void S(void) {
    match('a'); match('b'); T(); match('c');
}

void T(void) {
    match('b'); Rest();
}

void Rest(void) {
    if (lookahead == 'a') {
        match('a'); Rest();
    }
    else {
        // Any input is OK, since we can fit an empty sequence anywhere
    }
}

int main(void) {
    printf("Badparser 2. Enter input. End with EOF (CTRL-D on Linux).\n");
    lookahead = scan(); // To get started, so we have something to compare to
    S();
    if (lookahead != EOF)
        error();
    printf("Done!\n");
    return EXIT_SUCCESS;
}

Question 3

a) Language:

a b a b c c

b) The problem:

The grammar is ambiguous (in Swedish: tvetydig). The input a b a b c c can be parsed in two different ways:

One parse tree Another parse tree

c) Yes, the parser below parses the language correctly. There is no problem with the language, which just consists of the only valid input a b a b c c. The problem is with how it is to be matched using the productions in the grammar. Is a b c a T, or is it a U that is, in turn, a T?

d) A new, non-ambiguous, grammar that describes the same language:

S -> a b T c
T -> a b c
Another non-ambiguous grammar that also describes the same language:
S -> a b T c
T -> U
U -> a b c
Yet another non-ambiguous grammar that describes the same language is this, which is the simplest grammar for the language:
S -> a b a b c c

Follow-up question: With these three different grammars, the same input a b a b c c can be parsed in three different ways. Doesn't that mean that it is ambiguous? Do we still have the same problem that we tried to solve?

e) A new parser that implements the first new grammar given above, but still the same language as before:

Download

// A simple parser: goodparser-3.c

#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>

int lookahead;

void error(void) {
    printf("Syntax error: unexpected token '%c'\n", lookahead);
    exit(EXIT_FAILURE);
}

// A simple scanner.
// Returns one of the tokens 'a', 'b' or 'c', or EOF. Other input is ignored.
int scan(void) {
    int input;
    while ((input = getchar()) != 'a' && input != 'b' && input != 'c' && input != EOF)
        ;
    return input;
}

void match(int expected) {
    if (lookahead == expected)
        lookahead = scan();
    else
        error();
}

// These are the non-terminals, calling each other recursively
void S(void), T(void);

void S(void) {
    match('a'); match('b'); T(); match('c');
}

void T(void) {
    match('a'); match('b'); match('c');
}

int main(void) {
    printf("Goodparser 3. Enter input. End with EOF (CTRL-D on Linux).\n");
    lookahead = scan(); // To get started, so we have something to compare to
    S();
    if (lookahead != EOF)
        error();
    printf("Done!\n");
    return EXIT_SUCCESS;
}

Question 4

It "describes the same language" in the sense that it accepts the same inputs (in this case just the string a b a b c c) and rejects the same inputs. But the parse trees are different. Compare this to a parser of mathematical expressions that has an ambiguous grammar, and that can parse the input string 3 + 4 * 5 either as 3 + (4 * 5), which when calculated is 23, or as (3 + 4) * 5, which is 35.


Thomas Padron-McCarthy (thomas.padron-mccarthy@oru.se) September 23, 2024