KOI: Exercise 3: The basics of parsing, simple top-down parsers

You can do this exercise in any size group you wish. Don't worry if you find some of the questions difficult to answer, since they may be intended to initiate a discussion rather than have a single correct answer.

Preparations

Before this exercise you should have:

Question 1

This is the program parser-1. It implements a simple parser, with some additional code to make it a complete program.

a) What is the grammar that this parser implements?

b) What is the language that it defines?

(Some helpful hints: It only understands the tokens, or terminals, a, b and c. Finish the input with EOF, which in Linux is entered using CTRL-D on a separate line.a All other input, such as d, is ignored. You can compile and run the program if you want, but you don't have to.)

Download

// A simple parser: parser-1.c

#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>

int lookahead;

void error(void) {
    printf("Syntax error: unexpected token '%c'\n", lookahead);
    exit(EXIT_FAILURE);
}

// A simple scanner.
// Returns one of the tokens 'a', 'b' or 'c', or EOF. Other input is ignored.
int scan(void) {
    int input;
    while ((input = getchar()) != 'a' && input != 'b' && input != 'c' && input != EOF)
        ;
    return input;
}

void match(int expected) {
    if (lookahead == expected)
        lookahead = scan();
    else
        error();
}

// These are the non-terminals, calling each other recursively
void S(void), T(void), U(void);

void S(void) {
    T();
}

void T(void) {
    match('a'); match('b'); U();
}

void U(void) {
    match('c');
}

int main(void) {
    printf("Parser 1. Enter input. End with EOF (CTRL-D on Linux).\n");
    lookahead = scan(); // To get started, so we have somthing to compare to
    S();
    if (lookahead != EOF)
        error();
    printf("Done!\n");
    return EXIT_SUCCESS;
}

Question 2

This is the program parser-2. It implements a simple parser, with some additional code to make it a complete program.

a) What is the grammar that this parser implements?

b) What is the language that it defines?

Download

// A simple parser: parser-2.c

#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>

int lookahead;

void error(void) {
    printf("Syntax error: unexpected token '%c'\n", lookahead);
    exit(EXIT_FAILURE);
}

// A simple scanner.
// Returns one of the tokens 'a', 'b' or 'c', or EOF. Other input is ignored.
int scan(void) {
    int input;
    while ((input = getchar()) != 'a' && input != 'b' && input != 'c' && input != EOF)
        ;
    return input;
}

void match(int expected) {
    if (lookahead == expected)
        lookahead = scan();
    else
        error();
}

// These are the non-terminals, calling each other recursively
void S(void), T(void), U(void);

void S(void) {
    T(); T(); U();
}

void T(void) {
    match('a'); U();
}

void U(void) {
    match('b'); match('c');
}

int main(void) {
    printf("Parser 2. Enter input. End with EOF (CTRL-D on Linux).\n");
    lookahead = scan(); // To get started, so we have somthing to compare to
    S();
    if (lookahead != EOF)
        error();
    printf("Done!\n");
    return EXIT_SUCCESS;
}

Question 3

This is the program parser-3. It implements a simple parser, with some additional code to make it a complete program.

a) What is the grammar that this parser implements?

b) What is the language that it defines?

Download

// A simple parser: parser-3.c

#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>

int lookahead;

void error(void) {
    printf("Syntax error: unexpected token '%c'\n", lookahead);
    exit(EXIT_FAILURE);
}

// A simple scanner.
// Returns one of the tokens 'a', 'b' or 'c', or EOF. Other input is ignored.
int scan(void) {
    int input;
    while ((input = getchar()) != 'a' && input != 'b' && input != 'c' && input != EOF)
        ;
    return input;
}

void match(int expected) {
    if (lookahead == expected)
        lookahead = scan();
    else
        error();
}

// These are the non-terminals, calling each other recursively
void S(void), T(void), U(void), V(void);

void S(void) {
    if (lookahead == 'a') {
        match('a'); T();
    }
    else if (lookahead == 'b') {
        match('b'); T();
    }
    else {
        error();
    }
}

void T(void) {
    if (lookahead == 'a') {
        U(); match('c');
    }
    else if (lookahead == 'b') {
        V(); match('c');
    }
    else {
        error();
    }
}

void U(void) {
    match('a'); match('b');
}

void V(void) {
    match('b'); match('b');
}

int main(void) {
    printf("Parser 3. Enter input. End with EOF (CTRL-D on Linux).\n");
    lookahead = scan(); // To get started, so we have somthing to compare to
    S();
    if (lookahead != EOF)
        error();
    printf("Done!\n");
    return EXIT_SUCCESS;
}

Question 4

This is the program parser-4. It implements a simple parser, with some additional code to make it a complete program.

a) What is the grammar that this parser implements?

b) What is the language that it defines?

Download

// A simple parser: parser-4.c

#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>

int lookahead;

void error(void) {
    printf("Syntax error: unexpected token '%c'\n", lookahead);
    exit(EXIT_FAILURE);
}

// A simple scanner.
// Returns one of the tokens 'a', 'b' or 'c', or EOF. Other input is ignored.
int scan(void) {
    int input;
    while ((input = getchar()) != 'a' && input != 'b' && input != 'c' && input != EOF)
        ;
    return input;
}

void match(int expected) {
    if (lookahead == expected)
        lookahead = scan();
    else
        error();
}

// These are the non-terminals, calling each other recursively
void S(void), T(void);

void S(void) {
    if (lookahead == 'a') {
        match('a'); T(); S();
    }
    else if (lookahead == 'b') {
        match('b');
    }
    else {
        error();
    }
}

void T(void) {
    match('a'); match('b'); match('c');
}

int main(void) {
    printf("Parser 4. Enter input. End with EOF (CTRL-D on Linux).\n");
    lookahead = scan(); // To get started, so we have somthing to compare to
    S();
    if (lookahead != EOF)
        error();
    printf("Done!\n");
    return EXIT_SUCCESS;
}

Question 5

A language consists of zero or more instances of ab, followed by ba. Some examples of valid input are ba, abba, ababba and abababba. Some examples of invalid input are ab, aa and the empty input.

a) Write a grammar for this language.

b) Write a parser that implements this grammar.


Thomas Padron-McCarthy (thomas.padron-mccarthy@oru.se) September 19, 2021