KOI: Exercise 5: Bottom-up parsers, Bison

You can do this exercise in any size group you wish. Don't worry if you find some of the questions difficult to answer, since they may be intended to initiate a discussion rather than have a single correct answer.

Preparations

Before this exercise you should have:

Question 1

Here is a grammar. There are three terminals: a, b and c. There are three non-terminals: S, T and U. The start symbol is S.

S -> a T | a b T
T -> T a | c | U
U -> c

Below is the file bisongrammar-1.y, which is an input file for Bison (or Yacc). It is an attempt to use Bison to specify a parser that implements the grammar above.

You can run it through bison with the command bison bisongrammar-1.y, which creates the C file bisongrammar-1.tab.c. This file can then be compiled as a normal C program.

a) What is the language that this grammar defines?

b) This grammar has each of the three problems that we saw in exercise 4. Identify those problems in the grammar!

c) Instead of a hand-coded top-down parser, as those we saw in exercise 4, we are now using Bison to create a bottom-up parser, with the Bison specification file given below. Which of those problems are still problems, with the Bison-generated parser? Which problems are no longer an issue?

Download

// bisongrammar-1.y

%{
  #include <stdio.h>
  #include <ctype.h>
  #include <stdlib.h>
  extern int yylex(void);
  extern void yyerror(const char *message);
%}

%define parse.error verbose

%%

S : 'a' T  | 'a' 'b' T ;
T : T 'a' | 'c' | U ;
U : 'c' ;

%%

void yyerror(const char *message) {
    printf("The Bison-generated parser found an error: %s\n", message);
    exit(EXIT_FAILURE);
}

// A simple scanner.
// Returns one of the tokens 'a', 'b' or 'c', or EOF. Other input is ignored.
int yylex(void) {
    int input;
    while ((input = getchar()) != 'a' && input != 'b' && input != 'c' && input != EOF)
        ;
    return input;
}

int main(void) {
    printf("Bisongrammar-1. Enter input. End with EOF (CTRL-D on Linux).\n");
    yyparse();
    printf("Done!\n");
    return EXIT_SUCCESS;
}

Question 2

Here is a grammar. There are five terminals: a, b, c, d and e. There are three non-terminals: S, T and U. The start symbol is S.

S -> T c U
T -> a b
U -> d e

The language that this grammar defines consists of a single valid input: a b c d e

a) Show how a top-down parser of the type we have seen previously parses the input a b c d e and constructs the parse tree for it. In which order are the three productions applied?

b) Show how a bottom-up parser with a stack parses the input a b c d e and constructs the parse tree for it. In which order are the three productions applied?

Question 3

a) What do we mean by a top-down parser?

b) What do we mean by a bottom-up parser?

c) What do we mean by a recursive-descent parser?

d) What do we mean by a predictive parser?

Question 4

This grammar is similar to the one in question 1, but with the ambiguity removed:

S -> a T | a b T
T -> c | d

Show how a bottom-up parser with a stack parses the input a b c. Explain how the FIRST() conflict is handled, or in other words how the parser decides which production for S to use.

Question 5

Another grammar:

S -> a T | b U
T -> c
U -> c

Show how a bottom-up parser with a stack parses the input b c. How can it decide whether to reduce c to T or U? (Don't go into details, just explain the general principle.)

Question 6

Here are some terms that are often used in more theoretical works about parsing. What do they mean?

a) LL
b) LL(k)
c) LL(1)
d) LR
e) LR(k)
f) LR(1)

There are some solutions to some of the questions, but try to solve them yourself first.


Thomas Padron-McCarthy (thomas.padron-mccarthy@oru.se) September 23, 2024