a b a b a b c a b a a a b c b a a b a b c b a a a a b c
b) The problem:
There is a FIRST() conflict between the two productions for non-terminal T:
T -> a b U T -> a a UWhen the parser expects to find input matching a T, and finds the terminal a, it can't decide which branch of the if statement to choose:
if (lookahead == 'a') { match('a'); match('b'); U(); } else if (lookahead == 'a') { match('a'); match('a'); U(); } else { error(); }
c) Transformation:
Left Factoring (in Swedish: vänsterfaktorisering)
d) The old grammar for the non-terminal T:
T -> a b U T -> a a Uis transformed to:
T -> a Rest Rest -> b U | a U
e) New parser:
// Correct parser: goodparser-1.c #include <stdio.h> #include <ctype.h> #include <stdlib.h> int lookahead; void error(void) { printf("Syntax error: unexpected token '%c'\n", lookahead); exit(EXIT_FAILURE); } // A simple scanner. // Returns one of the tokens 'a', 'b' or 'c', or EOF. Other input is ignored. int scan(void) { int input; while ((input = getchar()) != 'a' && input != 'b' && input != 'c' && input != EOF) ; return input; } void match(int expected) { if (lookahead == expected) lookahead = scan(); else error(); } // These are the non-terminals, calling each other recursively void S(void), T(void), U(void), Rest(void); void S(void) { if (lookahead == 'a') { match('a'); match('b'); T(); } else if (lookahead == 'b') { match('b'); match('a'); T(); } else { error(); } } void T(void) { match('a'); Rest(); } void Rest(void) { if (lookahead == 'b') { match('b'); U(); } else if (lookahead == 'a') { match('a'); U(); } else { error(); } } void U(void) { match('a'); match('b'); match('c'); } int main(void) { printf("Badparser 1. Enter input. End with EOF (CTRL-D on Linux).\n"); lookahead = scan(); // To get started, so we have something to compare to S(); if (lookahead != EOF) error(); printf("Done!\n"); return EXIT_SUCCESS; }
a b b c a b b a c a b b a a c a b b a a a c ...Or, with a regular expression: abba*c
b) The problem:
There is left recursion (in Swedish: vänsterrekursion) in this production for the non-terminal T:
T -> T aWhen the parser finds input matching a T, it will then try to parse the right side of the production (T a), which starts with a T, so it will call the function T, which will call itself, and so on. The program will be stuck in an infinite recursion.
void T(void) { if (lookahead == 'b') { T(); match('a'); } ...
c) Transformation:
Elimination of left recursion
d) The old grammar for the non-terminal T:
T -> T a T -> bis transformed to:
T -> b Rest Rest -> a Rest | empty
e) New parser:
// Correct parser: goodparser-2.c #include <stdio.h> #include <ctype.h> #include <stdlib.h> int lookahead; void error(void) { printf("Syntax error: unexpected token '%c'\n", lookahead); exit(EXIT_FAILURE); } // A simple scanner. // Returns one of the tokens 'a', 'b' or 'c', or EOF. Other input is ignored. int scan(void) { int input; while ((input = getchar()) != 'a' && input != 'b' && input != 'c' && input != EOF) ; return input; } void match(int expected) { if (lookahead == expected) lookahead = scan(); else error(); } // These are the non-terminals, calling each other recursively void S(void), T(void), Rest(void); void S(void) { match('a'); match('b'); T(); match('c'); } void T(void) { match('b'); Rest(); } void Rest(void) { if (lookahead == 'a') { match('a'); Rest(); } else { // Any input is OK, since we can fit an empty sequence anywhere } } int main(void) { printf("Badparser 2. Enter input. End with EOF (CTRL-D on Linux).\n"); lookahead = scan(); // To get started, so we have something to compare to S(); if (lookahead != EOF) error(); printf("Done!\n"); return EXIT_SUCCESS; }
a b a b c c
b) The problem:
The grammar is ambiguous (in Swedish: tvetydig). The input a b a b c c can be parsed in two different ways:
c) Yes, the parser below parses the language correctly. There is no problem with the language, which just consists of the only valid input a b a b c c. The problem is with how it is to be matched using the productions in the grammar. Is a b c a T, or is it a U that is, in turn, a T?
d) A new, non-ambiguous, grammar that describes the same language:
S -> a b T c T -> a b cAnother non-ambiguous grammar that also describes the same language:
S -> a b T c T -> U U -> a b cYet another non-ambiguous grammar that describes the same language is this, which is the simplest grammar for the language:
S -> a b a b c c
Follow-up question: With these three different grammars, the same input a b a b c c can be parsed in three different ways. Doesn't that mean that it is ambiguous? Do we still have the same problem that we tried to solve?
e) A new parser that implements the first new grammar given above, but still the same language as before:
// A simple parser: goodparser-3.c #include <stdio.h> #include <ctype.h> #include <stdlib.h> int lookahead; void error(void) { printf("Syntax error: unexpected token '%c'\n", lookahead); exit(EXIT_FAILURE); } // A simple scanner. // Returns one of the tokens 'a', 'b' or 'c', or EOF. Other input is ignored. int scan(void) { int input; while ((input = getchar()) != 'a' && input != 'b' && input != 'c' && input != EOF) ; return input; } void match(int expected) { if (lookahead == expected) lookahead = scan(); else error(); } // These are the non-terminals, calling each other recursively void S(void), T(void); void S(void) { match('a'); match('b'); T(); match('c'); } void T(void) { match('a'); match('b'); match('c'); } int main(void) { printf("Goodparser 3. Enter input. End with EOF (CTRL-D on Linux).\n"); lookahead = scan(); // To get started, so we have something to compare to S(); if (lookahead != EOF) error(); printf("Done!\n"); return EXIT_SUCCESS; }