Kompilatorer och interpretatorer: Lecture 4

Note: This is an outline of what I intend to say on the lecture. It is not a definition of the course content, and it does not replace the textbook.

Today: Creating a parser with Yacc.
Aho et al, section 4.9. KP p 77-84. Thomas Niemann: A Compact Guide to Lex & Yacc (only the Yacc parts).

Left factoring

(ASU 4.3, "Writing a grammar", p 178-179. Swedish: vänsterfaktorisering.)

"FIRST()-conflicts" in a recursive-descent parser:

statement -> assignment | expr
assignment -> id = expr     <-- Always starts with id
expr -> term moreterms      <-- Can start with id
...

Found	What is next?	That would mean "a" is the start of
a	= 5 ... + 5 ...	An assignment An expression

Can be handled by two-token lookahead instead of just one-token.

Found	What is next?	That would mean "a[3+4+z]" is the start of
a[3+4+z]	= 5 ... + 5 ...	An assignment An expression

We may need any number of tokens lookahead.
Instead: Try one of the rules. Backtrack if it (finally) fails, and try the other one.

Found	What is next?	That would mean "b[5+6]" is the start of
a[b[5+6]	= 5 ... + 5 ...	An assignment An expression

Now we have a tree of choices. Backtrack to the previous "choice-point".

Another example: if statement with optional else part

stmt ->   if ( expr ) stmt
        | if ( expr ) stmt else stmt

Remember the trick for removing left recursion? A similar one can be used here, called left factoring:

stmt ->   if ( expr ) stmt optional-else
optional-else ->   else stmt | empty

How to left-factor. Rewrite these two (yes, two) productions:

A -> x y | x z

into these three (yes, three) productions:

A -> x R
R -> y | z

To build a predictive recursive-descent parser, first rewrite your grammar:

Un-ambiguous
No left recursion
"FIRST()-disjointness"

Can be difficult, so for more complicated parsers, we use tools such as Yacc.

But for simple grammars, like a calculator, or a search field. (Could be difficult to plug in Yacc into your ASP page or your Perl script.)

Search phrase	SQL
cinema	select d.Nr from Document d, Occurs o, Word w where d.Nr = o.Document and o.Word = w.Number and w.Text = 'cinema'
cinema Örebro	select Document.Nr from Document d, Occurs o1, Occurs o2, Word w1, Word w2 where (d.Nr = o1.Document and o1.Word = w1.Number and w1.Text = 'cinema') and (d.Nr = o2.Document and o2.Word = w2.Number and w2.Text = Örebro')
(cinema or museum) and Örebro	select Document.Nr from Document d, Occurs o1, Occurs o2, Occurs o3, Word w1, Word w2, Word w3 where ((d.Nr = o1.Document and o1.Word = w1.Number and w1.Text = 'cinema') or (d.Nr = o2.Document and o2.Word = w2.Number and w2.Text = Örebro')) and (d.Nr = o3.Document and o3.Word = w3.Number and w3.Text = Örebro')

4.9 Parser generators

Yacc.

First: How to do bottom-up parsing.

Derivation, deriving = generating a token string from the grammar. Reducing = finding which rules to apply to reduce the token string to the start symbol.

This example adapted from Thomas Niemann:

Ambiguous grammar:	Could be transformed to:
E -> E + E (rule 1) E -> E * E (rule 2) E -> id (rule 3)	E -> E + T \| T T -> T * F \| F F -> id

(We use the ambiguous grammar.)

Generate, or derive, the string "x + y + z" from the start symbol E, by applying the rules 1-3:

Expression	And now use rule...
E	2
E * E	3
E * z	1
E + E * z	3
E + y * z	3
x + y * z

Note: (x + y) * z

Reduce the string "x + y + z" to the start symbol E: by applying the rules 1-3:

Stack	Input left	Action...	Conflicts
	x + y * z	shift
x	+ y * z	reduce with r3
E	+ y * z	shift
E +	y * z	shift
E + y	* z	reduce with r3
E + E	* z	shift	shift/reduce (r1)
E + E *	z	shift
E + E * z		reduce with r3
E + E * E		reduce with r2	reduce/reduce (r1 and r2)
E + E		reduce with r1
E		(done!)

Note: x + (y * z)

Shift-reduce conflict

stmt ->   if ( expr ) stmt
        | if ( expr ) stmt else stmt

if (x < 2) foo(); // no problem
if (x < 2) foo(); else fum(); // still no problem
if (x < 2) if (y > 3) zip(); else boom(); // shift/reduce

Yacc always shifts. (Yacc gives a warning when it compiles the grammar.)

Reduce-reduce conflict

Simple example:

E -> T
T -> id
E -> id

String: "x". x is a T is an E, or x is an E.

Real example (ASU page 252):

Rule	Example	Output
text -> text sup text	`x sup y`	x^y
text -> text sub text	`x sub 3`	x₃
text -> text sub text sup text	`x sub 3 sup y`	y x 3

Third rule just to make the combination of sub and sup look good.

Yacc reduces according to the first of the two (or more) conflicting productions in the file. (Yacc gives a warning when it compiles the grammar.)

Ok, now: Yacc

yacc
language.y
y.tab.c
cc
y.tab.o
y.tab.h
lex

Example Yacc input file (download):

%{
  #include "global.h"
  extern int tokenval;
  extern void yyerror(char*);
%}

%token DONE ID NUM DIV MOD

%%

start: list DONE

list: expr ';' list
        | /* empty */
        ;

expr: expr '+' term { printf("+"); }
       | term
       ;

term: term '*' factor { printf("*"); }
       | term MOD factor { printf("MOD"); }
       | factor
       ;

factor: '(' expr ')'
       | ID { printf("%s", symtable[tokenval].lexptr); }
       | NUM { printf("%d", tokenval); }
       ;

%%

void yyerror(char *s) {
    fprintf(stderr, "%s\n", s);
}

int yylex(void) {
  return lexan();
}

void parse() {
  yyparse();
}

More Yacc:

yylval

factor : '(' expr ')' { $$ = $2; }
expr : expr + expr { $$ = $1 + $3; }

Later:

2.8 Abstract stack machines

Thomas Padron-McCarthy (Thomas.Padron-McCarthy@tech.oru.se) February 2, 2003