Today:
Creating a parser with Yacc.
Aho et al, section 4.9.
KP p 77-84.
Thomas Niemann:
A Compact Guide to Lex & Yacc
(only the Yacc parts).
"FIRST()-conflicts" in a recursive-descent parser:
statement -> assignment | expr assignment -> id = expr <-- Always starts with id expr -> term moreterms <-- Can start with id ...
Found | What is next? | That would mean "a" is the start of |
---|---|---|
a | = 5 ... + 5 ... | An assignment An expression |
Can be handled by two-token lookahead instead of just one-token.
Found | What is next? | That would mean "a[3+4+z]" is the start of |
---|---|---|
a[3+4+z] | = 5 ... + 5 ... | An assignment An expression |
We may need any number of tokens lookahead.
Instead: Try one of the rules. Backtrack if it (finally) fails, and try the other one.
Found | What is next? | That would mean "b[5+6]" is the start of |
---|---|---|
a[b[5+6] | = 5 ... + 5 ... | An assignment An expression |
Now we have a tree of choices. Backtrack to the previous "choice-point".
Another example: if statement with optional else part
Remember the trick for removing left recursion? A similar one can be used here, calledstmt -> if ( expr ) stmt | if ( expr ) stmt else stmt
How to left-factor. Rewrite these two (yes, two) productions:stmt -> if ( expr ) stmt optional-else optional-else -> else stmt | empty
into these three (yes, three) productions:A -> x y | x z
To build a predictive recursive-descent parser, first rewrite your grammar:A -> x R R -> y | z
But for simple grammars, like a calculator, or a search field. (Could be difficult to plug in Yacc into your ASP page or your Perl script.)
Search phrase | SQL |
---|---|
cinema |
select d.Nr
from Document d, Occurs o, Word w where d.Nr = o.Document and o.Word = w.Number and w.Text = 'cinema' |
cinema Örebro |
select Document.Nr
from Document d, Occurs o1, Occurs o2, Word w1, Word w2 where (d.Nr = o1.Document and o1.Word = w1.Number and w1.Text = 'cinema') and (d.Nr = o2.Document and o2.Word = w2.Number and w2.Text = Örebro') |
(cinema or museum) and Örebro |
select Document.Nr
from Document d, Occurs o1, Occurs o2, Occurs o3, Word w1, Word w2, Word w3 where ((d.Nr = o1.Document and o1.Word = w1.Number and w1.Text = 'cinema') or (d.Nr = o2.Document and o2.Word = w2.Number and w2.Text = Örebro')) and (d.Nr = o3.Document and o3.Word = w3.Number and w3.Text = Örebro') |
First: How to do bottom-up parsing.
Derivation, deriving = generating a token string from the grammar. Reducing = finding which rules to apply to reduce the token string to the start symbol.
This example adapted from Thomas Niemann:
Ambiguous grammar: | Could be transformed to: |
---|---|
E -> E + E (rule 1) E -> E * E (rule 2) E -> id (rule 3) |
E -> E + T | T T -> T * F | F F -> id |
(We use the ambiguous grammar.)
Generate, or derive, the string "x + y + z" from the start symbol E, by applying the rules 1-3:
Expression | And now use rule... |
---|---|
E | 2 |
E * E | 3 |
E * z | 1 |
E + E * z | 3 |
E + y * z | 3 |
x + y * z |
Note: (x + y) * z
Reduce the string "x + y + z" to the start symbol E: by applying the rules 1-3:
Stack | Input left | Action... | Conflicts |
---|---|---|---|
x + y * z | shift | ||
x | + y * z | reduce with r3 | |
E | + y * z | shift | |
E + | y * z | shift | |
E + y | * z | reduce with r3 | |
E + E | * z | shift | shift/reduce (r1) |
E + E * | z | shift | |
E + E * z | reduce with r3 | ||
E + E * E | reduce with r2 | reduce/reduce (r1 and r2) | |
E + E | reduce with r1 | ||
E | (done!) |
Note: x + (y * z)
stmt -> if ( expr ) stmt | if ( expr ) stmt else stmt
String: "x". x is a T is an E, or x is an E.E -> T T -> id E -> id
Real example (ASU page 252):
Rule | Example | Output |
---|---|---|
text -> text sup text | x sup y | xy |
text -> text sub text | x sub 3 | x3 |
text -> text sub text sup text | x sub 3 sup y | y x 3 |
Third rule just to make the combination of sub and sup look good.
Yacc reduces according to the first of the two (or more) conflicting productions in the file. (Yacc gives a warning when it compiles the grammar.)
Example Yacc input file (download):More Yacc:
%{ #include "global.h" extern int tokenval; extern void yyerror(char*); %} %token DONE ID NUM DIV MOD %% start: list DONE list: expr ';' list | /* empty */ ; expr: expr '+' term { printf("+"); } | term ; term: term '*' factor { printf("*"); } | term MOD factor { printf("MOD"); } | factor ; factor: '(' expr ')' | ID { printf("%s", symtable[tokenval].lexptr); } | NUM { printf("%d", tokenval); } ; %% void yyerror(char *s) { fprintf(stderr, "%s\n", s); } int yylex(void) { return lexan(); } void parse() { yyparse(); }
yylval
factor : '(' expr ')' { $$ = $2; } expr : expr + expr { $$ = $1 + $3; }
Later: