Today:
More about Yacc. Building parse trees.
Aho et al, section 4.9.
KP p 77-84.
Thomas Niemann:
A Compact Guide to Lex & Yacc
(only the Yacc parts).
Some repetition from last time:
%{ #include "global.h" extern int tokenval; extern void yyerror(char*); %} %token DONE ID NUM DIV MOD %% start: list DONE list: expr ';' list | /* empty */ ; expr: expr '+' term { printf("+"); } | term ; term: term '*' factor { printf("*"); } | term MOD factor { printf("MOD"); } | factor ; factor: '(' expr ')' | ID { printf("%s", symtable[tokenval].lexptr); } | NUM { printf("%d", tokenval); } ; %% void yyerror(char *s) { fprintf(stderr, "%s\n", s); } int yylex(void) { return lexan(); } void parse() { yyparse(); }
A complete program. bison + cc. (Or: bison + g++)
%{ #include <stdlib.h> /* Required to compile with C++ */ #include <stdio.h> #include <ctype.h> extern int yyparse(); /* Required to compile with C++ */ extern void yyerror(char*); /* Required to compile with C++ */ extern int yylex(void); /* Required to compile with C++ */ %} %token DIGIT %% line: expr '\n' { printf("%d\n", $1); } ; expr: expr '+' term { $$ = $1 + $3; } | term ; term: term '*' factor { $$ = $1 * $3; } | factor ; factor: '(' expr ')' { $$ = $2; } | DIGIT ; %% int yylex(void) { int c; c = getchar(); if (isdigit(c)) { yylval = c - '0'; return DIGIT; } return c; } void yyerror(char *s) { fprintf(stderr, "%s\n", s); } int main() { yyparse(); return 0; }
yylval:
"{ $$ = $1; }" is the default.factor : '(' expr ')' { $$ = $2; } expr : expr + expr { $$ = $1 + $3; }
Declaring precedence and associativity:%token NUMBER %left '+' '-' %left '*' '/' %right UMINUS ... expr : expr '+' expr { $$ = $1 + $3; } | expr '-' expr { $$ = $1 - $3; } | expr '*' expr { $$ = $1 * $3; } | expr '/' expr { $$ = $1 * $3; } | '(' expr ')' { $$ = $2; } | '-' expr %prec UMINUS { $$ = -$2; }
Some of the following examples are adapted from Thomas Niemann: A Compact Guide to Lex & Yacc.
Right recursion:list: item | list ',' item ;
list: item | item ',' list ;
Yacc does the right thing, but gives a warning about a shift/reduce conflict. Use precedence to avoid the warning:stmt: IF expr stmt | IF expr stmt ELSE stmt | ...
%nonassoc IFX %nonassoc ELSE stmt: IF expr stmt %prec IFX | IF expr stmt ELSE stmt | ...
If nothing else matches, the token error will match everything until the first of (in this case) semicolon or right curly bracket.void yyerror(char *s) { fprintf(stderr, "line %d: %s\n", yylineno, s); }
stmt: ';' | expr ';' | PRINT expr ';' | VARIABLE '=' expr '; | WHILE '(' expr ')' stmt | IF '(' expr ')' stmt %prec IFX | IF '(' expr ')' stmt ELSE stmt | '{' stmt_list '}' | error ';' | error '}' ;
expr: expr '+' expr { $$ = $1 + $3; };
decl: type varlist; type: INT | FLOAT; varlist: VAR { setType($1, $0); } | varlist ',' VAR { setType($3, $0); } ;
list: item1 { do_item1($1); } item2 { do_item2($3); } item3
Later:
C++:#define MAX_ARGS 4 enum TreeNodeType { IF, WHILE, PLUS, MINUS, TIMES }; struct TreeNode { enum TreeNodeType type; struct TreeNode* args[MAX_ARGS]; };
class ParseTreeNode { // ... }; class Stmt : public ParseTreeNode { // ... }; class If : public Stmt { ParseTreeNode* condition; ParseTreeNode* then_part; ParseTreeNode* else_part; };