KOI: Lab Exercise 3
Creating a parser with Yacc.
Resources:
-
Thomas Niemann:
A Compact Guide to Lex & Yacc
is a good introduction to Yacc.
-
PC versions of Bison and Flex can be downloaded from the same site.
Click on Overview in the navigation bar to the left,
and then choose "my version of Lex and Yacc".
(Or use these local copies of the executables and the source:
gnu.zip,
gnusrc.zip.)
-
Lecture 5
in this course was about Yacc.
About this version of Yacc
Unpack the ZIP file gnu.zip. It contains versions of both Yacc and Lex.
The Yacc program is GNU Bison version 1.25,
slightly modified to compile on Windows.
Assuming that you unzip gnu.zip in a directory called M:\Yacc,
then the Bison executable will be called
M:\Yacc\bin\bison.exe
You can add the directory with the executable to your Windows PATH variable:
path=M:\Yacc\bin;%path%
When generating its C output file, Bison needs a file called
bison.simple
(or, for certain types of difficult grammars, bison.hairy),
which is located in the same directory as the executable.
You can give this file as a command-line argument to Bison,
as in
bison -S "M:\Yacc\bin\bison.simple"
or you can set the environment variable BISON_SIMPLE:
set BISON_SIMPLE=M:\Yacc\bin\bison.simple
The original version of Yacc created a C file called y.tab.c,
with compilable C code,
and a header file called y.tab.h, with definitions of token codes.
With this version of Bison,
you need to use the command-line argument "-d" to get a ".h" file.
Also, Bison doesn't use the fixed names y.tab.c and y.tab.h,
but instead uses the same base name as that of the input file,
with .tab.c and .tab.h appended.
For example, the command
bison -d language.y
will create the files language.tab.c and language.tab.h.
Some more things to do:
-
You must declare yyerror in the "definitions" part of
the Yacc input file: extern void yyerror(char*);
-
We must also define yyerror somewhere in the program,
and a good place to do that is in the "subroutines" part of the Yacc input file.
-
Yacc expects the scanner to be called yylex.
Define such a function (again, in the "subroutines" part of the Yacc input file),
and let it call the old scanner function, lexan.
-
The parser function generated by Yacc will be called yyparse,
but the program in which we will plug it in
expects the parser function to be called parse.
One way to handle this is to define a parse function,
which just calls yyparse.
-
Yacc generates its own token codes for ID, NUM etc.
Obviously, the scanner must use the same token codes, and not the
(different) ones that we used before.
Therefore you should change the header file global.h,
and replace the old definitions of token codes
with an #include of the Yacc-generated header file language.tab.h.
There is a sample Yacc input file that (sort of) works with the 2.9 program
in the lecture notes for
lecture 5.
Part A: The calculator
Replace your hand-coded parser from
lab exercise 2
with a Yacc-generated parser.
The program should still generate postfix output and calculate the result.
With a hand-coded parser, it was difficult to handle both assignments
and expressions. Ley your Yacc grammar handle both,
and see how easy it is!
Part B: More operators
Implement the following operators from C and C++, in the grammar,
in the postfix translator, and in the calculator:
- % (a synonym for mod)
- & (bitwise and)
- | (bitwise or)
- ?: (as in expr1 ? expr2 : expr3)
- <
- >
Multi-character operators, such as == and ++,
would require changing the scanner, so for now we wait with those.
In C and in C++, the ?: operator only evaluates one of the expressions
expr2 and expr3. Let your operator evaluate both
expr2 and expr3
Part C: Yacc and C++
Compile the program, including the Yacc-generated parser,
as C++ instead of C.
The original Yacc was designed to accept C code in the semantic actions,
but later versions, such as Bison, also allows C++.
The output from Bison, called for example language.tab.c,
must be renamed to for example language.tab.cpp.
(Some versions of Yacc, for example Bison version 1.28 for Linux,
automatically genrates a file with a C++ extension
if the input file ends with .yy instead of .y,
but the PC Bison used in the labs doesn't.)
For everything to compile cleanly, you may have to add some declarations
in the definitions part of the Yacc input file:
%{
#include <stdlib.h>
#include "global.h"
extern int tokenval;
extern void yyerror(char*);
extern int yylex();
static void *alloca(size_t size) { return malloc(size); }
%}
To avoid linking problems when mixing C and C++ in Borland C++,
change all other C files that your program uses
(main.c, lexer.c etc)
to have C++ extensions (main.cpp, lexer.cpp etc).
(And yes, C and C++ are two different languages.
You have to be a bit careful if you mix them in the same project.)
Report
Show your results and discuss them with the teacher,
or,
send an
e-mail
with clear and full explanations of what you have done.
(Send your e-mail in plain text format, not as HTML or Word documents.)
Include the source code, with your changes clearly marked.
Thomas Padron-McCarthy
(Thomas.Padron-McCarthy@tech.oru.se)
January 12, 2004