KOI: Lab Exercise 3
Creating a parser with Yacc.
(Or actually with Bison, which is the version of Yacc that we will be using.
Yacc and Bison are very similar, but there are some differences.)
Resources:
-
Thomas Niemann:
A Compact Guide to Lex & Yacc
is a good introduction to Yacc.
-
Bison and Flex are available in Linux, so you don't need to install any programs.
However, if you ever want to use them in Windows,
Windows versions of Bison and Flex can be downloaded from Thomas Niemann's site.
Click on Overview in the navigation bar to the left,
and then choose "my version of Lex and Yacc".
(Or use these local copies of the executables and the source:
gnu.zip,
gnusrc.zip.)
The
old Windows version
of this lab has some more instructions about how to use the Yacc in a Windows environment.
-
Lecture 5
in this course was about Yacc.
About Bison
The original version of Yacc created a C file called y.tab.c,
with compilable C code,
and a header file called y.tab.h, with definitions of token codes.
With Bison,
you need to use the command-line argument "-d" to get a ".h" file.
Also, Bison doesn't use the fixed names y.tab.c and y.tab.h,
but instead uses the same base name as that of the input file,
with .tab.c and .tab.h appended.
For example, the command
bison -d language.y
will create the files language.tab.c and language.tab.h.
Some more things to do:
-
You should declare yyerror in the "definitions" part of
the Yacc input file: extern void yyerror(char*);
-
We must also define yyerror somewhere in the program,
and a good place to do that is in the "subroutines" part of the Yacc input file.
-
Yacc expects the scanner to be called yylex.
Define such a function (again, in the "subroutines" part of the Yacc input file),
and let it call the old scanner function, lexan.
-
The parser function generated by Yacc will be called yyparse,
but the program in which we will plug it in
expects the parser function to be called parse.
One way to handle this is to define a parse function,
which just calls yyparse.
-
Yacc generates its own token codes for ID, NUM etc.
Obviously, the scanner must use the same token codes, and not the
(different) ones that we used before.
Therefore you should change the header file global.h,
and replace the old definitions of token codes
with an #include of the Yacc-generated header file something.tab.h.
There is a sample Yacc input file that (sort of) works with the 2.9 program
in the lecture notes for
lecture 5.
Part A: The calculator
Replace your hand-coded parser from
lab exercise 2
with a Yacc-generated parser.
The program should still generate postfix output and calculate the result.
With a hand-coded parser, it was difficult to handle both assignments
and expressions. Ley your Yacc grammar handle both,
and see how easy it is!
Part B: More operators
Implement the following operators from C and C++, in the grammar,
in the postfix translator, and in the calculator:
- % (a synonym for mod)
- & (bitwise and)
- | (bitwise or)
- ?: (as in expr1 ? expr2 : expr3)
- <
- >
Multi-character operators, such as == and ++,
would require changing the scanner, so for now we wait with those.
In C and in C++, the ?: operator only evaluates one of the expressions
expr2 and expr3. Let your operator evaluate both
expr2 and expr3
Part C: Yacc and C++
Compile the program, including the Yacc-generated parser,
as C++ instead of C.
The original Yacc was designed to accept C code in the semantic actions,
but later versions, such as Bison, also allows C++.
If the input file ends with .ypp instead of .y,
Bison will
automatically give its output file a C++ extension.
For example, if you call the input file language.ypp,
the command bison -d language.ypp
will generate the output files
language.tab.cpp and language.tab.hpp.
C++ is more picky with declarations than C is,
so you may have to add some declarations
in the definitions part of the Yacc input file.
We suggest something like this:
%{
#include <stdlib.h>
#include "global.h"
extern int tokenval;
extern void yyerror(char*);
extern int yylex();
%}
To avoid linking problems when mixing C and C++ in Borland C++,
change all other C files that your program uses
(main.c, lexer.c etc)
to have C++ extensions (main.cpp, lexer.cpp etc).
(And yes, C and C++ are two different languages.
You have to be a bit careful if you mix them in the same project.)
Report
Show your results and discuss them with the teacher,
or,
send an
e-mail
with clear and full explanations of what you have done.
(Send your e-mail in plain text format, not as HTML or Word documents.
Do not use attachments.)
Include the source code, with your changes clearly marked.
Even if you don't send a report by e-mail,
we advise that you write down your answers,
to facilitate communication and for your own later use.
Thomas Padron-McCarthy
(Thomas.Padron-McCarthy@tech.oru.se)
January 18, 2004