KOI: Lab Exercise 3
Creating a parser with Yacc.
(Or actually with Bison, which is the version of Yacc that we will be using.
Yacc and Bison are very similar, but there are some differences.)
Resources:
Yacc and Bison
The original version of Yacc created a C file called y.tab.c,
with compilable C code,
and a header file called y.tab.h, with definitions of token codes.
With Bison,
you need to use the command-line argument "-d" to get a ".h" file.
Also, Bison doesn't use the fixed names y.tab.c and y.tab.h,
but instead uses the same base name as that of the input file,
with .tab.c and .tab.h appended.
For example, the command
bison -d dumscript.y
will create the files dumscript.tab.c and dumscript.tab.h.
Yacc is very often used together with a scanner generator called Lex,
and Bison is very often used together with a scanner generator called Flex.
Flex is very similar to Lex.
Bison and Flex in various environments
Unix-like systems, such as Linux, usually come with these tools already installed.
You can run them using the commands bison and flex in a command shell.
On Windows machines, you have to download and install them separately.
If you are managing your own Windows machine,
you can download and then install a current version of
Bison for Windows.
At the same time, you probably want to download and install
Flex for Windows.
(Local copies from 2007-09-14:
Bison,
Flex)
In the computer room T120 at the university,
Bison and Flex have been installed and can be run directly,
using the commands bison and flex in a command window.
The versions are rather old, but they seem to be working.
You may have to set the environment variable TEMP to a directory
where you have write permissions,
such as C:/Temp/ (note the trailing slash).
|
To build a project that contains Bison or Flex specifications,
you can run the commands by hand,
and then compile and link the C files,
but it is strongly advised that you automate this process as much as you can.
A simple script ("batch file") may be enough, but more advanced methods are available.
In Unix-like systems,
you would probably use make to automate the build process.
You can do similar things in Visual Studio.
Here is an introduction on how to use Flex and Bison together with Visual Studio 2005:
Custom building and code generators in Visual Studio 2005.
Part A: Bison and Flex
Try out a simple example, which implements a calculator that understands
simple expressions.
Download the parser specification example.y and scanner specification example.lex
in this ZIP file:
example.zip
Unpack, study the files, make a project, compile, and run it.
(You can either look in the file Makefile for some commands to run,
ur you can set up rules in Visual Studio.)
- Which types of expressions does it understand?
- One common operator is missing. Which one?
- Add the missing operator!
Part B: The calculator
Replace your hand-coded parser
in the 2.9 program
from lab exercise 2
with a Yacc-generated parser.
The program should still generate postfix output and calculate the result.
With a hand-coded parser, it was difficult to handle both assignments and expressions.
Let your Yacc grammar handle both, and see how easy it is!
Some more things to do:
-
You can declare yyerror in the "definitions" part of
the Yacc input file: extern void yyerror(char*);
-
We must also define yyerror somewhere in the program,
and a good place to do that is in the "subroutines" part of the Yacc input file.
-
Yacc expects the scanner to be called yylex,
since this is the name of the function generated by Lex and Flex.
Define such a function (again, in the "subroutines" part of the Yacc input file),
and let it call the old scanner function, lexan.
-
The parser function generated by Yacc will be called yyparse,
but the program in which we will plug it in
expects the parser function to be called parse.
One way to handle this is to define a parse function,
which just calls yyparse.
-
Yacc generates its own token codes for ID, NUM etc.
Obviously, the scanner must use the same token codes, and not the
(different) ones that we used before.
Therefore you should change the header file global.h,
and replace the old definitions of token codes
with an #include of the Yacc-generated header file something.tab.h.
There is a sample Yacc input file that (sort of) works with the 2.9 program
in the lecture notes for
lecture 5.
Part C: More operators
Implement the following operators from C and C++, in the grammar,
in the postfix translator, and in the calculator:
- % (a synonym for mod)
- & (bitwise and)
- | (bitwise or)
- <
- >
- ?: (as in expr1 ? expr2 : expr3)
Multi-character operators, such as == and ++,
would require changing the scanner, so for now we wait with those.
In C and in C++, the ?: operator only evaluates one of the expressions
expr2 and expr3.
If you want, you can instead let your operator evaluate both expr2 and expr3.
Report
Show your results and discuss them with the teacher,
or,
send an
e-mail
with clear and full explanations of what you have done.
(Send your e-mail in plain text format, not as HTML or Word documents.
Do not use attachments.)
Include the source code, with your changes clearly marked.
Even if you don't send a report by e-mail,
we advise that you write down your answers,
to facilitate communication and for your own later use.
Thomas Padron-McCarthy
(Thomas.Padron-McCarthy@tech.oru.se)
October 9, 2007