KOI: Lab Exercise 3
Creating a parser with Yacc.
(Or actually with Bison, which is the version of Yacc that we will be using.
Yacc and Bison are very similar, but there are some differences.)
Resources:
Yacc and Bison
The original version of Yacc created a C file called y.tab.c,
with compilable C code,
and a header file called y.tab.h, with definitions of token codes.
With Bison,
you need to use the command-line argument "-d" to get a ".h" file.
Also, Bison doesn't use the fixed names y.tab.c and y.tab.h,
but instead uses the same base name as that of the input file,
with .tab.c and .tab.h appended.
For example, let's say we are building a parser
for a language called DumScript. The command
bison -d dumscript.y
will create the files dumscript.tab.c and dumscript.tab.h.
Yacc is very often used together with a scanner generator called Lex,
and Bison is very often used together with a scanner generator called Flex.
Flex is very similar to Lex.
(Note that there are several other computer-related things that are called Flex,
which have nothing to do with the scanner generator Flex.)
Bison and Flex in various environments
Some Unix and Unix-like systems, such as various Linux distributions, come with these tools already installed.
On Ubuntu 14.04, you need to install them, which can be done with the command sudo apt-get install bison flex.
You can then run them using the commands bison and flex in a command shell.
On Windows, you have to download and install them separately,
and it may be a bit difficult to get them to work well with Visual Studio.
There are some Windows-specific instructions in
the old Windows version of these assignments.
Bison should generate the C files
something.tab.c and something.tab.h,
and Flex should generate the C file
lex.yy.c.
You must re-run Bison and Flex whenever you have made changes to their input files,
so they generate new C source and header files.
To build a project that contains Bison or Flex specifications,
you can run the commands by hand,
and then compile and link the C files,
but you should automate this process as much as you can.
A simple script ("batch file") may be enough, but more advanced methods are available.
On Unix-like systems such as Linux you can use make to automate the build process.
(See below.)
Part A: Bison and Flex
Try out a simple example, which implements a calculator that understands
simple expressions.
Download the parser specification
example.y
and the scanner specification
example.l,
or both (and a
Makefile)
in
example.zip.
Unpack, study the files, make a project, compile, and run it.
(You can either just type make to build the program,
using the provided
Makefile,
or look in that file for commands to give.
- Study the Makefile,
Which commands used to build example are written explicitly in the file?
Are there some commands that make will deduce by itself?
- Which types of expressions does the calculator understand?
- One common operator is missing. Which one?
- Add the missing operator!
Part B: The calculator
Replace your hand-coded parser
in the 2.9 program
from lab exercise 2
with a Bison-generated parser.
For the user, it should work just like the program in lab 2,
handling a sequence of assignments,
including the exponential operator.
With a hand-coded parser, it was difficult to handle both assignments and expressions.
Let your Bison grammar handle both, and see how easy it is!
Some more things to do:
-
You can declare yyerror in the "definitions" part of
the Bison input file: extern void yyerror(char*);
-
We must also define yyerror somewhere in the program,
and a good place to do that is in the "subroutines" part of the Bison input file.
-
Bison expects the scanner to be called yylex,
since this is the name of the function generated by Lex and Flex.
Define such a function (again, in the "subroutines" part of the Bison input file),
and let it call the old scanner function, lexan.
-
The parser function generated by Bison will be called yyparse,
but the program in which we will plug it in
expects the parser function to be called parse.
One way to handle this is to define a parse function,
which just calls yyparse.
-
Bison generates its own token codes for ID, NUM etc.
Obviously, the scanner must use the same token codes, and not the
(different) ones that we used before.
Therefore you should change the header file global.h,
and replace the old definitions of token codes
with an #include of the Bison-generated header file something.tab.h.
There is a sample Bison input file that (sort of) works with the 2.9 program
in the lecture notes for
lecture 5.
Part C: More operators
Implement the following operators from C and C++, in the grammar,
in the postfix translator, and in the calculator:
- % (a synonym for mod)
- & (bitwise and)
- | (bitwise or)
- <
- >
- ?: (as in expr1 ? expr2 : expr3)
Multi-character operators, such as == and ++,
would require changing the scanner, so for now we wait with those.
The operators should have reasonable priorities and associativities.
For example, the expression 2 + 3 < 4 + 5
should be interpreted as (2 + 3) < (4 + 5) and result in 1,
or however you decide to represent a true result.
It should not be interpreted as 2 + (3 < 4) + 5 and result in 8,
or as ((2 + 3) < 4) + 5 and result in 5.
In C and in C++, the ?: operator only evaluates one of the expressions
expr2 and expr3.
If you want, you can instead let your operator evaluate both expr2 and expr3.
(And please don't call it "the ternary operator".
It is a ternary operator, which just means that it takes three arguments,
and it is the only ternary operator in C,
but its real name is the conditional operator.)
Part D: Bison and C++
Make a copy of your entire project, and then compile this copy program, including the Bison-generated parser,
as C++ instead of C.
The original Yacc was designed to accept C code in the semantic actions,
but later versions, such as Bison, also allows C++.
If the input file ends with .ypp instead of .y,
Bison will
automatically give its output file a C++ extension.
For example, if you call the input file language.ypp,
the command bison -d language.ypp
will generate the output files
language.tab.cpp and language.tab.hpp.
C++ is more picky with declarations than C is,
so you may have to add some declarations
in the definitions part of the Bison input file.
Something like this might work:
%{
#include <stdlib.h>
#include "global.h"
extern int tokenval;
extern void yyerror(char*);
extern int yylex();
%}
To avoid linking problems when mixing C and C++ source files,
change all other C files that your program uses
(main.c, lexer.c etc)
to have C++ extensions (main.cpp, lexer.cpp etc).
(And yes, C and C++ are two different languages.
You have to be a bit careful if you mix them in the same project.)
Report
Show your results and discuss them with the teacher,
or,
send an
e-mail
with clear and full explanations of what you have done,
and answers to all the questions on this page.
If you use attachments, send them as PDF and not Word.
Also send the source code, as a Zip file or equivalent.
Even if you don't send a report by e-mail,
we advise that you write down your answers,
to facilitate communication and for your own later use.
Thomas Padron-McCarthy
(thomas.padron-mccarthy@oru.se)
November 6, 2024