KOI: Lab Exercise 3

Creating a parser with Yacc. (Or actually with Bison, which is the version of Yacc that we will be using. Yacc and Bison are very similar, but there are some differences.)

Resources:

Thomas Niemann: A Compact Guide to Lex & Yacc is a good introduction to Yacc.
Lecture 5 in this course is about Yacc.
Bison for Windows
Flex for Windows

Yacc and Bison

The original version of Yacc created a C file called y.tab.c, with compilable C code, and a header file called y.tab.h, with definitions of token codes. With Bison, you need to use the command-line argument "-d" to get a ".h" file. Also, Bison doesn't use the fixed names y.tab.c and y.tab.h, but instead uses the same base name as that of the input file, with .tab.c and .tab.h appended. For example, let's say we are building a parser for a language called DumScript. The command

bison -d dumscript.y

will create the files dumscript.tab.c and dumscript.tab.h.

Yacc is very often used together with a scanner generator called Lex, and Bison is very often used together with a scanner generator called Flex. Flex is very similar to Lex. (Note that there are several other computer-related things that are called Flex, which have nothing to do with the scanner generator Flex.)

Bison and Flex in various environments

Some Unix and Unix-like systems, such as various Linux distributions, come with these tools already installed. On Ubuntu 14.04, you need to install them, which can be done with the command sudo apt-get install bison flex. You can then run them using the commands bison and flex in a command shell.

On Windows, you have to download and install them separately, and it may be a bit difficult to get them to work well with Visual Studio. There are some Windows-specific instructions in the old Windows version of these assignments.

Bison should generate the C files something.tab.c and something.tab.h, and Flex should generate the C file lex.yy.c.

You must re-run Bison and Flex whenever you have made changes to their input files, so they generate new C source and header files.

To build a project that contains Bison or Flex specifications, you can run the commands by hand, and then compile and link the C files, but you should automate this process as much as you can. A simple script ("batch file") may be enough, but more advanced methods are available. On Unix-like systems such as Linux you can use make to automate the build process. (See below.)

Part A: Bison and Flex

Try out a simple example, which implements a calculator that understands simple expressions. Download the parser specification example.y and the scanner specification example.l, or both (and a Makefile) in example.zip.

Unpack, study the files, make a project, compile, and run it. (You can either just type make to build the program, using the provided Makefile, or look in that file for commands to give.

Study the Makefile, Which commands used to build example are written explicitly in the file? Are there some commands that make will deduce by itself?
Which types of expressions does the calculator understand?
One common operator is missing. Which one?
Add the missing operator!

Part B: The calculator

Replace your hand-coded parser in the 2.9 program from lab exercise 2 with a Yacc-generated parser. The program should still generate postfix output and calculate the result.

With a hand-coded parser, it was difficult to handle both assignments and expressions. Let your Yacc grammar handle both, and see how easy it is!

Some more things to do:

You can declare yyerror in the "definitions" part of the Yacc input file: extern void yyerror(char*);
We must also define yyerror somewhere in the program, and a good place to do that is in the "subroutines" part of the Yacc input file.
Yacc expects the scanner to be called yylex, since this is the name of the function generated by Lex and Flex. Define such a function (again, in the "subroutines" part of the Yacc input file), and let it call the old scanner function, lexan.
The parser function generated by Yacc will be called yyparse, but the program in which we will plug it in expects the parser function to be called parse. One way to handle this is to define a parse function, which just calls yyparse.
Yacc generates its own token codes for ID, NUM etc. Obviously, the scanner must use the same token codes, and not the (different) ones that we used before. Therefore you should change the header file global.h, and replace the old definitions of token codes with an #include of the Yacc-generated header file something.tab.h.

There is a sample Yacc input file that (sort of) works with the 2.9 program in the lecture notes for lecture 5.

Part C: More operators

Implement the following operators from C and C++, in the grammar, in the postfix translator, and in the calculator:

% (a synonym for mod)
& (bitwise and)
| (bitwise or)
<
>
?: (as in expr1 ? expr2 : expr3)

Multi-character operators, such as == and ++, would require changing the scanner, so for now we wait with those.

In C and in C++, the ?: operator only evaluates one of the expressions expr2 and expr3. If you want, you can instead let your operator evaluate both expr2 and expr3.

Part D: Yacc and C++

Make a copy of your entire project, and then compile this copy program, including the Yacc-generated parser, as C++ instead of C.

The original Yacc was designed to accept C code in the semantic actions, but later versions, such as Bison, also allows C++. If the input file ends with .ypp instead of .y, Bison will automatically give its output file a C++ extension. For example, if you call the input file language.ypp, the command bison -d language.ypp will generate the output files language.tab.cpp and language.tab.hpp.

C++ is more picky with declarations than C is, so you may have to add some declarations in the definitions part of the Yacc input file. Something like this might work:

%{
  #include <stdlib.h>
  #include "global.h"
  extern int tokenval;
  extern void yyerror(char*);
  extern int yylex();
%}

To avoid linking problems when mixing C and C++ source files, change all other C files that your program uses (main.c, lexer.c etc) to have C++ extensions (main.cpp, lexer.cpp etc).

(And yes, C and C++ are two different languages. You have to be a bit careful if you mix them in the same project.)

Report

Show your results and discuss them with the teacher,
or,
send an e-mail with clear and full explanations of what you have done. (Send your e-mail in plain text format, not as HTML or Word documents. Do not use attachments.) Include the source code, with your changes clearly marked.

Even if you don't send a report by e-mail, we advise that you write down your answers, to facilitate communication and for your own later use.

Thomas Padron-McCarthy (thomas.padron-mccarthy@oru.se) September 15, 2014