KOI: Lab Exercise 3

Creating a parser with Yacc. (Or actually with Bison, which is the version of Yacc that we will be using. Yacc and Bison are very similar, but there are some differences.)

Resources:

Yacc and Bison

The original version of Yacc created a C file called y.tab.c, with compilable C code, and a header file called y.tab.h, with definitions of token codes. With Bison, you need to use the command-line argument "-d" to get a ".h" file. Also, Bison doesn't use the fixed names y.tab.c and y.tab.h, but instead uses the same base name as that of the input file, with .tab.c and .tab.h appended. For example, let's say we are building a parser for a language called DumScript. The command
bison -d dumscript.y
will create the files dumscript.tab.c and dumscript.tab.h.

Yacc is very often used together with a scanner generator called Lex, and Bison is very often used together with a scanner generator called Flex. Flex is very similar to Lex. (Note that there are several other computer-related things that are called Flex, which have nothing to do with the scanner generator Flex.)

Bison and Flex in various environments

Some Unix and Unix-like systems, such as various Linux distributions, come with these tools already installed. On Ubuntu 14.04, you need to install them, which can be done with the command sudo apt-get install bison flex. You can then run them using the commands bison and flex in a command shell.

On Windows, you have to download and install them separately, and it may be a bit difficult to get them to work well with Visual Studio. There are some Windows-specific instructions in the old Windows version of these assignments.

Bison should generate the C files something.tab.c and something.tab.h, and Flex should generate the C file lex.yy.c.

You must re-run Bison and Flex whenever you have made changes to their input files, so they generate new C source and header files.

To build a project that contains Bison or Flex specifications, you can run the commands by hand, and then compile and link the C files, but you should automate this process as much as you can. A simple script ("batch file") may be enough, but more advanced methods are available. On Unix-like systems such as Linux you can use make to automate the build process. (See below.)

Part A: Bison and Flex

Try out a simple example, which implements a calculator that understands simple expressions. Download the parser specification example.y and the scanner specification example.l, or both (and a Makefile) in example.zip.

Unpack, study the files, make a project, compile, and run it. (You can either just type make to build the program, using the provided Makefile, or look in that file for commands to give.

Part B: The calculator

Replace your hand-coded parser in the 2.9 program from lab exercise 2 with a Bison-generated parser. For the user, it should work just like the program in lab 2, handling a sequence of assignments, including the exponential operator.

With a hand-coded parser, it was difficult to handle both assignments and expressions. Let your Bison grammar handle both, and see how easy it is!

Some more things to do:

There is a sample Bison input file that (sort of) works with the 2.9 program in the lecture notes for lecture 5.

Part C: More operators

Implement the following operators from C and C++, in the grammar, in the postfix translator, and in the calculator: Multi-character operators, such as == and ++, would require changing the scanner, so for now we wait with those.

The operators should have reasonable priorities and associativities. For example, the expression 2 + 3 < 4 + 5 should be interpreted as (2 + 3) < (4 + 5) and result in 1, or however you decide to represent a true result. It should not be interpreted as 2 + (3 < 4) + 5 and result in 8, or as ((2 + 3) < 4) + 5 and result in 5.

In C and in C++, the ?: operator only evaluates one of the expressions expr2 and expr3. If you want, you can instead let your operator evaluate both expr2 and expr3.

Part D: Bison and C++

Make a copy of your entire project, and then compile this copy program, including the Bison-generated parser, as C++ instead of C.

The original Yacc was designed to accept C code in the semantic actions, but later versions, such as Bison, also allows C++. If the input file ends with .ypp instead of .y, Bison will automatically give its output file a C++ extension. For example, if you call the input file language.ypp, the command bison -d language.ypp will generate the output files language.tab.cpp and language.tab.hpp.

C++ is more picky with declarations than C is, so you may have to add some declarations in the definitions part of the Bison input file. Something like this might work:

%{
  #include <stdlib.h>
  #include "global.h"
  extern int tokenval;
  extern void yyerror(char*);
  extern int yylex();
%}

To avoid linking problems when mixing C and C++ source files, change all other C files that your program uses (main.c, lexer.c etc) to have C++ extensions (main.cpp, lexer.cpp etc).

(And yes, C and C++ are two different languages. You have to be a bit careful if you mix them in the same project.)

Report

Show your results and discuss them with the teacher,
or,
send an e-mail with clear and full explanations of what you have done, and answers to all the questions on this page. If you use attachments, send them as PDF and not Word. Also send the source code, as a Zip file or equivalent.

Even if you don't send a report by e-mail, we advise that you write down your answers, to facilitate communication and for your own later use.


Thomas Padron-McCarthy (thomas.padron-mccarthy@oru.se) October 4, 2022