Custom building and code generators in Visual Studio 2005
I'm
a fervent fan of using code generator tools wherever possible to make
your life easier. Although they come with issues related to effective
building, diagnostics, and debugging, the amount of value they add to
your application is immense: they can eliminate entire classes of
potential bugs, save you a great deal of effort and time, make your
module much easier to extend and maintain, and even yield runtime
performance gains. Among the most frequently used code generator tools
are the lexer and parser generators GNU Flex and Bison,
based on the classic generators lex and yacc. Although I'll get into
the details of how to use these tools effectively at a later time,
today what I want to show you is a practical example of how to use the
new custom build features of Visual Studio 2005 to effectively
incorporate a code generator into your automatic build process.
Here is a very simple Bison grammar for evaluating arithmetic expressions involving addition, subtraction, and multiplication:
/* example.y */
%{
#define YYSTYPE int
%}
%token PLUS MINUS STAR LPAREN RPAREN NUMBER NEWLINE
%left PLUS MINUS
%left STAR
%%
line : /* empty */
| line expr NEWLINE { printf("%d\n", $2); }
expr : LPAREN expr RPAREN { $$ = $2; }
| expr PLUS expr { $$ = $1 + $3; }
| expr MINUS expr { $$ = $1 - $3; }
| expr STAR expr { $$ = $1 * $3; }
| NUMBER { $$ = $1; }
;
%%
int yyerror (char const *msg) {
printf("Error: %s\n", msg);
}
int main() {
printf("%d\n", yyparse());
return 0;
}
The Flex lexer used by this parser looks like this:
/* example.lex */
%{
#include "example.parser.h"
%}
%option noyywrap
%%
[ \t]+ { /* ignore whitespace */ }
"(" { return LPAREN; }
")" { return RPAREN; }
"+" { return PLUS; }
"-" { return MINUS; }
"*" { return STAR; }
\n { return NEWLINE; }
[0-9]+ { yylval = atoi(yytext); return NUMBER; }
. { printf("Invalid character '%s'", yytext); }
%%
If we were writing this parser at a UNIX command line, we might
generate the source files and compile the result using this sequence of
commands:
bison -v -d example.y -o example.parser.c
flex -oexample.lexer.c example.lex
gcc -o example example.lexer.c example.parser.c
Now say you wanted to build the same application for Windows using Visual Studio 2005. The tools are available on Windows (see flex for Win32, bison for Win32),
and you could simply run the same first two commands at the
command-line and then build the resulting source files in Visual
Studio. However, this simple approach sacrifices many of the advantages
that Visual Studio provides for its built-in source file types: it
doesn't rebuild the generated source files as needed, it doesn't
allow you to jump to errors that occur during generation, and it
doesn't allow you to configure build options using a nice GUI. Let's
see how we can reclaim these advantages for Flex and Bison files.
Creating a simple custom build type
Our first goal is simply to be able to build Flex and Bison files.
First, use the Flex and Bison setup binaries from the links above to
install the tools. A bin directory will be created in the installation
directory. Add this to your system path. You should be able to execute
both flex and bison from a command prompt without specifying a path.
Next, we create a new C++ console application. Uncheck the option to
use precompiled headers - I'll explain how to use these with Flex and
Bison later. Remove the main source file created for you by the wizard.
Next, right-click the project in Solution Explorer and choose Custom
Build Rules. The following dialog appears:
A build rule establishes how to build a file of a particular type. A group of related build rules are stored in a build rule file,
which can be saved, distributed, and reused in many projects. We'll
start by creating a new build rule file for our Flex and Bison rules:
- Click New Rule File.
- Enter "GNU Tools" for Display Name and File Name.
- Choose a suitable directory for the build rule file. If it
asks you if you want to add the directory to your search path, say yes.
Now we'll create a build rule for Bison files:
- Click Add Build Rule.
- Enter the following values:
- Name: Bison
- File Extensions: *.y
- Outputs: $(InputName).parser.c;$(InputName).parser.h
- Command Line: bison -d [inputs] -o $(InputName).parser.c
- Execution Description: Generating parser...
- Click OK twice, then check the box labelled "GNU Tools" and click OK.
- Add the example.y file above to your project. Right-click on the file and choose Compile. You should receive no errors.
- Create a new file folder under the project called "Generated Files". Add the existing file example.parser.c to this folder.
If you build now, you should receive only an error complaining that
yylex() is undefined. Now, go back to Custom Build Tools and click
Modify Rule File on GNU Tools. Create a rule for Flex:
- Click Add Build Rule.
- Enter the following values:
- Name: Flex
- File Extensions: *.lex
- Outputs: $(InputName).lexer.c
- Command Line: flex -o$(InputName).lexer.c [inputs]
- Execution Description: Generating lexer...
- Click OK three times.
- Add the example.lex file above to your project. Right-click on the file and choose Compile. You should receive no errors.
- Add the existing file example.lexer.c to your project.
If you build now, you should receive no errors and be able to run
the application successfully. Now in any project you can simply check
the "GNU Tools" box, add the .lex and .y files to your project, and
build. What happens if you modify the example.y and build? It runs
Bison again and recompiles example.parser.c, because it was
regenerated, and example.lexer.c, because it includes a header file
that was regenerated. If we modify the .lex file, Flex is rerun and
example.lexer.c is recompiled, but example.parser.c is not rebuilt. If
you had a larger parser, you'd appreciate how much time this
incremental rebuilding saves you.
Improving diagnostic support
Delete one of the "%%" marks in the .y file and build.
Unsurprisingly, Bison fails. However, the Error List tells you no more
than this. It'd be more helpful if you could find out what errors the
tool produced. If you look at the output window, Bison did produce some
errors, but if you double click on them to visit the error location, it
just takes you to the top of the file. What gives?
The reason for this is that Visual Studio only recognizes one error format, that used by its own tools. Here's an example:
c:\myprojects\myproject\hello.cpp(10) : error C2065: 'i' : undeclared identifier
Bison doesn't output errors in this format, and so they aren't
parsed. Flex uses yet another different format. What to do? The
simplest way to deal with this is to invoke a simple script on the
output of the tools as part of the build rule which parses the output
and converts it to the desired format. You can write this script in any
language; I wrote them in C# using the .NET Framework's regular
expressions. Here's what I wrote inside the Main() function for the
Bison converter tool (error checking and such omitted):
string line;
while ((line = Console.In.ReadLine()) != null)
{
Match match = Regex.Match(line, "([^:]+):([0-9]+)\\.[^:]*: (.*)");
if (match != null)
{
Console.WriteLine("{0}({1}): error BISON: {2}",
Path.GetFullPath(match.Groups[1].Value),
match.Groups[2].Value, match.Groups[3].Value);
}
else
{
Console.WriteLine(line);
}
}
I deploy the binary, say it's called BisonErrorFilter.exe, to
the same directory as bison.exe. I then change the Command Line of the
Bison build rule to the following (click the arrow in the right of
the field to access a multiline text box):
bison.exe -d [inputs] -o $(InputName).parser.c > bison.err 2>&1
BisonErrorFilter < bison.err
If you compile the .y file now, any errors should appear in the
error list, as desired, and you can double-click them to visit their
locations. I wrote a similar script for the lexer output. Be careful
when doing this, though, because if you miss any errors, Visual Studio
might look at the error return of the last command and interpret it as
success. A better way to do this would be to wrap the tool in a script
that passes its arguments to the tool, collects the tool's output and
return code, converts and prints the output, and then returns its
return code.
I haven't figured out how, but I believe it's possible to also
create custom help entries for each error message, then have the filter
tool produce the right error code for each one. This way, users can get
help for each error individually by just clicking on it and pressing F1.
Properties
Properties enable you to control how the command-line tool is
executed directly from the properties page for each individual file you
wish to build with it. Let's start with a simple example: a handy lexer
switch is -d, which prints out an informative message each time a token
is recognized. We don't want it on all the time, and certainly not in
release mode, but it'd be handy to be able to turn on and off as
necessary.
To create a property for this, first return to the lexer build rule. Then follow these steps:
- Click Add Property.
- Choose Boolean for the User Property Type.
- Enter the following values:
- Name: debug
- Display Name: Print debug traces
- Switch: -d
- Description: Displays an informative message each time a token is recognized.
- Click OK. Then, add [debug] right after "flex" in the Command Line field.
- Click OK three times.
- Right-click on example.lex in Solution Explorer and choose Properties.
- In the left pane, click the plus next to Flex. Click General.
- You'll see your property. Click on it and its description will appear at the bottom. Set it to Yes.
- Click Command Line in the left pane. You'll see that the -d flag has been added.
- Click OK and build.
- Run the app and type an arithmetic expression. You'll see trace messages.
- View the project properties. You'll see that it now has a Flex
node also. Here you can set the default settings for all files of that
type in the project which don't have specific overriding settings set.
Adding more properties is just as simple. You can go through the man
page for the tool and add properties for each switch, using the
Category field to group them into categories. You can use the other
property types for switches accepting arguments. If you want, you can
create a detailed help file with additional explanation and examples
for each switch. When you're done you have an impressive looking
property sheet for your files reminiscent of those for built-in types:
You can also set different settings for debug and release builds.
For example, for Flex, it's good to set table size to slowest and
smallest for the Debug version, to speed up compilation, and to set it
to the recommended full tables with equivalence classes for the Release
version, which is a good tradeoff of table size and speed.
Finally, once you're done adding all the properties you like, you
can take the resulting .rules files and give it to everyone on your
team, or distribute it on a website, so that everyone can easily
integrate the tool into Visual Studio. Perhaps eventually tools like
Flex and Bison will ship with a .rules file.
Conclusion
In Visual Studio 2003 you would have had to write a plug-in to come
close to achieving this level of integration with a third-party tool.
Although it has limitations, I hope the problems solved by these new
features help encourage you to incorporate more tools and code
generation into your regular development. Now that you know how to use
Flex and Bison from the Visual Studio IDE, next time I'll talk about
how to use the tools themselves, going through some of the development
and debugging processes that a grammar developer goes through, and show
you some similar tools for other .NET languages. Thanks for reading,
everyone.