KOI: Lab Exercise 4
This exercise is about lexical analysis (scanning),
with a hand-crafted deterministic finite state machine,
and with a Lex-generated scanner.
Resources
-
Thomas Niemann:
A Compact Guide to Lex & Yacc
is a good introduction to Lex.
-
If you want to use Lex on Windows with Borland C++ Builder,
be advised that the version that can be downloaded from that site
has a bug, so the scanner it generates can't be compiled as C++.
A Windows version of Lex that works better can be found here.
Download the executable, and just run it.
-
Lecture 6
in this course was about scanners and Lex.
Learn about DFAs!
A finite state automaton, also called finite state machine,
(in Swedish: ändlig tillståndsmaskin, ändlig tillståndsautomat),
can be in one of several states.
For example, here are three states:
It is called finite (Swedish: "ändlig")
since it only has a finite (Swedish: ändligt)
number of states.
We add some transitions, or edges, that lets the machine
change state from one state to another:
The arrow pointing into state 1 is marked "Start",
which means that we always start in state 1.
From state 1 we can only move to state 2.
When we are in state 2, we can either go back to state 1,
or go on to state 3.
State 3 is the end state, marked with a double ring.
In a deterministic finite state automaton, or DFA
(Swedish: deterministisk tillståndsmaskin)
transitions are determined by some sort of input:
We move from state 1 to state 2 if we get an a as input.
From state 2, we move back to state 1 if we get a b as input,
and we move on to state 3 if we get an a.
To get from the start state to the end state,
we must get an input string that starts with an a,
and then has zero or more ba pairs.
As soon as we get two a in a row, we move to the end state.
These strings will work:
b
- aa
- abaa
- ababaa
- abababaa
- ...
If other input symbols than a and b can occur in the input,
we should handle them too. We add a fourth state,
which is also an end state, and were we
will arrive as soon as we find anything else than a and b
in the input:
Do!
In part A you will implement a finite state machine,
and in part B you will use Lex to build a scanner.
Part A
Implement the DFA described above as a program.
The program should read characters from standard input,
and report which state it is in.
There are several common ways to implement a state machine like this one.
- With goto and labels
- With a switch statement inside a while (1) loop
- With a table of states, input and transitions
Choose one of these ways, or try to invent a new one!
Part B
Replace the hand-coded scanner in the
2.9 program
(see the file
lexer.c)
with a Lex-generated one.
It should recognize all the token types expected by the parser.
Report!
Show your results and discuss them with the teacher,
or,
send an
e-mail
with clear and full explanations of what you have done.
Include well-chosen and commented test runs with input and output.
(Send your e-mail in plain text format,
not as HTML or Word documents
or with unnecessary attachments.)
Include relevant source code, with any changes clearly marked.
Thomas Padron-McCarthy
(Thomas.Padron-McCarthy@tech.oru.se)
January 18, 2004