KOI: Lab Exercise 4

This exercise is about lexical analysis (scanning), with a hand-crafted deterministic finite state machine, and with a Lex-generated scanner.

Resources

Learn about DFAs!

A finite state automaton, also called finite state machine, (in Swedish: ändlig tillståndsmaskin, ändlig tillståndsautomat), can be in one of several states. For example, here are three states:

Three states for a FSA

It is called finite (Swedish: "ändlig") since it only has a finite (Swedish: ändligt) number of states.

We add some transitions, or edges, that lets the machine change state from one state to another:

Three states, with transitions

The arrow pointing into state 1 is marked "Start", which means that we always start in state 1. From state 1 we can only move to state 2. When we are in state 2, we can either go back to state 1, or go on to state 3. State 3 is the end state, marked with a double ring.

In a deterministic finite state automaton, or DFA (Swedish: deterministisk tillståndsmaskin) transitions are determined by some sort of input:

DFA

We move from state 1 to state 2 if we get an a as input. From state 2, we move back to state 1 if we get a b as input, and we move on to state 3 if we get an a.

To get from the start state to the end state, we must get an input string that starts with an a, and then has zero or more ba pairs. As soon as we get two a in a row, we move to the end state. These strings will work: b

If other input symbols than a and b can occur in the input, we should handle them too. We add a fourth state, which is also an end state, and were we will arrive as soon as we find anything else than a and b in the input:

Another DFA

Do!

In part A you will implement a finite state machine, and in part B you will use Lex to build a scanner.

Part A

Implement the DFA described above as a program. The program should read characters from standard input, and report which state it is in.

There are several common ways to implement a state machine like this one.

Choose one of these ways, or try to invent a new one!

Part B

Replace the hand-coded scanner in the 2.9 program (see the file lexer.c) with a Lex-generated one. It should recognize all the token types expected by the parser.

Report!

Show your results and discuss them with the teacher,
or,
send an e-mail with clear and full explanations of what you have done. Include well-chosen and commented test runs with input and output. (Send your e-mail in plain text format, not as HTML or Word documents or with unnecessary attachments.) Include relevant source code, with any changes clearly marked.


Thomas Padron-McCarthy (Thomas.Padron-McCarthy@tech.oru.se) September 14, 2007