KOI: Lab Exercise 4

This exercise is about lexical analysis (scanning), with a hand-crafted deterministic finite state machine, and with a Lex-generated scanner. (But you will most likely be using Flex and not the original Lex.)

Resources

Learn about DFAs!

A finite state automaton, also called finite state machine, (in Swedish: ändlig tillståndsmaskin, ändlig tillståndsautomat), can be in one of several states. For example, here are three states:

Three states for a FSA

It is called finite (Swedish: "ändlig") since it only has a finite (Swedish: ändligt) number of states.

We add some transitions, or edges, that lets the machine change state from one state to another:

Three states, with transitions

The arrow pointing into state 1 is marked "Start", which means that we always start in state 1. From state 1 we can only move to state 2. When we are in state 2, we can either go back to state 1, or go on to state 3. State 3 is the end state, marked with a double ring.

In a deterministic finite state automaton, or DFA (Swedish: deterministisk tillståndsmaskin) transitions are determined by some sort of input:

DFA

We move from state 1 to state 2 if we get an a as input. From state 2, we move back to state 1 if we get a b as input, and we move on to state 3 if we get an a.

To get from the start state to the end state, we must get an input string that starts with an a, and then has zero or more ba pairs. As soon as we get two a in a row, we move to the end state. These strings will work: b

If other input symbols than a and b can occur in the input, we should handle them too. We add a fourth state, which is also an end state, and were we will arrive as soon as we find anything else than a and b in the input:

Another DFA

Do!

In part A you will implement a finite state machine, and in part B you will use Flex to build a scanner.

Part A

Implement the DFA described above as a program. The program should read characters from standard input, and report which state it is in.

There are several common ways to implement a state machine like this one.

Choose one of these ways, or try to invent a new one!

Part B

Continue building on the compiler from the previous assignment by replacing the hand-coded scanner from the 2.9 program (see the file lexer.c) with a Lex-generated one. (But you will most likely be using Flex and not the original Lex.) It should recognize all the token types expected by the parser.

Report

There are two ways to pass this lab. Ideally you should do this: But as an alternative, you can instead do this.

About sources and collaboration:
  • Each group (which normally consists of one or two students) must make its own solution to each assignment, and submit or present it, but it is not forbidden to collaborate or ask other students for help. However, in that case you must clearly state who you have collaborated with. Each solution must include the name of everyone who contributed to the work, Collaboration is thus perfectly fine, but must be clearly stated.
  • You must add an LLM statement to the lab report, where you state if and how you have used large language models, and some experiences from that. Was the LLM helpful, and in what way? Did it help your learning, or maybe in some way hinder it?
  • All sources (such as books, web pages, colleagues and LLMs) used in the solution of the assignment must be stated.


Thomas Padron-McCarthy (thomas.padron-mccarthy@oru.se) August 28, 2025