Lecture 4: More about syntactic analys. Bottom-up parsing. Parser-generators.

The course Compilers and interpreters | Lectures: 1 2 3 4 5 6 7 8 9 10 11 12


These lecture notes are my own notes that I made in order to use during the lecture, and it is approximately what I will be saying in the lecture. These notes may be brief, incomplete and hard to understand, not to mention in the wrong language, and they do not replace the lecture or the book, but there is no reason to keep them secret if someone wants to look at them.

Idag: Lect factoring ("vänsterfaktorisering"). How a bottom-up parser works. Parser-generators (such as Yacc or its close copy Bison).

You can also read the introductions to chapter 4, 4.1 and 4.2!

4.3.3 Lect factoring ("vänsterfaktorisering")

(ALSU-07 4.3, "Writing a grammar", p 214-215. Swedish: vänsterfaktorisering.)

"FIRST()-conflicts" in a recursive-descent parser:

statement -> assignment | expr
assignment -> id = expr     <-- Always starts with id
expr -> term moreterms      <-- Can start with id
...
Found What is next? That would mean "a" is the start of
a = 5 ...
+ 5 ...
An assignment
An expression

Can be handled by two-token lookahead instead of just one-token.

(From now on we use another, more complicated, C-like grammar.)

Found What is next? That would mean "a[3+4+z]" is the start of
a[3+4+z] = 5 ...
+ 5 ...
An assignment
An expression

We may need any number of tokens lookahead.
Instead: Try one of the rules. Backtrack if it (finally) fails, and try the other one.

Found What is next? That would mean "b[5+6]" is the start of
a[b[5+6] = 5 ...
+ 5 ...
An assignment
An expression

Now we have a tree of choices. Backtrack to the previous "choice-point".

Another example: if statement with optional else part

stmt ->   if ( expr ) stmt
        | if ( expr ) stmt else stmt
Remember the trick for removing left recursion? A similar one can be used here, called left factoring:
stmt ->   if ( expr ) stmt optional-else
optional-else ->   else stmt | empty
How to left-factor. Rewrite these two (yes, two) productions:
A -> x y | x z
into these three (yes, three) productions:
A -> x R
R -> y | z
To build a predictive recursive-descent parser, first rewrite your grammar: Can be difficult, so for more complicated parsers, we use tools such as Yacc.

But for simple grammars, like a calculator, or a search field. (Could be difficult to plug in Yacc into your ASP page or your Perl script.)

Search phrase SQL
cinema select d.Nr
from Document d, Occurs o, Word w
where d.Nr = o.Document
and o.Word = w.Number
and w.Text = 'cinema'
cinema Örebro select Document.Nr
from Document d, Occurs o1, Occurs o2, Word w1, Word w2
where (d.Nr = o1.Document
and o1.Word = w1.Number
and w1.Text = 'cinema')
and
(d.Nr = o2.Document
and o2.Word = w2.Number
and w2.Text = Örebro')
(cinema or museum) and Örebro select Document.Nr
from Document d, Occurs o1, Occurs o2, Occurs o3, Word w1, Word w2, Word w3
where ((d.Nr = o1.Document
and o1.Word = w1.Number
and w1.Text = 'cinema')
or
(d.Nr = o2.Document
and o2.Word = w2.Number
and w2.Text = Örebro'))
and
(d.Nr = o3.Document
and o3.Word = w3.Number
and w3.Text = Örebro')

4.9 Parser generators

The next lecture will look more into detail about how to use Yacc/Bison.

First: How to do bottom-up parsing.

This example adapted from Thomas Niemann:

Ambiguous grammar: Could be transformed to:
E -> E + E    (rule 1)
E -> E * E    (rule 2)
E -> id       (rule 3)
E -> E + T | T
T -> T * F | F
F -> id

(We use the ambiguous grammar.)

Generate, or derive, the string "x + y + z" from the start symbol E, by applying the rules 1-3:

Expression And now use rule...
E 2
E * E 3
E * z 1
E + E * z 3
E + y * z 3
x + y * z

Note: (x + y) * z

Reduce the string "x + y + z" to the start symbol E: by applying the rules 1-3:

Stack Input left Action... Conflicts
x + y * z shift
x + y * z reduce with r3
E + y * z shift
E + y * z shift
E + y * z reduce with r3
E + E * z shift shift/reduce (r1)
E + E * z shift
E + E * z reduce with r3
E + E * E reduce with r2 reduce/reduce (r1 and r2)
E + E reduce with r1
E (done!)

Note: x + (y * z)

Shift-reduce conflict

stmt ->   if ( expr ) stmt
        | if ( expr ) stmt else stmt
Yacc always shifts. (Yacc gives a warning when it compiles the grammar.)

Reduce-reduce conflict

Simple example:
E -> T
T -> id
E -> id
String: "x". x is a T is an E, or x is an E.

Real example (ASU-86 page 252):

Rule Example Output
text -> text sup text x sup y xy
text -> text sub text x sub 3 x3
text -> text sub text sup text x sub 3 sup y
 y
x
 3

Third rule just to make the combination of sub and sup look good.

Yacc reduces according to the first of the two (or more) conflicting productions in the file. (Yacc gives a warning when it compiles the grammar.)

The course Compilers and interpreters | Lectures: 1 2 3 4 5 6 7 8 9 10 11 12


Thomas Padron-McCarthy (thomas.padron-mccarthy@oru.se), September 13, 2022