Föreläsning 4: Mer om syntaktisk analys. Bottom-up parsing. Parser-generatorer.

Kursen Kompilatorer och interpretatorer | Föreläsningar: 1 2 3 4 5 6 7 8 9 10 11 12


Det här är ungefär vad jag tänker säga på föreläsningen. Använd det för förberedelser, repetition och ledning. Det är inte en definition av kursinnehållet, och det ersätter inte kursboken.

Idag: Vänsterfaktorisering. Hur en bottom-up-parser fungerar. Parser-generatorer (som Yacc).

Läs gärna inledningen av kapitel 4, 4.1 och 4.2!

4.3.3 Vänsterfaktoriesering

(ALSU-07 4.3, "Writing a grammar", p 214-215. Swedish: vänsterfaktorisering.)

"FIRST()-conflicts" in a recursive-descent parser:

statement -> assignment | expr
assignment -> id = expr     <-- Always starts with id
expr -> term moreterms      <-- Can start with id
...
Found What is next? That would mean "a" is the start of
a = 5 ...
+ 5 ...
An assignment
An expression

Can be handled by two-token lookahead instead of just one-token.

(From now on we use another, more complicated, C-like grammar.)

Found What is next? That would mean "a[3+4+z]" is the start of
a[3+4+z] = 5 ...
+ 5 ...
An assignment
An expression

We may need any number of tokens lookahead.
Instead: Try one of the rules. Backtrack if it (finally) fails, and try the other one.

Found What is next? That would mean "b[5+6]" is the start of
a[b[5+6] = 5 ...
+ 5 ...
An assignment
An expression

Now we have a tree of choices. Backtrack to the previous "choice-point".

Another example: if statement with optional else part

stmt ->   if ( expr ) stmt
        | if ( expr ) stmt else stmt
Remember the trick for removing left recursion? A similar one can be used here, called left factoring:
stmt ->   if ( expr ) stmt optional-else
optional-else ->   else stmt | empty
How to left-factor. Rewrite these two (yes, two) productions:
A -> x y | x z
into these three (yes, three) productions:
A -> x R
R -> y | z
To build a predictive recursive-descent parser, first rewrite your grammar: Can be difficult, so for more complicated parsers, we use tools such as Yacc.

But for simple grammars, like a calculator, or a search field. (Could be difficult to plug in Yacc into your ASP page or your Perl script.)

Search phrase SQL
cinema select d.Nr
from Document d, Occurs o, Word w
where d.Nr = o.Document
and o.Word = w.Number
and w.Text = 'cinema'
cinema Örebro select Document.Nr
from Document d, Occurs o1, Occurs o2, Word w1, Word w2
where (d.Nr = o1.Document
and o1.Word = w1.Number
and w1.Text = 'cinema')
and
(d.Nr = o2.Document
and o2.Word = w2.Number
and w2.Text = Örebro')
(cinema or museum) and Örebro select Document.Nr
from Document d, Occurs o1, Occurs o2, Occurs o3, Word w1, Word w2, Word w3
where ((d.Nr = o1.Document
and o1.Word = w1.Number
and w1.Text = 'cinema')
or
(d.Nr = o2.Document
and o2.Word = w2.Number
and w2.Text = Örebro'))
and
(d.Nr = o3.Document
and o3.Word = w3.Number
and w3.Text = Örebro')

4.9 Parser generators

Yacc.

First: How to do bottom-up parsing.

Derivation, deriving = generating a token string from the grammar. Reducing = finding which rules to apply to reduce the token string to the start symbol.

This example adapted from Thomas Niemann:

Ambiguous grammar: Could be transformed to:
E -> E + E    (rule 1)
E -> E * E    (rule 2)
E -> id       (rule 3)
E -> E + T | T
T -> T * F | F
F -> id

(We use the ambiguous grammar.)

Generate, or derive, the string "x + y + z" from the start symbol E, by applying the rules 1-3:

Expression And now use rule...
E 2
E * E 3
E * z 1
E + E * z 3
E + y * z 3
x + y * z

Note: (x + y) * z

Reduce the string "x + y + z" to the start symbol E: by applying the rules 1-3:

Stack Input left Action... Conflicts
x + y * z shift
x + y * z reduce with r3
E + y * z shift
E + y * z shift
E + y * z reduce with r3
E + E * z shift shift/reduce (r1)
E + E * z shift
E + E * z reduce with r3
E + E * E reduce with r2 reduce/reduce (r1 and r2)
E + E reduce with r1
E (done!)

Note: x + (y * z)

Shift-reduce conflict

stmt ->   if ( expr ) stmt
        | if ( expr ) stmt else stmt
Yacc always shifts. (Yacc gives a warning when it compiles the grammar.)

Reduce-reduce conflict

Simple example:
E -> T
T -> id
E -> id
String: "x". x is a T is an E, or x is an E.

Real example (ASU-86 page 252):

Rule Example Output
text -> text sup text x sup y xy
text -> text sub text x sub 3 x3
text -> text sub text sup text x sub 3 sup y
 y
x
 3

Third rule just to make the combination of sub and sup look good.

Yacc reduces according to the first of the two (or more) conflicting productions in the file. (Yacc gives a warning when it compiles the grammar.)

Kursen Kompilatorer och interpretatorer | Föreläsningar: 1 2 3 4 5 6 7 8 9 10 11 12


Thomas Padron-McCarthy (Thomas.Padron-McCarthy@tech.oru.se) 15 september 2007