KOI: Some solutions to exercise 5

Please note that these are suggested solutions. There can be other solutions that are also correct.

Question 1

a) Language:

ac
abc  
aca
abca
acaa
abcaa
acaaa
abcaaa
...
As a regular expression: a c a* | a b c a*
As another regular expression: a b? c a*

b) The problem:

There is a FIRST() conflict between the two productions for the non-terminal S:

S -> a T
S -> a b T
There is left recursion in this production for the non-terminal T:
T -> T a
The grammar is ambiguous, since a T can either be expanded to a c directly, or to a U, which can then in turn be expanded to the same c.
T -> c | U
U -> c 

c) Bottom-up parsers, such as the ones generated by Bison, can handle grammars with FIRST() conflicts and with left recursion, so those are no longer a problem. The ambiguity, however, is still a problem. It doesn't matter how well a parser can handle ambiguity, since the same input can be interpreted in different ways, and the parser has no way to know which interpretation we intended to be correct.

Question 2

a)

Top-down

The productions are applied in this order:

  1. S -> T c U
  2. T -> a b
  3. U -> d e
This is called leftmost derivation: We create the nodes, or at least apply the productions, in the order S, T, U, that is, from the top and not from the bottom, but also from the left and not from the right.

b)

  1. The stack is empty, so we can't reduce, and must shift. Shift a to the stack. The stack now contains a.
  2. a on top of the stack can't be reduced (since there is no production with a as its right-hand side), so shift b to the stack. The stack now contains a b.
  3. a b on top of the stack can be reduced to T, so we reduce it. The stack now contains T.
  4. T on top of the stack can't be reduced, so shift c to the stack. The stack now contains T c.
  5. T c on top of the stack can't be reduced, so shift d to the stack. The stack now contains T c d.
  6. T c d on top of the stack can't be reduced, so shift e to the stack. The stack now contains T c d e.
  7. d e on top of the stack can be reduced to U, so we reduce it. The stack now contains T c U.
  8. T c U on top of the stack can be reduced to S, so we reduce it. The stack now contains S, and we are finished.

The productions are applied in this order:

  1. T -> a b
  2. U -> d e
  3. S -> T c U
This is called rightmost derivation in reverse: We create the nodes, or at least apply the productions, in the order S, T, U. The order S, U, T would be called rightmost derivation, that is, from the top and not from the bottom, but also from the right and not from the left. If we just reverse that order, we get T, U, S, or rightmost derivation in reverse.

Question 3

a) A parser that builds the parse tree from the top down, that is, starting from the root node that represents the start symbol of the grammar. It does not necessarily have to actually materialize the tree physically using pointers in memory, but it will start by using one of the productions for the start symbol. The parser in the 2.9 program is a top-down parser.

b) A parser that builds the parse tree bottom up, that is, ending from the root node that represents the start symbol of the grammar.

c) A parser that consists of subroutines, one for each non-terminal, that can call each other recursively. It starts with a call to the subroutine that represents the start symbol. The parser in the 2.9 program is a recursive-descent parser.

d) A parser that can always decide which of several productions for a non-terminal to use. It never has to guess or backtrack. The parser in the 2.9 program is a predictive parser.

Question 4

  1. The stack is empty, so we can't reduce, and must shift. Shift a to the stack. The stack now contains a.
  2. a on top of the stack can't be reduced (since there is no production with a as its right-hand side), so shift b to the stack. The stack now contains a b.
  3. a b on top of the stack can't be reduced (since there is no production with a b as its right-hand side), so shift c to the stack. The stack now contains a b c.
  4. c on top of the stack can be reduced to T, so we reduce it. The stack now contains a b T.
  5. a b T on top of the stack can be reduced to S, so we reduce it. The stack now contains S, and we are finished.
Instead of trying to decide as soon as it sees the a if it is the start of a T or of a b T, it puts ("shifts") the a on the stack, and defers the decision until later.

Question 5

  1. The stack is empty, so we can't reduce, and must shift. Shift b to the stack. The stack now contains b.
  2. b on top of the stack can't be reduced (since there is no production with b as its right-hand side), so shift c to the stack. The stack now contains b c.
  3. c on top of the stack can be reduced either to T or to U. This is what makes bottom-up parsing complicated. The parser must choose based on what is below c on the stack, and the only reduction that can work is to U. Reduce to U. The stack now contains b U.
  4. b U on top of the stack can be reduced to S, so we reduce it. The stack now contains S, and we are finished.

Question 6

These are classes of parsers.

The first L in all of the names just means that the tokens in the input are read from left to right.

The number within parentheses is how many tokens lookahead the parser uses.

The second letter L means leftmost derivation, and is what top-down parsers do. Let's say that the start symbol in a grammar is S, and there is a production S -> T U. An LL parser will apply that production to create a node for the S non-terminal, and then continue to parse a T. We can say that it expands S to T U, and it then starts to expand the leftmost part, T. Compare question 2a above.

The second letter R means rightmost derivation in reverse, and is what bottom-up parsers do. An LR parser will wait until it has parsed a complete T, and then parse a complete U, before finally creating a node for S. We can say that it reduces T U to S, which is why we say that it is a derivation in reverse: Instead of using the production S -> T U to expand S to T U, it uses it in reverse to reduce T U to S. It is not a rightmost derivation, since a rightmost derivation would start with U before T, but since it is rightmost in reverse, we do T before U. Compare question 2b above.


Thomas Padron-McCarthy (thomas.padron-mccarthy@oru.se) September 24, 2024