KOI: Some solutions to exercise 5

Please note that these are suggested solutions. There can be other solutions that are also correct.

Question 1

a) Language:

ac
abc  
aca
abca
acaa
abcaa
acaaa
abcaaa
...
As a regular expression: a c a* | a b c a*
As another regular expression: a b? c a*

b) The problem:

There is a FIRST() conflict between the two productions for the non-terminal S:

S -> a T
S -> a b T
There is left recursion in this production for the non-terminal T:
T -> T a
The grammar is ambiguous, since a T can either be expanded to a c directly, or to a U, which can then in turn be expanded to the same c.
T -> c | U
U -> c 

c) Bottom-up parsers, such as the ones generated by Bison, can handle grammars with FIRST() conflicts and the left recursion, so those are no longer a problem. The ambiguity, however, is still a problem. It doesn't matter how well a parser can handle ambiguity, since the same input can be interpreted in different ways, and the parser has no way to know which interpretation is the correct or intended one.

d) A reduce/reduce conflict, because of the ambiguity. When we expect a T in the input, and the input consists of a c, the c can be reduced to a U, which can then be reduced to a T. Alternatively, the same c can be reduced directly to a T.

Question 2

  1. The token a is shifted to the stack. The stack now contains a.
  2. There is no production to reduce a single a to anything, so we must shift the next token, b to the stack. The stack now contains a b (with b on top).
  3. There is no production to reduce a b to anything, so we must shift the next token, c to the stack. The stack now contains a b c.
  4. There are no more tokens to shift from the input, so we must reduce. There are two different productions that let us reduce c: T -> c and also U -> c. This is a reduce/reduce conflict, because of the ambiguous grammar. We can't know which one was intended by whoever constructed the grammar, so we arbitrarily chose to reduce c to T. The stack now contains a b T.
  5. Finally, we use the production S -> a b T to reduce the content of the stack to S. The stack now just contains the start symbol S, and we are finished.

The FIRST() conflict is handled by the parser shifting the token a to the stack, and then it waits to do something with this token until the (uppermost) content of the stack matches a production.

Question 3

  1. The token a is shifted to the stack. The stack now contains a.
  2. There is no production to reduce a single a to anything, so we must shift the next token, b to the stack. The stack now contains a b (with b on top).
  3. There is no production to reduce a b to anything, so we must shift the next token, c to the stack. The stack now contains a b c.
  4. Like in the previous question, we can reduce c to T. The stack now contains a b T.
  5. We could reduce a b T to S, like we did in the previous question, but there are still tokens in the input that we need to handle, so this would lead to an error. Instead, we shift the next token, a, to the stack. The stack now contains a b T a.
  6. The top of the stack contains T a, which we can reduce to T. The stack now contains a b T.
  7. Again, we could reduce a b T to S, but there are still tokens in the input that we need to handle, so this would lead to an error. Instead, we shift the next token, a, to the stack. The stack now contains a b T a.
  8. There are no more tokens to shift from the input, so we must reduce. The top of the stack contains T a, which we can reduce to T. The stack now contains a b T.
  9. Finally, we use the production S -> a b T to reduce the content of the stack to S. The stack now just contains the start symbol S, and we are finished.

Left recursion is not a problem. We shift tokens to the stack, and we reduce contents on the stack to non-terminals, until the top of the stack contains a T and an a. Then we reduce this T a to T, according to the production T -> T a. It doesn't matter that the production is left-recursive, because we don't do anything recursively, we just put things on the stack and then reduce them according to the productions.

Question 4

...

Question 5

These are classes of parsers.

The first L in all of the names means that the tokens in the input are read from left to right.

The number within parentheses is how many tokens lookahead the parser uses.

The second letter L means leftmost derivation, and is what top-down parsers do. Let's say that the start symbol in a grammar is S, and there is a production S -> T U. An LL parser will apply that production to create a node for the S non-terminal, and then continue to parse a T. We can say that it expands S to T U, and it then starts to expand the leftmost part, T.

The second letter R means rightmost derivation in reverse, and is what bottom-up parsers do. An LR parser will wait until it has parsed a complete T, and then parse a complete U, before finally creating a node for S. We can say that it reduces T U to S, which is why we say that it is a derivation in reverse. Instead of using the production S -> T U to expand S to T U, it uses it in reverse to reduce T U to S.

Question 6

...


Thomas Padron-McCarthy (thomas.padron-mccarthy@oru.se) November 7, 2023