KOI: Lab Exercise 2

A hand-coded parser. You will modify the parser in the very simple compiler in section 2.9 of the (old edition of the) course book. (Source code.)

Don't worry if this assignment takes longer to complete than the others. It could have been split into two or more assignments, but since it covers a single subject (parsing using a hand-coded predictive recursive-descent parser) it has been kept as one assignment.

Om den här uppgiften:

En sak som jag brukar lära ut i grundläggande programmeringskurser är att programmering är en form av kommunikation. Och då menar jag inte med datorn, utan med andra människor som någon gång i framtiden ska arbeta vidare med ens program. När man arbetar som programmerare är det nämligen ganska ovanligt att man sätter sig ner och skriver ett program från noll, utan det är mycket vanligare att det redan finns programkod, och den programkoden ska man sen modifiera, eller integrera i sitt program. Det är därför det är så viktigt att programkoden är begriplig, till exempel med informativa och rättvisande variabelnamn.

I den här uppgiften får vi ett litet program, som vi ska arbeta vidare med. Det är inte bara en laboration i programmering, utan det är också en övning i att sätta sig in i existerande programkod, så man förstår den tillräckligt bra för att kunna modifiera den.

The grammar

A syntax-directed translation scheme (ASU-86 Fig 2.35) for the infix-to-postfix translator:
start -> list eof
list -> expr ; list
       | empty
expr -> expr + term { print('+') }
       | expr - term { print('-') }
       | term
term -> term * factor { print('*') }
       | term / factor { print('/') }
       | term div factor { print('DIV') }
       | term mod factor { print('MOD') }
       | factor
factor -> ( expr )
       | id { print(id.lexeme) }
       | num { print(num.value) }
(empty is used as the symbol for the empty string.)

The grammar is left-recursive. We use the standard method (see the lecture notes from lecture 3) to eliminate left-recursion. We get this syntax-directed translation scheme (ASU-86 Fig 2.38):

start -> list eof
list -> expr ; list
       | empty
expr -> term moreterms
moreterms -> + term { print('+') } moreterms
       | - term { print('-') } moreterms
       | empty
term -> factor morefactors
morefactors -> * factor { print('*') } morefactors
       | / factor { print('/') } morefactors
       | div factor { print('DIV') } morefactors
       | mod factor { print('MOD') } morefactors
       | empty
factor -> ( expr )
       | id { print(id.lexeme) }
       | num { print(num.value) }

In the program in section 2.9 of the course book (ASU-86), the parser in parser.c has been hand-optimized to eliminate tail recursion, using the methods described on page 52-53 in ASU-86. For example, the non-terminals term and morefactors are handled by the function term, which contains a while loop. Modern C compilers will perform these optimizations automatically. Since they are unnecessary and make the program harder to understand, they have been removed in the version of the program available on these web pages.

Part A: Modifying the grammar

The grammars above describe a language that consists of expressions. Instead of expressions, one could change the language to consist of assignments. That is, instead of for example 10 * 3 + foo, it should understand input of the type fum = 10 * 3 + foo.

The postfix representation of an assignment variable = value is variable value =. For example, the postfix representation of x = 17 is x 17 =, and the postfix representation of fum = 10 * 3 + foo is fum 10 3 * foo + =.

To change the grammar from above to one that understands assignments, we first add a production for a new non-terminal called assignment. We also change the definition of list from a list of expressions to a list of assignments. The grammar now starts like this:

start -> list eof
list -> assignment ; list
       | empty
assignment -> id = expr
expr -> term moreterms
...
(And the rest is the same as above.)

Assume that we want to handle both assignments and expressions. Write a grammar for that.

Would it be possible to write a predictive recursive-descent parser for your grammar? (Hints: Is it ambiguous? Is it left-recursive? Are FIRST sets disjoint?)

Part B: Change the program

Modify the parser in the 2.9 program so that it understands the grammar with assignments that is given above in part A. (Not your own grammar from part A, which handles both expressions and assignments.) The program should generate the correct postfix output.

Part C: Calculate the values

Modify the program from part B so that it not only translates the input to postfix notation, but also calculates the result, and stores the value of the assigned variable. Modify the symbol table so you can store values in it, and retrieve those values. (Hint: Use a stack.)

Also print the assigned value after each assignment.

Just use integer operations in your calculations. The result of 10 / 3 should be 3, and the result of 1 / 3 should be 0.

Another hint: The infix expression x = x + 1 has the postfix representation x x 1 + =, but those two xes are different! The second x refers to the value of the variable, as usual in an expression, but the first x refers to the variable itself. So what should be pushed onto the stack for those xes?

Part D: The exponential function

Modify the grammar and the program to handle the exponential function, expressed with the operator ^. This operator should be left-associative (but is it, in reality?) and have the highest precedence of all operators.

Examples of input and output:

Input Postfix Value
2 ^ 3 2 3 ^ 8
2 ^ 3 * 4 2 3 ^ 4 * 32
2 * 3 ^ 4 2 3 4 ^ * 162
2 ^ 3 ^ 4 2 3 ^ 4 ^ 4096

For the report, include and explain any changes you made to the grammar and to the program.

Några tips:

Man kan tänka sig att upphöjt-operatorn bildar en ny nivå av prioriteter.

I den ursprungliga grammatiken kan man tänka sig att det finns uttryck som består av termer som i sin tur består av faktorer. Exempelvis uttrycket 2*3+4*(5+6) består av termerna 2*3 och 4*(5+6), och sen består termen 2*3 av faktorerna 2 och 3, och termen 4*(5+6) består av faktorerna 4 och (5*6).

Nu kan man tänka sig att uttryck består av termer som i sin tur består av faktorer, men sen består faktorerna av exponentargument (eller vad man nu ska kalla dem). Exempelvis uttrycket 2*3+4^5*6^(7+8) består av termerna 2*3 och 4^5*6^(7+8), och sen består termen 2*3 precis som förut av faktorerna 2 och 3, men den andra termen 4^5*6^(7+8) består av faktorerna 4^5 och 6^(7+8). Den första av de faktorerna, 4^5, består av exponentargumenten 4 och 5, och den andra faktorn, 6^(7+8), består av exponentargumenten 6 och (7+8).

Jag brukar tycka att det blir lättare att se om man ritar rutor runt uttrycken, termerna, faktorerna och exponentargumenten.

Report

Show your results and discuss them with the teacher,
or,
send an e-mail with clear and full explanations of what you have done, and answers to all the questions on this page. Include the source code, with your changes clearly marked. If you use attachments, send them as PDF and not Word.

Even if you don't send a report by e-mail, we advise that you write down your answers, to facilitate communication and for your own later use.


Thomas Padron-McCarthy (thomas.padron-mccarthy@oru.se) October 7, 2017