KOI: Lab Exercise 2

A hand-coded parser. You will modify the parser in the very simple compiler in section 2.9 of the (old edition of the) course book. (Source code.)

Don't worry if this assignment takes longer to complete than the others. It could have been split into two or more assignments, but since it covers a single subject (parsing using a hand-coded predictive recursive-descent parser) it has been kept as one assignment.

Om den här uppgiften:

En sak som jag brukar lära ut i grundläggande programmeringskurser är att programmering är en form av kommunikation. Och då menar jag inte med datorn, utan med andra människor som någon gång i framtiden ska arbeta vidare med ens program. När man arbetar som programmerare är det nämligen ganska ovanligt att man sätter sig ner och skriver ett program från noll, utan det är mycket vanligare att det redan finns programkod, och den programkoden ska man sen modifiera, eller integrera i sitt program. Det är därför det är så viktigt att programkoden är begriplig, till exempel med informativa och rättvisande variabelnamn.

I den här uppgiften får vi ett litet program, som vi ska arbeta vidare med. Det är inte bara en laboration i programmering, utan det är också en övning i att sätta sig in i existerande programkod, så man förstår den tillräckligt bra för att kunna modifiera den.

In English:

About this task:

One of the things I teach in basic programming courses is that programming is a form of communication. Not with the computer, but with other people who some time in the future will continue working with your programs. When working as a programmer, it is unusual to sit down and write a program from scratch. It is much more common that there is already some program code, and that program code should then be modified, or integrated into another program. That's why it's so important that the program code is understandable, for example with informative and correct variable names.

In this task we get a small program, which we will continue working with. It's not just a programming lab about compilers, but it's also an exercise about how to work with existing program code, so you understand it well enough to be able to modify it.

The grammar

A syntax-directed translation scheme (ASU-86 Fig 2.35) for the infix-to-postfix translator:
start -> list eof
list -> expr ; list
       | empty
expr -> expr + term { print('+') }
       | expr - term { print('-') }
       | term
term -> term * factor { print('*') }
       | term / factor { print('/') }
       | term div factor { print('DIV') }
       | term mod factor { print('MOD') }
       | factor
factor -> ( expr )
       | id { print(id.lexeme) }
       | num { print(num.value) }
(empty is used as the symbol for the empty string.)

The grammar is left-recursive. We use the standard method (see the lecture notes from lecture 3) to eliminate left-recursion. We get this syntax-directed translation scheme (ASU-86 Fig 2.38):

start -> list eof
list -> expr ; list
       | empty
expr -> term moreterms
moreterms -> + term { print('+') } moreterms
       | - term { print('-') } moreterms
       | empty
term -> factor morefactors
morefactors -> * factor { print('*') } morefactors
       | / factor { print('/') } morefactors
       | div factor { print('DIV') } morefactors
       | mod factor { print('MOD') } morefactors
       | empty
factor -> ( expr )
       | id { print(id.lexeme) }
       | num { print(num.value) }

In the program in section 2.9 of the course book (ASU-86), the parser in parser.c has been hand-optimized to eliminate tail recursion, using the methods described on page 52-53 in ASU-86. For example, the non-terminals term and morefactors are handled by the function term, which contains a while loop. Modern C compilers will perform these optimizations automatically. Since they are unnecessary and make the program harder to understand, they have been removed in the version of the program available on these web pages.

Part A: Modifying the grammar

The grammars above describe a language that consists of expressions. Instead of expressions, one could change the language to consist of assignments. That is, instead of for example 10 * 3 + foo, it should understand input of the type fum = 10 * 3 + foo.

The postfix representation of an assignment variable = value is variable value =. For example, the postfix representation of x = 17 is x 17 =, and the postfix representation of fum = 10 * 3 + foo is fum 10 3 * foo + =.

To change the grammar from above to one that understands assignments, we first add a production for a new non-terminal called assignment. We also change the definition of list from a list of expressions to a list of assignments. The grammar now looks like this, with changes marked in red:

start -> list eof
list -> assignment ; list
       | empty
assignment -> id { print(id.lexeme) } = expr { print('=') }
expr -> term moreterms
moreterms -> + term { print('+') } moreterms
       | - term { print('-') } moreterms
       | empty
term -> factor morefactors
morefactors -> * factor { print('*') } morefactors
       | / factor { print('/') } morefactors
       | div factor { print('DIV') } morefactors
       | mod factor { print('MOD') } morefactors
       | empty
factor -> ( expr )
       | id { print(id.lexeme) }
       | num { print(num.value) }

Assume that we want to handle both assignments and expressions. Write a grammar for that.

Would it be possible to write a predictive recursive-descent parser for your grammar? (Hints: Is it ambiguous? Is it left-recursive? Are FIRST sets disjoint?)

Part B: Change the program

Modify the parser in the 2.9 program so that it understands the grammar with assignments that is given above in part A. (Not your own grammar from part A, which handles both expressions and assignments.) The program should generate the correct postfix output.

Part C: Calculate the values

Modify the program from part B so that it not only translates the input to postfix notation, but also calculates the result, and stores the value of the assigned variable. (Hints: Use a stack to calculate the value of the expression. To store the values of variables, you can modify the symbol table so you can store variable values in it, and retrieve those values.)

Also print the assigned value after each assignment.

Just use integer operations in your calculations. The result of 10 / 3 should be 3, and the result of 1 / 3 should be 0.

Another hint: The infix expression x = x + 1 has the postfix representation x x 1 + =, but those two xes are different! The second x refers to the value of the variable, as usual in an expression, but the first x refers to the variable itself. So which different numbers should be pushed onto the stack for those different xes?

Part D: The exponential function

Modify the grammar and the program to handle the exponential function, expressed with the operator ^. This operator should be left-associative (but is it, in reality?) and have the highest precedence of all operators.

Examples of input and output:

Input Postfix Value
2 ^ 3 2 3 ^ 8
2 ^ 3 * 4 2 3 ^ 4 * 32
2 * 3 ^ 4 2 3 4 ^ * 162
2 ^ 3 ^ 4 2 3 ^ 4 ^ 4096
10 * 2 ^ 3 ^ 4 * 10 10 2 3 ^ 4 ^ * 10 * 409600

For the report, include and explain any changes you made to the grammar and to the program.

Några tips:

Man kan tänka sig att upphöjt-operatorn bildar en ny nivå av prioriteter.

I den ursprungliga grammatiken kan man tänka sig att det finns uttryck som består av termer som i sin tur består av faktorer. Exempelvis uttrycket 2*3+4*(5+6) består av termerna 2*3 och 4*(5+6), och sen består termen 2*3 av faktorerna 2 och 3, och termen 4*(5+6) består av faktorerna 4 och (5+6).

Nu kan man tänka sig att uttryck består av termer som i sin tur består av faktorer, men sen består faktorerna av exponentargument (eller vad man nu ska kalla dem). Exempelvis uttrycket 2*3+4^5*6^(7+8) består av termerna 2*3 och 4^5*6^(7+8), och sen består termen 2*3 precis som förut av faktorerna 2 och 3, men den andra termen 4^5*6^(7+8) består av faktorerna 4^5 och 6^(7+8). Den första av de faktorerna, 4^5, består av exponentargumenten 4 och 5, och den andra faktorn, 6^(7+8), består av exponentargumenten 6 och (7+8).

Jag brukar tycka att det blir lättare att se om man ritar rutor runt uttrycken, termerna, faktorerna och exponentargumenten.

En sak till: Kom ihåg att i C betyder ^-operatorn exklusivt eller på bitarna. Det finns ingen exponent-operator i C.

In English:

Some useful tips:

The exponential operator forms a new level of priority.

In the original grammar one can imagine that there are expressions, that consist of terms, which in turn consist of factors. For example, the expression 2 * 3 + 4 * (5 + 6) consists of the terms 2 * 3 and 4 * (5 + 6), and then the term 2 * 3 consists of the factors 2 and 3, and the term 4 * (5 + 6) consists of the factors 4 and (5 + 6).

With the exponential operator, you can imagine that expressions still consist of terms that in turn consist of factors, but then the factors consist of exponential arguments. For example, the expression 2 * 3 + 4 ^ 5 * 6 ^ (7 + 8) consists of the terms 2 * 3 and 4 ^ 5 * 6 ^ (7 + 8), and then the term 2 * 3 consists, exactly as before, of the factors 2 and 3, but the second term 4 ^ 5 * 6 ^ (7 + 8) consists of the factors 4 ^ 5 and 6 ^ (7 + 8). The first of the factors, 4 ^ 5, consists of the exponential arguments 4 and 5, and the other factor, 6 ^ (7 + 8), consists of the exponential arguments 6 and (7 + 8).

I usually find it easier to visualize if you draw squares around the expressions, terms, factors, and exponential arguments.

Also remember that in C, the ^ operator means bitwise exclusive or. There is no exponentiation operator in C.

Another thing that we learn in introductory programming courses, besides that programming is a form of communication, is that you need to test your programs. Programming can be hard, and it is easy to make mistakes. Therefore you need to invent and run test cases to try to find your mistakes. Don't forget this step. I have some test cases of my own that I will be testing your progam with. If they fail, you will get your program back, and you will need to fix it.

Report

Show your results and discuss them with the teacher,
or,
send an e-mail with clear and full explanations of what you have done, and answers to all the questions on this page. If you use attachments, send them as PDF and not Word. Also send the source code, as a Zip file or equivalent.

Even if you don't send a report by e-mail, we advise that you write down your answers, to facilitate communication and for your own later use.


Thomas Padron-McCarthy (thomas.padron-mccarthy@oru.se) December 26, 2021