Don't worry if this assignment takes longer to complete than the others. It could have been split into two or more assignments, but since it covers a single subject (parsing using a hand-coded predictive recursive-descent parser) it has been kept as one assignment.
Om den här uppgiften:
En sak som jag brukar lära ut i grundläggande programmeringskurser är att programmering är en form av kommunikation. Och då menar jag inte med datorn, utan med andra människor som någon gång i framtiden ska arbeta vidare med ens program. När man arbetar som programmerare är det nämligen ganska ovanligt att man sätter sig ner och skriver ett program från noll, utan det är mycket vanligare att det redan finns programkod, och den programkoden ska man sen modifiera, eller integrera i sitt program. Det är därför det är så viktigt att programkoden är begriplig, till exempel med informativa och rättvisande variabelnamn. I den här uppgiften får vi ett litet program, som vi ska arbeta vidare med. Det är inte bara en laboration i programmering, utan det är också en övning i att sätta sig in i existerande programkod, så man förstår den tillräckligt bra för att kunna modifiera den. In English: About this task: One of the things I teach in basic programming courses is that programming is a form of communication. Not with the computer, but with other people who some time in the future will continue working with your programs. When working as a programmer, it is unusual to sit down and write a program from scratch. It is much more common that there is already some program code, and that program code should then be modified, or integrated into another program. That's why it's so important that the program code is understandable, for example with informative and correct variable names. In this task we get a small program, which we will continue working with. It's not just a programming lab about compilers, but it's also an exercise about how to work with existing program code, so you understand it well enough to be able to modify it. |
(empty is used as the symbol for the empty string.)start -> list eof list -> expr ; list | empty expr -> expr + term { print('+') } | expr - term { print('-') } | term term -> term * factor { print('*') } | term / factor { print('/') } | term div factor { print('DIV') } | term mod factor { print('MOD') } | factor factor -> ( expr ) | id { print(id.lexeme) } | num { print(num.value) }
The grammar is left-recursive. We use the standard method (see the lecture notes from lecture 3) to eliminate left-recursion. We get this syntax-directed translation scheme (ASU-86 Fig 2.38):
start -> list eof list -> expr ; list | empty expr -> term moreterms moreterms -> + term { print('+') } moreterms | - term { print('-') } moreterms | empty term -> factor morefactors morefactors -> * factor { print('*') } morefactors | / factor { print('/') } morefactors | div factor { print('DIV') } morefactors | mod factor { print('MOD') } morefactors | empty factor -> ( expr ) | id { print(id.lexeme) } | num { print(num.value) }
In the program in section 2.9 of the course book (ASU-86), the parser in parser.c has been hand-optimized to eliminate tail recursion, using the methods described on page 52-53 in ASU-86. For example, the non-terminals term and morefactors are handled by the function term, which contains a while loop. Modern C compilers will perform these optimizations automatically. Since they are unnecessary and make the program harder to understand, they have been removed in the version of the program available on these web pages.
The postfix representation of an assignment variable = value is variable value =. For example, the postfix representation of x = 17 is x 17 =, and the postfix representation of fum = 10 * 3 + foo is fum 10 3 * foo + =.
To change the grammar from above to one that understands assignments, we first add a production for a new non-terminal called assignment. We also change the definition of list from a list of expressions to a list of assignments. The grammar now looks like this, with changes marked in red:
start -> list eof list -> assignment ; list | empty assignment -> id { print(id.lexeme) } = expr { print('=') } expr -> term moreterms moreterms -> + term { print('+') } moreterms | - term { print('-') } moreterms | empty term -> factor morefactors morefactors -> * factor { print('*') } morefactors | / factor { print('/') } morefactors | div factor { print('DIV') } morefactors | mod factor { print('MOD') } morefactors | empty factor -> ( expr ) | id { print(id.lexeme) } | num { print(num.value) }
Assume that we want to handle both assignments and expressions. Write a grammar for that.
Would it be possible to write a predictive recursive-descent parser for your grammar? (Hints: Is it ambiguous? Is it left-recursive? Are FIRST sets disjoint?)
Also print the assigned value after each assignment.
Just use integer operations in your calculations. The result of 10 / 3 should be 3, and the result of 1 / 3 should be 0.
Another hint: The infix expression x = x + 1 has the postfix representation x x 1 + =, but those two xes are different! The second x refers to the value of the variable, as usual in an expression, but the first x refers to the variable itself. So which different numbers should be pushed onto the stack for those different xes?
As an example test case (that I will run!), after this input:
kalle = 5 * 100 + 10 * 50; olle = kalle * kalle * 13 + kalle * (10 - 5); |
the variable olle should have been assigned the value 13005000.
Examples of input and output:
Input | Postfix | Value |
---|---|---|
2 ^ 3 | 2 3 ^ | 8 |
2 ^ 3 * 4 | 2 3 ^ 4 * | 32 |
2 * 3 ^ 4 | 2 3 4 ^ * | 162 |
2 ^ 3 ^ 4 | 2 3 ^ 4 ^ | 4096 |
10 * 2 ^ 3 ^ 4 * 10 | 10 2 3 ^ 4 ^ * 10 * | 409600 |
For the report, include and explain any changes you made to the grammar and to the program.
Några tips:
Man kan tänka sig att upphöjt-operatorn bildar en ny nivå av prioriteter. I den ursprungliga grammatiken kan man tänka sig att det finns uttryck som består av termer som i sin tur består av faktorer. Exempelvis uttrycket 2*3+4*(5+6) består av termerna 2*3 och 4*(5+6), och sen består termen 2*3 av faktorerna 2 och 3, och termen 4*(5+6) består av faktorerna 4 och (5+6). Nu kan man tänka sig att uttryck består av termer som i sin tur består av faktorer, men sen består faktorerna av exponentargument (eller vad man nu ska kalla dem). Exempelvis uttrycket 2*3+4^5*6^(7+8) består av termerna 2*3 och 4^5*6^(7+8), och sen består termen 2*3 precis som förut av faktorerna 2 och 3, men den andra termen 4^5*6^(7+8) består av faktorerna 4^5 och 6^(7+8). Den första av de faktorerna, 4^5, består av exponentargumenten 4 och 5, och den andra faktorn, 6^(7+8), består av exponentargumenten 6 och (7+8). Jag brukar tycka att det blir lättare att se om man ritar rutor runt uttrycken, termerna, faktorerna och exponentargumenten. En sak till: Kom ihåg att i C betyder ^-operatorn exklusivt eller på bitarna. Det finns ingen exponent-operator i C. In English: Some useful tips: The exponential operator forms a new level of priority. In the original grammar one can imagine that there are expressions, that consist of terms, which in turn consist of factors. For example, the expression 2 * 3 + 4 * (5 + 6) consists of the terms 2 * 3 and 4 * (5 + 6), and then the term 2 * 3 consists of the factors 2 and 3, and the term 4 * (5 + 6) consists of the factors 4 and (5 + 6). With the exponential operator, you can imagine that expressions still consist of terms that in turn consist of factors, but then the factors consist of exponential arguments. For example, the expression 2 * 3 + 4 ^ 5 * 6 ^ (7 + 8) consists of the terms 2 * 3 and 4 ^ 5 * 6 ^ (7 + 8), and then the term 2 * 3 consists, exactly as before, of the factors 2 and 3, but the second term 4 ^ 5 * 6 ^ (7 + 8) consists of the factors 4 ^ 5 and 6 ^ (7 + 8). The first of the factors, 4 ^ 5, consists of the exponential arguments 4 and 5, and the other factor, 6 ^ (7 + 8), consists of the exponential arguments 6 and (7 + 8). I usually find it easier to visualize if you draw squares around the expressions, terms, factors, and exponential arguments. Also remember that in C, the ^ operator means bitwise exclusive or. There is no exponentiation operator in C. |
Another thing that we learn in introductory programming courses, besides that programming is a form of communication, is that you need to test your programs. Programming can be hard, and it is easy to make mistakes. Therefore you need to invent and run test cases to try to find your mistakes. Don't forget this step. I have some test cases of my own that I will be testing your progam with. If they fail, you will get your program back, and you will need to fix it. |
Even if you don't send a report by e-mail, we advise that you write down your answers, to facilitate communication and for your own later use.