Note: This is an outline of what I intend to say on the lecture.
It is not a definition of the course content,
and it does not replace the textbook.
Today:
More about syntax analysis ("parsing"),
Aho et al, sections 2.1 - 2.4
But first: Rest from lecture 1:
1.5 The Grouping of Phases
- Front end (almost = analysis).
Independent of the target machine.
Connected to the source language.
- Back end (almost = synthesis).
Dependent on the target machine.
(Sort of) independent of the source language.
- Pass.
One reading (and writing) of the source program (in some form).
Usually contains several phases (scanning, parsing...).
ASU p 21, Reducing the number of passes:
- memory usage
- how to connect the parser and the scanner
- backpatching
More rest from lecture 1:
1.6 Compiler-Construction Tools
- Parser generators (ex: Yacc, Bison)
- Scanner generators (ex: Lex, Flex)
- Data flow analysis (Swedish: dataflödesanalys) - what is that?
For optimization. Ex: Slicing.
- Kodgeneratorgeneratorer
ASU p 22:
"Compiler-compiler" = a complete system for compiler building.
But! "Yacc" = "Yet Another Compiler-Compiler" is a parser generator.
2.1 Overview
A compiler that translates infix to postfix:
Tree | Infix notation | Postfix notation |
|
2 + 3
|
2 3 +
|
|
2 + 3 * 4
|
2 3 4 * +
|
|
2 * 3 + 4
|
2 3 * 4 +
|
|
2 * (3 + 4)
|
2 3 4 + *
|
Source and target as text.
Postfix: Stack machine. Easy to write an interpreter.
- Push numbers onto the top of the stack.
- +: Pop the two top numbers, add, and push the sum.
The "2.5" program:
simple grammar (Sw: "grammatik") (only + and -),
simple parser, very simple scanner (one character = one token).
The "2.9" program:
more advanced grammar (identifiers, *, /, mod, div),
therefore a more complex parser, a "real" scanner.
2.2 Syntax definition
Example: the if statement in C. An instance:
if (a == b)
printf("Same!\n");
else
printf("Not same!\n");
This, as you know, is the syntax for the if statement:
if ( some expression ) some statement else some other statement
A rule that could be part of a context-free grammar
(Sw: kontextfri grammatik) for C:
statement -> if ( expression ) statement else statement
statement -> if ( expression ) statement
statement -> { statement-list } (forgot what?)
...
A context-free grammar contains:
- A set of terminals (Sw: terminaler) = terminal symbols = tokens
- A set of non-terminals (Sw: icke-terminaler) = non-terminal symbols (compound grammatical constructs)
- A set of productions (Sw: produktioner) = rules: non-terminal -> tokens/non-terminals. A production is for the non-terminal to the left.
- What is the start symbol (Sw: startsymbolen)
Other concepts:
- String (Sw: sträng) = a sequence of tokens
- {E-symbol} = The empty string (Sw: tomma strängen)
- Language (Sw: språk) =
the set of all strings that can be derived from the start symbol
(using the productions in the grammar),
Sw: mängden av alla strängar som kan härleds från startsymbolen
(med hjälp av produktionerna i grammtiken).
Example 2.1 (p. 27)
7+3, 7+3-4+6, 3 (but not 17, -3 or 2*2)
digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
list -> digit
list -> list + digit
list -> list - digit
or
digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
list -> digit | list + digit | list - digit
Try 9-5+2.
9 -> digit -> list.
5 -> digit.
9-5 -> list - digit -> list
2 -> digit.
9-5+2 -> list + digit -> list
ASU fig 2.2, the parse tree (= concrete syntax tree)
and the syntax tree (= abstract syntax tree):
- The start symbol in the root (Sw: rot).
- A token (or the empty string) as each leaf (Sw: löv).
- Non-terminals in the inner nodes (Sw: de inre noderna).
- The children of each inner node is the right-hand side of a production!
Why list + digit etc? Asymmetrical and ugly?
Why not just list + list, like this:
digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
string -> string | string + string | string - string
ASU fig 2.3 (slide!):
Operator associativity (Sw: Operatorassociativitet)
ASU fig 2.4:
Use a grammar like above, that is,
list -> list + digit
for left-associative (Sw: vänsterassociativa) operators.
Use a grammar like
list -> digit + list
for right-associative (Sw: högerassociativa) operators, for example:
right -> letter = right
letter -> a | b | c | ... | z
Operator precedence (Sw: Operatorarprioritet, operatorarprecedens)
9 + 5 * 2 = 9 + (5 * 2), not (9 + 5) * 2.
"*" has higher precedence than "+".
Express this in the grammar:
factor -> digit | ( expr )
term -> term * factor | term / factor | factor
expr -> expr + term | expr - term | term
2.3 Syntax-directed translation
- Syntax-directed definition = just rules
- (Syntax-directed) translation scheme = more procedural
Syntax-directed definitions
ASU fig 2.6:
(Syntax-directed) translations schemes
ASU fig 2.12:
ASU fig 2.14:
2.4 Parsing
How does the parser build the parse tree?
Or rather:
how does it traverse the input in a way so it could build a parse tree?
ASU fig 2.15:
ASU fig 2.16:
Thomas Padron-McCarthy
(Thomas.Padron-McCarthy@tech.oru.se)
January 22, 2003