Kompilatorer och interpretatorer: Lecture 2 (Preliminary!)

Note: This is an outline of what I intend to say on the lecture. It is not a definition of the course content, and it does not replace the textbook.

Today: More about syntax analysis ("parsing"),
Aho et al, sections 2.1 - 2.4

But first: Rest from lecture 1:

1.5 The Grouping of Phases

ASU p 21, Reducing the number of passes:

More rest from lecture 1:

1.6 Compiler-Construction Tools

ASU p 22: "Compiler-compiler" = a complete system for compiler building. But! "Yacc" = "Yet Another Compiler-Compiler" is a parser generator.

2.1 Overview

A compiler that translates infix to postfix:

Tree Infix notation Postfix notation
An abstract syntax tree for the expression 2 + 3 2 + 3 2 3 +
An abstract syntax tree for the expression 2 + 3 * 4 2 + 3 * 4 2 3 4 * +
An abstract syntax tree for the expression 2 * 3 + 4 2 * 3 + 4 2 3 * 4 +
An abstract syntax tree for the expression 2 * (3 + 4) 2 * (3 + 4) 2 3 4 + *

Source and target as text.

Postfix: Stack machine. Easy to write an interpreter.

The "2.5" program: simple grammar (Sw: "grammatik") (only + and -), simple parser, very simple scanner (one character = one token).
The "2.9" program: more advanced grammar (identifiers, *, /, mod, div), therefore a more complex parser, a "real" scanner.

2.2 Syntax definition

Example: the if statement in C. An instance:

if (a == b)
  printf("Same!\n");
else
  printf("Not same!\n");

This, as you know, is the syntax for the if statement:

if ( some expression ) some statement else some other statement

A rule that could be part of a context-free grammar (Sw: kontextfri grammatik) for C:

statement -> if ( expression ) statement else statement
statement -> if ( expression ) statement
statement -> { statement-list } (forgot what?)
...

A context-free grammar contains:

  1. A set of terminals (Sw: terminaler) = terminal symbols = tokens
  2. A set of non-terminals (Sw: icke-terminaler) = non-terminal symbols (compound grammatical constructs)
  3. A set of productions (Sw: produktioner) = rules: non-terminal -> tokens/non-terminals. A production is for the non-terminal to the left.
  4. What is the start symbol (Sw: startsymbolen)
Other concepts:

Example 2.1 (p. 27)

7+3, 7+3-4+6, 3 (but not 17, -3 or 2*2)

digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
list -> digit
list -> list + digit
list -> list - digit

or

digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
list -> digit | list + digit | list - digit

Try 9-5+2.
9 -> digit -> list.
5 -> digit.
9-5 -> list - digit -> list
2 -> digit.
9-5+2 -> list + digit -> list

ASU fig 2.2, the parse tree (= concrete syntax tree) and the syntax tree (= abstract syntax tree):

Parse tree for 9-5+2

Why list + digit etc? Asymmetrical and ugly? Why not just list + list, like this:

digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
string -> string | string + string | string - string

ASU fig 2.3 (slide!):

Two parse trees for 9-5+2

Operator associativity (Sw: Operatorassociativitet)

ASU fig 2.4:

Parse trees for left- and right-associative operators

Use a grammar like above, that is,
list -> list + digit
for left-associative (Sw: vänsterassociativa) operators. Use a grammar like
list -> digit + list
for right-associative (Sw: högerassociativa) operators, for example:

right -> letter = right
letter -> a | b | c | ... | z

Operator precedence (Sw: Operatorarprioritet, operatorarprecedens)

9 + 5 * 2 = 9 + (5 * 2), not (9 + 5) * 2.
"*" has higher precedence than "+".

Express this in the grammar:

factor -> digit | ( expr )
term -> term * factor | term / factor | factor
expr -> expr + term | expr - term | term

2.3 Syntax-directed translation

Syntax-directed definitions

ASU fig 2.6:

Attribute values at nodes in a parse tree

(Syntax-directed) translations schemes

ASU fig 2.12:

An extra leaf is constructed for a semantic action

ASU fig 2.14:

Actions translating 9-5+2 into 95-2+

2.4 Parsing

How does the parser build the parse tree? Or rather: how does it traverse the input in a way so it could build a parse tree?

ASU fig 2.15:

Steps in top-down construction of a parse tree

ASU fig 2.16:

Top-down parsing while scanning the input from left to right


Thomas Padron-McCarthy (Thomas.Padron-McCarthy@tech.oru.se) January 22, 2003