Compilers and Language Processing Tools Summer Term 2011 Prof. Dr. Arnd Poetzsch-Heffter Software Technology Group TU Kaiserslautern c Prof. Dr. Arnd Poetzsch-Heffter 1 Content of Lecture 1. Introduction 2. Syntax and Type Analysis 2.1 Lexical Analysis 2.2 Context-Free Syntax Analysis 2.3 Context-Dependent Syntax Analysis 3. Translation to Target Language 3.1 Translation of Imperative Language Constructs 3.2 Translation of Object-Oriented Language Constructs 4. Selected Aspects of Compilers 4.1 Intermediate Languages 4.2 Optimization 4.3 Data Flow Analysis 4.4 Register Allocation 4.5 Code Generation 5. Garbage Collection 6. XML Processing (DOM, SAX, XSLT) c Prof. Dr. Arnd Poetzsch-Heffter 2 Context-Free Syntax Analysis 2.2. Context-Free Syntax Analysis c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 3 Context-Free Syntax Analysis Introduction Section outline 1. Specification of parsers 2. Implementation of parsers 2.1 Top-down syntax analysis - Recursive descent - LL(k) parsing theory - LL parser generation 2.2 Bottom-up syntax analysis - Principles of LR parsing - LR parsing theory - SLR, LALR, LR(k) parsing - LALR parser generation 3. Error handling 4. Concrete and abstract syntax c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 4 Context-Free Syntax Analysis Introduction Task of context-free syntax analysis • Check if token stream (from scanner) matches context-free syntax of language if erroneous: error handling if correct: construct syntax tree Parser Token Stream Abstract / Concrete Syntax Tree c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 5 Context-Free Syntax Analysis Introduction Task of context-free syntax analysis (2) Remarks: • Parsing can be interleaved with other actions processing the program (e.g. attributation). • Syntax tree controls translation. We distinguish Concrete syntax tree corresponding to context-free grammar Abstract syntax tree providing a more compact representation tailored to subsequent phases c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 6 Context-Free Syntax Analysis Specification of Parsers 2.2.1 Specification of Parsers c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 7 Context-Free Syntax Analysis Specification of Parsers Specification of parsers 2 general specification techniques • Syntax diagrams • Context-free grammars (often in extended form) c Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 8
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Compilers and Language Processing ToolsSummer Term 2011
Context-Free Syntax Analysis Specification of Parsers
Context-Free Grammars
DefinitionLet• N and T be two alphabets with N ∩ T = ∅• Π a finite subset of N × (N ∪ T )∗
• S ∈ NThen, Γ = (N,T ,Π,S) is a context-free grammar (CFG) where• N is the set of nonterminals• T is the set of terminals• Π is the set of productions rules• S is the start symbol (axiom)
Context-Free Syntax Analysis Specification of Parsers
Derivation (2)
• A derivation φ0, . . . , φn is a leftmost derivation (rightmost) if inevery derivation step φi ⇒ φi+1 the leftmost (rightmost)nonterminal in φi is replaced.
• Leftmost and rightmost derivation steps are denoted by φ⇒lm ψand φ⇒rm ψ resp.
• The tree representation of a derivation is a syntax tree.
• L(Γ) = {z ∈ T ∗ |S ⇒∗ z} is the language generated by Γ.
• x ∈ L(Γ) is a sentence of Γ (germ. Satz).
• φ ∈ (N ∪ T )∗ with S ⇒∗ φ is a sentential form of Γ (germ.Satzform).
Context-Free Syntax Analysis Specification of Parsers
Derivation (3)
Remarks:• Each derivation corresponds to exactly one syntax tree. In
reverse, for each syntax tree, there can be several derivations.• For “syntax tree”, the term “derivation tree” is also used.• For each language, there can be several generating grammars,
i.e., the mapping L: Grammar→ Language is in general notinjective.
Context-Free Syntax Analysis Implementation of Parsers
Context-free analysis with linear complexity
• Restrictions on grammar (not every CFG has a linear parser)• Use of push-down automata or systems of recursive procedures• Usage of look ahead to remaining input in order to select next
Context-Free Syntax Analysis Implementation of Parsers
Syntax analysis methods and parser generators
• Basic knowledge of syntax analysis is essential for use of parsergenerators.
• Parser generators are not always applicable.• Often, error handling has to be done manually.• Methods underlying parser generation is a good example for a
generic technique (and a highlight of computer science!).
Context-Free Syntax Analysis Implementation of Parsers
Top-down syntax analysis
Learning objectives• Understand the general principle of top-down syntax analysis• Be able to implement recursive descent parsing (by example)• Know expressiveness and limitations of top-down parsing• Understand the basic concepts of LL(k) parsing
Context-Free Syntax Analysis Implementation of Parsers
Recursive descent parsing
Basic idea
• Each nonterminal A is associated with a procedure. Thisprocedure accepts a partial sentence derived from A.
• The procedure implements a finite automaton constructed fromthe productions with A as left-hand side. This automaton is calledthe item automaton of A.
• The recursiveness of the grammar is mapped to mutual recursiveprocedures such that the stack of higher programing languages isused for handling the recursion.
Context-Free Syntax Analysis Implementation of Parsers
Recursive descent parsing procedures
• The recursive procedures are constructed from the item automata.• The input is a token stream terminated by #.• The variable currToken contains one token look ahead, i.e., the
Context-Free Syntax Analysis Implementation of Parsers
Recursive descent parsing procedures (5)
Remarks:
• Recursive descentI is relatively easy to implementI can easily be used with other tasks (see following example)I is a typical example for syntax-directed methods (see also following
example)
• Example uses one token look ahead.• Error handling is not considered.
Context-Free Syntax Analysis Implementation of Parsers
Recursive descent and evaluation (4)
• Extension of parser with actions/computations can easily beimplemented, but mixes conceptually different phases/tasks andcauses programs hard to maintain.
• Question: For which grammars does the recursive descenttechnique work?→ LL(k) parsing theory
Context-Free Syntax Analysis Implementation of Parsers
Non LL(k) grammars (5)
Example 3:For the following grammar, there is no k such that Γ5 is an LL(k).
• S → A | B• A→ aAb | 0• B → aBbb | 1
Remark:For L(Γ5), there exists noLL(k) grammar.
Proof.Let k be arbitrary, but fixed.Choose two derivations according to the LL(k) definition and show that,despite of equal prefixes of length k, β and γ are not equal:
S ⇒∗lm S ⇒lm A⇒∗lm ak0bk
S ⇒∗lm S ⇒lm B ⇒∗lm ak1b2k
Then: prefix(k ,ak0bk ) = ak = prefix(k ,ak1b2k ), but β = A 6= B = γ.
Context-Free Syntax Analysis Implementation of Parsers
Proof of LL characterization lemma
• Direction from left to right:Γ is LL(1) implies FIRST-FOLLOW disjointness.
Proof by contradiction:(“FIRST-FOLLOW intersection non empty” implies “not LL(1)” )Let A→ β and A→ γ be two distinct productions of Gamma(β 6= γ) such that the FIRST-FOLLOW intersection is non empty.
Case distinction. We consider three cases:
Case 1: β ⇒∗ ε and γ ⇒∗ εIn this case, the LL(1) property does not hold for A→ β, A→ γ.
Context-Free Syntax Analysis Implementation of Parsers
Bottom-up syntax anaysis
Learning objectives:• General principles of bottom-up syntax analysis• LR(k) analysis• Resolving conflicts in parser generation• Connection between CFGs and push-down automata
Context-Free Syntax Analysis Implementation of Parsers
Basic ideas: bottom-up syntax analysis
• Bottom-up analysis is more powerful than top-down analysis,since production is chosen at the end of the analysis while intop-down analysis the production is selected up front.
• LR: read input from left (L)and search for rightmost derivations (R)
Context-Free Syntax Analysis Implementation of Parsers
Principles of LR parsing
1. Reduce from sentence to axiom according to productions of Γ
2. Reduction yields sentential forms αx with α ∈ (N ∪ T )∗ andx ∈ T ∗ where x is the input rest
3. α has to be a prefix of a right sentential form of Γ. Such prefixesare called viable prefixes. This prefix property has to holdinvariantly during LR parsing to avoid dead ends.
4. Reductions are always made at the leftmost possible position.
Context-Free Syntax Analysis Implementation of Parsers
Regularity of viable prefixes
TheoremThe language of viable prefixes of a grammar Γ is regular.
Proof.Cf. Wilhelm, Maurer Thm. 8.4.1 and Corrollary 8.4.2.1. (pp. 361, 362).Essential proof steps are illustrated in the following by the constructionof the LR-DFA(Γ).
Context-Free Syntax Analysis Implementation of Parsers
Construction of LR-DFA
Let Γ = (T ,N,Π,S) be a CFG.• For each nonterminal A ∈ N, construct item automaton• Build union of item automata: Start state is the start state of item
automaton for S, final states are final states of item automata• Add ε transitions from each state which contains the dot in front of
a nonterminal A to the starting state of the item automaton of A
TheoremThe automatonobtained from LR-DFA(Γ) by declaring all states to be final statesexactly accepts the language of viable prefixes of Γ.
Context-Free Syntax Analysis Implementation of Parsers
LR pushdown automaton
DefinitionLet Γ = (N,T ,Π,S) be a CFG. The LR-DFA pushdown automaton for Γcontains:• a finite set of states Q (the states of the LR-DFA(Γ))• a set of actions Act = {shift ,accept ,error} ∪ red(Π), where
red(Π) contains an action reduce(A→ α) for each A→ α .• an action table at : Q → Act .• a successor table succ : P × (N ∪ T )→ Q with
Context-Free Syntax Analysis Implementation of Parsers
LR pushdown automaton (2)
Remarks:• The LR-DFA pushdown automaton is a variant of pushdown
automata particularly designed for LR parsing.• States encode the read left context.• If there are no conflicts, the action table can be directly
constructed from the LR-DFA:I accept: final state of item automaton of start symbolI reduce: all other final statesI error: error stateI shift: all other states
Context-Free Syntax Analysis Implementation of Parsers
LR-DFA construction
Questions:• Does LR-DFA construction work for all unambiguous grammars?• For which grammars does the construction work?• How can the construction be generalized / made more
Context-Free Syntax Analysis Implementation of Parsers
LR parsing theory (2)
Remarks:
• While for LL grammars the selection of the production depends onthe nonterminal to be derived, for LR grammars it depends on thecomplete left context.
• For LL grammars, the look ahead considers the language to begenerated from the nonterminal. For LR grammars, the lookahead considers the language generated from not yet readnonterminals.
Context-Free Syntax Analysis Implementation of Parsers
Proof of LR(0) characterization
• Left-to-Right-Direction:LR(0) property implies that LR-DFA has no conflicts.
Let q be a state of LR-DFA(Γ) with two items [A→ α.] and[B → β.γ].We show that these items do not cause a conflict.
By Lemma 1(b), there are µ, ν with µα = νβ and with µ = ε orν = ε.Let ψ be a path leading to q, then according to Lemma 1(a),there exists ϕ with ψ = ϕµα = ϕνβ.By Lemma 2, there are the following rightmost derivations
Context-Free Syntax Analysis Implementation of Parsers
Proof of LR(0) characterization (2)
Case 1: Suppose, the items [A→ α.] and [B → β.γ] are differentand cause a reduce/reduce-conflict, i.e. γ = ε.
We show that then it has to holds that A = B and α = β,i.e. both items are identical which is a contradiction to theassumption.
Since γ = ε and ϕµα = ϕνβ, it holds that ϕµα = ϕνβγ.By the LR(0) property, it holds ϕµ = ϕν and A = B (and v = v ).From ϕν = ϕµ and ϕµα = ϕνβ, it follows that α = β,i.e. both items are identical.
Context-Free Syntax Analysis Implementation of Parsers
Proof of LR(0) characterization (4)
If γ = cDρ, we can extend the above rightmost derivation
S ⇒∗rm ϕνBv ⇒rm ϕνβγv = ϕµαcDρv⇒∗rm ϕµαcxEyv ⇒rm ϕµαcxwyv
Then we have the rightmost derivations
S ⇒∗rm ϕµAu ⇒rm ϕµαu
S ⇒∗rm ϕµαcxEyv ⇒rm ϕµαcxwyv
The LR(0) property yields in particular that ϕµαcx = ϕµ.Thus, it has to hold that αcx = ε which is not possible; thus, theassumption yieds to a contradiction.
According to the LR(0) definition, we consider two rightmostderivations
S ⇒∗rm ψAu ⇒rm ψαu
S ⇒∗rm ϕBx ⇒rm ψαy
In derivation 2, we assume a production B → δsuch that ϕδx = ψαy .By Lemma 2, [A→ α.] belongs to a state reached by ψαand [B → δ.] to a state reached by ϕδ.Since ϕδx = ψαy , it holds ϕδ is prefix of ψα or vice versa.
Case distinction on the relationship between ϕδ und ψα:
Context-Free Syntax Analysis Implementation of Parsers
Proof of LR(0) characterization (6)
1. ϕδ = ψα:
[A→ α.] and [B → δ.] belong to the same state.Since the LR-DFA has no conflicts, it holds that α = δ and A = B,thus also ϕ = ψ and x = y . This yields the LR(0) property.
2. ϕδ is a proper prefix of ψα:
Since ϕδx = ψαy , there exists c ∈ T und z ∈ T ∗
such that x = czy and hence ϕδcz = ψα.By Lemma 1(c), the state reached by ϕδ has to contain a transitionmarked with c and an item [C → µ.cν].Furthermore, by Lemma 2, the state reached by ϕδ has to contain[B → δ.] But this would cause a conflict with [B → δ.] whichcontradicts the conflict-freeness of LR-DFA such that this casecannot occur.
3. ψα is a proper prefix of ϕδ: analog to case (b).
Context-Free Syntax Analysis Implementation of Parsers
Resolving conflicts by look ahead
• Compute look ahead sets from (N ∪ T )≤k for items. The lookahead set of an item approximates the set of prefixes of length kwith which the input rest at this item can start.
• If the look ahead sets at an item are disjoint, then the action to beexecuted (shift, reduce) can be determined by k symbols lookahead.
• For an item, select the action whose look ahead set contains theprefix of the input rest. Action table has to be extended.
• For computation of look ahead sets, there are different methods.
Context-Free Syntax Analysis Implementation of Parsers
SLR grammars
Definition (SLR(1) grammar)Let Γ = (N,T ,Π,S) be a CFG and LA([A→ α.]) = FOLLOW1(A).
A state LR-DFA(Γ) has an SLR(1) conflict if there exists• two different reduce items with LA([A→ α.]) ∩ LA([B → β.]) 6= ∅ or• two items [A→ α.] and [B → α.aβ] with a ∈ LA([A→ a]).
Error handling is required in all analysis phases and at runtime. Onedistinguishes• lexical errors• parse errors (in context-free analysis)• errors in name and type analysis• runtime errors (cannot be avoided in most cases)• logical errors (behavioural errors)
Remarks:
1. The first 2 (3) kinds of errors are syntactic errors. In the following,we only consider error handling in context-free analysis.
2. The specification of what an error is, is defined by the languagespecification.
1. Panic error handlingMark synchronizing terminal symbols, e.g. “end” or “;”
If parser reaches error state, all symbols up to next synchronizingsymbol are skipped and the stack is corrected as if the productionwith the synchronizing symbol was read correctly.
I Pros: easy to implement, termination guaranteedI Cons: large parts of the program can be skipped or misinterpretedI Example: Incorrect input a : = b *** c;
Read until “;” correct stack and continue as if statement has beenaccepted
2. Error productionsExtend grammar with productions describing typical errorsituations, so called error productions.Error messages can be directly associated with error productions.
I Pros: easy to implement, termination guaranteedI Cons: extended grammar can belong to more general grammar
class, knowledge of typical error situations is necessaryI Example: Typical error in PASCAL
if ... then A := E; else ...Error Production:Stmt → if Expr then Stmt∗ ; else Stmt∗
3. Production-local error correctionGoal is local correction of input such that analysis can beresumed.Local means that it is tried to correct the input for the currentproduction.
I Pros: flexible and powerful techniqueI Cons: problematic if errors occur earlier than they can be detected,
operations for corrections can lead to nonterminating analysis
4. Global error correctionAttempt to get a correction that is as good as possible by alteringthe read input or the look ahead input.
Idea: Define distance or quality measure on inputs. For eachincorrect input, look for a syntactically correct input that is bestaccording to the used measure.
I Pros: very powerful techniqueI Cons: analysis effort can be rather high, implementation is complex
5. Interactive error correctionIn modern programming languages, syntactic analysis is oftenalready supported by editors. In this case, editor marks errorpositions.
I Pros: quick feedback, possible error positions are shown directly,interaction with programmer possible
I Cons: editing can be disturbed, analysis must be able to handleincomplete programs
The presented techniques can be combined.Selection criteria depend on programming language syntax.Error handling also depends on grammar class and implementationtechniques used for parser.
• Procedure: Use correction window of n symbols before symbol atwhich error was detected. Check all possible variations of symbolsequence in correction window that can be obtained by insertion,exchange or modification of a symbol at any position.
• Quality measure: Choose variation that allows longestcontinuation of parsing procedure
• Implementation: Work with two stack automata, one representsthe configuration at the beginning of the correction window, theother one the configuration at the end of the correction window.In an error case, the automaton running behind can be used toresume at the old position and to test the computed variations.
• Cons:I non-modular, no clear interfacesI not suitable for global aspects of translationI subsequent phases depend on parsingI cannot be used with every parser generator
Context-Free Syntax Analysis Concrete and Abstract Syntax
Abstract syntax vs. concrete Syntax
Let PL be some (programming) language with CFG Γ andp ∈ PL be a program.
Definition (Concrete syntax)The concrete syntax of PL determines the actual text representation ofprograms (incl. key words, separators, etc.).The syntax tree of p according to Γ is the concrete syntax tree of p.
Definition (Abstract syntax)The abstract syntax of PL describes the tree structure of programs in aform that is sufficient and suitable for further processing.A tree for representing a program p according to the abstract syntax iscalled the abstract syntax tree of p.
Context-Free Syntax Analysis Concrete and Abstract Syntax
Concrete syntax tree as interface
Token Stream
Parser (with Tree Construction)
Concrete Syntax Tree
Further LanguageProcessing
• Resolves disadvantages of direct control by parser• Advantages over abstract syntax
I No additional specification of abstract syntax requiredI Tree construction does not have to be described.I Tree construction can be done automatically by parser generators.
Context-Free Syntax Analysis Concrete and Abstract Syntax
Abstract syntax tree as interface
Token Stream
Parser (with Transforming Tree Construction)
Abstract Syntax Tree
Further LanguageProcessing
• Advantages over concrete syntaxI Simpler, more compact tree representationI Simplifies later phasesI Often implemented by programming or specification language as
Context-Free Syntax Analysis Concrete and Abstract Syntax
Order-sorted data types (2)
Definition (Order-sorted types - contd.)Order-sorted terms are recursively defined as• If t is a term of type Vi , then it is also of type V.• If ti is a term of type Ti for each i, then T (t1, . . . , tn) is of type T ,
T is also the constructor.• If s1, . . . , sk are terms of type S, then L(s1, . . . , sn) is of type L,
L is also the list constructor.Additional operators• the selectors selk : T → Tk returns the k-th subterm• the usual list operations (rest, append, conc, ...)