1 Contents Contents Introduction Introduction A Simple Compiler A Simple Compiler Scanning – Theory and Practice Scanning – Theory and Practice Grammars and Parsing Grammars and Parsing LL(1) Parsing LL(1) Parsing LR Parsing LR Parsing Lex and yacc Lex and yacc Semantic Processing Semantic Processing Symbol Tables Symbol Tables Run-time Storage Organization Run-time Storage Organization Code Generation and Local Code Optimization Code Generation and Local Code Optimization Global Optimization Global Optimization
39
Embed
1 Contents Introduction Introduction A Simple Compiler A Simple Compiler Scanning – Theory and Practice Scanning – Theory and Practice Grammars and Parsing.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
ContentsContents IntroductionIntroduction A Simple CompilerA Simple Compiler Scanning – Theory and PracticeScanning – Theory and Practice Grammars and ParsingGrammars and Parsing LL(1) ParsingLL(1) Parsing LR ParsingLR Parsing Lex and yaccLex and yacc Semantic ProcessingSemantic Processing Symbol TablesSymbol Tables Run-time Storage OrganizationRun-time Storage Organization Code Generation and Local Code OptimizationCode Generation and Local Code Optimization Global OptimizationGlobal Optimization
2
Chapter 4 Chapter 4 Grammars and ParsingGrammars and Parsing
3
OutlineOutline Context-Free GrammarsContext-Free Grammars Errors in Context-Free GrammarsErrors in Context-Free Grammars Transforming Extended BNF GrammarsTransforming Extended BNF Grammars Parsers and RecognizersParsers and Recognizers Grammar Analysis AlgorithmsGrammar Analysis Algorithms
4
Context-Free Grammars: Concepts and Context-Free Grammars: Concepts and NotationNotation
A context-free grammar A context-free grammar G = (G = (VVtt,, V Vnn,, S S,, P P)) A finite A finite terminal vocabularyterminal vocabulary VVtt
The token set produced by scannerThe token set produced by scanner
A finite set of A finite set of nonterminal vacabularynonterminal vacabulary VVnn
Intermediate symbolsIntermediate symbols
A A start symbolstart symbol SS Vn that starts all derivationsVn that starts all derivations Also called goal symbolAlso called goal symbol
P, a finite set of P, a finite set of productionsproductions (rewriting rules) (rewriting rules) of the form of the form AAXX11XX22XXmm
AAVVnn, X, Xi i VVnn ∪ ∪ VVtt, 1, 1i i mm AAis a valid productionis a valid production
5
Context-Free Grammars: Concepts and Context-Free Grammars: Concepts and Notation Notation (Cont’d.)(Cont’d.)
Other notationsOther notations The vocabulary V of a CFG is the set of terminal The vocabulary V of a CFG is the set of terminal
and nonterminal symbols and nonterminal symbols V= V= VVnn∪∪VVtt
L(G), the set of strings derivable from S compriseL(G), the set of strings derivable from S comprise Context-free language of grammar GContext-free language of grammar G
Notational conventionsNotational conventions a, b, c, a, b, c, denote symbols in denote symbols in VVtt
A, B, C, A, B, C, denote symbols in denote symbols in VVnn
U, V, W, U, V, W, denote symbols in denote symbols in VV , , , , ,,denote strings in denote strings in V*V* u, v, w, u, v, w, denote strings in denote strings in VVtt**
6
Context-Free Grammars: Concepts and Context-Free Grammars: Concepts and Notation Notation (Cont’d.)(Cont’d.)
DerivationDerivation One step derivationOne step derivation
If If AA, then , then AA One-step derivation One-step derivation One or more steps derivation One or more steps derivation
Zero or more steps derivation Zero or more steps derivation
If If S S , then , then is said to be is said to be sentential formsentential form of the of the CFG.CFG. SF(G) is the set of sentential forms of grammar G.SF(G) is the set of sentential forms of grammar G.
L(G) = {x L(G) = {x VVtt*| S*| Sx}x} L(G) = SF(G) L(G) = SF(G) ∩∩VVtt*; that is, the language of G is simply those *; that is, the language of G is simply those
sentential forms of G that are terminal strings.sentential forms of G that are terminal strings.
7
Context-Free Grammars: Concepts and Context-Free Grammars: Concepts and Notation Notation (Cont’d.)(Cont’d.)
Left-most derivation, a top-down parsersLeft-most derivation, a top-down parsers lmlm ,, lmlm
++, , lmlm** A sentential form produced via a leftmost A sentential form produced via a leftmost
derivation sequence is called a left sentential derivation sequence is called a left sentential form.form.
E.g. of leftmost derivation of F(V+V)E.g. of leftmost derivation of F(V+V)
EPrefix(E)EV TailPrefixFPrefixTailTail
G0
Elm Prefix(E)
lm F(E)
lm F(V Tail)
lm F(V+E)
lm F(V+V Tail)
lm F(V+V)
8
Context-Free Grammars: Concepts and Context-Free Grammars: Concepts and Notation Notation (Cont’d.)(Cont’d.)
++, , rmrm** Bottom-up parsersBottom-up parsers A sentential form produced via a rightmost derivation A sentential form produced via a rightmost derivation
sequence is called a right sentential form.sequence is called a right sentential form. E.g. of rightmost derivation of F(V+V)E.g. of rightmost derivation of F(V+V)
EPrefix(E)EV TailPrefixFPrefixTailTail
G0
Erm Prefix(E)
rm Prefix(V Tail)
rm Prefix(V+E)
rm Prefix(V+V Tail)
rm Prefix(V+V)
rm F(V+V)
Same # of steps, but different order
9
Context-Free Grammars: Concepts and Context-Free Grammars: Concepts and Notation Notation (Cont’d.)(Cont’d.)
A parse treeA parse tree Rooted by the start symbolRooted by the start symbol Its leaves are grammar symbols Its leaves are grammar symbols
or or
10
Context-Free Grammars: Concepts and Context-Free Grammars: Concepts and Notation Notation (Cont’d.)(Cont’d.)
AA phasephase of a sentential form is of a sentential form is a sequence of symbols a sequence of symbols descended from a single descended from a single nonterminal in the parse tree.nonterminal in the parse tree.
Simple or prime phraseSimple or prime phrase A simple phrase is a sequence A simple phrase is a sequence
of symbols directly derived form of symbols directly derived form a nonterminal.a nonterminal.
The The handlehandle of a sentential of a sentential form is the left-most simple form is the left-most simple phrase.phrase.
11
Errors in Context-Free GrammarsErrors in Context-Free Grammars
CFGs are a definitional mechanism. They CFGs are a definitional mechanism. They may have errors, just as programs may.may have errors, just as programs may.
UnreachableUnreachable Derive no terminal stringDerive no terminal string
SSA|BA|BAAaaBBBbBbCCcc
Nonterminal C cannot be reached form SNonterminal C cannot be reached form SNonterminal B derives no terminal stringNonterminal B derives no terminal string
S is the start symbol.S is the start symbol. Do exercise 7.Do exercise 7.
12
Errors in Context-Free Grammars Errors in Context-Free Grammars (Cont’d.)(Cont’d.)
AmbiguousAmbiguous Grammars that allow different parse trees for the Grammars that allow different parse trees for the
same terminal stringsame terminal string It is It is impossibleimpossible to decide whether a given to decide whether a given
CFG is ambiguousCFG is ambiguous
13
Errors in Context-Free Grammars Errors in Context-Free Grammars (Cont’d.)(Cont’d.)
It is impossible to decide whether a given It is impossible to decide whether a given CFG is ambiguousCFG is ambiguous For certain grammar classes, we can prove that For certain grammar classes, we can prove that
constituent grammars are unambiguousconstituent grammars are unambiguous Wrong languageWrong language A general comparison algorithm applicable A general comparison algorithm applicable
to all CFGs is known to be impossibleto all CFGs is known to be impossible
Extended BNF allows Extended BNF allows Square bracket []Square bracket [] Optional list {}Optional list {}
15
Parsers and RecognizersParsers and Recognizers
RecognizerRecognizer An algorithm that does Boolean-valued testAn algorithm that does Boolean-valued test
Is this input syntactically valid?Is this input syntactically valid?
ParserParser Answers more general questionsAnswers more general questions
Is this input valid?Is this input valid? And, if it is, what is its structure (parse tree)?And, if it is, what is its structure (parse tree)?
16
Parsers and Recognizers Parsers and Recognizers (Cont’d.)(Cont’d.)
Two general approaches to parsingTwo general approaches to parsing Top-down parserTop-down parser
Expanding the parse tree (via predictions) in a Expanding the parse tree (via predictions) in a depth-first mannerdepth-first manner
Preorder traversal of the parse treePreorder traversal of the parse tree PredictivePredictive in nature in nature lm lm LL parser, recursive descentLL parser, recursive descent
17
Parsers and Recognizers Parsers and Recognizers (Cont’d.)(Cont’d.)
Bottom-up parserBottom-up parser Beginning at its bottom (the leaves of the tree, Beginning at its bottom (the leaves of the tree,
which are terminal symbols) and determining the which are terminal symbols) and determining the productions used to generate the leavesproductions used to generate the leaves
Postorder traversal of the parse treePostorder traversal of the parse tree rmrm LR parser, shift-reduce parserLR parser, shift-reduce parser
18
Parsers and Recognizers Parsers and Recognizers (Cont’d.)(Cont’d.)
To parseTo parse begin SimpleStmt; SimpleStmt; end $
19
20
21
Parsers and Recognizers Parsers and Recognizers (Cont’d.)(Cont’d.)
Naming of parsing techniquesNaming of parsing techniques Top-downTop-down
LLLL Bottom-upBottom-up
LRLR
The way to parse The way to parse token sequencetoken sequence
First(First()) The set of all the terminal symbols that can The set of all the terminal symbols that can
begin a sentential form derivable from begin a sentential form derivable from If If is the right-hand side of a production, then is the right-hand side of a production, then
First(First() contains terminal symbols that begin ) contains terminal symbols that begin strings derivable from strings derivable from
First(First()={a)={aVVtt| | * * aa}}
{if {if * * then { then {} else } else }}
27
根據定義 , FIRST(X) 集合之計算可依下列三步驟而得 :
1. If XT, then FIRST(X) = {X}. 2. If XN, X→, then add to FIRST(X). 3. If XN, and X → Y1 Y2 . . . Yn, then add all
non- elements of FIRST(Y1) to FIRST(X), if FIRST(Y1), then add all non- elements of FIRST(Y2) to FIRST(X), ..., if FIRST(Yn), then add to FIRST(X).
28
文法 文法 G G 定義如下定義如下 ::
E E TE’ TE’
E’ E’ +TE’ | +TE’ |
T T FT’ FT’
T’T’ *FT’ | *FT’ |
F F (E) | id (E) | id
則其 則其 FIRST FIRST 求解如下求解如下 :: FIRSTFIRSTEE (( ididE’E’ ++ TT (( ididT’T’ ** FF (( idid
29
Follow(A)Follow(A) A is any nonterminalA is any nonterminal Follow(A) is the set of terminals that may follow A Follow(A) is the set of terminals that may follow A