Top Banner
Functional Design and Programming Lecture 9: Lexical analysis and parsing
23

Functional Design and Programming Lecture 9: Lexical analysis and parsing.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Functional Design and Programming Lecture 9: Lexical analysis and parsing.

Functional Design and Programming

Lecture 9:

Lexical analysis and parsing

Page 2: Functional Design and Programming Lecture 9: Lexical analysis and parsing.

Literature

Paulson, chap. 9: Lexical analysis (9.1) Functional parsing (9.2-9.4)

Page 3: Functional Design and Programming Lecture 9: Lexical analysis and parsing.

Exercises

Paulson, chap. 9: 9.1-9.2 9.3-9.6, 9.8

Write a parser for XML elements (see home page)

.

Page 4: Functional Design and Programming Lecture 9: Lexical analysis and parsing.

Parsing/Unparsing

Purpose: Encoding/decoding structured data into flat (string) representations

Reasons: Data read (and written) using operating system

routines (“read 25 bytes from file XYZ”). Need for universal format for all kinds of data;

e.g., to allow editing with text editor.

Page 5: Functional Design and Programming Lecture 9: Lexical analysis and parsing.

Language processor architecture

scanner

parser

transformer(s)

unparser

character stream

token stream

abstract syntax tree

abstract syntax tree

character stream

“<H1 > My title</ H1>”

[LANGLE, ID “H1”, RANGLE, ID “ My title”, LSLASH, ID “ H1”, RANGLE]

element

stag contents etag

“H1” “ My title” “H1”

“<H1> MY TITLE </H1>”

“MY TITLE”.... ...

Page 6: Functional Design and Programming Lecture 9: Lexical analysis and parsing.

Lexical analysis (Scanning, lexing, tokenizing)

Purpose: Turning a character stream into a stream of tokens.

Reasons: Making parsing easier by taking care of ‘low-level’

concerns such as eliminating whitespace. Efficient preprocessing and compression of input to parser. Unbounded lookahead into input stream (in contrast to

most parsers) Well-founded theoretical basis and tool support (regular

expressions and finite state machines).

Page 7: Functional Design and Programming Lecture 9: Lexical analysis and parsing.

Context-free Grammars (CFGs)

A context-free grammar G describes a language (set of strings)

G = (T, N, P, S) where T: set of terminal symbols N: set of nonterminal symbols P: set of productions S: start symbol (a particular nonterminal symbol)

Page 8: Functional Design and Programming Lecture 9: Lexical analysis and parsing.

CFGs: Example

T = { +, -, *, /, (, ), Var, Const }N = { Exp, Term, Factor }S = Exp

Exp ::= Exp + Term | Exp - Term | TermTerm :: = Term * Factor | Term / Factor | FactorFactor ::= Var | Const | ( Exp )

Page 9: Functional Design and Programming Lecture 9: Lexical analysis and parsing.

[Var, +, Var, /, Const, -, Var, *, Var]

CFG’s: Example...

“x + y / 15 - x * x”

Factor Factor

Term

Term

Factor

Term

Exp

Exp

Factor Factor

Term

Term

Exp

Page 10: Functional Design and Programming Lecture 9: Lexical analysis and parsing.

Parsing

Purpose: Turning a stream of tokens into a tree structure expressed by grammar

Reasons: Checking that input is well-formed (according to

given grammar) Producing parse tree or abstract syntax tree to

recover tree structure in input Processing parse tree according to grammar

Page 11: Functional Design and Programming Lecture 9: Lexical analysis and parsing.

Parsing combinators

Idea: For each terminal or nonterminal M there is a function: fM : token list -> T * token list (= T phrase)

such that fM takes elements from its argument until it has reduced the elements to M

and then produces a value of type T for it.

Page 12: Functional Design and Programming Lecture 9: Lexical analysis and parsing.

Parsing primitives

Terminals: Var: string phrase Const: int phrase $: string -> string phrase (for keywords)

Page 13: Functional Design and Programming Lecture 9: Lexical analysis and parsing.

Parsing primitives...

Parsing combinators: empty: (‘a list) phrase ||: ‘a phrase * ‘a phrase -> ‘a phrase --: ‘a phrase * ‘b phrase -> (‘a * ‘b) phrase >>: ‘a phrase * (‘a -> ‘b) -> ‘b phrase

Derived combinators: repeat: ‘a phrase -> ‘a list phrase $--: ‘a phrase * ‘b phrase -> ‘b phrase --$: ‘a phrase * ‘b phrase -> ‘a phrase

Page 14: Functional Design and Programming Lecture 9: Lexical analysis and parsing.

Parsing precedences

infix 6 $-- --$

infix 5 --

infix 3 >>

infix 0 ||

Page 15: Functional Design and Programming Lecture 9: Lexical analysis and parsing.

Problems with combinatory parsers

Left-recursion: Problem: Left-recursive grammars make parsers go into

an infinite loop. Remedy: Transform grammar to eliminate left-recursion

Mutual recursion: Problem (SML-specific!): Cannot use val-declaration

and combinator applications only. Remedy: Use fun-declarations for mutually recursive

parts of a grammar

Page 16: Functional Design and Programming Lecture 9: Lexical analysis and parsing.

Parsing problems...

Example grammar is left-recursive:Exp ::= Exp ‘+’ Term | Exp ‘-’ Term | TermTerm :: = Term ‘*’ Factor | Term ‘/’ Factor | FactorFactor ::= Var | Const | ‘(’ Exp ‘)’

Eliminate left-recursion:Binop1 ::= ‘+’ | ‘-’

Binop2 ::= ‘*’ | ‘/’Factor ::= Var | Const | ‘(’ Exp ‘)’

Term ::= Factor (Binop2 Factor)*

Exp ::= Term (Binop1 Term)*

Page 17: Functional Design and Programming Lecture 9: Lexical analysis and parsing.

Data type for abstract syntax trees

type binop = string

datatype expAST =

EXP of termAST * (binop * termAST) list

and termAST =

TERM of factorAST * (binop * factorAST) list

and factorAST =

VAR of string

| CONST of int

| PARENEXP of expAST

Page 18: Functional Design and Programming Lecture 9: Lexical analysis and parsing.

Parser: example (first try)

val binop1 = $”+” || $”-”

val binop2 = $”*” | $”/”val factor = Var >> VAR || Const >> CONST o Int.fromString || $”(” $-- exp --$ $”)” >> PARENEXPval term = factor -– repeat (binop2 -- factor) >> TERM

val exp = term –- repeat (binop1 term) >> EXP

PROBLEM: Doesn’t work! These definitions are intended to be mutually recursive, but are not!

Page 19: Functional Design and Programming Lecture 9: Lexical analysis and parsing.

Parser: example (second try)

val binop1 = $”+” || $”-”

val binop2 = $”*” | $”/”fun factor toks = ( Var >> VAR || Const >> CONST || $”(” $-- exp --$ $”)” ) toksand term toks = (factor -– repeat (binop2 -- factor)) toks

and exp toks =

(term -– repeat (binop1 term)) toks

Page 20: Functional Design and Programming Lecture 9: Lexical analysis and parsing.

Operator precedence parsing (overview)

When processing operator expressions, a parser has to decide whether to reduce (stop the current phrase parser and return its result) or shift (continue the current phrase parse)

Operator precedence parsing: Associate a precedence (binding strength) with each operator, remember the the precedence of the last operator processed and determine whether to reduce or shift depending on the precedence of the next operator.

See Paulson, pp. 364-366

Page 21: Functional Design and Programming Lecture 9: Lexical analysis and parsing.

Backtracking parsing (overview)

There may be more than one of parsing an expression.

Backtracking parsing: Construct a lazy list of all possible parses of a token stream. Continue parse with first of those and find a complete parse for the whole token stream; if that fails, backtrack to second in the list and repeat.

See Paulson, pp. 366-367

Page 22: Functional Design and Programming Lecture 9: Lexical analysis and parsing.

Recursive-descent parsing (overview)

Write one parser for each grammatical category (as in combinatory parsing)

Process token stream as in combinatory parsers, excepting alternatives.

Process alternatives as follows: Look at next token (first token of remaining

token stream). Choose phrase parser on the basis of that token.

Page 23: Functional Design and Programming Lecture 9: Lexical analysis and parsing.

LL-parsing and LR-parsing (overview)

Use tools to generate parsers from grammar specifications.

Produces a table that guides a push-down automaton through parsing actions (“shift”, “reduce”)

LL-parsing: Predictive (basically recursive descent parsing in table-driven form)

LR-parsing (incl. SLR- and LALR-parsing): (Virtual) parallel execution of phrase parsers.

Problems: Lookahead bounded in practice, at times unwieldy.