Top Banner
Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion Approximating Context-Free Grammars for Parsing and Verification Sylvain Schmitz LORIA, INRIA Nancy - Grand Est October 18, 2007
111

Approximating Context-Free Grammars for Parsing and ...

Jan 27, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion

Approximating Context-FreeGrammars

for Parsing and Verification

Sylvain Schmitz

LORIA, INRIA Nancy - Grand Est

October 18, 2007

Page 2: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionA Syntax Issue

Standard MLMilner et al. [1997]

datatype ’a option = NONE | SOME of ’a

fun filter pred l = let fun filterP (x::r, l) = case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l) | filterP ([], l) = rev l in filterP (l, []) end

/** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0−262−63181−4. */

%token CASE "case"%token FUN "fun"%token MATCH "=>"%token OF "of"%token VID%start dec

%%

dec : "fun" fvalbind ;

fvalbind : sfvalbind | fvalbind ’|’ sfvalbind ;

sfvalbind : VID atpats ’=’ exp ;

exp : VID | "case" exp "of" match ;

match : mrule | match ’|’ mrule ;

mrule : pat "=>" exp ;

atpats : atpat | atpats atpat ;

atpat : VID ;

pat : VID atpat ;

%%

Executablecode

Programtext

Compiler

Language Specification

I MLtonI Moscow MLI Poly/MLI SML/NJ

Page 3: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionA Syntax Issue

Standard MLMilner et al. [1997]

datatype ’a option = NONE | SOME of ’a

fun filter pred l = let fun filterP (x::r, l) = case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l) | filterP ([], l) = rev l in filterP (l, []) end

datatype ’a option = NONE | SOME of ’a

fun filter pred l = let fun filterP (x::r, l) = case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l) | filterP ([], l) = rev l in filterP (l, []) end Executable

codeProgram

text

I MLtonI Moscow MLI Poly/MLI SML/NJ

Page 4: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionA Syntax Issue

Standard MLMilner et al. [1997]

datatype ’a option = NONE | SOME of ’a

fun filter pred l = let fun filterP (x::r, l) = case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l) | filterP ([], l) = rev l in filterP (l, []) end

datatype ’a option = NONE | SOME of ’a

fun filter pred l = let fun filterP (x::r, l) = case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l) | filterP ([], l) = rev l in filterP (l, []) end Executable

codeProgram

text

I MLtonI Moscow MLI Poly/MLI SML/NJ

Error: match.sml 9.25.

Syntax error: replacing EQUALOP with DARROW

Page 5: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionA Syntax Issue

Standard MLMilner et al. [1997]

datatype ’a option = NONE | SOME of ’a

fun filter pred l = let fun filterP (x::r, l) = case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l) | filterP ([], l) = rev l in filterP (l, []) end

datatype ’a option = NONE | SOME of ’a

fun filter pred l = let fun filterP (x::r, l) = case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l) | filterP ([], l) = rev l in filterP (l, []) end Executable

codeProgram

text

I MLtonI Moscow MLI Poly/MLI SML/NJ

! Toplevel input:

! | filterP ([], l) = rev l

! ˆ

! Syntax error.

Page 6: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionA Syntax Issue

Standard MLMilner et al. [1997]

datatype ’a option = NONE | SOME of ’a

fun filter pred l = let fun filterP (x::r, l) = case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l) | filterP ([], l) = rev l in filterP (l, []) end

datatype ’a option = NONE | SOME of ’a

fun filter pred l = let fun filterP (x::r, l) = case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l) | filterP ([], l) = rev l in filterP (l, []) end Executable

codeProgram

text

I MLtonI Moscow MLI Poly/MLI SML/NJ

Error: => expected but = was found

Page 7: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionA Syntax Issue

Standard MLMilner et al. [1997]

datatype ’a option = NONE | SOME of ’a

fun filter pred l = let fun filterP (x::r, l) = case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l) | filterP ([], l) = rev l in filterP (l, []) end

datatype ’a option = NONE | SOME of ’a

fun filter pred l = let fun filterP (x::r, l) = case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l) | filterP ([], l) = rev l in filterP (l, []) end Executable

codeProgram

text

I MLtonI Moscow MLI Poly/MLI SML/NJ

stdIn:7.24-7.29 Error: syntax error:

deleting EQUALOP ID

Page 8: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionA Syntax Issue

Parsers

datatype ’a option = NONE | SOME of ’a

fun filter pred l = let fun filterP (x::r, l) = case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l) | filterP ([], l) = rev l in filterP (l, []) end

/** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0−262−63181−4. */

%token CASE "case"%token FUN "fun"%token MATCH "=>"%token OF "of"%token VID%start dec

%%

dec : "fun" fvalbind ;

fvalbind : sfvalbind | fvalbind ’|’ sfvalbind ;

sfvalbind : VID atpats ’=’ exp ;

exp : VID | "case" exp "of" match ;

match : mrule | match ’|’ mrule ;

mrule : pat "=>" exp ;

atpats : atpat | atpats atpat ;

atpat : VID ;

pat : VID atpat ;

%%

Compiler

Language Specification

Executablecode

Programtext

Page 9: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionA Syntax Issue

Parsers

datatype ’a option = NONE | SOME of ’a

fun filter pred l = let fun filterP (x::r, l) = case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l) | filterP ([], l) = rev l in filterP (l, []) end

/** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0−262−63181−4. */

%token CASE "case"%token FUN "fun"%token MATCH "=>"%token OF "of"%token VID%start dec

%%

dec : "fun" fvalbind ;

fvalbind : sfvalbind | fvalbind ’|’ sfvalbind ;

sfvalbind : VID atpats ’=’ exp ;

exp : VID | "case" exp "of" match ;

match : mrule | match ’|’ mrule ;

mrule : pat "=>" exp ;

atpats : atpat | atpats atpat ;

atpat : VID ;

pat : VID atpat ;

%%

Programtext

Executablecode

Language Specification

Compiler

Parser

Page 10: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionA Syntax Issue

Parsers/** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0−262−63181−4. */

%token CASE "case"%token FUN "fun"%token MATCH "=>"%token OF "of"%token VID%start dec

%%

dec : "fun" fvalbind ;

fvalbind : sfvalbind | fvalbind ’|’ sfvalbind ;

sfvalbind : VID atpats ’=’ exp ;

exp : VID | "case" exp "of" match ;

match : mrule | match ’|’ mrule ;

mrule : pat "=>" exp ;

atpats : atpat | atpats atpat ;

atpat : VID ;

pat : VID atpat ;

%%

/** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0−262−63181−4. */

%token CASE "case"%token FUN "fun"%token MATCH "=>"%token OF "of"%token VID%start dec

%%

dec : "fun" fvalbind ;

fvalbind : sfvalbind | fvalbind ’|’ sfvalbind ;

sfvalbind : VID atpats ’=’ exp ;

exp : VID | "case" exp "of" match ;

match : mrule | match ’|’ mrule ;

mrule : pat "=>" exp ;

atpats : atpat | atpats atpat ;

atpat : VID ;

pat : VID atpat ;

%%

Parser

Context-freegrammar

Page 11: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionA Syntax Issue

Parsers/** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0−262−63181−4. */

%token CASE "case"%token FUN "fun"%token MATCH "=>"%token OF "of"%token VID%start dec

%%

dec : "fun" fvalbind ;

fvalbind : sfvalbind | fvalbind ’|’ sfvalbind ;

sfvalbind : VID atpats ’=’ exp ;

exp : VID | "case" exp "of" match ;

match : mrule | match ’|’ mrule ;

mrule : pat "=>" exp ;

atpats : atpat | atpats atpat ;

atpat : VID ;

pat : VID atpat ;

%%

〈fvalbind〉

〈sfvalbind〉

〈atpats〉

| NONE => filterP(r, l) | filterP ([], l ) = rev l

〈exp〉

. . .

〈mrule〉

〈pat〉

〈match〉

〈exp〉

〈sfvalbind〉

〈fvalbind〉

〈exp〉

|N

ON

E=>

filterP

(r,l)|

filterP

([],l)=

revl

...

Context-freegrammar

Parsergenerator

Parsetree

Inputtokens

Parser

Page 12: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionA Syntax Issue

Parsers/** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0−262−63181−4. */

%token CASE "case"%token FUN "fun"%token MATCH "=>"%token OF "of"%token VID%start dec

%%

dec : "fun" fvalbind ;

fvalbind : sfvalbind | fvalbind ’|’ sfvalbind ;

sfvalbind : VID atpats ’=’ exp ;

exp : VID | "case" exp "of" match ;

match : mrule | match ’|’ mrule ;

mrule : pat "=>" exp ;

atpats : atpat | atpats atpat ;

atpat : VID ;

pat : VID atpat ;

%%

〈fvalbind〉

〈sfvalbind〉

〈atpats〉

| NONE => filterP(r, l) | filterP ([], l ) = rev l

〈exp〉

. . .

〈mrule〉

〈pat〉

〈match〉

〈exp〉

〈sfvalbind〉

〈fvalbind〉

〈exp〉

|N

ON

E=>

filterP

(r,l)|

filterP

([],l)=

revl

...

Parser

Parsetree

Inputtokens

Parsergenerator

Context-freegrammar

Page 13: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Conflicts

LALR(1) Parser Generator

I GNU Bisonstate 20

6 exp: "case" exp "of" match .

8 match: match . ’|’ mrule

’|’ shift, and go to state 24

’|’ [reduce using rule 6 (exp)]

I Restricted grammar class

CFG

LALR(1)

Page 14: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Conflicts

LALR(1) Parser Generator

I GNU Bisonstate 20

6 exp: "case" exp "of" match .

8 match: match . ’|’ mrule

’|’ shift, and go to state 24

’|’ [reduce using rule 6 (exp)]

I Restricted grammar class

CFG

LALR(1)

Page 15: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Conflicts

Dealing with ConflictsAn Objective Measure [Malloy et al., 2002] on a C# Grammar

0

100

200

300

400

500

600

700

2 4 6 8 10 12 14 16 18 20

LALR

(1)

conf

licts

Parser versions

’2002_malloy.data’ using 1:($2+$3)

Page 16: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Conflicts

Dealing with ConflictsA Subjective Measure

Courtesy of http://www.phdcomics.com .

Page 17: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Conflicts

Dealing with ConflictsA Subjective Measure

Courtesy of http://www.phdcomics.com .

Page 18: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Conflicts

Dealing with ConflictsA Subjective Measure

Courtesy of http://www.phdcomics.com .

Page 19: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionSolutions

State of the Art

I LR(k) [Knuth, 1965]

I LR-Regular [Culik andCohen, 1973]

I Generalized LR [Tomita,1986]

I Unambiguous CFGs[Cantor, 1962, Chomskyand Schutzenberger, 1963]

I Horizontal and verticalunambiguity test[Brabrand et al., 2007]

LALR(1)

LR(k)

CFG

Page 20: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionSolutions

State of the Art

I LR(k) [Knuth, 1965]

I LR-Regular [Culik andCohen, 1973]

I Generalized LR [Tomita,1986]

I Unambiguous CFGs[Cantor, 1962, Chomskyand Schutzenberger, 1963]

I Horizontal and verticalunambiguity test[Brabrand et al., 2007]

CFG

LR-Regular

LR(k)

LALR(1)

Page 21: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionSolutions

State of the Art

I LR(k) [Knuth, 1965]

I LR-Regular [Culik andCohen, 1973]

I Generalized LR [Tomita,1986]

I Unambiguous CFGs[Cantor, 1962, Chomskyand Schutzenberger, 1963]

I Horizontal and verticalunambiguity test[Brabrand et al., 2007]

CFG

Page 22: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionSolutions

Ambiguity/** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0−262−63181−4. */

%token CASE "case"%token FUN "fun"%token MATCH "=>"%token OF "of"%token VID%start dec

%%

dec : "fun" fvalbind ;

fvalbind : sfvalbind | fvalbind ’|’ sfvalbind ;

sfvalbind : VID atpats ’=’ exp ;

exp : VID | "case" exp "of" match ;

match : mrule | match ’|’ mrule ;

mrule : pat "=>" exp ;

atpats : atpat | atpats atpat ;

atpat : VID ;

pat : VID atpat ;

%%

〈exp〉

〈exp〉

〈match〉

〈mrule〉〈mrule〉

〈match〉

〈exp〉

〈mrule〉

〈exp〉

〈match〉

〈pat〉

〈exp〉 〈pat〉 〈exp〉 〈mrule〉

〈match〉

〈mrule〉

〈match〉

〈exp〉

〈match〉

〈mrule〉

〈exp〉

ca

se

ao

fb=>

ca

se

bo

fc=>

c|

d=>

d

case a of b => case b of c => c | d=> d

case a of b => case b of c => c | d=> d

Context-freegrammar

Parsergenerator

Inputtokens

Parseforest

Parser

Page 23: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionSolutions

Ambiguity/** * smlfvalbind.y * * Standard ML function declarations. * See _The_definition_of_standard_ML_, Milner et al., 1997, * ISBN 0−262−63181−4. */

%token CASE "case"%token FUN "fun"%token MATCH "=>"%token OF "of"%token VID%start dec

%%

dec : "fun" fvalbind ;

fvalbind : sfvalbind | fvalbind ’|’ sfvalbind ;

sfvalbind : VID atpats ’=’ exp ;

exp : VID | "case" exp "of" match ;

match : mrule | match ’|’ mrule ;

mrule : pat "=>" exp ;

atpats : atpat | atpats atpat ;

atpat : VID ;

pat : VID atpat ;

%%

〈exp〉

〈exp〉

〈match〉

〈mrule〉〈mrule〉

〈match〉

〈exp〉

〈mrule〉

〈exp〉

〈match〉

〈pat〉

〈exp〉 〈pat〉 〈exp〉 〈mrule〉

〈match〉

〈mrule〉

〈match〉

〈exp〉

〈match〉

〈mrule〉

〈exp〉

case a of b => case b of c => c | d=> d

case a of b => case b of c => c | d=> d

ca

se

ao

fb=>

ca

se

bo

fc=>

c|

d=>

d

Context-freegrammar

Parsergenerator

Parseforest

Inputtokens

Parser

Page 24: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionSolutions

State of the Art

I LR(k) [Knuth, 1965]

I LR-Regular [Culik andCohen, 1973]

I Generalized LR [Tomita,1986]

I Unambiguous CFGs[Cantor, 1962, Chomskyand Schutzenberger, 1963]

I Horizontal and verticalunambiguity test[Brabrand et al., 2007]

CFG

UCFG

Page 25: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionSolutions

State of the Art

I LR(k) [Knuth, 1965]

I LR-Regular [Culik andCohen, 1973]

I Generalized LR [Tomita,1986]

I Unambiguous CFGs[Cantor, 1962, Chomskyand Schutzenberger, 1963]

I Horizontal and verticalunambiguity test[Brabrand et al., 2007]

CFG

UCFG

LR-Regular

LR(k)

LALR(1)

Safe

Unsafe

Page 26: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionSolutions

State of the Art

I LR(k) [Knuth, 1965]

I LR-Regular [Culik andCohen, 1973]

I Generalized LR [Tomita,1986]

I Unambiguous CFGs[Cantor, 1962, Chomskyand Schutzenberger, 1963]

I Horizontal and verticalunambiguity test[Brabrand et al., 2007]

CFG

UnsafeHVRU UCFG

Safe

Page 27: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionSolutions

State of the Art

I LR(k) [Knuth, 1965]

I LR-Regular [Culik andCohen, 1973]

I Generalized LR [Tomita,1986]

I Unambiguous CFGs[Cantor, 1962, Chomskyand Schutzenberger, 1963]

I Horizontal and verticalunambiguity test[Brabrand et al., 2007]

LALR(1)

LR-Regular

CFG

UnsafeHVRU

Safe

LR(k)

UCFG

Page 28: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionSolutions

Contributions

I Noncanonical parsingmethods [Szymanski andWilliams, 1976, Tai, 1979]

I Noncanonical LALR(1)

I Shift-Resolve

I Noncanonicalunambiguity test

I Framework for grammarapproximations

Page 29: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionSolutions

Contributions

I Noncanonical parsingmethods [Szymanski andWilliams, 1976, Tai, 1979]

I Noncanonical LALR(1)

I Shift-Resolve

I Noncanonicalunambiguity test

I Framework for grammarapproximations

CFG

UCFG

LALR(1)

NLALR(1)

LR(k)

LR-Regular

Page 30: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionSolutions

Contributions

I Noncanonical parsingmethods [Szymanski andWilliams, 1976, Tai, 1979]

I Noncanonical LALR(1)

I Shift-Resolve

I Noncanonicalunambiguity test

I Framework for grammarapproximations

LALR(1)

ShRe

LR(k)

LR-Regular

UCFG

CFG

Page 31: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionSolutions

Contributions

I Noncanonical parsingmethods [Szymanski andWilliams, 1976, Tai, 1979]

I Noncanonical LALR(1)

I Shift-Resolve

I Noncanonicalunambiguity test

I Framework for grammarapproximations

LALR(1)

LR(k)

LR-RegularNU

HVRU UCFG

CFG

Page 32: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionSolutions

Contributions

I Noncanonical parsingmethods [Szymanski andWilliams, 1976, Tai, 1979]

I Noncanonical LALR(1)

I Shift-Resolve

I Noncanonicalunambiguity test

I Framework for grammarapproximations

LALR(1)

LR(k)

LR-RegularNU

HVRU UCFG

CFG

Page 33: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Approximations

Bracketed Grammars

G = 〈N, T , P, S〉, V = N ∪ T

〈dec〉 1−→ fun 〈fvalbind〉〈fvalbind〉 2−→ 〈sfvalbind〉〈fvalbind〉 3−→ 〈fvalbind〉 ′| ′ 〈sfvalbind〉〈sfvalbind〉 4−→ vid 〈atpats〉 = 〈exp〉

〈exp〉 5−→ case 〈exp〉 of 〈match〉〈match〉 6−→ 〈mrule〉〈match〉 7−→ 〈match〉 ′| ′ 〈mrule〉〈mrule〉 8−→ 〈pat〉 => 〈exp〉〈atpats〉 9−→ 〈atpat〉〈atpats〉 10−→ 〈atpats〉 〈atpat〉〈pat〉 11−→ vid 〈atpat〉

〈atpat〉 12−→ vid

Page 34: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Approximations

Bracketed Grammars

Gb = 〈N, Tb, Pb, S〉, Vb = N ∪ Tb

〈dec〉 1−→ d1 fun 〈fvalbind〉 r1

〈fvalbind〉 2−→ d2 〈sfvalbind〉 r2

〈fvalbind〉 3−→ d3 〈fvalbind〉 ′| ′ 〈sfvalbind〉 r3

〈sfvalbind〉 4−→ d4 vid 〈atpats〉 = 〈exp〉 r4

〈exp〉 5−→ d5 case 〈exp〉 of 〈match〉 r5

〈match〉 6−→ d6 〈mrule〉 r6

〈match〉 7−→ d7 〈match〉 ′| ′ 〈mrule〉 r7

〈mrule〉 8−→ d8 〈pat〉 => 〈exp〉 r8

〈atpats〉 9−→ d9 〈atpat〉 r9

〈atpats〉 10−→ d10 〈atpats〉 〈atpat〉 r10

〈pat〉 11−→ d11 vid 〈atpat〉 r11

〈atpat〉 12−→ d12 vid r12

Page 35: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Approximations

Positions

〈fvalbind〉

〈sfvalbind〉

〈fvalbind〉 〈sfvalbind〉’|’

〈exp〉=vid 〈atpats〉

d3 d2 〈sfvalbind〉 r2′| ′ ·d4 vid 〈atpats〉 = 〈exp〉 r4 r3

Page 36: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Approximations

Position Graph ΓLeft-to-right Walks in Trees

〈fvalbind〉

〈sfvalbind〉

〈fvalbind〉 〈sfvalbind〉’|’

〈exp〉=vid 〈atpats〉

d4

d3 d2 〈sfvalbind〉 r2′| ′ d4· vid 〈atpats〉 = 〈exp〉 r4 r3

Page 37: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Approximations

Position Graph ΓLeft-to-right Walks in Trees

〈fvalbind〉

〈sfvalbind〉

〈fvalbind〉 〈sfvalbind〉’|’

〈exp〉=vid 〈atpats〉

〈sfvalbind〉

d3 d2 〈sfvalbind〉 r2′| ′ d4 vid 〈atpats〉 = 〈exp〉 r4· r3

Page 38: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Approximations

Position Graph ΓLeft-to-right Walks in Trees

〈fvalbind〉

〈sfvalbind〉

〈fvalbind〉 〈sfvalbind〉’|’

〈exp〉=vid 〈atpats〉

r3

d3 d2 〈sfvalbind〉 r2′| ′ d4 vid 〈atpats〉 = 〈exp〉 r4 r3·

Page 39: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Approximations

Position Graph ΓLeft-to-right Walks in Trees

〈fvalbind〉

〈sfvalbind〉

〈fvalbind〉 〈sfvalbind〉’|’

〈exp〉=vid 〈atpats〉

r3d3

r4

d4

r2d2

. . .

. . . . . . . . .

Page 40: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Approximations

Position Automaton Γ/≡

DefinitionΓ/≡ is the quotient of Γ by an equivalence relation ≡between positions.

Theorem (Language over-approximation)

L(Gb) ⊆ L(Γ/≡) ∩ T ∗b

Page 41: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Approximations

Example: item0 Equivalence〈fvalbind〉

〈sfvalbind〉

〈fvalbind〉 〈sfvalbind〉’|’

r3d3

r2d2

d4

r4

= 〈exp〉

d4= 〈exp〉

r4

vid 〈atpats〉

vid 〈atpats〉

I equivalence class[〈sfvalbind〉 4−→vid 〈atpats〉· = 〈exp〉]

I LR(0) itemsI Γ/item0: nondeterministic LR(0) automaton

Page 42: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Approximations

Example: item0 Equivalence

[〈sfvalbind〉4−→·vid 〈atpats〉 = 〈exp〉]

d4

[〈sfvalbind〉4−→vid·〈atpats〉 = 〈exp〉]

vid

〈atpats〉

[〈sfvalbind〉4−→vid 〈atpats〉· = 〈exp〉]

=

[〈sfvalbind〉4−→vid 〈atpats〉 = ·〈exp〉]

[〈sfvalbind〉4−→vid 〈atpats〉 = 〈exp〉·]

〈exp〉

d4

[〈fvalbind〉2−→·〈sfvalbind〉]

[〈fvalbind〉3−→〈fvalbind〉 ′| ′ 〈sfvalbind〉·]

[〈fvalbind〉3−→〈fvalbind〉 ′| ′·〈sfvalbind〉]

[〈fvalbind〉2−→〈sfvalbind〉·]

r4r4

I equivalence class[〈sfvalbind〉 4−→vid 〈atpats〉· = 〈exp〉]

I LR(0) itemsI Γ/item0: nondeterministic LR(0) automaton

Page 43: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Approximations

Summary

I general framework for approximations

I applications:I parser construction

I ambiguity detection

I XML validation [Segoufin and Vianu, 2002]?

I symbolic supertagging [Boullier, 2003]?

Page 44: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionGrammar Approximations

Summary

I general framework for approximations

I applications:I parser construction

I ambiguity detection

I XML validation [Segoufin and Vianu, 2002]?

I symbolic supertagging [Boullier, 2003]?

Page 45: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParsing Principles

Shift-Resolve Parsing

I noncanonical

I k = 1 reduced lookahead symbol

I resolve = reduce + pushback: emulates abounded reduced lookahead without any presetbound

Page 46: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParsing Principles

Shift-Resolve Parsing

I noncanonical

I k = 1 reduced lookahead symbol

I resolve = reduce + pushback: emulates abounded reduced lookahead without any presetbound

Page 47: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParsing Example

Shift-Resolve Parse

| NONE => filterP(r, l) | filterP ([], l ) = rev l. . .

〈fvalbind〉

〈sfvalbind〉

〈exp〉〈atpats〉

〈exp〉

〈sfvalbind〉

〈fvalbind〉

Page 48: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParsing Example

Shift-Resolve Parse

| NONE => filterP(r, l) | filterP ([], l ) = rev l. . .

〈fvalbind〉

〈sfvalbind〉

〈exp〉〈atpats〉

〈exp〉

〈sfvalbind〉

〈fvalbind〉

〈match〉

〈mrule〉

〈pat〉 〈exp〉

Page 49: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParsing Example

Shift-Resolve Parse

| NONE => filterP(r, l) | filterP ([], l ) = rev l. . .

〈fvalbind〉

〈sfvalbind〉

〈exp〉〈atpats〉

〈exp〉

〈sfvalbind〉

〈fvalbind〉

〈match〉

〈mrule〉

〈pat〉 〈exp〉

Page 50: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParsing Example

Shift-Resolve Parse

| NONE => filterP(r, l) | filterP ([], l ) = rev l. . .

〈fvalbind〉

〈exp〉

〈sfvalbind〉

〈fvalbind〉

〈match〉

〈mrule〉

〈pat〉 〈exp〉 〈exp〉〈atpats〉

〈sfvalbind〉

Page 51: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParsing Example

Shift-Resolve Parse

| NONE => filterP(r, l) | filterP ([], l ) = rev l. . .

〈fvalbind〉

〈sfvalbind〉

〈fvalbind〉

〈mrule〉

〈pat〉 〈exp〉 〈atpats〉 〈exp〉

〈sfvalbind〉

〈match〉

〈exp〉

Page 52: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParsing Example

Shift-Resolve Parse

| NONE => filterP(r, l) | filterP ([], l ) = rev l. . .

〈mrule〉

〈pat〉 〈exp〉 〈atpats〉 〈exp〉

〈match〉

〈exp〉

〈sfvalbind〉

〈sfvalbind〉

〈fvalbind〉

〈fvalbind〉

Page 53: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction

Generating the Parser

1. position automaton

2. determinization by subset construction

Page 54: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction

Subset ConstructionPrinciple

I di transitions denote traditional item closures

I ri transitions denote a phrase that should bereduced

I other transitions denote shifts

I items in the construction hold1. a state of the position automaton2. a parsing action3. a pushback length

Page 55: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction

Subset ConstructionPrinciple

I di transitions denote traditional item closures

I ri transitions denote a phrase that should bereduced

I other transitions denote shifts

I items in the construction hold1. a state of the position automaton2. a parsing action3. a pushback length

Page 56: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction

Subset ConstructionExample

〈exp〉−→case 〈exp〉 of 〈match〉·, 0, 0

〈match〉−→〈match〉· ’|’ 〈mrule〉, 0, 0

Page 57: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction

Subset ConstructionExample

〈exp〉−→case 〈exp〉 of 〈match〉·, 0, 0

〈match〉−→〈match〉· ’|’ 〈mrule〉, 0, 0

〈sfvalbind〉−→vid 〈atpats〉 = 〈exp〉·, 5, 0

r5

Page 58: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction

Subset ConstructionExample

〈exp〉−→case 〈exp〉 of 〈match〉·, 0, 0

〈match〉−→〈match〉· ’|’ 〈mrule〉, 0, 0

〈sfvalbind〉−→vid 〈atpats〉 = 〈exp〉·, 5, 0

〈fvalbind〉−→〈fvalbind〉 ’|’ 〈sfvalbind〉·, 5, 0

〈fvalbind〉−→〈sfvalbind〉·, 5, 0

r4

Page 59: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction

Subset ConstructionExample

〈exp〉−→case 〈exp〉 of 〈match〉·, 0, 0

〈match〉−→〈match〉· ’|’ 〈mrule〉, 0, 0

〈sfvalbind〉−→vid 〈atpats〉 = 〈exp〉·, 5, 0

〈fvalbind〉−→〈fvalbind〉 ’|’ 〈sfvalbind〉·, 5, 0

〈fvalbind〉−→〈sfvalbind〉·, 5, 0

〈fvalbind〉−→〈fvalbind〉· ’|’ 〈sfvalbind〉, 5, 0

〈dec〉−→fun 〈fvalbind〉·, 5, 0

S′−→〈dec〉·$, 5, 0

Page 60: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction

Subset ConstructionExample

〈match〉−→〈match〉· ’|’ 〈mrule〉, 0, 0

〈sfvalbind〉−→vid 〈atpats〉 = 〈exp〉·, 5, 0

〈fvalbind〉−→〈fvalbind〉 ’|’ 〈sfvalbind〉·, 5, 0

〈fvalbind〉−→〈sfvalbind〉·, 5, 0

〈fvalbind〉−→〈fvalbind〉· ’|’ 〈sfvalbind〉, 5, 0

〈dec〉−→fun 〈fvalbind〉·, 5, 0

S′−→〈dec〉·$, 5, 0

〈exp〉−→case 〈exp〉 of 〈match〉·, 0, 0

Page 61: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction

Subset ConstructionExample

〈exp〉−→case 〈exp〉 of 〈match〉·, 0, 0

〈match〉−→〈match〉· ’|’ 〈mrule〉, 0, 0

〈sfvalbind〉−→vid 〈atpats〉 = 〈exp〉·, 5, 0

〈fvalbind〉−→〈fvalbind〉 ’|’ 〈sfvalbind〉·, 5, 0

〈fvalbind〉−→〈sfvalbind〉·, 5, 0

〈fvalbind〉−→〈fvalbind〉· ’|’ 〈sfvalbind〉, 5, 0

〈dec〉−→fun 〈fvalbind〉·, 5, 0

S′−→〈dec〉·$, 5, 0

〈fvalbind〉−→〈fvalbind〉 ’|’ ·〈sfvalbind〉, 5, 1

〈match〉−→〈match〉 ’|’ ·〈mrule〉, 0, 0

’|’

Page 62: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction

Subset ConstructionExample

〈exp〉−→case 〈exp〉 of 〈match〉·, 0, 0

〈match〉−→〈match〉· ’|’ 〈mrule〉, 0, 0

〈sfvalbind〉−→vid 〈atpats〉 = 〈exp〉·, 5, 0

〈fvalbind〉−→〈fvalbind〉 ’|’ 〈sfvalbind〉·, 5, 0

〈fvalbind〉−→〈sfvalbind〉·, 5, 0

〈fvalbind〉−→〈fvalbind〉· ’|’ 〈sfvalbind〉, 5, 0

〈dec〉−→fun 〈fvalbind〉·, 5, 0

S′−→〈dec〉·$, 5, 0

〈fvalbind〉−→〈fvalbind〉 ’|’ ·〈sfvalbind〉, 5, 1

〈match〉−→〈match〉 ’|’ ·〈mrule〉, 0, 0

’|’

〈mrule〉−→·〈pat〉 => 〈exp〉, 0, 0d8

Page 63: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction

Subset ConstructionExample

〈exp〉−→case 〈exp〉 of 〈match〉·, 0, 0

〈match〉−→〈match〉· ’|’ 〈mrule〉, 0, 0

〈sfvalbind〉−→vid 〈atpats〉 = 〈exp〉·, 5, 0

〈fvalbind〉−→〈fvalbind〉 ’|’ 〈sfvalbind〉·, 5, 0

〈fvalbind〉−→〈sfvalbind〉·, 5, 0

〈fvalbind〉−→〈fvalbind〉· ’|’ 〈sfvalbind〉, 5, 0

〈dec〉−→fun 〈fvalbind〉·, 5, 0

S′−→〈dec〉·$, 5, 0

〈fvalbind〉−→〈fvalbind〉 ’|’ ·〈sfvalbind〉, 5, 1

〈match〉−→〈match〉 ’|’ ·〈mrule〉, 0, 0

’|’

〈mrule〉−→·〈pat〉 => 〈exp〉, 0, 0

〈pat〉−→·vid 〈atpat〉, 0, 0

〈sfvalbind〉−→·vid 〈atpats〉 = 〈exp〉, 0, 0

Page 64: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction

Construction Failure〈exp〉−→case 〈exp〉 of 〈match〉·, 0, 0

〈match〉−→〈match〉· ’|’ 〈mrule〉, 0, 0

〈sfvalbind〉−→vid 〈atpats〉 = 〈exp〉·, 5, 0

〈fvalbind〉−→〈fvalbind〉 ’|’ 〈sfvalbind〉·, 5, 0

〈fvalbind〉−→〈sfvalbind〉·, 5, 0

〈fvalbind〉−→〈fvalbind〉· ’|’ 〈sfvalbind〉, 5, 0

〈dec〉−→fun 〈fvalbind〉·, 5, 0

S′−→〈dec〉·$, 5, 0

Page 65: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction

Construction Failure〈exp〉−→case 〈exp〉 of 〈match〉·, 0, 0

〈match〉−→〈match〉· ’|’ 〈mrule〉, 0, 0

〈sfvalbind〉−→vid 〈atpats〉 = 〈exp〉·, 5, 0

〈fvalbind〉−→〈fvalbind〉 ’|’ 〈sfvalbind〉·, 5, 0

〈fvalbind〉−→〈sfvalbind〉·, 5, 0

〈fvalbind〉−→〈fvalbind〉· ’|’ 〈sfvalbind〉, 5, 0

〈dec〉−→fun 〈fvalbind〉·, 5, 0

S′−→〈dec〉·$, 5, 0

〈mrule〉−→〈pat〉 ’|’ 〈exp〉·, 5, 0

r5

Page 66: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction

Construction Failure〈exp〉−→case 〈exp〉 of 〈match〉·, 0, 0

〈sfvalbind〉−→vid 〈atpats〉 = 〈exp〉·, 5, 0

〈fvalbind〉−→〈fvalbind〉 ’|’ 〈sfvalbind〉·, 5, 0

〈fvalbind〉−→〈sfvalbind〉·, 5, 0

〈fvalbind〉−→〈fvalbind〉· ’|’ 〈sfvalbind〉, 5, 0

〈dec〉−→fun 〈fvalbind〉·, 5, 0

S′−→〈dec〉·$, 5, 0

〈mrule〉−→〈pat〉 ’|’ 〈exp〉·, 5, 0

〈match〉−→〈mrule〉·, 5, 0

〈match〉−→〈match〉· ’|’ 〈mrule〉, 5, 0

〈match〉−→〈match〉· ’|’ 〈mrule〉, 0, 0

Page 67: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction

Complexity

I |Γ/≡|: size of the position automaton

|Γ/item0| = O(|G|)

I |A|: size of the parser: O(2|Γ/≡| |P|)

I parsing time complexity for input w: O(|w|)

Page 68: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction

Complexity

I |Γ/≡|: size of the position automaton|Γ/item0| = O(|G|)

I |A|: size of the parser: O(2|Γ/≡| |P|)

I parsing time complexity for input w: O(|w|)

Page 69: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction

Limitations

− incomparable with classical parsing techniques

+ subset construction mendable

Page 70: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction

Limitations

− incomparable with classical parsing techniques

+ subset construction mendable

Page 71: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionParser Construction

Summary

I Shift Resolve parsers1. Large class of grammars accepted2. Unambiguity3. Linear time parsing

I 2-steps construction1. Simple2. Flexible

Page 72: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion

Principles

I a bracketed sentence = a derivation tree

I ambiguity =more than one tree with the sameyield

d6d8d13 vid r13 =>d5 case d14 vid r14 of d7d6d8d13 vid r13 =>d14 vid r14r8r6′|′ d8d13 vid r13 =>d14 vid r14r8r7r5r8r6

d7d6d8d13 vid r13 =>d5 case d14 vid r14 of d7d8d13 vid r13 =>d14 vid r14r8r7r5r8r6′|′ d8d13 vid r13 =>d14 vid r14r8r7

I construct a FSA A such that L(Gb) ⊆ L(A), andlook for bracketed sentences with the same yield

Page 73: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion

Principles

I a bracketed sentence = a derivation tree

I ambiguity =more than one tree with the sameyield

d6d8d13 vid r13 =>d5 case d14 vid r14 of d7d6d8d13 vid r13 =>d14 vid r14r8r6′|′ d8d13 vid r13 =>d14 vid r14r8r7r5r8r6

d7d6d8d13 vid r13 =>d5 case d14 vid r14 of d7d8d13 vid r13 =>d14 vid r14r8r7r5r8r6′|′ d8d13 vid r13 =>d14 vid r14r8r7

I construct a FSA A such that L(Gb) ⊆ L(A), andlook for bracketed sentences with the same yield

Page 74: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection Conclusion

Principles

I a bracketed sentence = a derivation tree

I ambiguity =more than one tree with the sameyield

d6d8d13 vid r13 =>d5 case d14 vid r14 of d7d6d8d13 vid r13 =>d14 vid r14r8r6′|′ d8d13 vid r13 =>d14 vid r14r8r7r5r8r6

d7d6d8d13 vid r13 =>d5 case d14 vid r14 of d7d8d13 vid r13 =>d14 vid r14r8r7r5r8r6′|′ d8d13 vid r13 =>d14 vid r14r8r7

I construct a FSA A such that L(Gb) ⊆ L(A), andlook for bracketed sentences with the same yield

Page 75: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionRegular Unambiguity

RU(≡)

I G is regular unambiguous for ≡ of finite index, ifthere does not exist wb , w ′

b in L(Γ/≡)∩ T ∗b withh(wb) = h(w ′

b)

I LR(0) * RU(item0)

I regular approximations are too weak

Page 76: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionRegular Unambiguity

RU(≡)

I G is regular unambiguous for ≡ of finite index, ifthere does not exist wb , w ′

b in L(Γ/≡)∩ T ∗b withh(wb) = h(w ′

b)

I LR(0) * RU(item0)

I regular approximations are too weak

Page 77: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity

Nonterminal Transitions

I SF(Gb) ⊆ L(Γ/≡)

I look for two different bracketed sentential formsin L(Γ/≡)

d6d8 〈pat〉 => d5 case 〈exp〉 of d7 〈match〉 ′| ′ 〈mrules〉r7r5r8r6

d7d6d8 〈pat〉 => d5 case 〈exp〉 of 〈match〉r5r8r6′| ′ 〈mrules〉 r7

I a nonterminal transition represents exactly itsderived context-free language

Page 78: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity

Nonterminal Transitions

I SF(Gb) ⊆ L(Γ/≡)

I look for two different bracketed sentential formsin L(Γ/≡)

d6d8 〈pat〉 => d5 case 〈exp〉 of d7 〈match〉 ′| ′ 〈mrules〉r7r5r8r6

d7d6d8 〈pat〉 => d5 case 〈exp〉 of 〈match〉r5r8r6′| ′ 〈mrules〉 r7

I a nonterminal transition represents exactly itsderived context-free language

Page 79: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity

Nonterminal Transitions

I SF(Gb) ⊆ L(Γ/≡)

I look for two different bracketed sentential formsin L(Γ/≡)

d6d8 〈pat〉 => d5 case 〈exp〉 of d7 〈match〉 ′| ′ 〈mrules〉r7r5r8r6

d7d6d8 〈pat〉 => d5 case 〈exp〉 of 〈match〉r5r8r6′| ′ 〈mrules〉 r7

I a nonterminal transition represents exactly itsderived context-free language

Page 80: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity

Mutual Accessibility RelationsI between pairs of states of Γ/≡, (q1, q2)

I synchronized left-to-right walks from an initialpair (qs, qs)

d6d8 d14 vid r14 => d5 case 〈exp〉 of d7 〈match〉 ′| ′ 〈mrules〉r7r5r8r6

d7d6d8 d14 vid r14 => d5 case 〈exp〉 of 〈match〉r5r8r6′| ′ 〈mrules〉 r7

epsilon: mae

Page 81: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity

Mutual Accessibility RelationsI between pairs of states of Γ/≡, (q1, q2)

I synchronized left-to-right walks from an initialpair (qs, qs)

d6d8 d14 vid r14 => d5 case 〈exp〉 of d7 〈match〉 ′| ′ 〈mrules〉r7r5r8r6

d7d6d8 d14 vid r14 => d5 case 〈exp〉 of 〈match〉r5r8r6′| ′ 〈mrules〉 r7

epsilon: mae

Page 82: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity

Mutual Accessibility RelationsI between pairs of states of Γ/≡, (q1, q2)

I synchronized left-to-right walks from an initialpair (qs, qs)

d6d8 d14 vid r14 => d5 case 〈exp〉 of d7 〈match〉 ′| ′ 〈mrules〉r7r5r8r6

d7d6d8 d14 vid r14 => d5 case 〈exp〉 of 〈match〉r5r8r6′| ′ 〈mrules〉 r7

epsilon: mae

Page 83: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity

Mutual Accessibility RelationsI between pairs of states of Γ/≡, (q1, q2)

I synchronized left-to-right walks from an initialpair (qs, qs)

d6d8 d14 vid r14 => d5 case 〈exp〉 of d7 〈match〉 ′| ′ 〈mrules〉r7r5r8r6

d7d6d8 d14 vid r14 => d5 case 〈exp〉 of 〈match〉r5r8r6′| ′ 〈mrules〉 r7

shift: mas

Page 84: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity

Mutual Accessibility RelationsI between pairs of states of Γ/≡, (q1, q2)

I synchronized left-to-right walks from an initialpair (qs, qs)

d6d8 d14 vid r14 => d5 case 〈exp〉 of d7 〈match〉 ′| ′ 〈mrules〉r7r5r8r6

d7d6d8 d14 vid r14 => d5 case 〈exp〉 of 〈match〉r5r8r6′| ′ 〈mrules〉 r7

nothing!

Page 85: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity

Mutual Accessibility RelationsI between pairs of states of Γ/≡, (q1, q2)

I synchronized left-to-right walks from an initialpair (qs, qs)

d6d8 〈pat〉 => d5 case 〈exp〉 of d7 〈match〉 ′| ′ 〈mrules〉r7r5r8r6

d7d6d8 〈pat〉 => d5 case 〈exp〉 of 〈match〉r5r8r6′| ′ 〈mrules〉 r7

shift: mas

Page 86: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity

Mutual Accessibility RelationsI between pairs of states of Γ/≡, (q1, q2)

I synchronized left-to-right walks from an initialpair (qs, qs)

d6d8 〈pat〉 => d5 case 〈exp〉 of d7 〈match〉 ′| ′ 〈mrules〉r7r5r8r6

d7d6d8 〈pat〉 => d5 case 〈exp〉 of 〈match〉r5r8r6′| ′ 〈mrules〉 r7

conflict: mac

Page 87: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity

Mutual Accessibility RelationsI between pairs of states of Γ/≡, (q1, q2)

I synchronized left-to-right walks from an initialpair (qs, qs)

d6d8 〈pat〉 => d5 case 〈exp〉 of d7 〈match〉 ′| ′ 〈mrules〉r7r5r8r6

d7d6d8 〈pat〉 => d5 case 〈exp〉 of 〈match〉r5r8r6′| ′ 〈mrules〉 r7

conflict: mac

Page 88: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity

Mutual Accessibility RelationsI between pairs of states of Γ/≡, (q1, q2)

I synchronized left-to-right walks from an initialpair (qs, qs)

d6d8 〈pat〉 => d5 case 〈exp〉 of d7 〈match〉 ′| ′ 〈mrules〉r7r5r8r6

d7d6d8 〈pat〉 => d5 case 〈exp〉 of 〈match〉r5r8r6′| ′ 〈mrules〉 r7

conflict: mac

Page 89: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity

Mutual Accessibility RelationsI between pairs of states of Γ/≡, (q1, q2)

I synchronized left-to-right walks from an initialpair (qs, qs)

d6d8 〈pat〉 => d5 case 〈exp〉 of d7 〈match〉 ′| ′ 〈mrules〉r7r5r8r6

d7d6d8 〈pat〉 => d5 case 〈exp〉 of 〈match〉r5r8r6′| ′ 〈mrules〉 r7

shift: mas

Page 90: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity

Mutual Accessibility RelationsI between pairs of states of Γ/≡, (q1, q2)

I synchronized left-to-right walks from an initialpair (qs, qs)

d6d8 〈pat〉 => d5 case 〈exp〉 of d7 〈match〉 ′| ′ 〈mrules〉r7r5r8r6

d7d6d8 〈pat〉 => d5 case 〈exp〉 of 〈match〉r5r8r6′| ′ 〈mrules〉 r7

reduce: mar

Page 91: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity

Mutual Accessibility RelationsI between pairs of states of Γ/≡, (q1, q2)

I synchronized left-to-right walks from an initialpair (qs, qs)

d6d8 〈pat〉 => d5 case 〈exp〉 of d7 〈match〉 ′| ′ 〈mrules〉r7r5r8r6

d7d6d8 〈pat〉 => d5 case 〈exp〉 of 〈match〉r5r8r6′| ′ 〈mrules〉 r7

conflict: mac

Page 92: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity

NU(≡)

I ma=mas ∪ mae ∪ mac ∪ mar

I G is noncanonically unambiguous if there doesnot exist a relation (qs, qs) ma∗ (qf, qf) that usesmac at some step

I Computation in O(|Γ/≡|2) in space

Page 93: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity

Comparisons

I Regular Unambiguity RU(≡)

I Bounded-length detection schemes

I LR(k) and LR-Regular (LR(Π))

I Horizontal and vertical ambiguity (HVRU(≡))

Page 94: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity

Bounded-length detection[Gorn, 1963, Cheung and Uzgalis, 1995, Schroer, 2001, Jampana, 2005]

I generate sentencesI not conservativeI prefixm prevents from false positives in

sentences of length < m

I need to generate a2n+1 to find Gn4 ambiguous,

but Gn4 < NU(item0)

S−→A |Bna, A−→Aaa |a, B1−→aa, B2−→B1B1, . . . , Bn−→Bn−1Bn−1(Gn

4 )

Page 95: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionNoncanonical Unambiguity

LR(k) and LR-Regular[Knuth, 1965, Hunt III et al., 1975, Culik and Cohen, 1973, Heilbrunner, 1983]

I conservative testsI define itemΠ s.t. LR(Π) ⊂ NU(itemΠ)

I need a LR(2n) test to prove Gn3 unambiguous,

but Gn3 ∈ NU(item0)

S−→A |Bn, A−→Aaa |a, B1−→aa, B2−→B1B1, . . . , Bn−→Bn−1Bn−1(Gn

3 )

Page 96: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionExperimental Results

ImplementationI For the whole SML grammar:

I conflicts in the LALR(1) parsersml.y: conflicts: 223 shift/reduce, 35 reduce/reduce

I Our tool:89 potential ambiguities with LR(1) precision detected

I For the SML grammar fragment:2 potential ambiguities with LR(0) precision detected:

(match -> mrule . , match -> match . ’|’ mrule )

(match -> match . ’|’ mrule , match -> match ’|’ mrule . )

I NU(item1) correctly identifies 87% of ourunambiguous grammars—73% of thenon-LALR(1) ones

Page 97: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionExperimental Results

Summary

I conservative ambiguity detection

I provably better than several other techniques

I also experimentally better

Page 98: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionClosing Comments

Conclusion

I Main issues in parser development:I nondeterminismI ambiguity in particular

I Deterministic parsers for larger classes ofgrammars

I Ambiguity detection algorithm

Page 99: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionFuture Work

Directions for Future Work

I Linear time parsing for NU(≡) grammars?

I Improved implementation

I Noncanonical languages

I Regular approximations

Page 100: Approximating Context-Free Grammars for Parsing and ...

Motivation Approximations Shift-Resolve Parsing Ambiguity Detection ConclusionFuture Work

Thanks!

Page 101: Approximating Context-Free Grammars for Parsing and ...

References

Our IssueShift/Reduce Conflict

GNU Bison

state 20

6 exp: "case" exp "of" match .

8 match: match . ’|’ mrule

’|’ shift, and go to state 24

’|’ [reduce using rule 6 (exp)]

Page 102: Approximating Context-Free Grammars for Parsing and ...

References

Our IssueShift/Reduce Conflict

Which action to choose?

of SOME y => filterP(r, y :: l ) | NONE => filterP(r, l)

〈pat〉

〈mrule〉

〈exp〉

〈match〉

. . .

〈match〉

〈pat〉

〈mrule〉

〈exp〉

〈fvalbind〉

Page 103: Approximating Context-Free Grammars for Parsing and ...

References

Our IssueShift/Reduce Conflict

Which action to choose? Reduce?

of SOME y => filterP(r, y :: l ) | NONE => filterP(r, l)

〈pat〉

〈mrule〉

〈exp〉

〈match〉

. . .

〈exp〉

vid

〈sfvalbind〉

error!

〈sfvalbind〉

〈fvalbind〉

Page 104: Approximating Context-Free Grammars for Parsing and ...

References

Our IssueShift/Reduce Conflict

Which action to choose? Shift?

of SOME y => filterP(r, y :: l ) | NONE => filterP(r, l)

〈pat〉

〈mrule〉

〈exp〉

〈match〉

〈match〉

. . .

〈exp〉〈pat〉

〈mrule〉

〈fvalbind〉

Page 105: Approximating Context-Free Grammars for Parsing and ...

References

Our IssueShift/Reduce Conflict

Which action to choose?

| NONE => filterP(r, l) | filterP ([], l ) = rev l

〈exp〉〈pat〉

〈mrule〉

〈match〉

. . .

〈fvalbind〉

〈sfvalbind〉

〈exp〉〈atpats〉

〈exp〉

〈sfvalbind〉

〈fvalbind〉

Page 106: Approximating Context-Free Grammars for Parsing and ...

References

Our IssueShift/Reduce Conflict

Which action to choose? Reduce?

| NONE => filterP(r, l) | filterP ([], l ) = rev l

〈exp〉〈pat〉

〈mrule〉

〈match〉

. . .

〈fvalbind〉

〈sfvalbind〉

〈exp〉〈atpats〉

〈exp〉

〈sfvalbind〉

〈fvalbind〉

Page 107: Approximating Context-Free Grammars for Parsing and ...

References

Our IssueShift/Reduce Conflict

Which action to choose? Shift?

| NONE => filterP(r, l) | filterP ([], l ) = rev l

〈exp〉〈pat〉

〈mrule〉

〈match〉

. . .

〈pat〉

〈exp〉

〈sfvalbind〉

〈fvalbind〉

〈fvalbind〉

〈exp〉

〈match〉

error!

| NONE => filterP(r, l) | filterP ([], l ) = rev l

〈exp〉〈pat〉

〈mrule〉

〈match〉

. . .

〈pat〉

〈exp〉

〈sfvalbind〉

〈fvalbind〉

〈fvalbind〉

〈exp〉

〈match〉

error!

| NONE => filterP(r, l) | filterP ([], l ) = rev l

〈exp〉〈pat〉

〈mrule〉

〈match〉

. . .

〈pat〉

〈exp〉

〈sfvalbind〉

〈fvalbind〉

〈fvalbind〉

〈exp〉

〈match〉

error!

Page 108: Approximating Context-Free Grammars for Parsing and ...

References

Unbounded Lookahead

〈sfvb〉 〈mrule〉

〈atpats〉

〈atpat〉 〈atpat〉

〈pat〉

vid

vid =>=

| |

... ...

... ...

Page 109: Approximating Context-Free Grammars for Parsing and ...

References

LimitationsAmbiguity Report

I grambiguity [Brabrand et al., 2007]*** horizontal ambiguity at E[plus]: Exp <--> ’+’ Exp

ambiguous string: "x+x+x"

I ANTLRWorks [Parr, 2007]

Page 110: Approximating Context-Free Grammars for Parsing and ...

References

Other Limitations

I memory requirements: a solution could be aNLALR test

I dynamic disambiguation: inverse problem,some means to deciding equivalence needed

Page 111: Approximating Context-Free Grammars for Parsing and ...

References

H. J. S. Basten. Ambiguity detection methods forcontext-free grammars. Master’s thesis, Centrumvoor Wiskunde en Informatica, Universiteit vanAmsterdam, Aug. 2007.

P. Boullier. Supertagging: A non-statisticalparsing-based approach. In IWPT’03, pages 55–65,2003. URLftp://ftp.inria.fr/INRIA/Projects/Atoll/

Pierre.Boullier/supertaggeur final.pdf.C. Brabrand, R. Giegerich, and A. Møller. Analyzing

ambiguity of context-free grammars. In J. Holuband J. Zd’arek, editors, CIAA’07, 2007. URLhttp://www.brics.dk/∼brabrand/grambiguity/.To appear in Lecture Notes in Computer Science.

D. G. Cantor. On the ambiguity problem of Backussystems. J. ACM, 9(4):477–479, 1962. ISSN0004-5411. doi: 10.1145/321138.321145.

B. S. N. Cheung and R. C. Uzgalis. Ambiguity incontext-free grammars. In SAC’95, pages 272–276.ACM Press, 1995. ISBN 0-89791-658-1. doi:10.1145/315891.315991.

N. Chomsky and M. P. Schutzenberger. The algebraictheory of context-free languages. In P. Braffort andD. Hirshberg, editors, Computer Programmingand Formal Systems, Studies in Logic, pages118–161. North-Holland Publishing, 1963.

K. Culik and R. Cohen. LR-Regular grammars—anextension of LR(k) grammars. J. Comput. Syst.Sci., 7:66–96, 1973. ISSN 0022-0000.

C. Donnely and R. Stallman. Bison version 2.3, Sept.2006. URLhttp://www.gnu.org/software/bison/manual/.

S. Gorn. Detection of generative ambiguities incontext-free mechanical languages. J. ACM, 10(2):196–208, 1963. ISSN 0004-5411. doi:10.1145/321160.321168.

S. Heilbrunner. Tests for the LR-, LL-, andLC-Regular conditions. J. Comput. Syst. Sci., 27(1):1–13, 1983. ISSN 0022-0000. doi:10.1016/0022-0000(83)90026-0.

H. B. Hunt III, T. G. Szymanski, and J. D. Ullman. Onthe complexity of LR(k) testing. Commun. ACM,18(12):707–716, 1975. ISSN 0001-0782. doi:10.1145/361227.361232.

S. Jampana. Exploring the problem of ambiguity incontext-free grammars. Master’s thesis, OklahomaState University, July 2005. URLhttp://e-archive.library.okstate.edu/

dissertations/AAI1427836/.P. Klint and E. Visser. Using filters for the

disambiguation of context-free grammars. InG. Pighizzini and P. San Pietro, editors, ASMICSWorkshop on Parsing Theory, Technical Report126-1994, pages 89–100. Universita di Milano,1994. URL http://citeseer.ist.psu.edu/klint94using.html.

D. E. Knuth. On the translation of languages fromleft to right. Information and Control, 8(6):607–639, 1965. ISSN 0019-9958. doi:10.1016/S0019-9958(65)90426-2.

B. A. Malloy, J. F. Power, and J. T. Waldron. Applyingsoftware engineering techniques to parser design:the development of a C# parser. In SAICSIT’02,pages 75–82. SAICSIT, 2002. ISBN 1-58113-596-3.URL http://www.cs.nuim.ie/∼jpower/Research/Papers/2002/saicsit02.pdf.

S. McPeak and G. C. Necula. Elkhound: A fast,practical GLR parser generator. In E. Duesterwald,editor, CC’04, volume 2985 of Lecture Notes inComputer Science, pages 73–88. Springer, 2004.ISBN 3-540-21297-3. doi: 10.1007/b95956.

R. Milner, M. Tofte, R. Harper, and D. MacQueen.The definition of Standard ML. MIT Press, revisededition, 1997. ISBN 0-262-63181-4.

T. J. Parr. The Definitive ANTLR Reference: BuildingDomain-Specific Languages. The PragmaticProgrammers, 2007. ISBN 0-9787392-5-6.

F. W. Schroer. AMBER, an ambiguity checker forcontext-free grammars. Technical report,compilertools.net, 2001. URLhttp://accent.compilertools.net/Amber.html.

L. Segoufin and V. Vianu. Validating streaming XMLdocuments. In PODS’02, pages 53–64. ACM Press,2002. ISBN 1-58113-507-6. doi:10.1145/543613.543622.

T. G. Szymanski and J. H. Williams. Noncanonicalextensions of bottom-up parsing techniques.SIAM J. Comput., 5(2):231–250, 1976. ISSN0097-5397. doi: 10.1137/0205019.

K.-C. Tai. Noncanonical SLR(1) grammars. ACMTrans. Prog. Lang. Syst., 1(2):295–320, 1979. ISSN0164-0925. doi: 10.1145/357073.357083.

M. Tomita. Efficient Parsing for Natural Language.Kluwer Academic Publishers, 1986. ISBN0-89838-202-5.

M. van den Brand, J. Scheerder, J. J. Vinju, andE. Visser. Disambiguation filters for scannerlessgeneralized LR parsers. In R. N. Horspool, editor,CC’02, volume 2304 of Lecture Notes in ComputerScience, pages 143–158. Springer, 2002. ISBN3-540-43369-4. URL http://www.springerlink.com/content/03359k0cerupftfh/.

E. Visser. Syntax Definition for LanguagePrototyping. PhD thesis, Sept. 1997. URL http://citeseer.ist.psu.edu/visser97syntax.html.