ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources1 Advanced Course: Treebank-Based Acquisition of LFG, HPSG and CCG Resources Josef.

ESSLLI 2006

Treebank-Based Acquisition of LFG, HPSG and CCG

Resources1

Advanced Course: Treebank-Based Acquisition of LFG, HPSG and CCG Resources

Josef van Genabith, Dublin City University

Yusuke Miyao, University of Tokyo

Julia Hockenmaier, University of Pennsylvania and University of Edinburgh

ESSLLI 200618th European Summer School for Language, Logic

and Information, University of Malaga, July – August 2006

ESSLLI 2006


Resources2

• Josef van Genabith, National Centre for Language Technology NCLT, School of Computing, Dublin City University, Dublin 9, Ireland, [email protected]

• Julia Hockenmaier, [email protected]

• Yusuke Miyao, Department of Computer Science, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033, JAPAN, [email protected]

Lecturer Contact Information

mailto:[email protected]




ESSLLI 2006


Resources3

Motivation

• What do grammars do?– Grammars define languages as sets of strings– Grammars define what strings are grammatical

and what strings are not– Grammars tell us about the syntactic structure of

(associated with) strings• “Shallow” vs. “Deep” grammars• Shallow grammars do all of the above• Deep grammars (in addition) relate text to information/meaning

representation• Information: predicate-argument-adjunct structure, deep

dependency relations, logical forms, …• In natural languages, linguistic material is not always

interpreted locally where you encounter it: long-distance dependencies (LDDs)

• Resolution of LDDs crucial to construct accurate and complete information/meaning representations.

• Deep grammars := (text <-> meaning) + (LDD resolution)

ESSLLI 2006


Resources4

Motivation

• Unification (Constraint-Based) Grammar Formalisms (FU, GPSG, PATR-II, …)

– Lexical-Functional Grammar (LFG)– Head-Driven Phrase Structure Grammar (HPSG)– Combinatory Categorial Grammar (CCG)– Tree-Adjoining Grammar (TAG)

• Traditionally, deep constraint-based grammars are hand-crafted• LFG ParGram, HPSG LingoErg, Core Language Engine CLE, Alvey

Tools, RASP, ALPINO, …• Wide-coverage, deep unification (constraint-based) grammar

development is knowledge extensive and expensive!• Very hard to scale hand-crafted grammars to unrestricted text! • English XLE (Riezler et al. 2002); German XLE (Forst and Rohrer

2006); Japanese XLE (Masuichi and Okuma 2003); RASP (Carroll and Briscoe 2002); ALPINO (Bouma, van Noord and Malouf, 2000)

ESSLLI 2006


Resources5

Motivation

• Instance of “knowledge acquisition bottleneck” familiar from classical “rationalist” rule/knowledge-based AI/NLP

• Alternative to classical “rationalist” rule/knowledge-based AI/NLP• “Empiricist” research paradigm (AI/NLP):

– Corpora, treebanks, …, machine-learning-based and statistical approaches, …

– Treebank-based grammar acquisition, probabilistic parsing– Advantage: grammars can be induced (learned) automatically – Very low development cost, wide-coverage, robust, but …

• Most treebank-based grammar induction/parsing technology produces “shallow” grammars

• Shallow grammars don’t resolve LDDs (but see (Johnson 2002); …), do not map strings to information/meaning representations …

ESSLLI 2006


Resources6

Motivation

• Poses a research question:• Can we address the knowledge acquisition bottleneck for

deep grammar development by combining insights from rationalist and empiricist research paradigms?

• Specifically:• Can we automatically acquire wide-coverage “deep”,

probabilistic, constraint-based grammars from treebanks?• How do we use them in parsing?• Can we use them for generation?• Can we acquire resources for different languages and

treebank encodings?• How do these resources compare with hand-crafted

resources?• …

ESSLLI 2006


Resources7

Course Overview

Monday:

Tuesday:

Wednesday:

Thursday:

Friday:

Motivation, Course Overview, Introductions to TAG, LFG, CCG, HPSG and Penn-II TreeBank, TAG Resources

Penn-II-Based Acquisition of LFG Resources

Penn-II-Based Acquisition of CCG Resources

Penn-II-Based Acquisition of HPSG Resources

Multilingual Resources, Formal Semantics, Comparing LFG, CCG, HPSG and TAG-Based Approaches, Demos, Current and Future Work, Discussion

ESSLLI 2006


Resources8

Course Overview

Tuesday/Wednesday/Thursday

Penn-II-Based Acquisition of XXG Resources:

• Treebank Preprocessing/Clean-Up

• Treebank Annotation/Conversion

• Grammar and Lexicon Extraction

• Parsing (Architectures, Probability Models, Evaluation)

• Generation (Architectures, Probability Models, Evaluation)

• Other (Sematics, Domain Variation, …)

ESSLLI 2006


Resources9

Grammar Formalisms

Grammar Formalisms

ESSLLI 2006


Resources10

Grammar formalisms and linguistic theories

• Linguistics aims to explain natural language:– What is universal grammar?– What are language-specific constraints?

• Formalisms are mathematical theories:– They provide a language in which linguistic theories

can be expressed (like calculus for physics)– They define elementary objects (trees, strings,

feature structures) and recursive operations which generate complex objects from simple objects.

– They do impose linguistic constraints (e.g. on the kinds of dependencies they can capture)

ESSLLI 2006


Resources11

Lexicalised Grammar Formalisms:

TAG, CCG, LFG and HPSG

ESSLLI 2006


Resources12

Lexicalised formalisms (TAG, CCG, LFG and HPSG)

• The lexicon:– pairs words with elementary objects– specifies all language-specific information

(number and location of arguments, control and binding theory)

• The grammatical operations:– are universal– define (and impose constraints on) recursion

ESSLLI 2006


Resources13


• They describe different kinds of linguistic objects:– TAG is a theory of trees– CCG is a theory of (syntactic and semantic) types– LFG is a multi-level theory based on a projection

architecture relating different types of linguistic objects (trees, AVMs, linear logic–based semantics)

– HPSG uses single, uniform formalism (typed feature structures) to describe phonological, morphological, syntactic and semantic representations (signs)

• They differ in details:– treatment of wh-movement, coordination, etc.

ESSLLI 2006


Resources14


• TAG and CCG are weakly equivalent.

• Both are mildly context-sensitive:– can capture Dutch crossing dependencies – but are still efficiently parseable (in polynomial

time)

• LFG context-sensitive

ESSLLI 2006


Resources15

Tree-Adjoining Grammar (TAG)

Tree-Adjoining Grammar

ESSLLI 2006


Resources16

(Lexicalized) Tree-Adjoining Grammar

• TAG is a tree-rewriting formalism:– TAG defines operations (substitution and adjunction) on

trees.– The elementary objects in TAG are trees (not strings)

• TAG is lexicalized:– Each elementary tree is anchored to a lexical item (word)– “Extended domain of locality”:

The elementary tree contains all arguments of the anchor.– TAG requires a linguistic theory which specifies the shape

of these elementary trees.

• TAG is mildly context-sensitive:– can capture Dutch crossing dependencies– but is still efficiently parseable

AK Joshi and Y Schabes (1996) Tree Adjoining Grammars. In G. Rosenberg and A. Salomaa, Eds., Handbook of Formal Languages

ESSLLI 2006


Resources17

TAG substitution (arguments)

SubstituteX YX Y

X Y

Derivation tree:

Derived tree:

ESSLLI 2006


Resources18

ADJOIN

TAG adjunction (modifiers)

XX*

X

X

X*

Auxiliary tree

Foot node

Derived tree:

Derivation tree:

ESSLLI 2006


Resources19

A small TAG lexicon

S

NP VP

VBZ NP

eats

NP

John

VP

RB VP*

always

NP

tapas

ESSLLI 2006


Resources20

A TAG derivation

S

NP VP

VBZ NP

eats

NP

John

NP

tapas

VPRB VP*

always

NP

NP

NP

NP

ESSLLI 2006


Resources21

A TAG derivation

S

NP VP

VBZ NP

eats tapas

VPRB VP*

always

John

VP

VP

ESSLLI 2006


Resources22

A TAG derivation

S

NP

VBZ

VP

NP

eats tapas

VPRB VP*

always

John

ESSLLI 2006


Resources23

Combinatory Categorial Grammar (CCG)

Combinatory Categorial Grammar

ESSLLI 2006


Resources24


• CCG is a lexicalized grammar formalism(the “rules” of the grammar are completely general,all language-specific information is given in the lexicon)

• CCG is nearly context-free(can capture Dutch crossing dependencies, but is still efficiently parseable)

• CCG has a flexible constituent structure• CCG has a simple, unified treatment of

extraction and coordination • CCG has a transparent syntax-semantics interface

(every syntactic category and operation has a semantic counterpart)

• CCG rules are monotonic(movement or traces don’t exist)

• CCG rules are type-driven, not structure-driven(this means e.g. that intransitive verbs and VPs are indistinguishable)

ESSLLI 2006


Resources25

• Categories: specify subcat lists of words/constituents.

• Combinatory rules: specify how constituents can combine.

• The lexicon: specifies which categories a word can have.

• Derivations: spell out process of combining constituents.

CCG: the machinery

ESSLLI 2006


Resources26

CCG categories

• Simple categories: NP, S, PP

• Complex categories: functions which return a result when combined with an argument:

VP or intransitive verb: S\NPTransitive verb: (S\NP)/NPAdverb: (S\NP)\(S\NP)PPs: ((S\NP)\(S\NP))/NP

(NP\NP)/NP• Every category has a semantic

interpretation

ESSLLI 2006


Resources27

Function application

• Combines a function with its argument to yield a result:

(S\NP)/NP NP -> S\NPeats tapas eats tapas

NP S\NP -> SJohn eats tapas John eats tapas

• Used in all variants of categorial grammar

ESSLLI 2006


Resources28

A (C)CG derivation

ESSLLI 2006


Resources29

Type-raising and function composition

• Type-raising: turns an argument into a function.Corresponds to case:

NP -> S/(S\NP) (nominative)NP -> (S\NP)/((S\NP)/NP) (accusative)

• Function composition: composes two functions (complex categories)

(S\NP)/PP PP/NP -> (S\NP)/NPS/(S\NP) (S\NP)/NP -> S/NP

ESSLLI 2006


Resources30

Type-raising and Composition

• Wh-movement:

• Right-node raising:

ESSLLI 2006


Resources31

Another CCG derivation

• We will only be concerned with canonical “normal-form” derivations, which only use function composition and type-raising when syntactically necessary.

ESSLLI 2006


Resources32

CCG: semantics

• Every syntactic category and rule has a semantic counterpart:

ESSLLI 2006


Resources33

The CCG lexicon

• Pairs words with their syntactic categories(and semantic interpretation):

eats (S\NP)/NP xy.eats’xyS\NP x.eats’x

• The main bottleneck for wide-coverage CCG parsing

ESSLLI 2006


Resources34

Why use CCG for statistical parsing?

• CCG derivations are binary trees: we can use standard chart parsing techniques.

• CCG derivations represent long-range dependencies and complement-adjunct distinctions directly:

ESSLLI 2006


Resources35

A comparison with Penn Treebank parsers

• Standard Treebank parsers do not recover the null elements and function tags that are necessary for semantic interpretation:

ESSLLI 2006


Resources36

Lexical-Functional Grammar (LFG)

Lexical-Functional Grammar

ESSLLI 2006


Resources37

Lexical-Functional Grammar LFG

Lexical-Functional Grammar (LFG) (Bresnan & Kaplan 1981, Bresnan 2001, Dalrymple 2001) is a unification- (or constraint-) based theory of grammar.

Two (basic) levels of representation:

• C-structure: represents surface grammatical configurations such as word order, annotated CFG data structures

• F-structure: represents abstract syntactic functions such as SUBJ(ject), OBJ(ect), OBL(ique), PRED(icate), COMP(lement), ADJ(unct) …, AVM attribute-value matrices/structures

F-structure approximates to basic predicate-argument structure, dependency representation, logical form (van Genabith and Crouch, 1996; 1997)

ESSLLI 2006


Resources38


ESSLLI 2006


Resources39


ESSLLI 2006


Resources40


ESSLLI 2006


Resources41

LFG Grammar Rules and Lexical Entries

ESSLLI 2006


Resources42

LFG Parse Tree (with Equations/Constraints)

ESSLLI 2006


Resources43

LFG Constraint Resolution (1/3)

ESSLLI 2006


Resources44


ESSLLI 2006


Resources45


ESSLLI 2006


Resources46

LFG Subcategorisation & Long Distance Dependencies

• Subcategorisation:

– Semantic forms (subcat frames): sign< SUBJ, OBJ>

– Completeness: all GFs in semantic form present at local f-structure

– Coherence: only the GFs in semantic form present at local f-structure

• Long Distance Dependencies (LDDs): resolved at f-structure with Functional Uncertainty Equations (regular expressions specifying paths in f-structure).

ESSLLI 2006


Resources47

LFG LDDs: Complement Relative Clause

ESSLLI 2006


Resources48


ESSLLI 2006


Resources49


ESSLLI 2006


Resources50

Head-Driven Phrase Structure Grammar (HPSG)

Head-Driven Phrase Structure Grammar

ESSLLI 2006


Resources51

Head-Driven Phrase Structure Grammar HPSG

• HPSG (Pollard and Sag 1994, Sag et al. 2003) is a unification-/constraint-based theory of grammar

• HPSG is a lexicalized grammar formalism• HPSG aims to explain generic regularities that underlie

phrase structures, lexicons, and semantics, as well as language-specific/-independent constraints

• Syntactic/semantic constraints are uniformly denoted by signs, which are represented with feature structures

• Two components of HPSG– Lexical entries represent word-specific constraints

(corresponding to elementary objects)– Principles express generic grammatical regularities

(corresponding to grammatical operations)

ESSLLI 2006


Resources52

Sign

• Sign is a formal representation of combinations of phonological forms, syntactic and semantic constraints

signPHON string

SYNSEM LOCAL

NONLOCAL

CAT

CONT content

HEAD

VAL

valenceSPR listSUBJ listCOMPS list

headMOD synsem

synsemlocal

category

nonlocalQUE listREL listSLASH list

phonological formsyntactic/

semanticconstraints

local constraints

syntactic category

syntactic headmodifying

constraintssubcategorization

framessemantic

representationsnon-local

dependenciesDTRS dtrs daughter structures

ESSLLI 2006


Resources53

Lexical entries

• Lexical entries express word-specific constraints

PHON “loves”HEAD verbSUBJ <HEAD noun>COMPS <HEAD noun>

We use simplified notations in this lecture

ESSLLI 2006


Resources54

Principles

• Principles describe generic regularities of grammar– Not corresponding to construction rules

• Head Feature Principle– The value of HEAD must be percolated from the head

daughter

• Valence Principle– Subcats not consumed are percolated to the mother

• Immediate Dominance (ID) Principle– A mother and her immediate daughters must satisfy one of

ID schemas

• Many other principles: percolation of NONLOCAL features, semantics construction, etc.

HEAD 1 HEAD 1head daughter

ESSLLI 2006


Resources55

ID schemas

• ID schemas correspond to construction rules in CFGs and other grammar formalisms– For subject-head constructions (ex. “John runs”)

– For head-complement constructions (ex. “loves Mary”)

– For filler-head constructions (ex. “what he bought”)

COMPS < | >1 2 1

SUBJ <> 1SUBJ < >1

COMPS 2

SLASH < | >1 21SLASH 2

ESSLLI 2006


Resources56

Example: HPSG parsing

• Lexical entries determine syntactic/semantic constraints of words

HEAD nounSUBJ <>COMPS <>

John Mary

HEAD verbSUBJ <HEAD noun>COMPS <HEAD noun>


saw

Lexical entries

ESSLLI 2006


Resources57


• Principles determine generic constraints of grammar


John Mary



saw

HEAD SUBJCOMPS < | >

23 4

13

HEAD SUBJCOMPS

12

4

Unification

ESSLLI 2006


Resources58


• Principle application produces phrasal signs


John Mary



saw

HEAD verbSUBJ <HEAD noun>COMPS <>

ESSLLI 2006


Resources59


• Recursive applications of principles produce syntactic/semantic structures of sentences


John Mary



saw

HEAD verbSUBJ <HEAD noun>COMPS <>

HEAD verbSUBJ <>COMPS <>

ESSLLI 2006


Resources60

Example: LDDs

• NONLOCAL features(SLASH, REL, etc.) explain long-distance dependencies– WH

movements– Topicalization– Relative

clauses etc...

prices

HEAD nounSUBJ < >COMPS < >SPR < >


HEAD verbSUBJ < >COMPS < >SLASH < >

chargedwere

we

2HEAD verbSUBJ < >COMPS < >REL < >

HEAD nounSUBJ < >COMPS < >


3

HEAD verbSUBJ < >COMPS < >

34

4

3

2

HEAD verbSUBJ < >COMPS < >SLASH < >2

3

2

2

1

1

HEAD detSUBJ < >COMPS < >

the

1


ESSLLI 2006


Resources61

Brief Intro to Penn Treebank

ESSLLI 2006


Resources62

The Penn Treebank

• The first large syntactically annotated corpus

• Contains text from different domains:– Wall Street Journal (50,000 sentences, 1 Million words)– Switchboard– Brown corpus– ATIS

• The annotation:– POS-tagged (Ratnaparkhi’s MXPOST) – Manually annotated with phrase-structure trees– Traces and other null elements used to represent non-local

dependencies (movement, PRO, etc.)– Designed to facilitate extraction of predicate-argument

structure

ESSLLI 2006


Resources63

A Treebank tree

• Relatively flat structures:– There is no noun level– VP arguments and adjuncts appear at the same level

• Co-indexed null elements indicate long-range dependencies• Function tags indicate complement-adjunct distinction (?)

ESSLLI 2006


Resources64

Penn-II Treebank

ESSLLI 2006


Resources65

Penn-II Treebank

• Until Congress acts , the government hasn't any authority to issue new debt obligations of any kind , the Treasury said .

ESSLLI 2006


Resources66

Penn-II Treebank

ESSLLI 2006


Resources67

Penn-II Treebank

ESSLLI 2006


Resources68

Penn-II Treebank

ESSLLI 2006


Resources69

Penn-II TreebankADJP ADJP-ADV ADJP-CLR ADJP-HLN ADJP-LOC ADJP-MNR ADJP-PRD ADJP-SBJ ADJP-TMP ADJP-TPC ADJP-TTL ADVP ADVP-CLR ADVP-DIR ADVP-EXT ADVP-HLN ADVP-LOC ADVP-MNR ADVP-PRD ADVP-PRP ADVP-PUT ADVP-TMP ADVP-TPC ADVP|PRT CONJP FRAG FRAG-ADV FRAG-HLN FRAG-PRD FRAG-TPC FRAG-TTL

INTJ INTJ-CLR INTJ-HLN LST NAC NAC-LOC NAC-TMP NAC-TTL NP NP-ADV NP-BNF NP-CLR NP-DIR NP-EXT NP-HLN NP-LGS NP-LOC NP-MNR NP-PRD NP-SBJ NP-TMP NP-TPC NP-TTL NP-VOC NX NX-TTL PP PP-BNF PP-CLR PP-DIR PP-DTV

PP-EXT PP-HLN PP-LGS PP-LOC PP-MNR PP-NOM PP-PRD PP-PRP PP-PUT PP-SBJ PP-TMP PP-TPC PP-TTL PRN PRT PRT|ADVP QP RRC S S-ADV S-CLF S-CLR S-HLN S-LOC S-MNR S-NOM S-PRD S-PRP S-SBJ S-TMP S-TPC

S-TTL SBAR SBAR-ADV SBAR-CLR SBAR-DIR SBAR-HLN SBAR-LOC SBAR-MNR SBAR-NOM SBAR-PRD SBAR-PRP SBAR-PUT SBAR-SBJ SBAR-TMP SBAR-TPC SBAR-TTL SBARQ SBARQ-HLN SBARQ-NOM SBARQ-PRD SBARQ-TPC SBARQ-TTL SINV SINV-ADV SINV-HLN SINV-TPC SINV-TTL SQ SQ-PRD SQ-TPC SQ-TTL

UCP UCP-ADV UCP-CLR UCP-DIR UCP-EXT UCP-HLN UCP-LOC UCP-MNR UCP-PRD UCP-PRP UCP-TMP UCP-TPC VP VP-TPC VP-TTL WHADJP WHADVP WHADVP-TMP WHNP WHPP X X-ADV X-CLF X-DIR X-EXT X-HLN X-PUT X-TMP X-TTL X-TTL

ESSLLI 2006


Resources70

Penn-II Treebank (Simple Transitive Verb)

ESSLLI 2006


Resources71

Penn-II Treebank (Simple Coordination)

ESSLLI 2006


Resources72

Penn-II Treebank (Passive)

ESSLLI 2006


Resources73

Penn-II Treebank (Subject WH-Relative Clause)

ESSLLI 2006


Resources74

Penn-II Treebank (WH-Less Complement Relative Cl.)

ESSLLI 2006


Resources75

Penn-II Treebank (Control and WH-Compl. Rel. Cl.)

ESSLLI 2006


Resources76

Penn-II Treebank (Adv. Relative Clause)

ESSLLI 2006


Resources77

Penn-II Treebank (Coord. and Right Node Raising)

ESSLLI 2006


Resources78

The Parseval measure

• Standard evaluation metric for Treebank parsers.Two components: – Precision: how many of the proposed NTs are correct?– Recall: how many of the correct NTs are proposed?

• Measures recovery of nonterminals(span + syntactic category)

• Ignores function tags and null elements

Has biased research towards parsers that produce linguistically shallow output (Collins, Charniak)

ESSLLI 2006


Resources79

Treebank-Based Acquisition

of TAG resources

ESSLLI 2006


Resources80

Extracting a TAG from the Treebank

• Two different approaches:– F. Xia. Automatic Grammar Generation From Two

Different Perspectives. PhD thesis, University of Pennsylvania, 2001.

– J. Chen, S. Bangalore, K. Vijaj-Shanker. Automated Extraction of Tree-Adjoining Grammars from Treebanks, Natural Language Engineering (forthcoming)

• This lecture: just the basic ideas!

ESSLLI 2006


Resources81

Extracting a TAG from the Penn Treebank

• Input: a Treebank tree (= the TAG derived tree)

•Output: a set of elementary trees(= the TAG lexicon)

ESSLLI 2006


Resources82

Extracting a TAG: the head

- Identify the head path (requires a head percolation table)

S

VPVBG

making

VP

- Find the arguments of the head (requires an argument table)- Ignore modifiers (requires an adjunct table)

- Merge unary productions (VP -> VP)

NP-SBJ

NP

ESSLLI 2006


Resources83

Extracting a TAG: the head

• This is the elementary tree for the head:

ESSLLI 2006


Resources84

Extracting a TAG: arguments

• Arguments are combined via substitution• Recurse on the arguments:

ESSLLI 2006


Resources85

Extracting a TAG: adjuncts

• Adjuncts require auxiliary trees(use adjunction to be combined with the head)

• Auxiliary trees require a foot node (with the same label as the root)

is

VBZ

VP

VP

ADVP-MNR

officially

NP

DTthe

ESSLLI 2006


Resources86

Extracting a TAG: adjuncts

• Adjuncts require auxiliary trees(use adjunction to be combined with the head)

• Auxiliary trees require a foot node (with the same label as the root)

ESSLLI 2006


Resources87

Special cases

• Coordination• Null elements (e.g. traces for wh-

movement):The trace has to be part of the elementary treeof the main verb

• Punctuation marks

ESSLLI 2006


Resources88

Wh-movement: relative clauses

(NP (NP a charge))

(SBAR (WHNP-2 (-NONE- 0))

(S (NP-SBJ Mr. Coleman))

(VP (VBZ denies)

(NP (-NONE- *T*-2)))))))

NP

NP

NP

SBAR

NP

S

VP

VBZ

WHNP

-NONE-

-NONE-

*T*-2

0

denies

ESSLLI 2006


Resources89

Evaluating an extracted grammar/lexicon

• Grammar/lexicon size?– Depends on head table, argument/adjunct distinction,

treatment of null elements, mapping of Treebank labels/POS tags to categories in extracted grammar etc.

– For TAGs, between 3,000-8,500 elementary tree types,and 100,000-130,000 lexical entries.

• Lexical coverage? – For TAGs, around 92-93%

• Distribution of tree types?• Convergence?• Quality?

– Inspection, comparison with manual grammar

ESSLLI 2006


Resources90

References: TAG extraction

TAG:A.K. Joshi and Y. Schabes (1996) Tree Adjoining Grammars. In G. Rosenberg and A.

Salomaa, Eds., Handbook of Formal Languages

TAG extraction:F. Xia. Automatic Grammar Generation From Two Different Perspectives. PhD

thesis, University of Pennsylvania, 2001.J. Chen, S. Bangalore, K. Vijaj-Shanker. Automated Extraction of Tree-Adjoining

Grammars from Treebanks, Natural Language Engineering (forthcoming)Also: L. Shen and A.K. Joshi, Building an LTAG Treebank, Technical Report MS-CIS-

05-15, CIS Department, University of Pennsylvania, 2005

Parsing with extracted TAGs:D. Chiang. Statistical parsing with an automatically extracted tree adjoining

grammar. In Data Oriented Parsing, CSLI Publications, pages 299–316.

L. Shen and A.K. Joshi. Incremental LTAG parsing, HLT/EMNLP 2005

ESSLLI 2006


Resources91


Lexical-Functional Grammar

ESSLLI 2006


Resources92


• Introduction

• Treebank Preprocessing/Clean-Up

• Treebank Annotation/Conversion

• Grammar and Lexicon Extraction

• Parsing (Architectures, Probability Models, Evaluation)

• Generation (Architectures, Probability Models, Evaluation)

• Other (Semantics, Domain Variation, …)

ESSLLI 2006


Resources93

Introduction: Penn-II & LFG

• If we had f-structure annotated version of Penn-II, we could use (standard) machine learning methods to extract probabilistic, wide-coverage LFG resources

• How do we get f-structure annotated Penn-II?

• Manually? No: 50,000 trees …!

• Automatically! Yes: F-Structure annotation algorithm … !

• Penn-II is a 2nd generation treebank – contains lots of annotations to support derivation of deep meaning representations: trees, Penn-II “functional” tags, traces & coindexation – f-structure annotation algorithm can exploit those.

ESSLLI 2006


Resources94


• What is the task?

• Given a Penn-II tree, the f-structure annotation algorithm has to traverse the tree and associate all tree nodes with f-structure equations (including lexical equations at the leaves of the tree).

• A simple example

ESSLLI 2006


Resources95


S

NP-SBJ VP

NN NNS

Factory payrolls

VBD PP-TMP

fellIN NP

NNPin

↑=↓

↑subj=↓

↑=↓

↑=↓

↓↑adjunct

↑=↓ ↓↑adjunct

↑=↓

↑obj=↓

↑=↓

September

Factory payrolls fell in September.

ESSLLI 2006


Resources96


subj : pred : payroll num : pl pers : 3 adjunct : 2 : pred : factory num : sg pers : 3adjunct : 1 : pred : in obj : pred : september num : sg pers : 3pred : falltense : past

ESSLLI 2006


Resources97

Treebank Preprocessing/Clean-Up: Penn-II & LFG

• Penn-II treebank: often flat analyses (coordination, NPs …), a certain amount of noise: inconsistent annotations, errors …

• No treebank preprocessing or clean-up in the LFG approach

• Take Penn-II treebank as is, but

• Remove all trees with FRAG or X labelled constituents

• Frag = fragments, X = not known how to annotate

• Total of 48,424 trees as they are.

ESSLLI 2006


Resources98

Treebank Annotation: Penn-II & LFG

• Annotation-based (rather than conversion-based)• Automatic annotation of nodes in Penn-II treebank tress

with f-structure equations• F-structure Annotation Algorithm• Annotation Algorithm exploits:

– Head information – Categorial information– Configurational information– Penn-II functional tags– Trace information

ESSLLI 2006


Resources99


• Architecture of a modular algorithm to assign LFG f-structure equations to trees in the Penn-II treebank:

Left-Right Context Annotation Principles

Coordination Annotation Principles

Catch-All and Clean-Up

Traces

ProtoF-Structures Proper

F-Structures

Head-Lexicalisation [Magerman,1994]

ESSLLI 2006


Resources100


• Head Lexicalisation: modified rules based on (Magerman, 1994)

ESSLLI 2006


Resources101


Left-Right Context Annotation Principles:

• Head of NP likely to be rightmost noun …• Mother → Left Context Head Right Context

LeftContext

Right Context

Head

ESSLLI 2006


Resources102


Left Context Head Right Context

DT: ↑spec:det=↓ QP: ↑spec:quant=↓JJ, ADJP: ↓↑adjunct

NN, NNS: ↑=↓

NP: ↓↑app PP: ↓↑adjunctS, SBAR: ↓↑relmod

NP

DT

RB

ADJP

very politicized

NN

JJ deala

NP

↑spec:det=↓

DT

RB

↓↑adjunct

ADJP

very politicized

↑=↓

NN

JJ deala

→

NP:

Left-Right Annotation Matrix

ESSLLI 2006


Resources103

Treebank Annotation: Penn-II & LFGADJPADJP-ADVADJP-CLRADJP-HLNADJP-LOCADJP-MNRADJP-PRDADJP-SBJADJP-TMPADJP-TPCADJP-TTLADVPADVP-CLRADVP-DIRADVP-EXTADVP-HLNADVP-LOCADVP-MNRADVP-PRDADVP-PRPADVP-PUTADVP-TMPADVP-TPCADVP|PRTCONJPFRAGFRAG-ADVFRAG-HLNFRAG-PRDFRAG-TPCFRAG-TTL

INTJINTJ-CLRINTJ-HLNLSTNACNAC-LOCNAC-TMPNAC-TTLNPNP-ADVNP-BNFNP-CLRNP-DIRNP-EXTNP-HLNNP-LGSNP-LOCNP-MNRNP-PRDNP-SBJNP-TMPNP-TPCNP-TTLNP-VOCNXNX-TTLPPPP-BNFPP-CLRPP-DIRPP-DTV

PP-EXTPP-HLNPP-LGSPP-LOCPP-MNRPP-NOMPP-PRDPP-PRPPP-PUTPP-SBJPP-TMPPP-TPCPP-TTLPRNPRTPRT|ADVPQPRRCSS-ADVS-CLFS-CLRS-HLNS-LOCS-MNRS-NOMS-PRDS-PRPS-SBJS-TMPS-TPC

S-TTLSBARSBAR-ADVSBAR-CLRSBAR-DIRSBAR-HLNSBAR-LOCSBAR-MNRSBAR-NOMSBAR-PRDSBAR-PRPSBAR-PUTSBAR-SBJSBAR-TMPSBAR-TPCSBAR-TTLSBARQSBARQ-HLNSBARQ-NOMSBARQ-PRDSBARQ-TPCSBARQ-TTLSINVSINV-ADVSINV-HLNSINV-TPCSINV-TTLSQSQ-PRDSQ-TPCSQ-TTL

UCPUCP-ADVUCP-CLRUCP-DIRUCP-EXTUCP-HLNUCP-LOCUCP-MNRUCP-PRDUCP-PRPUCP-TMPUCP-TPCVPVP-TPCVP-TTLWHADJPWHADVPWHADVP-TMPWHNPWHPPXX-ADVX-CLFX-DIRX-EXTX-HLNX-PUTX-TMPX-TTLX-TTL

ESSLLI 2006


Resources104


• Do annotation matrix for each of the monadic categories

(without –Fun tags) in Penn-II

• Based on analysing the most frequent rule types for each

category

such that

sum total of token frequencies of these rule types is greater

than 85% of total number of rule tokens for that category

100% 85% 100% 85%

NP 6595 102 VP 10239 307

S 2602 20 ADVP 234 6

• Apply annotation matrix to all (i.e. also unseen) rules/sub-trees,

i.e. also those NP-LOC, NP-TMP etc.

ESSLLI 2006


Resources105


• Co-ordination Annotation Principles• Often flat Penn-II analysis of coordination:

Co-ordinated ElementObjectModifier

ESSLLI 2006


Resources106


• Unlike constituents coordination:

Co-ordinated Element

ESSLLI 2006


Resources107


Traces Module:

• Long Distance Dependencies

• Topicalisation• Wh- and wh-less questions• Relative clauses• Passivisation• Control constructions• ICH (interpret constituent here)• RNR (right node raising)• …

• Translate Penn-II traces and coindexation into corresponding reentrancy in f-structure

ESSLLI 2006


Resources108

Treebank Annotation: WH-Relative Clauses

ESSLLI 2006


Resources109

Treebank Annotation: Wh-Less Relative Clauses

ESSLLI 2006


Resources110

Treebank Annotation: Control & Wh-Rel. LDD

ESSLLI 2006


Resources111

Treebank Annotation: Adv. Relative Clause

ESSLLI 2006


Resources112

Treebank Annotation: Right Node Raising

ESSLLI 2006


Resources113

Treebank Annotation: Right Node Raising

ESSLLI 2006


Resources114


Catch-All and Clean-Up Module:

• Penn-II Functional Tags are used to identify potential errors– e.g. Nodes with the tag -SBJ should be annotated as the subject …

• Correction of Overgeneralisations– e.g. Change a second OBJ annotations to OBJ2 …– e.g. Change arguments of head nouns erroneously annotated as

relative clauses to COMP arguments: • … signs [that managers expect declines]_RELCL …• … signs [that managers expect declines]_COMP …

• Unannotated Nodes– Defaults …

ESSLLI 2006


Resources115


Left-Right Context Annotation Principles

Coordination Annotation Principles

Catch-All and Clean-Up

Traces

ProtoF-Structures Proper

F-Structures

Head-Lexicalisation [Magerman,1995]

Constraint Solver

ESSLLI 2006


Resources116


• Collect f-structure equations• Send to constraint solver• Generates f-structures

• F-structure annotation algorithm implemented in Java, constraint solver in Prolog

• ~3 min annotating approx. 50,000 Penn-II trees• ~5 min producing approx. 50,000 f-structures

ESSLLI 2006


Resources117


ESSLLI 2006


Resources118


ESSLLI 2006


Resources119

Evaluation (Quantitative):

• Burke (2006)

• Coverage:

Over 99.8% of Penn-II sentences (without X and FRAG constituents) receive a single covering and connected f-structure:

0 F-structures 45 0.093% 1 F-structure 48329 99.804% 2 F-structures 50 0.103%


ESSLLI 2006


Resources120

Evaluation (Qualitative):

• Burke (2006)

• F-structure quality evaluation against DCU 105, a manually annotated dependency gold standard of 105 sentences randomly extracted from WSJ section 23.

• Triples are extracted from the gold standard and the automatically produced f-structures using the evaluation software from (Crouch et al. 2002) and (Riezler et al. 2002)

relation(predicate~0, argument~1)

• Results calculated in terms of Precision and Recall


ESSLLI 2006


Resources121


• Precision and Recall for DCU 105 Dependency Bank results are calculated for All Annotations and for Preds-Only

DCU 105 All Annotations Preds-Only

Precision 97.06% 94.28%

Recall 96.80% 94.28%

ESSLLI 2006


Resources122


DCU 105

Feature Precision Recall F-Score

adjunct 892/968 = 92 892/950 = 94 93 app 16/16 = 100 16/19 = 84 91 comp 88/92 = 96 88/102 = 86 91 coord 153/184 = 83 153/167 = 92 87 obj 442/459 = 96 442/461 = 96 96 obl 50/52 = 96 50/61 = 82 88 oblag 12/12 = 100 12/12 = 100 100passive 76/79 = 96 76/80 = 95 96poss 74/79 = 94 74/81 = 91 92quant 40/64 = 62 40/52 = 77 69relmod 46/48 = 96 46/50 = 92 94subj 396/412 = 96 396/414 = 96 96topic 13/13 = 100 13/13 = 100 100 topicrel 46/49 = 94 46/52 = 88 91 xcomp 145/153 = 95 145/146 = 99 97

ESSLLI 2006


Resources123


• Following (Kaplan et al. 2004) Precision and Recall for PARC 700 Dependency Bank calculated for:

all annotations PARC features preds-only

• Mapping required• (Burke 2006)

PARC 700 PARC features

Precision 88.31%

Recall 86.38%

ESSLLI 2006


Resources124

Grammar and Lexicon Extraction : Penn-II & LFG

Lexical Resources:

• Lexical information extremely important in modern lexicalised grammar formalisms

• LFG, HPSG, CCG, TAG, … • Lexicon development is time consuming and extremely

expensive • Rarely if ever complete• Familiar knowledge acquisition bottleneck …• Subcategorisation frame induction (LFG semantic forms) from

f-Structure annotated version of Penn-II and -III• Evaluation against COMLEX

ESSLLI 2006


Resources125

Grammar and Lexicon Extraction: Penn-II & LFG

• Lexicon Construction– Manual vs. Automated

Our Approach:

– F-Structure Annotation of Penn-II and Penn-III– Frames not Predefined– Functional and Categorial Information– Parameterised for Prepositions and Particles– Active and Passive – Long Distance Dependencies– Conditional Probabilities

ESSLLI 2006


Resources126


• Extraction Methodology– Automatic F-Structure Annotation of Penn-II & III– Lexical Extraction Algorithm– Examples

• Evaluation– Gold Standards (COMLEX, OALD)– Experimental Architecture– Results

ESSLLI 2006


Resources127


sign<subj,obj>

ESSLLI 2006


Resources128


• Semantic Forms: PRED<GF1, GF2, …, GFn>

• Governable Grammatical Functions (Arguments)

– SUBJ, OBJ, OBJθ, OBL, OBLθ, COMP, XCOMP, PART…

• Non-Governable Grammatical Functions (Adjuncts)

– ADJ, XADJ, APP, RELMOD, …

ESSLLI 2006


Resources129


Penn-II Treebank

Automatic F-Structure Annotation Algorithm

LFG F-Structures

Extraction Algorithm

Semantic Forms

ESSLLI 2006


Resources130


Extraction Algorithm:

For each f-structure F

For each level of embedding in F Determine the local predicate PRED Collect all subcategorisable grammatical functions GF1, …,

GFn

Return: PRED<GF1, GF2, …, GFn>

ESSLLI 2006


Resources131


subj : spec : det : pred : the pred : inquiry num : sg pers : 3adjunct : 1 : pred : soonpred : focustense : pastobl : pform : on obj : spec : det : pred : the pred : judge num : sg pers : 3

“The inquiry soon focused on the judge” (wsj_0267_72)

Prepositions and OBLs:

focus([subj,obl:on])

on([obj])

ESSLLI 2006


Resources132


topic : index : [1] subj : spec : det : pred : the num : sing pred : government pers : 3

……

pred : have tense : pressubj : spec : det : pred : the pers : 3 pred : treasury num : singcomp : index : [1] subj : spec : det : pred : the num : sing pred : government pers : 3

… …

pred : have tense : prespred : saytense : past

LDDs:

say([subj,comp])

“Until Congress acts , the government hasn't any authority to issue new debt obligations of any kind, the Treasury said.” (wsj_0008_2)

ESSLLI 2006


Resources133


subj : pred : pro pron_form : itpassive : +to_inf : +pred : bexcomp : subj : pred : pro pron_form : it passive : + pred : consider tense : past obl : pform : as obj : spec : det : pred : a ……… ……… pred : risk num : sg pers : 3

Passive:

consider([subj,obl:as],p)

“… to be considered as an additional risk for the investor…”(wsj_0018_14)

ESSLLI 2006


Resources134


subj : spec : det : pred : the cat : dt

pred : inquiry num : sg pers : 3 cat : nnadjunct : 1 : pred : soon

cat : rbpred : focustense : pastcat : vbdobl : pform : on obj : spec : det : pred : the

cat : dt pred : judge num : sg pers : 3

cat : nn

CFG categories:

focus(v,[subj,obl:on])focus(v,[subj(n),obl:on])

“The inquiry soon focused on the judge.” (wsj_0267_72)

ESSLLI 2006


Resources135


Semantic Form Conditional Probability

accept([subj,obj]) 0.813 accept([subj],p) 0.060 accept([subj,comp]) 0.033 accept([subj,obl:as],p) 0.020 accept([subj,obj,obl:as]) 0.020 accept([subj,obj,obl:from]) 0.020 accept([subj]) 0.013 Others 0.021

Without Prep/Part With Prep/Part Lemmas 3586 3586 Semantic Forms 10969 14348 Frame Types 38 577

Lexicon extracted from Penn-II (O’Donovan et al 2005):

ESSLLI 2006


Resources136


• Evaluation for all active verbs (2992) extracted from Penn-II against COMLEX

• Largest evaluation for English subcat frame extraction system • Carroll and Rooth (1998) – 200 verbs• Schulte im Walde (2000) – over 3000 German verbs

• (VERB :ORTH “reimburse” :SUBC ((NP-NP)

(NP-PP :PVAL (“for”))

(NP)))

• (vp-frame np-np :cs ((np 2)(np 3))

:gs (:subject 1 :obj 2 :obj2 3)

:ex “she asked him his name”)

ESSLLI 2006


Resources137


• Following Schulte im Walde (2000):

• Experiment 1: Exclude prepositional phrases entirely (e.g. [subj,obl:on] is [subj])

• Experiment 2: Include prepositional phrase but not specific preposition (e.g. [subj,obl]). – 2a (+ Part value)

• Experiment 3: Include details of specific preposition (e.g. [subj,obl:on]) – 3a (+ Part value)

• Relative Thresholds of 1% and 5%

ESSLLI 2006


Resources138


Threshold of 1% Threshold of 5% P R F P R F

Exp. 1 79.0% 59.6% 68.0% 83.5% 54.7% 66.1% Exp. 2 77.1% 50.4% 61.0% 81.4% 44.8% 57.8% Exp. 2a 76.4% 44.5% 56.3% 80.9% 39.0% 52.6% Exp. 3 73.7% 22.1% 34.0% 78.0% 18.3% 29.6% Exp. 3a 73.3% 19.9% 31.3% 77.6% 16.2% 26.8%

ESSLLI 2006


Resources139


• Directional Prepositions (about, across, along, around, behind, below, beneath, between, beyond, by, down, from…) included in COMLEX by “default” for verbs that have at least one p-dir …

Exp. 3 Exp. 3a Recall 40.8% 35.4% Increase 18.7% 15.5% F-Score 54.4% 49.7% Increase 20.4% 18.4%

(VERB :ORTH "cycle" :SUBC ((PP :PVAL ("p-dir")))

ESSLLI 2006


Resources140


• Penn-III = Penn-II + the parsed section of the Brown Corpus

– About 300,000 of a total of 1 Million Words Brown Corpus– Balanced Corpus (8 genres) e.g. Humour, Science Fiction

etc.

• Subcategorisation variation across domains • More data, more verbs

• -CLR tag (closely related)

ESSLLI 2006


Resources141


ESSLLI 2006


Resources142


ESSLLI 2006


Resources143


ESSLLI 2006


Resources144


• Applications:

• Porting to other languages– German (TIGER) – Spanish (CAST3LB )– Chinese (CTB-I and II)

• LDD resolution in parsing new text (Cahill et al., 2004)

ESSLLI 2006


Resources145


Parsing-Based Subcat Frame Extraction (O’Donovan 2006):

• Treebank-based vs. parsing-based subcat frame extraction

• We parsed British National Corpus BNC (100 million words) with our automatically induced LFGs

• 19 days on single machine: ~5 million words per day

• Subcat frame extraction for ~10,000 verb lemmas

• Evaluation against COMLEX and OALD

• Evaluation against Korhonen (2002) gold standard

• Our method is statistically significantly better

ESSLLI 2006


Resources146

Parsing: Penn-II and LFG

• Overview Parsing Architectures:

Pipeline & Integrated

• Long-Distance Dependency Resolution at F-Structure

• Evaluation

ESSLLI 2006


Resources147


ESSLLI 2006


Resources148


• PCFG consists of CFG rules with associated probabilities

• A-PCFG treats strings consisting of CFG categories followed by 1 or more functional annotation(s) as monadic categories (e.g. NP[up-obj=down] )

• Probabilistic parsing technology (PCFGs, History-Based and Lexicalised Parsers) produces trees without LDDs

• Exceptions: (Collins 1999): wh-relclauses; (Johnson 2002) post-processing; …

• In our (standard) architecture new text is parsed into proto f-structures.

• LDD resolution at f-structure

ESSLLI 2006


Resources149


• Penn-II tree with traces and co-indexation for LDDs

“U.N. signs treaty, the paper said”

S

S-1 NP VP

NP VP DT NN VBD S

NNP VBZ NP -NONE-

NN *T*-1U.N. signs

treaty

the papersaid

ESSLLI 2006


Resources150


• Trace and coindexaction in tree translated into reentrancy at f-structure by annotation algorithm:

“U.N. signs treaty, the headline said”

ESSLLI 2006


Resources151


• Parse tree from PCFG and History-Based Parsers without traces:

“U.N. signs treaty, the paper said”

S

S NP VP

NP VP DT NN VBD

NNP VBZ NP

NNU.N. signs

treaty

the paper said

ESSLLI 2006


Resources152


• Basic, but possibly incomplete, predicate-argument structures (proto-f-structures):

“U.N. signs treaty, the headline said”

ESSLLI 2006


Resources153


• Require:– subcategorisation frames (O’Donovan et al., 2004, 2005;

O’Donovan 2006)– functional uncertainty equations

• Previous Example:– say([subj,comp]) topic = comp*comp (search along a path of 0 or more

comps)

ESSLLI 2006


Resources154


Subcat Frames:

• Automatically acquired from automatically f-structure-

annotated Penn-II Treebank following (O’Donovan et al. 2004,

2005; O’Donovan 2006)

• Distinction between active and passive frames

• Associated with probabilities

• O’Donovan et al. evaluate against COMLEX resource

• Extracted from sections 02-21

• 10960 active lemma-frame types (semantic forms/subcat

frames), 2241 passive types

ESSLLI 2006


Resources155


Functional Uncertainty equations:

• Automatically acquire finite approximations of FU-equations

• Extract paths between co-indexed material in automatically

generated f-structures from sections 02-21 from Penn-II

• 26 TOPIC, 60 TOPICREL, 13 FOCUS path types

• 99.69% coverage of paths in section 23

• Each path type associated with a probability

ESSLLI 2006


Resources156


Sample TOPICREL paths with frequencies:

up-subj7894up-obj 1167up-xcomp 956up-xcomp:obj 793

up-xcomp:xcomp 161up-xcomp:xcomp:obj 135up-comp:subj 119up-xcomp:subj

92

Sample TOPIC paths with probabilities:up-topic=up-comp 0.940up-topic=up-xcomp:comp 0.006up-topic=up-comp:comp 0.001

ESSLLI 2006


Resources157


LDD Resolution Algorithm: recursively traverse an f-structure and

– find TOPIC:T attribute-value pair

– retrieve TOPIC paths

– for each path p of the form GF1:…: GFn:GF, traverse the f-

structure along the TOPIC path GF1:…: GFn to local sub f-

structure g

• at g retrieve local PRED:P

• add GF:T to g iff

– GF is not present at g

– g together with GF is locally complete and coherent with respect to a semantic form s for P

– multiply path and semantic form probabilities involved to rank resolution

ESSLLI 2006


Resources158


ESSLLI 2006


Resources159

Subcategorisation Framessay([subj]) 0.06say([comp,subj]) 0.87say([subj,xcomp]) 0.02... ...

Subcategorisation Frames say([subj]) 0.06Subcategorisation Frames say([subj]) 0.06say([comp,subj]) 0.87

topic : pred : sign subj : pred : U.N. obj : pred : treatypred : saysubj : spec : the pred : paper


comp : pred : sign subj : pred : U.N. obj : pred : treaty

FU-path approximationsup-topic=up-comp 0.940up-topic=up-xcomp:comp 0.006up-topic=up-comp:comp 0.001... ...

topic

pred : say

0.9400.87

FU-path approximationsup-topic=up-comp

ESSLLI 2006


Resources160


• How do treebank-based constraint grammars compare to deep hand-crafted grammars like XLE and RASP?

• XLE (Riezler et al. 2002, Kaplan et al. 2004)– hand-crafted, wide-coverage, deep, state-of-the-art English LFG

and XLE parsing system with log-linear-based probability models for disambiguation

– PARC 700 Dependency Bank gold standard (King et al. 2003), Penn-II Section 23-based

• RASP (Carroll and Briscoe 2002)– hand-crafted, wide-coverage, deep, state-of-the-art English

probabilistic unification grammar and parsing system (RASP Rapid Accurate Statistical Parsing)

– CBS 500 Dependency Bank gold standard (Carroll, Briscoe and Sanfillippo 1999), Susanne-based

ESSLLI 2006


Resources161


ESSLLI 2006


Resources162

• Choose best treebank-based LFG system to compare with XLE/RASP:

• C-structure engines (state-of-the-art history based, lexicalised parsers):– (Collins 1999)– (Charniak 2000)– (Bikel 2002)

• (Bikel 2002) retrained to retain Penn-II functional tags (-SBJ, -SBJ, -LOC, -TMP, -CLR, etc.)

• Pipeline architecture: tagged text Bikel retrained + f-structure annotation algorithm + LDD resolution f-structures automatic conversion evaluation against XLE/RASP gold standards PARC-700/CBS-500 dependency banks


ESSLLI 2006


Resources163

• Systematic differences between our f-structures and PARC 700 and CBS 500 dependency representations

• Automatic conversion of our f-structures to PARC 700 / CBS 500 -like structures (Burke et al. 2004, Burke 2006, Cahill et al. under review)

• Best XLE and RASP resources with better results than those reported in literature to date

• (Crouch et al. 2002) and (Carroll and Briscoe 2002) evaluation software

• (Noreen 1989) Approximate Randomisation Test to test for statistical significance of results


ESSLLI 2006


Resources164


• Result dependency f-scores:

PARC 700 XLE vs. BKR-LFG:– 80.55% XLE– 83.08% BKR-LFG (+2.53%)

CBS 500 RASP vs. BKR-LFG:– 76.57% RASP– 80.23% BKR-LFG (+3.66%)

• Results statistically significant at 95% level (Noreen 1989) Approximate Randomisation Test

• BKR-LFG = treebank-induced Lexical-Functional Grammar resources with Bickel retrained (BKR) as c-structure engine in pipeline

architecture

ESSLLI 2006


Resources165


PARC 700 Evaluation:

ESSLLI 2006


Resources166


ESSLLI 2006


Resources167


ESSLLI 2006


Resources168


ESSLLI 2006


Resources169


ESSLLI 2006


Resources170


ESSLLI 2006


Resources171

Probability Models: Penn-II & LFG

Probability Models:

• Our approach does not constitute proper probability model (Abney, 1996)

• Why? Probability model leaks:

• Highest ranking parse tree may feature f-structure equations that cannot be resolved into f-structure

• Probability associated with that parse tree is lost

• Doesn’t happen often in practise (coverage >99.5% on unseen data)

• Research on appropriate discriminative, log-linear or maximum entropy models is important (Miyao and Tsujii, 2002) (Riezler et al. 2002)

ESSLLI 2006


Resources172

Generation: Penn-II & LFG

Cahill and van Genabith, 2006

ESSLLI 2006


Resources173


ESSLLI 2006


Resources174


ESSLLI 2006


Resources175


ESSLLI 2006


Resources176


ESSLLI 2006


Resources177


ESSLLI 2006


Resources178


ESSLLI 2006


Resources179


ESSLLI 2006


Resources180


ESSLLI 2006


Resources181

Generation: the Good, the Bad and the Ugly

• Orig: Supporters of the legislation view the bill as an effort to add stability and certainty to the airline-acquisition process , and to preserve the safety and fitness of the industry .

• Gen: Supporters of the legislation view the bill as an effort to add stability and certainty to the airline-acquisition process , and to preserve the safety and fitness of the industry.

• Orig: The upshot of the downshoot is that the A 's go into San Francisco 's Candlestick Park tonight up two games to none in the best-of-seven fest .

• Gen: The upshot of the downshoot is that the A 's tonight go into San Francisco 's Candlestick Park up two games to none in the best-of-seven fest .

• Orig: By this time , it was 4:30 a.m. in New York , and Mr. Smith fielded a call from a New York customer wanting an opinion on the British stock market , which had been having troubles of its own even before Friday 's New York market break .

• Gen: Mr. Smith fielded a call from New a customer York wanting an opinion on the market British stock which had been having troubles of its own even before Friday 's New York market break by this time and in New York , it was 4:30 a.m. .

• Orig: Only half the usual lunchtime crowd gathered at the tony Corney & Barrow wine bar on Old Broad Street nearby .

• Gen: At wine tony Corney & Barrow the bar on Old Broad Street nearby gathered usual , lunchtime only half the crowd , .

ESSLLI 2006


Resources182

Domain Variation, Multilingual LFG Resources, etc.

• Domain variation: ATIS (Judge et al 2005) and QuestionBank (Judge et al 2006)

• F-Str -> (Q)LF Quasi-Logical Forms (Cahill et al. 2003)

• Multilingual treebank-based LFG acquisition:

– German: TIGER treebank (Cahill et al 2003), (Cahill et al 2005)

– Chinese: Chinese Penn Treebank (Burke et al 2004)

– Spanish: Cast3LB (O’Donovan et al 2005), (Chrupala and van Genabith 2006)

• GramLab Project at DCU (2005-2008): Chinese, Japanese, Arabic, Spanish, French and German

ESSLLI 2006


Resources183

Demo System

• http://lfg-demo.computing.dcu.ie/lfgparser.html

http://lfg-demo.computing.dcu.ie/lfgparser.html

ESSLLI 2006


Resources184

Publications

A. Cahill and J. Van Genabith, Robust PCFG-Based Generation using Automatically Acquired LFG-Approximations, COLING/ACL 2006, Sydney, Australia

J. Judge, A. Cahill and J. van Genabith, QuestionBank: Creating a Corpus of Parse-Annotated Questions, COLING/ACL 2006, Sydney, Australia

G. Chrupala and J. van Genabith, Using Machine-Learning to Assign Function Labels to Parser Output for Spanish, COLING/ACL 2006, Sydney, Australia

M. Burke, Automatic Treebank Annotation for the Acquisition of LFG Resources, Ph.D. Thesis, School of Computing, Dublin City University, Dublin 9, Ireland. 2005

R. O’Donovan, Automatic Extraction of Large-Scale Multilingual Lexical Resources, Ph.D. Thesis, School of Computing, Dublin City University, Dublin 9, Ireland. 2005

R. O'Donovan, M. Burke, A. Cahill, J. van Genabith and A. Way. Large-Scale Induction and Evaluation of Lexical Resources from the Penn-II and Penn-III Treebanks, Computational Linguistics, 2005

A. Cahill, M. Forst, M. Burke, M. McCarthy, R. O'Donovan, C. Rohrer, J. van Genabith and A. Way. Treebank-Based Acquisition of Multilingual Unification Grammar Resources; Journal of Research on Language and Computation; Special Issue on "Shared Representations in Multilingual Grammar Engineering", (eds.) E. Bender, D. Flickinger, F. Fouvry and M. Siegel, Kluwer Academic Press, 2005

ESSLLI 2006


Resources185

Publications

R. O'Donovan, A. Cahill, J. van Genabith, and A. Way. Automatic Acquisition of Spanish LFG Resources from the CAST3LB Treebank; In Proceedings of the Tenth International Conference on LFG, Bergen, Norway, 2005

J. Judge, M. Burke, A. Cahill, R. O'Donovan, J. van Genabith, and A. Way. Strong Domain Variation and Treebank-Induced LFG Resources; In Proceedings of the Tenth International Conference on LFG, Bergen, Norway,2005

M. Burke, A. Cahill, J. van Genabith, and A. Way. Evaluating Automatically Acquired F-Structures against PropBank; In Proceedings of the Tenth International Conference on LFG, Bergen, Norway, 2005

M. Burke, A. Cahill, M. McCarthy, R.O'Donovan, J. van Genabith and A. Way. Evaluating Automatic F-Structure Annotation for the Penn-II Treebank; Journal of Language and Computation; Special Issue on "Treebanks and Linguistic Theories", (eds.) E. Hinrichs and K.Simov, Kluwer Academic Press. 2005. pages 523-547

A. Cahill. Parsing with Automatically Acquired, Wide-Coverage, Robust, Probabilistic LFG Approximations. Ph.D. Thesis. School of Computing, Dublin City University, Dublin 9, Ireland. 2004

M. Burke, O. Lam, A. Cahill, R. Chan, R. O'Donovan, A. Bodomo, J. van Genabith and A. Way; Treebank-Based Acquisition of a Chinese Lexical-Functional Grammar; Proceedings of the PACLIC-18 Conference, Waseda University, Tokyo, Japan, pages 161-172, 2004

ESSLLI 2006


Resources186

Publications

M. Burke, A. Cahill, R. O'Donovan, J. van Genabith, and A. Way. The Evaluation of an Automatic Annotation Algorithm against the PARC 700 Dependency Bank, In Proceedings of the Ninth International Conference on LFG, Christchurch, New Zealand, pages 101-121, 2004

A. Cahill, M. Burke, R. O'Donovan, J. van Genabith, and A. Way. Long-Distance Dependency Resolution in Automatically Acquired Wide-Coverage PCFG-Based LFG Approximations, In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), July 21-26 2004, pages 320-327, Barcelona, Spain, 2004

R. O'Donovan, M. Burke, A. Cahill, J. van Genabith, and A. Way. Large-Scale Induction and Evaluation of Lexical Resources from the Penn-II Treebank, In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), July 21-26 2004, pages 368-375, Barcelona, Spain, 2004

M. Burke, Cahill A., R. O' Donovan, J. van Genabith and A. Way. Treebank-Based Acquisition of Wide-Coverage, Probabilistic LFG Resources: Project Overview, Results and Evaluation, The First International Joint Conference on Natural Language Processing (IJCNLP-04), Workshop "Beyond shallow analyses - Formalisms and statistical modeling for deep analyses"; March 22-24, 2004 Sanya City, Hainan Island, China, 2004

Cahill A., M. Forst, M. McCarthy, R. O' Donovan, C. Rohrer, J. van Genabith and A. Way. Treebank-Based Multilingual Unification-Grammar Development. In the Proceedings of the Workshop on Ideas and Strategies for Multilingual Grammar Development, at the 15th European Summer School in Logic Language and Information, Vienna, Austria, 18th - 29th August 2003

ESSLLI 2006


Resources187

Publications

Cahill A, M. McCarthy, J. van Genabith and A. Way. Quasi-Logical Forms for the Penn Treebank; In (eds.) Harry Bunt, Ielka van der Sluis and Roser Morante; Proceedings of the Fifth International Workshop on Computational Semantics, IWCS-05, January 15-17, 2003, Tilburg, The Netherlands, ISBN: 90-74029-24-8, pp.55-71, 2003

Cahill A, M. McCarthy, J. van Genabith and A. Way. Evaluating Automatic F-Structure Annotation for the Penn-II Treebank. TLT 2002, Treebanks and Linguistic Theories 2002, 20th and 21st September 2002, Sozopol, Bulgaria, (eds.) E. Hinrichs and K. Simov, Proceedings of the First Workshop on Treebanks and Linguistic Theories (TLT 2002), pp. 42-60, 2002

Cahill A, M. McCarthy, J. van Genabith and A. Way. Parsing with PCFGs and Automatic F-Structure Annotation, In M. Butt and T. Holloway-King (eds.): Proceedings of the Seventh International Conference on LFG CSLI Publications, Stanford, CA., pp.76--95. 2002

Cahill A, and J. van Genabith. TTS - A Treebank Tool; in LREC 2002, The Third International Conference on Language Resources and Evaluation, Las Palmas de Grand Canaria, Spain, May 27th--June 2nd, 2002, Proceedings of the Conference, Volume V, (eds.) M.G.Rodriguez and C.P. Suarez Arnajo, ISBN 2-9517408-0-8, pp. 1712-1717, 2002

Cahill A, M. McCarthy, J. van Genabith and A. Way. Automatic Annotation of the Penn-Treebank with LFG F-Structure Information; LREC 2002 workshop on Linguistic Knowledge Acquisition and Representation - Bootstrapping Annotated Language Data, LREC 2002, Third International Conference on Language Resources and Evaluation, post-conference workshop, June 1st, 2002, proceedings of the workshop, (eds.) A. Lenci, S. Montemagni and V. Pirelli, ELRA - European Language Resources Association, Paris France, pp. 8-15, 2002

ESSLLI 2006


Resources188

Penn-II-Based Acquisition of CCG Resources


ESSLLI 2006


Resources189

This lecture

• Recap: CCG

• Translating the Penn Treebank to CCG– The translation algorithm– CCGbank: the acquired grammar and lexicon

• Wide-coverage parsing with CCG

ESSLLI 2006


Resources190

• Categories: specify subcat lists of words/constituents.

• Combinatory rules: specify how constituents can combine.

• The lexicon: specifies which categories a word can have.

• Derivations: spell out process of combining constituents.

CCG: the machinery

ESSLLI 2006


Resources191

CCG categories

• Simple categories: NP, S, PP

• Complex categories: functions which return a result when combined with an argument:

VP or intransitive verb: S\NPTransitive verb: (S\NP)/NPAdverb: (S\NP)\(S\NP)PPs: ((S\NP)\(S\NP))/NP

(NP\NP)/NP

ESSLLI 2006


Resources192

The combinatory rules

• Function application: x.f(x) a f(a) X/Y Y X (>)Y X\Y X (<)

• Function composition: x.f(x) y.g(y) x.f(g(x))X/Y Y/Z X/Z (>B)Y\Z X\Y X/Z (<B)X/Y Y\Z X\Z (>Bx)Y/Z X\Y X/Z (<Bx)

• Type-raising: a f.f(a)X T/(T\X) (>T)

X T\(T/X) (<T)

ESSLLI 2006


Resources193

CCG derivations

• Canonical “normal-form” derivations (mostly function application):

• Alternative derivations:

ESSLLI 2006


Resources194

Type-raising and Composition

• Wh-movement:

• Right-node raising:

ESSLLI 2006


Resources195

CCG: semantics

• Every syntactic category and rule has a semantic counterpart:

ESSLLI 2006


Resources196

From the Penn Treebank to CCG

• The basic translation algorithm• Dealing with null elements• Type-changing rules in the grammar• Preprocessing• CCGbank: The extracted lexicon/grammar

ESSLLI 2006


Resources197

Input: Penn Treebank tree

• Flat phrase-structure tree• Traces/null elements and indices

represent underlying dependencies• Function tags

ESSLLI 2006


Resources198

Output: CCG derivation

• Binary derivation treewith explicit “deep”dependency structuresand subcategorization information.

• No null elements

ESSLLI 2006


Resources199

I. Identify heads, arguments, adjuncts

ESSLLI 2006


Resources200

II. Binarise the tree

ESSLLI 2006


Resources201

III. Assign CCG categories

ESSLLI 2006


Resources202

Morphosyntactic Features

• Features on verbal categories:declarative, infinitival, past participle,present participle, passive

• Sentential features:wh-questions, yes-no questions, embedded questions, embedded declaratives, fragments, etc.

• CCGbank has no case or number distinction!

ESSLLI 2006


Resources203

III. Assign CCG categories: adjuncts

ESSLLI 2006


Resources204

III. Assign CCG categories: arguments

ESSLLI 2006


Resources205

IV. Assign predicate-argument structure

• We approximate predicate-argument structure by word-word dependencies

• These are defined by the argument slots of functor catgeories:

just (S\NP)/(S\NP) opened opened (S[dcl]\NP)/NP doors

ESSLLI 2006


Resources206

IV. Assign predicate-argument structure

• Non-local dependencies arise through:– Binding and control: “He may want you to listen”– Extraction: “the tapas that he told us she ate”

• Both are mediated by lexical categories:– Control verbs, auxiliaries/modals– Relative pronouns

• We represent this via coindexation: (NP\NPi)/(S[dcl]/NPi)

In CCGbank: added automatically to certain category types

ESSLLI 2006


Resources207

Lexical categories that mediate dependencies

• Auxiliaries/modals, raising verbs: will, might, seem(S[dcl]\NPi)/(S[b]\NPi)

• Control verbs: persuade you to go((S[dcl]\NP)/(S[to]\NPi))/NPi

• Relative pronouns: which, who, that(NP\NPi)/(S[dcl]/NPi)

• Many more (listed in CCGbank manual)

ESSLLI 2006


Resources208

Summary: The basic algorithm

1. Identify heads, complements and adjuncts.2. Binarize the tree.3. Assign CCG categories.4. Add co-indexation to lexical categories.5. Create predicate-argument structure.

ESSLLI 2006


Resources209

Problems with basic algorithm

• Depends on Treebank markup:– Complement/adjunct distinction– The analyses don’t always correspond to CCG

analysis– Errors in Treebank annotation

• Proliferation of categories:

ESSLLI 2006


Resources210

The need for preprocessing

• Eliminating (some of) the noise:– POS-tagging errors– Bracketing errors (coordination!)

• Changing the Treebank analyses:– Small clauses

• Adding more structure:– Insert a noun level into NPs– Analyze QPs, fragments, parentheticals,

multiword-expressions

ESSLLI 2006


Resources211

Compacting the grammar: Type-changing rules

• Type-changing rules for adjuncts capture syntactic regularities:

ESSLLI 2006


Resources212

Null elements, traces, and coindexation

• *-null elements: passive, PRO• *T*-traces: wh-movement, tough movement• *RNR*-traces: right-node raising• Other null elements:

– *EXP*: expletive,– *ICH* (“insert constituent here”): extraposition – *U* (units): $ 500 *U* – *PPA* (permanent predictable ambiguity)

• =-coindexation: argument cluster coordination and gapping

ESSLLI 2006


Resources213

• Used for passive or PRO (arbitrary or controlled):

• Only the passive * matters for translation:(S with null subject = VP = S\NP)

* null elements

ESSLLI 2006


Resources214

Unbounded long-range dependencies

• … arising through extraction (*T*):– Wh-movement (relative clauses and wh-questions):

the articles that (you believed he saw that…) I filed– Tough-movement:

Peter is easy to please– Parasitic gaps:

the articles that I filed without reading

• … arising through coordination (*RNR* and =):– Right-node raising:

[[Mary ordered] and [John ate]] the tapas. – Argument cluster coordination:

Mary ordered [[tapas for herself] and [wine for John]].– Sentential gapping:

[[Mary ordered tapas] and [John beer]].

ESSLLI 2006


Resources215

Dealing with extraction

• Penn Treebank: *T* traces indicate extraction

ESSLLI 2006


Resources216

Dealing with extraction

• Pass the extracted NP up to relative clause.• The relative pronoun subcategorizes for an

‘incomplete’ sentence:(NP\NP)/(S[dcl]\NP) for subject relatives(NP\NP)/(S[dcl]/NP) for object relatives

• The derivation uses type-raising and composition

ESSLLI 2006


Resources217

Right node raising in the Penn Treebank

ESSLLI 2006


Resources218

Right node raising in CCGbank

ESSLLI 2006


Resources219

Argument-cluster coordination

• “Template gapping” annotation: Co-indexation between constituents in conjuncts

• The first conjunct contains the head

ESSLLI 2006


Resources220

Argument-cluster coordination in CCGbank

• The shared constituents are coordinated (via type-raising and composition):

X T\(T/NP) (<T)NP (S\NP)\((S\NP)/NP) (<T)

ESSLLI 2006


Resources221

Sentential Gapping

• In the Treebank:

• CCG uses decomposition to obtain the types(interpretation is given extragrammatically)

ESSLLI 2006


Resources222

Remaining problems: NP level

• Lists and appositives are indistinguishable:

• Compound nouns have no internal structure:

ESSLLI 2006


Resources223

Remaining problems: other constructions

• Complement-adjunct distinction:

ESSLLI 2006


Resources224

Putting it all together….

Funds that are or soon will be listed in New York or London

ESSLLI 2006


Resources225

The CCG derivation

ESSLLI 2006


Resources226

that: (NPi\NPi)/(S[dcl]\NPi) funds are,will

The relative clause:

ESSLLI 2006


Resources227

The right-node-raising VP

ESSLLI 2006


Resources228

CCGbank

• Coverage of the translation algorithm:99.44% of all sentences in the Treebank(main problem: sentential gapping)

• The lexicon (sec.02-21): – 74,669 entries for 44,210 word types– 1286 lexical category types

(439 appear once, 556 appear 5 times or more)

• The grammar (sec. 02-21):– 3262 rule instantiations (1146 appear once)

ESSLLI 2006


Resources229

The most ambiguous words

ESSLLI 2006


Resources230

Frequency distribution of categories

ESSLLI 2006


Resources231

Lexical coverage

• How well does our lexicon cover unseen data?“Training” data: sections 02-21

Test data: section 00

• The lexicon contains the correct entries for94.0% of the tokens in section 00.

• 3.8% of the tokens in section 00 do not appearin sections 02-21.

35% of the unknown tokens are N29% of the unknown tokens are N/N

ESSLLI 2006


Resources232

Statistical Parsing with CCG

• The data: CCGbank• The algorithms: standard CKY chart parsing

(and a supertagger)• The models:

– Generative: Hockenmaier and Steedman (2002)– Conditional: Clark and Curran (2004)

ESSLLI 2006


Resources233

Parsing algorithms for CCG

• CCG derivations are binary trees.• Standard chart parsing algorithms (eg. CKY)

can be used.• Complexity: O(n6)

(or O(n3) if the category set is fixed)• Recovery of “deep” dependencies require

feature structures. • Supertagging: assign most likely categories

to words before parsing. Significantly speeds up parsing!

ESSLLI 2006


Resources234

Parsing models

• Generative models: P(s,)Model the process which generates the derivation – Advantage: easy to guarantee consistency– Disadvantage: requires good smoothing techniques,

difficult to include complex features

Good baseline

• Conditional models: P( |s)Given a sentence s, predict most likely derivation – Advantage: more natural for parsing– Disadvantage: large model size, difficult to estimate

ESSLLI 2006


Resources235

Evaluation: recovery of dependency structures

LabelledUnlabelled

Generative: 83.3 90.3(Hockenmaier and Steedman, 2002)

Conditional: 84.6 91.2(Clark and Curran, 2004)

This includes long-range dependencies

ESSLLI 2006


Resources236

ccg2sem: from CCG to DRT

• A Prolog package which translates CCGbank derivations into Discourse Representation Theory structures (Bos, 2005)

ESSLLI 2006


Resources237

CCGbanks for other languages

• German (Hockenmaier, 2006):– Translation of German TIGER corpus into CCG.– Many crossing dependencies, etc.:

context-free approximations are inappropriate– Current coverage: 92.4% of all graphs

(excluding headlines, fragments etc.)

• Turkish (Cakici, 2005):– Extracts a CCG lexicon from the METU Sabanci

Treebank.

ESSLLI 2006


Resources238

A few referencesGeneral CCG references:M. Steedman (2000). The Syntactic Process, MIT Press.M. Steedman (1996). Surface Structure and Interpretation, MIT Press.CCGbank(s) and wide-coverage CCG parsing:J. Hockenmaier and M. Steedman (2005). CCGbank: User’s Manual, MS-CIS-05-09,

Dept. of Computer and Information Science, University of Pennsylvania.J. Hockenmaier and M. Steedman (2002). Acquiring Compact Lexicalized

Grammars from a Cleaner Treebank, LREC, Las Palmas, Spain.J. Hockenmaier (2003). Data and Models for Statistical Parsing with Combinatory

Categorial Grammar. PhD thesis, Infomatics, University of Edinburgh.J. Hockenmaier and M. Steedman (2002). Generative Models for Statistical Parsing

with Combinatory Categorial Grammar, ACL ‘02, Philadelphia, PA, USA.S. Clark and J. R. Curran (2004). Parsing the WSJ using CCG and Log-Linear

Models ACL '04, Barcelona, Spain.S. Clark and J. R. Curran (2004). The Importance of Supertagging for Wide-

Coverage CCG Parsing. Coling’04, Geneva, Switzerland.J. Bos (2005): Towards Wide-Coverage Semantic Interpretation. IWCS-6.R. Cakici (2005). Automatic Induction of a CCG Grammar for Turkish.

ACL Student Research Workshop, Ann Arbor, Mi, USA.J. Hockenmaier (2006). Creating a CCGbank and a wide-coverage CCG lexicon for

German. ACL/COLING ‘06, Sydney, Australia.

ESSLLI 2006


Resources239

More references

• The CCG website: http://groups.inf.ed.ac.uk/ccgwith lots of general references about CCG(as well as CCGbank, CCG parsing, etc.)

• CCGbank is available from the Linguistic Data Consortium (LDC) at the University of Pennsylvania.

http://groups.inf.ed.ac.uk/ccg




ESSLLI 2006


Resources240

Penn- II-Based Acquisition of HPSG Resources

Head-Driven Phrase Structure Grammar

ESSLLI 2006


Resources241

Penn- II-Based Acquisition of HPSG Resources

• Introduction• Treebank conversion and HPSG annotation• Lexicon extraction• Probabilistic models

– Feature forest model– Design of features

• Parsing• Evaluation• Advanced topics

ESSLLI 2006


Resources242

Introduction

• If we had an HPSG version of Penn-II, we could obtain lexical entries and probabilistic models

• How do we get HPSG-annotated Penn-II?• Converting Penn-II into an HPSG-conformant

treebank• How do we verify the conformity with the HPSG

theory?• Principles are exploited for the verification

– Implementation of principles is relatively easy, while construction of the lexicon is extremely difficult

– Principles are hand-coded, while lexical entries are acquired from a converted treebank

ESSLLI 2006


Resources243

Introduction

• We develop a treebank rather than a lexicon• A treebank provides more information than a

lexicon– Verification of the consistency of the grammar– Statistics

Principles

Lexicon

Treebank

ESSLLI 2006


Resources244

Methodology

TreebankTreebank

PrinciplesPrinciples LexiconLexicon

pretty/JJ

database/NN

Treebankconversion

Treebankconversion

HPSG treebankHPSG treebank

Lexiconextraction

Lexiconextraction

Grammarwriter

ESSLLI 2006


Resources245

Comparison with conventional grammar development

Lexiconextractor

Lexiconextractor

LexiconLexicon

PrinciplesTreebankPrinciplesTreebank

ParserParserGrammar writer

PrinciplesLexicon

PrinciplesLexicon

TreebankTreebank

CorpusCorpus editedit

verifyverify

Treebank-baseddevelopment

Manualdevelopment

ESSLLI 2006


Resources246

Treebank conversion and HPSG annotation

• Convert Penn-style parse trees into HPSG-style parse trees– Correcting frequent errors in Penn Treebank

• Ex. Confusion of VBD/VBN

– Converting tree structures• Small clauses, passives, NP structures, auxiliary/control

verbs, LDDs, etc.

– Mapping into HPSG-style representations• Head/argument/modifier distinction, schema name

assignment• Mapping into HPSG categories

– Applying HPSG principles/schemas• Undetermined features are filled• Violations of feature constraints are detected

ESSLLI 2006


Resources247

HEAD verbSUBJ < >COMPS < >MOD


Overview

S

making

the offer

NP

NL

NP

is officially

VP

VP

VP

head

head

head mod head

arg

arg

arg

S

making

the offer

NP

NL

NP

is

officially

VP

VP

ADVP

Error correction &tree conversion

Mapping intoHPSG-stylerepresentation

NL


subject-head



the offermaking

HEAD adv

HEAD verbSUBJ < >1

HEAD verb

HEAD verbSUBJ < >1

HEAD verb

is officially

HEAD verb

head-comp

head-mod head-comp

Principleapplication

NL




the offermaking


1


1

is officially

1

12


12

3

3


14

4

2

ESSLLI 2006


Resources248

Tree conversion

• Coordination, quotation, insertion, and apposition

• Small clauses, “than” phrases, quantifier phrases, complementizers, etc.

• Disambiguation of non-/pre-terminal symbols (TO, etc.)

• HEAD features (CASE, INV, VFORM, etc.)• Noun phrase structures• Auxiliary/control verbs• Subject extraction• Long distance dependencies• Relative clauses, reduced relatives

ESSLLI 2006


Resources249

Pattern-based tree conversion

tree_transform_rule("predicative", $Input, $Output) :- tree_match(TREE_NODE\$Node & TREE_DTRS\[tree_any & ANY_TREES\$LeftTrees, (TREE_NODE\SYM\"S" & TREE_DTRS\($PRDTrees & [tree_any, tree & TREE_NODE\

FUNC\"PRD", tree_any])), tree_any & ANY_TREES\$RightTrees], $Input), append_list([$LeftTrees, $PRDTrees, $RightTrees], $Dtrs), $Output = TREE_NODE\$Node & TREE_DTRS\$Dtrs.S

NP VP

SNP ADJP

himself

He considered

superior

S

NP VP

NP ADJPhimself

He considered

superior

Tree pattern

ESSLLI 2006


Resources250

Passive

• “be + VBN” constructions are assigned“VFORM passive”

S

been

out

VP

*-2

NP-SBJ-2

have n’t VP

VP

the details

worked/VBN NP PRT

VFORM passive

ESSLLI 2006


Resources251

Noun phrase structures

• Determiners are raised• Possessive structures are explicitly

represented

NP

of

plant

NPMonsanto

NP

’s

director PP

sciences

NP

of

plant

NPMonsanto

DP

’s director PP

sciences

N’

NP

ESSLLI 2006


Resources252

Auxiliary/control verbs

• Auxiliary/control verbs are annotated as taking unsaturated constituents

S

VP

have

to

choose

this particular moment

S

NP VP

VP

NP

they

NP-1

did n’t

*-1

VP

VP

SUBJ < >1

1 SUBJ < >2

SUBJ < >2

SUBJ < >3

3=

S

VP

have

to

choose

this particular moment

VP

VP

NP

they

NP-1

did n’t

VP

VP

ESSLLI 2006


Resources253

Subject extraction

• HPSG does not allow subject extraction• Relativizers are treated as ordinary subjects

in relative clauses

NP

WHNP-1

SBAR

SThe company

NP

which NP VP

VPhas

reported NP

*T*-1

net losses

NP

WHNP-1

SBAR

The company

NP

which

VP

VPhas

reported NP

net losses

ESSLLI 2006


Resources254

Subject relative

• Relativizers have a non-empty list in REL• The element of REL is consumed in a head-

relative construction and represents the relative-antecedent relation

NP

WHNP-1

SBAR

The company

NP

which

VP

VPhas

reported NP

net losses

REL < >2

REL < >22

REL < >

ESSLLI 2006


Resources255

LDDs: Object relative

• SLASH represents moved arguments• REL represents relative-antecedent relations

REL < >

SLASH < >1

2REL < >SLASH < >

2

REL < >SLASH < >

NP

WHNP-3

SBAR

Sthe energy and ambitions

NP

that NP-2

reformers

VP

Swanted

reward

VP

*T*-3

1

NP

to VP

NP

*-2

SLASH < >1

SLASH < >1

SLASH < >1

SLASH < >1

2

ESSLLI 2006


Resources256

Mapping into HPSG-style representations

• Convert nonterminal symbols into HPSG-style categories

• Assign schema names to internal nodes

NNHEAD: nounAGR: 3sg

HEAD: verbVFORM: finiteTENSE: past

VBD

ESSLLI 2006


Resources257

Category mapping & schema name assignment

• Example: “NL is officially making the offer”

S

making

the offer

NP

NL

NP

is officially

VP

VP

VP

head

head

head mod head

arg

arg

argNL


subject-head



the offermaking

HEAD adv

HEAD verbSUBJ < >1

HEAD verb

HEAD verbSUBJ < >1

HEAD verb

is officially

HEAD verb

head-comp

head-mod head-comp

ESSLLI 2006


Resources258

Principle application

inverse_schema_binary(subj_head_schema, $Mother, $Left, $Right) :- $Left = (SYNSEM\($LeftSynsem & LOCAL\CAT\(HEAD\MOD\[] & VAL\(SUBJ\[] & COMPS\[] & SPR\[])))), $Right = (SYNSEM\LOCAL\CAT\(HEAD\$Head & VAL\(SUBJ\[$LeftSynsem] & COMPS\[] &

SPR\[]))), $Mother = (SYNSEM\LOCAL\CAT\(HEAD\$Head & VAL\(SUBJ\[] & COMPS\[] & SPR\

[]))).

HEAD: noun HEAD: verbHe considered ...

HEAD: verbSUBJ: <HEAD: noun>

HEAD: verbSUBJ: <>

considered ...

HEAD: nounSUBJ: <>

HEAD: verb

He

structure-sharing

ESSLLI 2006


Resources259

Principle application

NL




the offermaking

HEAD advMOD

officially

1HEAD verbSUBJ < >COMPS < >

1


12


12

3 3

is


1


14

4

2

NL


subject-head



the offermaking

HEAD adv

HEAD verbSUBJ < >1

HEAD verb

HEAD verbSUBJ < >1

HEAD verb

is officially

HEAD verb

head-comp

head-mod head-comp

ESSLLI 2006


Resources260

Complicated example

NP

we were

VP

the prices

NP

S

SBAR

WHNP-1head

head

head

head

arg

arg

arg0

charged

NP

VP

*-2 *T*-1

arg

argarghead

prices




chargedwere

we

2HEAD verbSUBJ < >COMPS < >REL < >



3


34

4

3

2

HEAD verbSUBJ < >COMPS < >SLASH < >2

3

2

2

1

1

HEAD detSUBJ < >COMPS < >

the

1


ESSLLI 2006


Resources261

Lexicon extraction

• Collecting leaf nodes of HPSG parse trees• Generalizing leaf nodes into lexical entry

templates• Applying inverse lexical rules• Assigning predicate argument structures

ESSLLI 2006


Resources262

Overview

Collection of leaf nodes &generalization

Application ofinverse lexicalrules

Assignment ofpredicateargumentstructures

HEAD verbSUBJ < >COMPS < >MOD


NL




the offermaking


1


1

is officially

1

12


12

3

3


14

4

2

HEAD verbSUBJ < HEAD noun >COMPS < HEAD noun >

making:

HEAD verbSUBJ < HEAD noun >COMPS < HEAD noun >

make:make:

HEAD verb

HEAD nounCONT 2COMPS < >

HEAD nounCONT 1

SUBJ < >

CONTmake’ARG1ARG2 2

1

ESSLLI 2006


Resources263

Collecting leaf nodes

• Leaf nodes of HPSG parse trees are instances of lexical entries

NL




the offermaking

HEAD advMOD

officially

1HEAD verbSUBJ < >COMPS < >

1


12


12

3 3

is


1


14

4

2

ESSLLI 2006


Resources264

Generalization into lexical entry templates

• Unnecessary constraints are removed (restriction)

HEAD: verb

SUBJ: <HEAD: >nounPOSTHEAD: minus

HEAD: verbSUBJ: <HEAD: noun>

A leaf node ofthe HPSG treebank

Lexical entry template

lexical_entry_template($WordInfo, $Sign, $Template) :- copy($Sign, $Template), $Template = (SYNSEM\LOCAL\(CAT\HEAD\$Head & VAL\(SUBJ\$Subj & COMPS\$Comps & SPR\$SPR))), ... restriction($SubjSynsem, [NONLOCAL\]), restriction($SubjSynsem, [LOCAL\, CAT\, HEAD\, POSTHEAD\]), restriction($SubjSynsem, [LOCAL\, CAT\, HEAD\, AUX\]), restriction($SubjSynsem, [LOCAL\, CAT\, HEAD\, TENSE\]), ...

ESSLLI 2006


Resources265

Application of inverse lexical rules

• Converting lexical entries of inflected words into lexical entries of lexemes using inverse lexical rules

• Derivational rules: Ex. passive rule

• Inflectional rules: Ex. past-tense rule

HEAD: verbSUBJ: <HEAD: noun>COMPS: <HEAD: prep_by>

HEAD: verbSUBJ: <HEAD: noun>COMPS: <HEAD: noun>

HEAD:verbVFORM: finiteTENSE: past

HEAD:verbVFORM: base

ESSLLI 2006


Resources266

Predicate argument structures

• Create mappings from syntactic arguments into semantic arguments

COMPS < >

SUBJ < >

HEAD verb

make’ARG1ARG2

CAT|HEAD nounCONT 1

CONT 12

VALCAT|HEAD nounCONT 2

CAT

Ex. lexical entry for “make”

ESSLLI 2006


Resources267

ESSLLI 2006


Resources268

ESSLLI 2006


Resources269

ESSLLI 2006


Resources270

Probabilistic models

• Feature forest model– A solution to the problem of the probabilistic

modeling of feature structures

• Design of features– How to represent preferences of HPSG parse trees

ESSLLI 2006


Resources271

Example: PCFG

S

NP VP

She dances

0.30.3 0.2 0.2

S

NP VP

I dance

S

NP VP

She danced

S

NP VP

I danced

0.150.15 0.2 0.2Estimated prob.

S → NP VPNP → SheNP → I

VP → dancesVP → danceVP → danced

CFG rule probabilities1.00.50.5

0.30.30.4

Observed freq.

Training data

ESSLLI 2006


Resources272

What is the problem?

• PCFG assigns probabilities to ungrammatical structures– “She dance” (0.15), “I dances” (0.15)

S

NP VP

She dances

0.30.3 0.2 0.2

S

NP VP

I dance

S

NP VP

She danced

S

NP VP

I danced


Observed freq.

Training data

ESSLLI 2006


Resources273

Feature structure constraints

• In HPSG, feature structures explain grammatical constraints

• “She dance” “I dances” are never generated• However, constraints of feature structures

violate “independence assumption” of probabilistic models (Abney 1997)

S → NPAGR 1 VPAGR 1

NPAGR:3sg → SheNPAGR:no3sg → I

VPAGR:3sg → dancesVPAGR:no3sg → danceVP → danced

How can we estimate probabilities in this situation?

ESSLLI 2006


Resources274

Solution: ME model

• Probabilities of parse trees are estimated by maximum entropy models (Berger et al. 1996)

• Probability p(T) of parse tree T

• Optimal parameters are computed so as to maximize the likelihood of training data

iii Tf

ZTp )(exp

1)(

feature functionfeature function

parameter(feature weight)

parameter(feature weight)normalization factornormalization factor

ESSLLI 2006


Resources275

ME model of parse trees

• If feature functions correspond to CFG rules, this model is an extension of PCFG model

• Probabilities of parse tress are estimated without independence assumption

1f

)(3

)(2

)(1

332211

3211

)()()(exp1

)(

TfTfTf

Z

TfTfTfZ

Tp

S

NP

She2f

VP

dances3f

ESSLLI 2006


Resources276

Estimation by a ME model

S

NP VP

She dances0.30.3 0.2 0.2

S

NP VP

I dance

S

NP VP

She danced

S

NP VP

I danced


Observed freq.

Training data

S → NPAGR 1 VPAGR 1 　 1.0NPAGR:3sg → She 1.0NPAGR:no3sg → I 1.0

VPAGR:3sg → dances 1.145VPAGR:no3sg → dance 1.145VP → danced 0.763

ME parameters

1.1451.145 0.763 0.763i

ii f )exp(

)exp( i

ESSLLI 2006


Resources277

Combinatorial explosion of parse trees

• Exponentially many parse trees are assigned to sentences (i.e., a set of T is exponential)

S

NP1 VP1

By expanding...S

NP1 VP1

S

NP2 VP1

S

NP1 VP2

S

NP2 VP2

Size: nm

VP2NP2

n m

ESSLLI 2006


Resources278

Problems by combinatorial explosion

• Parameter estimation is intractable– Computation of

– Computation of

• Searching for the most probable parse is intractable– Computation of

T

ii TpTffE )()()(

T iii TfZ )(exp

)(maxargˆ TpTT

ESSLLI 2006


Resources279

Solutions in HMM and PCFG

• Probabilistic models are divided into independent probabilities, and dynamic programming is applied– Forward-backward probability– Baum-Welch algorithm– Inside-outside probability– Viterbi search

• Inside/outside probabilities can be computed at a cost proportional to the number of nodes, assuming a forest structure of parse trees

ESSLLI 2006


Resources280

Feature forest model

• Dynamic programming can also be applied to maximum entropy estimation

• Feature forest:– Forest structure

isomorphic to CFG parse forest

– Assign feature functions to nodesrather than symbols

• A ME model isestimated without unpacking feature forests

f(S)

f(NP1) f(VP1)

Size: n+m

feature forest

f(NP2) f(VP2)

ESSLLI 2006


Resources281

Feature forest representation of a parse tree

• A feature forest represents exponentially many trees of features

f(S)

f(NP1) f(VP1)

Size: n+m

feature forest representation

S

NP1 VP1 VP2NP2

n mf(NP2) f(VP2)

ESSLLI 2006


Resources282


Outside TO(NP1)Outside TO(NP1)

Inside TI(NP1)Inside TI(NP1)

• Focus on a set of trees below/above the targeted node

• Inside trees TI(n):Trees below n

• Outside trees TO(n):Trees above n

Inside/outside trees of a feature forest

f(S)

f(NP1) f(VP1)f(NP2) f(VP2)

ESSLLI 2006


Resources283

Estimation algorithms for ME models

• Estimation of parameters requires computation of model expectations (Malouf 2002)

xi

Dkxki

iii

xpxfxfD

fEfEG

)()()(||

1

)()(~

)(λ

Objective function

Dkx ikii Zxf

DG log)(

||

1)( λ

Gradient

Computed from training data

Recomputed at each iteration

ESSLLI 2006


Resources284

Inside/outside products

• Unnormalized product

• Inside product

• Outside product

)(

))(exp1NP

11NP (NPOT i

Oii Tf

)(

))(exp1NP

11NP (NPIT i

Iii Tf


f(S)


iii xfxq )(exp)(

ESSLLI 2006


Resources285

1N'1N

• The inside product of NP1 is a product of inside products of its daughters

Computation of inside products

iii

nn

nn

iii

f

f

)(

)(

)

(

},,{''

},,{

1

N'N'NN

1

N'NN'N

N'NN'NNP

NP

NP

2121

2212

21111


f(NP1) f(NP2)

f(N1) f(N2) f(N’1) f(N’2)

ESSLLI 2006


Resources286

1VP

• The outside products of NP1 is a product of the mother’s outside products and sister’s inside products

Computation of outside products

iii

nn

iii

f

f

)(

)(

},,{

S

S

2VP1VPS

2VPS1VPS1NP


f(S)


S

ESSLLI 2006


Resources287

Computation of model expectations

• Sum of unnormalized products of trees including NP1

• Expectation of fi at NP1

11

1

NPNPNP

includesTT

Tq:

)(

1NP1NP11NP NP Z

ffE ii

1)()(


f(S)


1NP

1NP

ESSLLI 2006


Resources288

Viterbi search

• Almost the same as the computation of inside products– “max” rather than

“sum”

)( 1N'q)( 1Nq


f(NP1) f(NP2)

f(N1) f(N2) f(N’1) f(N’2)

iii

n

n

f

nq

nqq

)(

)'(max

)(max)(max

},,{'

},,{

1

2N'1N'

2N1N1

NP

NP

ESSLLI 2006


Resources289

Design of features

• Feature engineering is important for higher accuracy

• Feature functions are designed for capturing syntactic/semantic preferences of HPSG parse trees

ESSLLI 2006


Resources290

A chart for HPSG parsing

he saw a girl with a telescope

HEAD nounSUBCAT <>HEAD prep

MOD NPSUBCAT <NP>

HEAD prepMOD NPSUBCAT <>

HEAD nounSUBCAT <>

HEAD verbSUBCAT <NP,NP>

HEAD nounSUBCAT <>

HEAD nounSUBCAT <>

HEAD prepMOD VPSUBCAT <NP>

HEAD prepMOD VPSUBCAT <>HEAD verb

SUBCAT <NP>

HEAD verbSUBCAT <>

HEAD verbSUBCAT <NP>

Equivalent signs are packed

ESSLLI 2006


Resources291

Feature forest representation of a chart

• Node= each rule application

HEAD prepMOD NPSUBCAT <>

HEAD nounSUBCAT <>


HEAD nounSUBCAT <>


HEAD prepMOD VPSUBCAT <>

HEAD verbSUBCAT <>


HEAD nounSUBCAT <>




HEAD nounSUBCAT <>


HEAD nounSUBCAT <>

HEAD nounSUBCAT <>

he

ESSLLI 2006


Resources292

Feature forest representation of predicate argument structures

• Node = already-determined predicate argument relations

fact

ARG1wantARG1

4ARG2 dispute1I

fact

wantARG1

ARG2dispute2

I

ARG1

4

ARG2 3

3

wantARG1

ARG2dispute1

I

ARG1

1

1 2

wantARG1

ARG2dispute2

I

ARG1

2

ARG2

She ignored the fact that I wanted to dispute

ESSLLI 2006


Resources293

Extraction of probabilistic events

extract_binary_event("hpsg-forest", "bin", $RuleName, $LDtr, $RDtr, _, _,

$Event) :- $Event = [$RuleName, $Dist, $Depth|$HDtrFeatures]) :- find_head($Rule, $LSign, $RSign, $Head, $NonHead), rule_name_mapping($Rule, $Head, $NonHead, $RuleName), encode_distance($LSign, $RSign, $Dist), encode_depth($LSign, $RSign, $Depth), encode_sign($Head, $HDtrFeatures, $NDtrFeatures), encode_sign($NonHead, $NDtrFeatures, []).

<subj-head, 2, 1, VP, ran, VBD, V_intrans-past, 2, NP, boys, NNS, N_plural, 2>

S

NP VP

ADVPnever

Cool ranboys

NTS POS

word

lexical entry

depthdistance

schemaspan

ESSLLI 2006


Resources294

Atomic features

• RULE: name of applied rule• DIST: distance between head words• COMMA: whether the phrase includes commas• SPAN: number of words the phrase dominates• SYM: nonterminal symbol (e.g. S, VP, …)• WORD: head word• POS: part-of-speech• LE: lexical entry• ARG: argument label (ARG1, ARG2, ...)

ESSLLI 2006


Resources295

Example: syntactic features

• Feature for the Head-Modifier construction for “saw a girl” and “with a telescope”

prep-mod-vpwith,IN,PP,3,

,transitiveVBD,saw,VP,3,

,0,modifier,3-head

LE,POS,WORD,SYM,SPAN

LE,POS,WORD,SYM,SPAN

COMMA,DIST,RULE,

rrrrr

lllllf

he saw a girl with a telescope

HEAD nounSUBCAT <>


HEAD nounSUBCAT <>

HEAD nounSUBCAT <>




HEAD verbSUBCAT <>


ESSLLI 2006


Resources296

Example: semantic features

• Feature for the predicate argument relation between “he” and “saw”

pronounPRPhe

transitiveVBDsaw

1ARG1

LEPOSWORD

LEPOSWORD

DISTARG

,,

,,,

,,

,,

,,,

,,

nnn

hhhpaf

girl

saw

heARG1

ARG2

ESSLLI 2006


Resources297

Feature generation

• Features are generated by abstracting descriptions of probabilistic events

feature_mask("hpsg-forest", "bin", [1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0]).feature_mask("hpsg-forest", "bin", [1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0]).feature_mask("hpsg-forest", "bin", [1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0]).

<subj-head, 2, _, _, ran, VBD, V_intrans-past, _, _, boys, NNS, N_plural, _>

<subj-head, 2, 1, VP, ran, VBD, V_intrans-past, 2, NP, boys, NNS, N_plural, 2>

ESSLLI 2006


Resources298

Parsing

• Efficient processing of feature structures (details omitted)– Abstract machines, quick check, CFG filtering, etc.

• Efficient search with probabilistic HPSG– Beam thresholding– Iterative beam thresholding

ESSLLI 2006


Resources299

Beam thresholding

• Thresholding out edges in each cell of the chart– Thresholding by number: for each cell, keep only

the best n edges– Thresholding by width: keep only the edges

whose FOM is greater than w, where w is the difference from the best FOM in the same cell

ESSLLI 2006


Resources300

Effect of beam thresholding

• Precision and recall by changing parameters of beam search

• Recall drops, while precision retains

ESSLLI 2006


Resources301

Iterative beam thresholding

• Start with a narrow beam width• Continue widening a beam width until

parsing succeeds

Iterative_parse(sentence) { w := beam_width_start; while(w < beam_width_end) { parse(sentence, w); if(parse succeeds) return; w := w + beam_width_step;}

ESSLLI 2006


Resources302

Efficacy of iterative beam thresholding

• Evaluated on Penn Treebank Section 24 (< 15 words)

Precision Recall F-score Avg. time (ms)

Viterbi 88.2% 87.9% 88.1% 103923

Beam 89.0% 82.4% 85.5% 88

Iterative 87.6% 87.2% 87.4% 99

ESSLLI 2006


Resources303

Distribution of parsing time

• Black: Viterbi, Red: iterative beam thresholding

1

10

100

1000

10000

100000

1000000

10000000

100000000

0 5 10 15

Sentence length (words)

Par

sing

tim

e (m

s)

ESSLLI 2006


Resources304

Evaluation

• Evaluation of the lexical entries extracted from Penn Treebank– Investigation of obtained lexical entries– Coverage

• Evaluation of the disambiguation model– Parsing accuracy

ESSLLI 2006


Resources305

Experimental settings

• Training data: Sections 2-21 of Penn Treebank II (39,832 sentences)

• Test data:– Development set: Section 22 (1,700 sentences)– Final test set: Section 23 (2,416 sentences)

ESSLLI 2006


Resources306

Number of tree conversion rules

Target of conversion Number

Penn-II errors 102

Category mapping 85

Head annotation and binarization 63

Difference of phrase structures 15

Predicate argument structures 13

Long distance dependencies 13

Others 52

Total 343

ESSLLI 2006


Resources307

Result of treebank conversion & lexicon extraction

• Treebank conversion and HPSG annotation succeeded for 37,886 sentences

• Extracted lexicon:

# words 34,765

# lexical entries 1,942

Average # lexical entries/word 1.43

ESSLLI 2006


Resources308

Sources of treebank conversion failures

• Classification of failures of treebank conversion in Section 02 (67 failures/1989 sentences)

Shortcomings of tree conversion rules 18

Errors in Penn Treebank 16

Constructions currently unsupported 20

Constructions unsupported by HPSG 13

ESSLLI 2006


Resources309

Breakdown of extracted lexical entries

# words# lexical entries

Avg. # lex. entries

noun 21,925 186 1.14

verb 4,094 945 1.94

adjective 8,078 62 1.28

adverb 1,295 72 2.75

preposition 159 193 9.17

particle 58 10 1.69

determiner 36 33 3.86

conjunction 94 321 9.46

punctuation 15 120 22.00

Total 34,765 1,942 1.43

ESSLLI 2006


Resources310

Example lexical entries

HEADnounMOD <>

VAL

SPR < HEAD det >SUBJ <>COMPS <>

Common nounEx. review/NNappeared 140,805 times

HEADverbMOD <>VFORM base

VALSPR <>SUBJ <HEAD noun>COMPS <HEAD noun>

Transitive verbappeared 12,244 times

HEAD

adjMOD <HEAD noun>POSTHEAD －

VALSPR <>SUBJ <>COMPS <>

Pre-head adjectiveappeared 55,049 times

ESSLLI 2006


Resources311

Evaluation of coverage

• The ratio of lexical entries in the test data covered by the grammar is measured

• A sentence is covered when all of the lexical entries in the sentence are covered (strong coverage)

Lexical entry

Sentence

w/o unknown word handling

96.52% 54.7%

w/ unknown word handling 99.15% 84.8%

ESSLLI 2006


Resources312

Treebank size vs. coverage

ESSLLI 2006


Resources313

Sentence length vs. coverage

ESSLLI 2006


Resources314

Error analysis

• Classification of randomly selected uncovered lexical entries

Errors of Penn Treebank 10

Errors of treebank conversion 48

Lack of lexical entries 23

Constructions currently unsupported 9

Idioms 6

Non-linguistic expressions (ex. list) 4

ESSLLI 2006


Resources315

Examples of uncovered lexical entries

• Lack of mappings from words into lexical entries because of data sparseness– Post-noun adjectives (younger, crucial)– Coordination conjunctions of NP and S’– Verbs taking present-participle as a complement

• Unsupported constructions– Free relatives, extrapositions

• Incorrect lexical entries obtained because of idiomatic expressions– (ADVP in part) because …

ESSLLI 2006


Resources316

Evaluation of parsing accuracy

• Empirical evaluation of the probabilistic models– Overall accuracy– Treebank size vs. accuracy – Sentence length vs. accuracy– Contribution of features– Coverage and accuracy– Error analysis

• Measure: precision/recall of<predicate word, argument position, argument word, predicate type>

– e.g.) <saw, ARG1, he, transitive> girlsaw

heARG1

ARG2

ESSLLI 2006


Resources317

Effect of feature forest models

• Accuracy for Section 23 (< 40 words)

Precision Recall

baseline 78.10 77.39

with syntactic features 86.92 86.28

with semantic features 84.29 83.74

with all features 86.54 86.02

ESSLLI 2006


Resources318

Treebank size vs. accuracy

0

20

40

60

80

100

0 10000 20000 30000 40000

# sentences

Pre

cisi

on

/reca

ll (

%)

ESSLLI 2006


Resources319

Sentence length vs. accuracy

0

20

40

60

80

100

0 20 40 60

Sentence length

Coverage (%)

Sentencecoverage

ESSLLI 2006


Resources320

Contribution of features (1/2)

precision recall # features

All 87.12 85.45 623,173

－ RULE 86.98 85.37 620,511

－ DIST 86.74 85.09 603,748

－ COMMA 86.55 84.77 608,117

－ SPAN 86.53 84.98 583,638

－ SYM 86.90 85.47 614,975

－ WORD 86.67 84.98 116,044

－ POS 86.36 84.71 430,876

－ LE 87.03 85.37 412,290

None 78.22 76.46 24,847

ESSLLI 2006


Resources321

Contribution of features (2/2)

precision recall # features

All 87.12 85.45 623,173

－ DIST,SPAN 85.54 84.02 294,971

－ DIST,SPAN,COMMA 83.94 82.44 286,489

－ RULE,DIST, SPAN,COMMA

83.61 81.98 283,897

－ WORD,LE 86.48 84.91 50,258

－ WORD,POS 85.56 83.94 64,915

－ WORD,POS,LE 84.89 83.43 33,740

－ SYM,WORD, POS,LE

82.81 81.48 26,761

None 78.22 76.46 24,847

ESSLLI 2006


Resources322

Coverage and accuracy

• Accuracies for strongly covered/uncovered sentences

• We can expect accuracy improvements by improving grammar coverage

Precision

Recall# sentences

Covered sentences

89.36 88.96 1,825

Uncovered sentences

75.57 74.04 319

ESSLLI 2006


Resources323

Error analysis

• Classification of errors in randomly selected sentences (100 sentences)

PP-attachment ambiguity 76

Distinction of arguments/modifiers 49

Ambiguity of lexical entries 44

Errors in test data 22

Ambiguity of commas 32

Others 75

ESSLLI 2006


Resources324

Examples of errors (1/2)

• Antecedent of a relative clause– It's made only in years when the grapes ripen perfectly (the

last was 1979) and comes from a single acre of [NP grapes [S' that yielded a mere 75 cases in 1987]].

• Argument/modifier distinction of to-phrases– More than a few CEOs say the red-carpet treatment tempts

them [VP-modifier to return to a heartland city for future meetings].

ESSLLI 2006


Resources325

Examples of errors (2/2)

• Preposition or verb phrase?– Mitsui Mining & Smelting Co. posted a 62 % rise in pretax

profit to 5.276 billion yen ($ 36.9 million) in its fiscal first half ended Sept. 30 [VP compared with 3.253 billion yen a year earlier].

• Selection of subcategorization frames– [NP-subject ``Nasty innuendoes,''] [VP says [NP-object

John Siegal, Mr. Dinkins's issues director, ``designed to prosecute a case of political corruption that simply doesn't exist.'']]

ESSLLI 2006


Resources326

Advanced topics

• Domain adaptation– Adapting the grammar and/or the disambiguation

model to a new domain using a small amount of training data

• Generation– Using the grammar for sentence generation

• Semantics construction– Obtaining representations of formal semantics

from HPSG parsing

• Applications

ESSLLI 2006


Resources327

Domain adaptation (1/2)

• Disambiguation models are adapted to a bio domain using small training data– An original probabilistic model is incorporated into

a new model as a reference distribution– Parameters of the new model are estimated so as

to maximize the likelihood of the new training data

iiiorignew xgxp

Zxp exp)(

1

Reference distribution

ESSLLI 2006


Resources328

Domain adaptation (2/2)

• Evaluation with a bio-domain corpus• Training data:

– Penn Treebank (News): 39,832 sentences– GENIA Treebank (Bio): 3,524 sentences

Precision Recall

News domain 87.69% 87.16%

Bio domain(w/o

adaptation)85.50% 83.91%

Bio domain 87.19% 85.58%

ESSLLI 2006


Resources329

Generation (1/2)

• The methods for HPSG parsing are applied to a chart generator of HPSG– Feature forest model– Iterative beam thresholding

he(x) buy(e) the(y) book(z) past(e)

{3}{2}{1}{0}

{0,3}{0,2} {2,3}{1,3}{1,2}

{1,2,3}{0,2,3}{0,1,3}{0,1,2}

{0,1,2,3}

0 1 2 3chart generation

He bought the book.

3210

0-3

1-30-2

2-31-20-1

chart parsing

0 1 2 3

{0,1}

ESSLLI 2006


Resources330

Generation (2/2)

• Evaluation on Penn Treebank Section 23

Beam width

Coverage (%)

Avg. generation

time (msec.)BLEU

Beam thresholding

4 44.76 621 0.8196

8 67.70 1776 0.8294

12 73.12 3074 0.8327

16 72.90 4287 0.8341

20 71.81 5273 0.8333

Iterative beam thresholding

8-20 82.47 1668 0.7982

ESSLLI 2006


Resources331

• Mapping from HPSG parse trees into semantic representations of typed dynamic logic (TDL)– Typed dynamic logic: a variant of dynamic

semantics that includes plural semantics, event semantics, and situation semantics (Bekki, 2005)

– Completely compositional semantics: lambda calculus composes semantic representations of phrases from lexical representations

Semantics construction (1/2)

Few boys fell. They died.

few(x)[boy’x][fall’x] ref(x)[die’x]Λ

ESSLLI 2006


Resources332

• Approach:– Mapping HPSG lexical entries into lexical

representations of TDL– Semantic representations of phrases are

composed along HPSG parse trees

• Coverage: around 90% of Penn Treebank Section 23 are assigned well-formed semantic representations

Semantics construction (2/2)

yethemexeagentelove

yobj

x

sbjsbjobj

,',''

.

.

..

PHON “loves”HEAD verbSUBJ <HEAD noun>COMPS <HEAD noun>

ESSLLI 2006


Resources333

Applications: information extraction

• Extraction of protein-protein interactions from biomedical paper abstracts– Patterns on predicate argument structures are

learned from small annotated data– Precision/recall: 71.8%/48.4%

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1Recall

Pre

cisi

on

(Yakushiji 2005)(Ramani et al., 2005)

ESSLLI 2006


Resources334

Applications: text retrieval

• Retrieval of relational concepts– All sentences in MEDLINE are parsed into

predicate argument structures– Relational concepts, such as “what causes

cancer”, are retrieved by matching with predicate argument structures

– Precision/recall: 60-96%/30-50%

ESSLLI 2006


Resources335

Summary

• Conversion of Penn Treebank II into an HPSG treebank– Pattern-based tree conversion and principle application

• Extraction of lexical entries from the HPSG treebank– Generalization, application of inverse lexical rules, and

assignment of predicate argument structures

• Probabilistic modeling of feature structures– Feature forest model

• Techniques for efficient parsing with probabilistic HPSG– Iterative beam thresholding

• Evaluation– Coverage and parsing accuracy

• Advanced topics– Domain adaptation, sentence generation, semantics

construction, and practical applications

ESSLLI 2006


Resources336

Publications

• Corpus-oriented development of HPSG– Y. Miyao, T. Ninomiya, and J. Tsujii. (2003). Lexicalized Grammar

Acquisition. In Proc. 10th EACL Companion Volume.– Y. Miyao, T. Ninomiya, and J. Tsujii. (2004) Corpus-oriented

grammar development for acquiring a Head-Driven Phrase Structure Grammar from the Penn Treebank. In Proc. IJCNLP 2004.

– H. Nakanishi, Y. Miyao, and J. Tsujii. (2004). Using Inverse Lexical Rules to Acquire a Wide-coverage Lexicalized Grammar. In the IJCNLP 2004 Workshop on “Beyond Shallow Analyses.”

– H. Nakanishi, Y. Miyao and J. Tsujii. (2004). An Empirical Investigation of the Effect of Lexical Rules on Parsing with a Treebank Grammar. In Proc. TLT 2004.

– K. Yoshida. (2005). Corpus-Oriented Development of Japanese HPSG Parsers. In 43rd ACL Student Research Workshop.

ESSLLI 2006


Resources337

Publications

• Feature forest model– Y. Miyao and J. Tsujii. (2002) Maximum entropy estimation

for feature forests. In Proc. HLT 2002.

• Probabilistic models for HPSG– Y. Miyao and J. Tsujii. (2003). A model of syntactic

disambiguation based on lexicalized grammars. In Proc. 7th CoNLL.

– Y. Miyao, T. Ninomiya and J. Tsujii. (2003). Probabilistic modeling of argument structures including non-local dependencies. In Proc. RANLP 2003.

– Y. Miyao, and J. Tsujii. (2005). Probabilistic disambiguation models for wide-coverage HPSG parsing. In Proc. ACL 2005.

– T. Ninomiya, T. Matsuzaki, Y. Tsuruoka, Y. Miyao, and J. Tsujii. (2006). Extremely Lexicalized Models for Accurate and Fast HPSG Parsing. In Proc. EMNLP 2006.

ESSLLI 2006


Resources338

Publications

• Parsing strategies for probabilistic HPSG– Y. Tsuruoka, Y. Miyao and J. Tsujii. (2004). Towards efficient

probabilistic HPSG parsing: integrating semantic and syntactic preference to guide the parsing. In the IJCNLP-04 Workshop on “Beyond shallow analyses.”

– T. Ninomiya, Y. Tsuruoka, Y. Miyao, and J. Tsujii. (2005). Efficacy of Beam Thresholding, Unification Filtering and Hybrid Parsing in Probabilistic HPSG Parsing. In Proc. IWPT 2005.

– T. Ninomiya, Y. Tsuruoka, Y. Miyao, K. Taura, and J. Tsujii. (2006). Fast and Scalable HPSG Parsing. Traitement automatique des langues (TAL). 46(2).

• Domain adaptation– T. Hara, Y. Miyao, and J. Tsujii. (2005). Adapting a

probabilistic disambiguation model of an HPSG parser to a new domain. In Proc. IJCNLP 2005.

ESSLLI 2006


Resources339

Publications

• Generation– H. Nakanishi, Y. Miyao, and J. Tsujii. (2005). Probabilistic models

for disambiguation of an HPSG-based chart generator. In Proc. IWPT 2005.

• Semantics construction– M. Sato, D. Bekki, Y. Miyao, and J. Tsujii. (2006). Translating

HPSG-style Outputs of a Robust Parser into Typed Dynamic Logic. In Proc. COLING-ACL 2006 Poster Session.

• Applications– Y. Miyao, T. Ohta, K. Masuda, Y. Tsuruoka, K. Yoshida, T.

Ninomiya, and J. Tsujii. (2006). Semantic Retrieval for the Accurate Identification of Relational Concepts. In Proc. COLING-ACL 2006.

– A. Yakushiji, Y. Miyao, T. Ohta, Y. Tateisi, and J. Tsujii. (2006). Automatic Construction of Predicate-Argument Structure Patterns for Biomedical Information Extraction. In EMNLP 2006 Poster Session.

ESSLLI 2006


Resources340

Comparing LFG, CCG, HPSG and TAG Acquisition

ESSLLI 2006


Resources341

Comparing LFG, CCG, HPSG and TAG Acquisition

ESSLLI 2006


Resources342

Demos

ESSLLI 2006


Resources343

Demos

ESSLLI 2006


Resources344

Future Work & Discussion

ESSLLI 2006


Resources345

Future Work & Discussion

ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources1 Advanced Course: Treebank-Based Acquisition of LFG, HPSG and CCG Resources Josef.

Documents

deep constraintbased

deep grammar development

deep grammarsshallow

handcrafted grammars

abovedeep grammars

university of edinburghesslli

university of pennsylvania

university of malaga