Top Banner
A Multi-Strategy Approach to A Multi-Strategy Approach to Parsing of Grammatical Relations Parsing of Grammatical Relations in Child Language Transcripts in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon University Thesis Committee: Alon Lavie, co-chair Brian MacWhinney, co-chair Lori Levin Jaime Carbonell John Carroll, University of Sussex
101

A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

Jan 12, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

A Multi-Strategy Approach to A Multi-Strategy Approach to Parsing of Grammatical Relations Parsing of Grammatical Relations

in Child Language Transcriptsin Child Language Transcripts

Kenji SagaeLanguage Technologies Institute

Carnegie Mellon University

Thesis Committee:

Alon Lavie, co-chair

Brian MacWhinney, co-chair

Lori Levin

Jaime Carbonell

John Carroll, University of Sussex

Page 2: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

2

Natural Language Parsing:Natural Language Parsing:Sentence Sentence → Syntactic Structure→ Syntactic Structure

• One of the core problems in NLP

Input: The boy ate the cheese sandwich

Output:

(S (NP (Det The) (N boy))

(VP (V ate) (NP (Det the) (N cheese) (N sandwich))))

(ROOT (predicate eat) (surface ate) (tense past) (category V) (SUBJ (category N) (agreement 3s)

(surface boy) (DET (surface the)

(category Det)))(OBJ (category N) (definite +) (DET (surface the)

(category Det)) (predicate sandwich) (surface sandwich) (MOD (category N)

(surface cheese) (predicate cheese))))

((1 2 The DET) (2 3 boy SUBJ) (3 0 ate ROOT) (4 6 the DET) (5 6 cheese MOD) (6 3 sandwich OBJ))

Grammatical Relations (GRs)• Subject, object, adjunct, etc.

Page 3: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

3

Using Natural Language ProcessingUsing Natural Language Processingin Child Language Researchin Child Language Research

• CHILDES Database (MacWhinney, 2000)– 200 megabytes of child-parent dialog transcripts– Part-of-speech and morphology analysis

• Tools available• Not enough for many research questions

– No syntactic analysis

• Can we use NLP to analyze CHILDES transcripts?– Parsing– Many decisions: representation, approach, etc.

Page 4: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

4

Parsing CHILDES: Parsing CHILDES: Specific and General MotivationSpecific and General Motivation

• Specific task: automatic analysis of syntax in CHILDES corpora– Theoretical importance (study of child language

development)– practical importance (measurement of syntactic

competence)

• In general: Develop techniques for syntactic analysis, advance parsing technologies– Can we develop new techniques that perform better

than current approaches?• Rule-based• Data-driven

Page 5: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

5

Research ObjectivesResearch Objectives

• Identify a suitable syntactic representation for CHILDES transcripts– Must address the needs of child language research

• Develop a high accuracy approach for syntactic analysis of spoken language transcripts– parents and children at different stages of language

acquisition

• The plan: a multi-strategy approach– ML: ensemble methods– Parsing: several approaches possible, but

combination is an underdeveloped area

Page 6: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

6

Research ObjectivesResearch Objectives

• Develop methods for combining analyses from different parsers and obtain improved accuracy– Combining rule-based and data-driven approaches

• Evaluate the accuracy of developed systems

• Validate the utility of the resulting systems to the child language community– Task-based evaluation: Automatic measurement of

grammatical complexity in child language

Page 7: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

7

Overview of the Multi-Strategy ApproachOverview of the Multi-Strategy Approachfor Syntactic Analysisfor Syntactic Analysis

Transcripts

Parser A

Parser B

Parser C

Parser D

Parser E

ParserCombination

SYNTACTICSTRUCTURES

Page 8: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

8

Thesis StatementThesis Statement

• The development of a novel multi-strategy approach for syntactic parsing allows for identification of Grammatical Relations in transcripts of parent-child dialogs at a higher level of accuracy than previously possible

• Through the combination of different NLP techniques (rule-based or data-driven), the multi-strategy approach can outperform each strategy in isolation, and produce significantly improved accuracy

• The resulting syntactic analysis are at a level of accuracy that makes them useful to child language research

Page 9: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

9

OutlineOutline

• The CHILDES GR scheme

• GR Parsing of CHILDES transcripts

• Combining different strategies

• Automated measurement of syntactic development in child language

• Related work

• Conclusion

Page 10: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

10

CHILDES GR SchemeCHILDES GR Scheme(Sagae, MacWhinney and Lavie, 2004)(Sagae, MacWhinney and Lavie, 2004)

• Grammatical Relations (GRs)– Subject, object, adjunct, etc.– Labeled dependencies

• Addresses needs of child language researchers– Informative and intuitive, basis for DSS and IPSyn

Dependent Head

Dependency Label

Page 11: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

11

CHILDES GR Scheme Includes Important CHILDES GR Scheme Includes Important GRs for Child Language StudyGRs for Child Language Study

Page 12: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

12

OutlineOutline

• The CHILDES GR scheme

• GR Parsing of CHILDES transcripts

• Combining different strategies

• Automated measurement of syntactic development in child language

• Related word

• Conclusion

• Evaluation• Data• Rule-based GR parsing• Data-driven GR parsing

Page 13: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

13

The Task:The Task:Sentence Sentence → GRs→ GRs

• Input: We eat the cheese sandwich

• Output:

Page 14: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

14

Evaluation of GR ParsingEvaluation of GR Parsing

• Dependency accuracy

• Precision/Recall of GRs

Page 15: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

15

Evaluation: Evaluation: Calculating Dependency AccuracyCalculating Dependency Accuracy

1 2 3 4 5

1 2 We SUBJ

2 0 eat ROOT

3 5 the DET4 5 cheese

MOD5 2 sandwich

SUBJ

Page 16: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

16

Evaluation: Evaluation: Calculating Dependency AccuracyCalculating Dependency Accuracy

1 2 3 4 5

1 2 We SUBJ

2 0 eat ROOT

3 5 the DET4 5 cheese

MOD5 2 sandwich

SUBJ

1 2 We SUBJ

2 0 eat ROOT

3 4 the DET4 2 cheese OBJ5 2 sandwich

PRED

Accuracy = number of correct dependenciestotal number of dependencies

= 2 / 5 = 0.40

40%

GOLD PARSED

Page 17: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

17

Evaluation:Evaluation:Precision and Recall of GRsPrecision and Recall of GRs

• Precision and recall are calculated separately for each GR type

• Calculated on aggregate counts over entire test corpus

• Example: SUBJ

Precision = # SUBJ matches between PARSED and GOLD Total number of SUBJs in PARSED

Recall = # SUBJ matches between PARSED and GOLD Total # of SUBJs in GOLD

F-score = 2 ( Precision × Recall ) Precision + Recall

Page 18: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

18

Evaluation:Evaluation:Precision and Recall of GRsPrecision and Recall of GRs

Precision = # SUBJ matches between PARSED and GOLD

Total number of SUBJs in PARSED

= 1 / 2 = 50%

1 2 We SUBJ

2 0 eat ROOT

3 5 the DET4 5 cheese

MOD5 2 sandwich OBJ

1 2 We SUBJ

2 0 eat ROOT

3 4 the DET4 2 cheese OBJ5 2 sandwich

SUBJ

GOLD PARSED

Recall = # SUBJ matches between PARSED and GOLD Total # of SUBJs in GOLD

= 1 / 1 = 100%

F-score = 2 ( Precision × Recall ) Precision + Recall

= 2(50×100) / (50+100) = 66.67

Page 19: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

19

OutlineOutline

• The CHILDES GR scheme

• GR Parsing of CHILDES transcripts

• Combining different strategies

• Automated measurement of syntactic development in child language

• Evaluation• Data• Rule-based GR parsing• Data-driven GR parsing

Page 20: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

20

CHILDES Data: the Eve CorpusCHILDES Data: the Eve Corpus(Brown, 1973)(Brown, 1973)

• A corpus from CHILDES– Manually annotated with GRs

• Training: ~ 5,000 words (adult)

• Development: ~ 1,000 words– 600 adult, 400 child

• Test: ~ 2,000 words– 1,200 adult, 800 child

Page 21: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

21

Not All Child Utterances Have GRsNot All Child Utterances Have GRs

• Utterances in training and test sets are well-formed

I need tapioca in the bowl.

That’s a hat.

In a minute.

• What about

* Warm puppy happiness a blanket.

* There briefcase.

? I drinking milk.

? I want Fraser hat.

• Separate Eve-child test set (700 words)

Page 22: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

22

The WSJ Corpus (Penn Treebank)The WSJ Corpus (Penn Treebank)

• 1 million words • Widely used

– Sections 02-21: training– Section 22: development– Section 23: evaluation

• Large corpus with syntactic annotation– Out-of-domain

• Constituent structures– Convert to unlabeled dependencies using head-

percolation table

Page 23: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

23

OutlineOutline

• The CHILDES GR scheme

• GR Parsing of CHILDES transcripts

• Combining different strategies

• Automated measurement of syntactic development in child language

• Evaluation• Data• Rule-based GR parsing• Data-driven GR parsing

Page 24: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

24

Rule-Based ParsingRule-Based Parsing

• The parser’s knowledge is encoded in manually written rules– Grammar, lexicon, etc.

• Only analyses that fit the rules are possible

• Accurate in specific domains, difficult to achieve wide coverage in open domain– Coverage, ambiguity, domain knowledge

Page 25: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

25

Rule-Based Parsing of CHILDES dataRule-Based Parsing of CHILDES data(Sagae, Lavie & MacWhinney, 2001, 2004)(Sagae, Lavie & MacWhinney, 2001, 2004)

LCFlex (Rosé and Lavie, 2001) Rules: CFG backbone augmented with unification

constraints Manually written, 153 rules

Robustness Limited insertions:

[Do] [you] want to go outside?

Limited skipping:No um maybe later.

PCFG disambiguation model Trained on 2,000 words

Page 26: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

26

High Precision from a Small GrammarHigh Precision from a Small Grammar

• Eve test corpus– 2,000 words

• 31% of the words can be parsed• Accuracy (over all 2,000 words): 29%• Precision: 94%• High Precision, Low Recall

• Improve recall using parser’s robustness– Insertions, skipping– Multi-pass approach

Page 27: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

27

Robustness and Multi-Pass ParsingRobustness and Multi-Pass Parsing

• No insertions, no skipping31% parsed, 29% recall, 94% precision

• Insertion of NP and/or auxiliary38% parsed, 35% recall, 92% precision

• Skipping of 1 word52% parsed, 47% recall, 90% precision

• Skipping of 1 word, insertion of NP, aux63% parsed, 55% recall, 88% precision

Page 28: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

28

Use Robustness to Improve RecallUse Robustness to Improve Recall

0

10

20

30

40

50

60

70

80

90

100

none insert NP/aux skip 1 word insert/skip

precision

recall

f-score

Page 29: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

29

OutlineOutline

• The CHILDES GR scheme

• GR Parsing of CHILDES transcripts

• Combining different strategies

• Automated measurement of syntactic development in child language

• Evaluation• Data• Rule-based GR parsing• Data-driven GR parsing

Page 30: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

30

Data-driven ParsingData-driven Parsing

• Parser learns from a corpus of annotated examples

• Data-driven parsers are robust

• Two approaches– Existing statistical parser– Classifier-based parsing

Page 31: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

31

Accurate GR Parsing with Accurate GR Parsing with Existing Resources (Mostly)Existing Resources (Mostly)

• Large training corpus: Penn Treebank (Marcus et al., 1993)– Head-table converts constituents into dependencies

• Use an existing parser (trained on the Penn Treebank)– Charniak (2000)

• Convert output to unlabeled dependencies

• Use a classifier for dependency labeling

Page 32: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

32

Unlabeled Dependency IdentificationUnlabeled Dependency Identification

We eat the cheese sandwich

sandwich

eat

eat

Page 33: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

33

Domain IssuesDomain Issues

• Parser training data is in a very different domain– WSJ vs Parent-child dialogs

• Domain specific training data would likely be better

• Performance is acceptable– Shorter, simpler sentences– Unlabeled dependency accuracy

• WSJ test data: 92%• Eve test data: 90%

Page 34: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

34

Dependency LabelingDependency Labeling

• Training data is required– Eve training set (5,000 words)

• Labeling dependencies is easier than finding unlabeled dependencies

• Use a classifier– TiMBL (Daelemans et al., 2004)– Extract features from unlabeled dependency structure– GR labels are target classes

Page 35: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

35

Dependency LabelingDependency Labeling

Page 36: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

36

Features Used for GR LabelingFeatures Used for GR Labeling

• Head and dependent words– Also their POS tags

• Whether the dependent comes before or after the head

• How far the dependent is from the head

• The label of the lowest node in the constituent tree that includes both the head and dependent

Page 37: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

37

Features Used for GR LabelingFeatures Used for GR Labeling

Consider the words “we” and “eat”

Features: we, pro, eat, v, before, 1, S

Class: SUBJ

Page 38: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

38

Good GR Labeling Results Good GR Labeling Results with Small Training Setwith Small Training Set

• Eve training set– 5,000 words for training

• Eve test set– 2,000 words for testing

• Accuracy of dependency labeling (on perfect dependencies): 91.4%

• Overall accuracy (Charniak parser + dependency labeling): 86.9%

Page 39: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

39

Some GRs Are Easier Than OthersSome GRs Are Easier Than Others

• Overall accuracy: 86.9%

• Easily identifiable GRs– DET, POBJ, INF, NEG: Precision and recall above

98%

• Difficult GRs– COMP, XCOMP: below 65%

• I think that Mary saw a movie (COMP)• She tried to see a movie (XCOMP)

Page 40: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

40

Precision and Recall of Specific GRsPrecision and Recall of Specific GRs

GR Precision Recall F-score

SUBJ 0.94 0.93 0.93

OBJ 0.83 0.91 0.87

COORD 0.68 0.85 0.75

JCT 0.91 0.82 0.86

MOD 0.79 0.92 0.85

PRED 0.80 0.83 0.81

ROOT 0.91 0.92 0.91

COMP 0.60 0.50 0.54

XCOMP 0.58 0.64 0.61

Page 41: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

41

Parsing with Domain-Specific DataParsing with Domain-Specific Data

• Good results with a system based on the Charniak parser

• Why domain-specific data?– No Penn Treebank– Handle dependencies natively– Multi-strategy approach

Page 42: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

42

Classifier-Based ParsingClassifier-Based Parsing(Sagae & Lavie, 2005)(Sagae & Lavie, 2005)

• Deterministic parsing– Single path, no backtracking– Greedy– Linear run-time

• Simple shift-reduce algorithm– Single pass over the input string

• Variety: Left-to-right, right-to-left (order matters)

• Classifier makes parser decisions– Classifier not tied to parsing algorithm

• Variety: Different types of classifiers can be used

Page 43: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

43

A Simple, Fast and Accurate ApproachA Simple, Fast and Accurate Approach

• Classifier-based parsing with constituents– Trained and evaluated on WSJ data: 87.5%– Very fast, competitive accuracy

• Simple adaptation to labeled dependency parsing– Similar to Malt parser (Nivre, 2004)– Handles CHILDES GRs directly

Page 44: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

44

GR Analysis with Classifier-Based ParsingGR Analysis with Classifier-Based Parsing

• Stack S– Items may be POS-tagged words or

dependency trees– Initialization: empty

• Queue W– Items are POS-tagged words– Initialization: Insert each word of the input

sentence in order (first word is in front)

Page 45: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

45

Shift and Reduce ActionsShift and Reduce Actions

• Shift– Remove (shift) the word in front of queue W– Insert shifted item on top of stack S

• Reduce– Pop two topmost item from stack S– Push new item onto stack S

• New item forms new dependency• Choose LEFT or RIGHT• Choose Dependency Label

Page 46: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

46

Parser DecisionsParser Decisions

• Shift vs. Reduce

• If Reduce– RIGHT or LEFT– Dependency label

• We use a classifier to make these decisions

Page 47: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

47

Classes and FeaturesClasses and Features

• Classes– SHIFT– LEFT-SUBJ– LEFT-JCT– RIGHT-OBJ– RIGHT-JCT– …

• Features: derived from parser configuration– Crucially: two topmost items in S, first item in W– Additionally: other features that describe the current

configuration (look-ahead, etc)

Page 48: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

48

Parsing CHILDESParsing CHILDESwith a Classifier-Based Parserwith a Classifier-Based Parser

• Parser uses SVM• Trained on Eve training set (5,000 words)• Tested on Eve test set (2,000 words)

• Labeled dependency accuracy: 87.3%– Uses only domain-specific data– Same level of accuracy as GR system based on

Charniak parser

Page 49: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

49

Precision and Recall of Specific GRsPrecision and Recall of Specific GRs

GR Precision Recall F-score

SUBJ 0.97 0.98 0.98

OBJ 0.89 0.94 0.92

COORD 0.71 0.76 0.74

JCT 0.78 0.88 0.83

MOD 0.94 0.87 0.91

PRED 0.80 0.83 0.82

ROOT 0.95 0.94 0.94

COMP 0.70 0.78 0.74

XCOMP 0.93 0.82 0.87

Page 50: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

50

Precision and Recall of Specific GRsPrecision and Recall of Specific GRs

GR Precision Recall F-score

SUBJ 0.97 0.98 0.98 0.93

OBJ 0.89 0.94 0.92 0.87

COORD 0.71 0.76 0.74 0.75

JCT 0.78 0.88 0.83 0.86

MOD 0.94 0.87 0.91 0.85

PRED 0.80 0.83 0.82 0.81

ROOT 0.95 0.94 0.94 0.91

COMP 0.70 0.78 0.74 0.54

XCOMP 0.93 0.82 0.87 0.61

Page 51: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

51

OutlineOutline

• The CHILDES GR scheme

• GR Parsing of CHILDES transcripts

• Combining different strategies

• Automated measurement of syntactic development in child language

• Related Work

• Conclusion

• Weighted voting• Combination as parsing• Handling young child utterances

Page 52: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

52

Combine Different Parsers Combine Different Parsers to Get More Accurate Resultsto Get More Accurate Results

• Rule-based

• Statistical parsing + dependency labeling

• Classifier-based parsing– Obtain even more variety

• SVM vs MBL• Left-to-right vs right-to-left

Page 53: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

53

Simple (Unweighted) VotingSimple (Unweighted) Voting

• Each parser votes for each dependency

• Word-by-word

• Every vote has the same weight

Page 54: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

54

Simple (Unweighted) VotingSimple (Unweighted) Voting

He eats cake

Parser A Parser B Parser C1 2 He SUBJ 1 2 He SUBJ 1 3 He SUBJ 2 0 eats CMOD 2 0 eats ROOT 2 0 eats

ROOT3 1 cake OBJ 3 1 cake OBJ 3 2 cake OBJ

GOLD1 2 He SUBJ2 0 eats ROOT3 2 cake OBJ

Page 55: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

55

Simple (Unweighted) VotingSimple (Unweighted) Voting

He eats cake

Parser A Parser B Parser C1 2 He SUBJ 1 2 He SUBJ 1 3 He SUBJ 2 0 eats CMOD 2 0 eats ROOT 2 0 eats

ROOT3 1 cake OBJ 3 1 cake OBJ 3 2 cake OBJ

GOLD1 2 He SUBJ2 0 eats ROOT3 2 cake OBJ

VOTED1 2 He SUBJ2 0 eats ROOT3 1 cake SUBJ

He eats cake

Parser A Parser B Parser C1 2 He SUBJ 1 2 He SUBJ 1 3 He SUBJ 2 0 eats CMOD 2 0 eats ROOT 2 0 eats

ROOT3 1 cake OBJ 3 1 cake OBJ 3 2 cake OBJ

GOLD1 2 He SUBJ2 0 eats ROOT3 2 cake OBJ

VOTED

Page 56: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

56

Simple (Unweighted) VotingSimple (Unweighted) Voting

He eats cake

Parser A Parser B Parser C1 2 He SUBJ 1 2 He SUBJ 1 3 He SUBJ 2 0 eats CMOD 2 0 eats ROOT 2 0 eats

ROOT3 1 cake OBJ 3 1 cake OBJ 3 2 cake OBJ

GOLD1 2 He SUBJ2 0 eats ROOT3 2 cake OBJ

VOTED1 2 He SUBJ2 0 eats ROOT3 1 cake OBJ

He eats cake

Parser A Parser B Parser C1 2 He SUBJ 1 2 He SUBJ 1 3 He SUBJ 2 0 eats CMOD 2 0 eats ROOT 2 0 eats

ROOT3 1 cake OBJ 3 1 cake OBJ 3 2 cake OBJ

GOLD1 2 He SUBJ2 0 eats ROOT3 2 cake OBJ

VOTED

Page 57: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

57

Weighted VotingWeighted Voting

• Each parser has a weight– Reflects confidence in parser’s GR

identification

• Instead of adding number of votes,add the weight of votes

• Takes into account that some parsers are better than others

Page 58: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

58

Weighted VotingWeighted Voting

He eats cake

Parser A (0.4) Parser B (0.3) Parser C (0.8)1 2 He SUBJ 1 2 He SUBJ 1 3 He SUBJ 2 0 eats CMOD 2 0 eats ROOT 2 0 eats

ROOT3 1 cake OBJ 3 1 cake OBJ 3 2 cake OBJ

GOLD1 2 He SUBJ2 0 eats ROOT3 2 cake OBJ

VOTED1 3 He SUBJ2 0 eats ROOT3 2 cake OBJ

VOTED

Page 59: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

59

Label-Weighted VotingLabel-Weighted Voting

• Not just one weight per parser, but

one weight for each GR for each parser

• Takes into account specific strengths of each parser

Page 60: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

60

Label-Weighted VotingLabel-Weighted Voting

He eats cake

Parser A Parser B Parser C1 2 He SUBJ (0.7) 1 2 He SUBJ (0.8) 1 3 He SUBJ (0.6)

2 0 eats CMOD (0.3) 2 0 eats ROOT (0.9) 2 0 eats ROOT(0.7)

3 1 cake OBJ (0.5) 3 1 cake OBJ (0.3) 3 2 cake OBJ (0.9)

GOLD1 2 He SUBJ2 0 eats ROOT3 2 cake OBJ

VOTED1 2 He SUBJ2 0 eats ROOT3 2 cake OBJ

VOTED

Page 61: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

61

Voting Produces Very Accurate ResultsVoting Produces Very Accurate Results

• Parsers– Rule-based– Statistical based on Charniak parser– Classifier-based

• Left-to-right SVM• Right-to-left SVM• Left-to-right MBL

• Simple Voting: 88.0%• Weighted Voting: 89.1%• Label-weighted Voting: 92.1%

Page 62: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

62

Precision and Recall of Specific GRsPrecision and Recall of Specific GRs

GR Precision Recall F-score

SUBJ 0.98 0.98 0.98

OBJ 0.94 0.94 0.94

COORD 0.94 0.91 0.92

JCT 0.87 0.90 0.88

MOD 0.97 0.91 0.94

PRED 0.86 0.89 0.87

ROOT 0.97 0.96 0.96

COMP 0.75 0.67 0.71

XCOMP 0.90 0.88 0.89

Page 63: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

63

OutlineOutline

• The CHILDES GR scheme

• GR Parsing of CHILDES transcripts

• Combining different strategies

• Automated measurement of syntactic development in child language

• Weighted voting• Combination as parsing• Handling young child utterances

Page 64: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

64

Voting May Not Produce Voting May Not Produce a Well-Formed Dependency Treea Well-Formed Dependency Tree

• Voting on a word-by-word basis

• No guarantee of well-formedness

• Resulting set of dependencies may form a graph with cycles, or may not even be fully connected– Technically not fully compliant with CHILDES

GR annotation scheme

Page 65: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

65

Parser Combination as ReparsingParser Combination as Reparsing

• Once several parsers have analyzed a sentence, use their output to guide the process of reparsing the sentence

• Two reparsing approaches– Maximum spanning tree– CYK (dynamic programming)

Page 66: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

66

Dependency Parsing as Search for Dependency Parsing as Search for Maximum Spanning TreeMaximum Spanning Tree

• First, build a graph– Each word in input sentence is a node– Each dependency proposed by any of the parsers is

an weighted edge– If multiple parsers propose the same dependency,

add weights into a single edge

• Then, simply find the MST– Maximizes the votes– Structure guaranteed to be a dependency tree– May have crossing branches

Page 67: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

67

Parser Combination with the CYK AlgorithmParser Combination with the CYK Algorithm

• The CYK algorithm uses dynamic programming to find all parses for a sentence given a CFG– Probabilistic version finds most probable parse

• Build a graph, as with MST• Parse the sentence using CYK

– Instead of a grammar, consult the graph to determine how to fill new cells in the CYK table

– Instead of probabilities, we use the weights from the graph

Page 68: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

68

Precision and Recall of Specific GRsPrecision and Recall of Specific GRs

GR Precision Recall F-score

SUBJ 0.98 0.98 0.98

OBJ 0.94 0.94 0.94

COORD 0.94 0.91 0.92

JCT 0.87 0.90 0.88

MOD 0.97 0.91 0.94

PRED 0.86 0.89 0.87

ROOT 0.97 0.97 0.97

COMP 0.73 0.89 0.80

XCOMP 0.88 0.88 0.88

Page 69: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

69

OutlineOutline

• The CHILDES GR scheme

• GR Parsing of CHILDES transcripts

• Combining different strategies

• Automated measurement of syntactic development in child language

• Weighted voting• Combination as parsing• Handling young child utterances

Page 70: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

70

Handling Young Child Utterances withHandling Young Child Utterances withRule-Based and Data-Driven ParsingRule-Based and Data-Driven Parsing

• Eve-child test set:I need tapioca in the bowl.

That’s a hat.

In a minute.

* Warm puppy happiness a blanket.

* There briefcase.

? I drinking milk.

? I want Fraser hat.

Page 71: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

71

Three Types of Sentences in One CorpusThree Types of Sentences in One Corpus

• No problem– High accuracy

• No GRs– But data-driven systems will output GRs

• Missing words, agreement errors, etc.– GRs are fine, but a challenge for data-driven

systems trained on fully grammatical utterances

Page 72: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

72

To Analyze or Not To Analyze:To Analyze or Not To Analyze:Ask the Rule-Based ParserAsk the Rule-Based Parser

• Utterances with no GRs are annotated in test corpus as such

• Rule-based parser set to high precision– Same grammar as before

• If sentence cannot be parsed with the rule-based system, output No GR.– 88% Precision, 89% Recall– Sentences are fairly simple

Page 73: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

73

The Rule-Based Parser also The Rule-Based Parser also Identifies Missing WordsIdentifies Missing Words

• If the sentence can be analyzed with the rule-based system, check if any insertions were necessary– If inserted be or possessive marker ’s, insert

the appropriate lexical item in the sentence

• Parse the sentence with data-driven systems, run combination

Page 74: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

74

High Accuracy Analysis of High Accuracy Analysis of Challenging UtterancesChallenging Utterances

• Eve-child test– No rule-based first pass: 62.9% accuracy

• Many errors caused by GR analysis of words with no GRs

– With rule-based pass: 88.0% accuracy

• 700 words from Naomi corpus– No rule-based: 67.4%– Rule-based, then combo: 86.8%

Page 75: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

75

OutlineOutline

• The CHILDES GR scheme

• GR Parsing of CHILDES transcripts

• Combining different strategies

• Automated measurement of syntactic development in child language

• Related work

• Conclusion

Page 76: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

76

Index of Productive Syntax (IPSyn)Index of Productive Syntax (IPSyn)(Scarborough, 1990)(Scarborough, 1990)

• A measure of child language development

• Assigns a numerical score for grammatical complexity

(from 0 to 112 points)

• Used in hundreds of studies

Page 77: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

77

IPSyn Measures Syntactic DevelopmentIPSyn Measures Syntactic Development

• IPSyn: Designed for investigating differences in language acquisition– Differences in groups (for example: bilingual children)– Individual differences (for example: delayed language

development)– Focus on syntax

• Addresses weaknesses of Mean Length of Utterance (MLU)– MLU surprisingly useful until age 3, then reaches ceiling (or

becomes unreliable)

• IPSyn is very time-consuming to compute

Page 78: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

78

Computing IPSyn (manually)Computing IPSyn (manually)

• Corpus of 100 transcribed utterances– Consecutive, no repetitions

• Identify 56 specific language structures (IPSyn Items)– Examples:

• Presence of auxiliaries or modals• Inverted auxiliary in a wh-question• Conjoined clauses• Fronted or center-embedded subordinate clauses

– Count occurrences (zero, one, two or more)

• Add counts

Page 79: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

79

Automating IPSynAutomating IPSyn

• Existing state of manual computation– Spreadsheets– Search each sentence for language structures– Use part-of-speech tagging to narrow down the

number of sentences for certain structures• For example: Verb + Noun, Determiner + Adjective + Noun

• Automatic computation is possible with accurate GR analysis– Use GRs to search for IPSyn items

Page 80: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

80

Some IPSyn Items Require Syntactic Analysis for Some IPSyn Items Require Syntactic Analysis for Reliable RecognitionReliable Recognition

(and some don’t)(and some don’t)

• Determiner + Adjective + Noun• Auxiliary verb• Adverb modifying adjective or nominal• Subject + Verb + Object• Sentence with 3 clauses• Conjoined sentences• Wh-question with inverted auxiliary/modal/copula• Relative clauses• Propositional complements• Fronted subordinate clauses• Center-embedded clauses

Page 81: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

81

Automating IPSyn with Automating IPSyn with Grammatical Relation AnalysesGrammatical Relation Analyses

• Search for language structures using patterns that involve POS tags and GRs (labeled dependencies)

• Examples

– Wh-embedded clauses: search for wh-words whose head (or transitive head) is a dependent in a GR of types [XC]SUBJ, [XC]PRED, [XC]JCT, [XC]MOD, COMP or XCOMP

– Relative clauses: search for a CMOD where the dependent is to the right of the head

Page 82: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

82

Evaluation DataEvaluation Data

• Two sets of transcripts with IPSyn scoring from two different child language research groups

• Set A– Scored fully manually– 20 transcripts– Ages: about 3 yrs.

• Set B– Scored with CP first, then manually corrected– 25 transcripts– Ages: about 8 yrs.

(Two transcripts in each set were held out for development and debugging)

Page 83: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

83

Evaluation Metrics: Evaluation Metrics: Point DifferencePoint Difference

• Point difference

– The absolute point difference between the scores provided by our system, and the scores computed manually

– Simple, and shows how close the automatic scores are to the manual scores

– Acceptable range• Smaller for older children

Page 84: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

84

Evaluation Metrics:Evaluation Metrics:Point-to-Point AccuracyPoint-to-Point Accuracy

• Point-to-point accuracy

– Reflects overall reliability over each scoring decision made in the computation of IPSyn scores

– Scoring decisions: presence or absence of language structures in the transcript

Point-to-Point Acc = C(Correct Decisions)

C(Total Decisions)

– Commonly used for assessing inter-rater reliability among human scorers (for IPSyn, about 94%).

Page 85: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

85

ResultsResults

• IPSyn scores from

– Our GR-based system (GR)

– Manual scoring (HUMAN)

– Computerized Profiling (CP)• Long, Fey and Channell, 2004

Page 86: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

86

GR-based IPSyn Is Quite AccurateGR-based IPSyn Is Quite Accurate

System Avg. Point Difference to HUMAN

Point-to-point Reliability (%)

GR (total) 3.3 92.8

CP (total) 8.3 85.4

GR (set A) 3.7 92.5

CP (set A) 6.2 86.2

GR (set B) 2.9 93.0

CP (set B) 10.2 84.8

Page 87: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

87

GR-Based IPSyn Close to Human ScoringGR-Based IPSyn Close to Human Scoring

• Automatic scores very reliable

• Validates usefulness of– GR annotation scheme– Automatic GR analysis

• Validates analysis over a large set of children of different ages

Page 88: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

88

OutlineOutline

• The CHILDES GR scheme

• GR Parsing of CHILDES transcripts

• Combining different strategies

• Automated measurement of syntactic development in child language

• Related work

• Conclusion

Page 89: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

89

Related WorkRelated Work

• GR schemes, GR evaluation:– Carroll, Briscoe & Sanfilippo,

1998– Lin, 1998– Yeh, 2000– Preiss, 2003

• Rule-based robust parsing– Heeman & Allen, 2001– Lavie, 1996– Rosé & Lavie, 2001

• Parsing– Carroll & Briscoe, 2002– Briscoe & Carroll, 2002– Buchholz, 2002– Tomita, 1987

– Magerman, 1995– Ratnaparkhi, 1997– Collins, 1997– Charniak, 2000

• Deterministic parsing– Yamada & Matsumoto,

2003– Nivre & Scholz, 2004

• Parser Combination– Henderson & Brill, 1999– Brill & Wu, 1998– Yeh, 2000– Sarkar, 2001

• Automatic measurement of grammatical complexity– Long, Fey & Channell,

2004

Page 90: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

90

OutlineOutline

• The CHILDES GR scheme

• GR Parsing of CHILDES transcripts

• Combining different strategies

• Automated measurement of syntactic development in child language

• Related work

• Conclusion

Page 91: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

91

Major ContributionsMajor Contributions

• An annotation scheme based on GRs for syntactic structure in CHILDES transcripts

• A linear-time classifier-based parser for constituent structures

• The development of rule-based and data-driven approaches to GR analysis– Precision/recall trade-off using insertions and skipping– Data-driven GR analysis using existing resources

• Charniak parser, Penn Treebank

– Parser variety in classifier-based dependency parsing

Page 92: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

92

Major Contributions (2)Major Contributions (2)

• The use of different voting schemes for combining dependency analyses– Surpasses state-of-the-art in WSJ dependency

parsing– Vastly outperforms individual parsing approaches

• A novel reparsing combination scheme– Maximum spanning trees, CYK

• An accurate automated tool for measurement of syntactic development in child language– Validates annotation scheme and quality of GR

analyses

Page 93: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

93

Possible Future DirectionsPossible Future Directions

• Classifier-based parsing– Beam search keeping linear time– Tree classification (Kudo & Matsumoto, 2004)

• Parser combination– Parser variety, reparsing combination with constituent

trees• Automated measurement of grammatical

complexity– Take precision/recall into account– A data-driven approach to replace search rules

• Other languages

Page 94: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

94

Page 95: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

95

Page 96: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

96

Page 97: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

97

Page 98: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

98

More on Dependency VotingMore on Dependency Voting

• On WSJ data: 93.9% unlabeled accuracy• On Eve data

– No RB: 91.1% • COMP: 50%

– No charn, No RB: 89.1%• COMP: 50%, COORD: 84%, ROOT: 95%

– No charn: 90.5%• COMP: 67%

– No RL, no MBL: 91.8%

Page 99: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

99

Full GR ResultsFull GR Results

• XJCT ( 2 / 2) : 1.00 1.00 1.00• OBJ ( 90 / 91) : 0.95 0.96 0.95• NEG ( 26 / 25) : 1.00 0.96 0.98• SUBJ ( 180 / 181) : 0.98 0.98 0.98• INF ( 19 / 19) : 1.00 1.00 1.00• POBJ ( 48 / 51) : 0.92 0.98 0.95• XCOMP ( 23 / 23) : 0.88 0.88 0.88• QUANT ( 4 / 4) : 1.00 1.00 1.00• VOC ( 2 / 2) : 1.00 1.00 1.00• TAG ( 1 / 1) : 1.00 1.00 1.00• CPZR ( 10 / 9) : 1.00 0.90 0.95• PTL ( 6 / 6) : 0.83 0.83 0.83• COORD ( 33 / 33) : 0.91 0.91 0.91• COMP ( 18 / 18) : 0.71 0.89 0.80• AUX ( 74 / 78) : 0.94 0.99 0.96• CJCT ( 6 / 5) : 1.00 0.83 0.91• PRED ( 54 / 55) : 0.87 0.89 0.88• DET ( 45 / 47) : 0.96 1.00 0.98• MOD ( 94 / 89) : 0.97 0.91 0.94• ROOT ( 239 / 238) : 0.97 0.96 0.96• PUNCT ( 286 / 286) : 1.00 1.00 1.00• COM ( 45 / 44) : 0.93 0.91 0.92• ESUBJ ( 2 / 2) : 1.00 1.00 1.00• CMOD ( 3 / 3) : 1.00 1.00 1.00• JCT ( 78 / 84) : 0.85 0.91 0.88

Page 100: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

100

Weighted VotingWeighted Voting

He eats cake

Parser A (0.4) Parser B (0.3) Parser C (0.8)1 2 He SUBJ 1 2 He SUBJ 1 3 He SUBJ 2 0 eats CMOD 2 0 eats ROOT 2 0 eats

ROOT3 1 cake OBJ 3 1 cake OBJ 3 2 cake OBJ

GOLD1 2 He SUBJ2 0 eats ROOT3 2 cake OBJ

VOTED1 3 He SUBJ2 0 eats ROOT3 2 cake OBJ

VOTED

0.80.7

Page 101: A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

101

Weighted VotingWeighted Voting

He eats cake

Parser A Parser B Parser C1 2 He SUBJ (0.7) 1 2 He SUBJ (0.8) 1 3 He SUBJ (0.6)

2 0 eats CMOD (0.3) 2 0 eats ROOT (0.9) 2 0 eats ROOT(0.7)

3 1 cake OBJ (0.5) 3 1 cake OBJ (0.3) 3 2 cake OBJ (0.9)

GOLD1 2 He SUBJ2 0 eats ROOT3 2 cake OBJ

VOTED1 2 He SUBJ2 0 eats ROOT3 2 cake OBJ

VOTED

0.61.5