EMNLP’01 19/11/2001 • ML: Classical methods from AI – Decision-Tree induction – Exemplar-based Learning – Rule Induction – TransformationBasedErrorDrivenLearning
Dec 27, 2015
EMNLP’01 19/11/2001 EMNLP’01 19/11/2001
•ML: Classical methods from AI
–Decision-Tree induction
–Exemplar-based Learning
–Rule Induction
–TransformationBasedErrorDrivenLearning
•ML: Classical methods from AI
–Decision-Tree induction
–Exemplar-based Learning
–Rule Induction
–TransformationBasedErrorDrivenLearning
EMNLP’01 19/11/2001 EMNLP’01 19/11/2001
TBEDLTBEDL
Transformation-Based Error-Driven Learning (Brill 92,93,95)
Transformation-Based Error-Driven Learning (Brill 92,93,95)
• The learning algorithm is a mistake-driven greedy procedure that iteratively acquires a set of transformation rules
• Firstly, unannotated text is passed through an initial-state annotator
• Then, at each step the algorithm adds the transformation rule that best repairs the current errors
• The learning algorithm is a mistake-driven greedy procedure that iteratively acquires a set of transformation rules
• Firstly, unannotated text is passed through an initial-state annotator
• Then, at each step the algorithm adds the transformation rule that best repairs the current errors
EMNLP’01 19/11/2001 EMNLP’01 19/11/2001
TBEDLTBEDL
• Concrete rules are acquired by instantiation of a predefined set of template rules:
conjunction_of_conditions transformation
• When annotating a new text, all the transformation rules are applied in order of generation
• Concrete rules are acquired by instantiation of a predefined set of template rules:
conjunction_of_conditions transformation
• When annotating a new text, all the transformation rules are applied in order of generation
Transformation-Based Error-Driven Learning (Brill 92,93,95)
Transformation-Based Error-Driven Learning (Brill 92,93,95)
EMNLP’01 19/11/2001 EMNLP’01 19/11/2001
UnnanotatedText
Annotated Text
Rules
“Truth”
InitialState
Learner
TRAININGTRAINING
TBEDLTBEDL
Transformation-Based Error-Driven Learning (Brill 92,93,95)
Transformation-Based Error-Driven Learning (Brill 92,93,95)
EMNLP’01 19/11/2001 EMNLP’01 19/11/2001
TB(ED)L Applied to POS TaggingTB(ED)L Applied to POS Tagging(Brill 92,93,94,95)(Brill 92,93,94,95)
• Initial_State_Annotator = Most_Frequent Label
• Three types of templates
– Non lexicalized conditions
– Lexicalized patterns
– Morphological conditions for dealing with unknown words
• Initial_State_Annotator = Most_Frequent Label
• Three types of templates
– Non lexicalized conditions
– Lexicalized patterns
– Morphological conditions for dealing with unknown words
TBEDLTBEDL
EMNLP’01 19/11/2001 EMNLP’01 19/11/2001
TB(ED)L Applied to POS TaggingTB(ED)L Applied to POS Tagging(Brill 92,93,94,95)(Brill 92,93,94,95)
• Non-lexicalized conditions:• Non-lexicalized conditions:
TBEDLTBEDL
EMNLP’01 19/11/2001 EMNLP’01 19/11/2001
TBEDLTBEDL
Fir
st
imp
lem
en
tati
on
Fir
st
imp
lem
en
tati
on
EMNLP’01 19/11/2001 EMNLP’01 19/11/2001
TB(ED)L Applied to POS TaggingTB(ED)L Applied to POS Tagging(Brill 92,93,94,95)(Brill 92,93,94,95)
• Non-lexicalized conditions: best rules acquired• Non-lexicalized conditions: best rules acquired
TBEDLTBEDL
EMNLP’01 19/11/2001 EMNLP’01 19/11/2001
TB(ED)L Applied to POS TaggingTB(ED)L Applied to POS Tagging(Brill 92,93,94,95)(Brill 92,93,94,95)
• Lexicalized patterns:• Lexicalized patterns:
TBEDLTBEDL
EMNLP’01 19/11/2001 EMNLP’01 19/11/2001
TB(ED)L Applied to POS TaggingTB(ED)L Applied to POS Tagging(Brill 92,93,94,95)(Brill 92,93,94,95)
• Lexicalized patterns:• Lexicalized patterns:
TBEDLTBEDL
– as/IN tall/JJ as/IN– as/IN tall/JJ as/IN
– We do ’nt eat / We did ’nt usually drink– We do ’nt eat / We did ’nt usually drink
EMNLP’01 19/11/2001 EMNLP’01 19/11/2001
TB(ED)L Applied to POS TaggingTB(ED)L Applied to POS Tagging(Brill 92,93,94,95)(Brill 92,93,94,95)
• Morphological conditions for dealing with unknown words:
• Morphological conditions for dealing with unknown words:
TBEDLTBEDL
EMNLP’01 19/11/2001 EMNLP’01 19/11/2001
TBEDLTBEDL
TB(ED)L Applied to POS TaggingTB(ED)L Applied to POS Tagging(Brill 92,93,94,95)(Brill 92,93,94,95)
• Unknown words: best rules acquired• Unknown words: best rules acquired
EMNLP’01 19/11/2001 EMNLP’01 19/11/2001
• Tested on 600 Kw of the Wall Street annotated corpus
– Number of transformation rules: <500
– Accuracy: • 97.0% - 97.2% (with no unknown words)
• The accuracy of a HMM trigram tagger is achieved using only 86 transformation rules
• 96.6% considering unknown words (82.2%)
• Tested on 600 Kw of the Wall Street annotated corpus
– Number of transformation rules: <500
– Accuracy: • 97.0% - 97.2% (with no unknown words)
• The accuracy of a HMM trigram tagger is achieved using only 86 transformation rules
• 96.6% considering unknown words (82.2%)
TBEDLTBEDL
TB(ED)L Applied to POS TaggingTB(ED)L Applied to POS Tagging(Brill 92,93,94,95)(Brill 92,93,94,95)
EMNLP’01 19/11/2001 EMNLP’01 19/11/2001
TBEDLTBEDL
TB(ED)L Applied to POS TaggingTB(ED)L Applied to POS Tagging(Brill 92,93,94,95)(Brill 92,93,94,95)
EMNLP’01 19/11/2001 EMNLP’01 19/11/2001
TB(ED)L and NLPTB(ED)L and NLP
• POS Tagging (Brill 92,94a,95; Roche & Schabes 95; Aone & Hausman 96)
• PP-attachment disambiguation (Brill & Resnik, 1994)
• Grammar induction and Parsing (Brill, 1993)
• Context-sensitive Spelling Correction (Mangu & Brill, 1996)
• Word Sense Disambiguation (Dini et al., 1998)
• Dialogue Act Tagging (Samuel et al., 1998a,1998b)
• Semantic Role Labeling (Higgins, 2004; Williams et al., 2004; CoNLL-2004)
• POS Tagging (Brill 92,94a,95; Roche & Schabes 95; Aone & Hausman 96)
• PP-attachment disambiguation (Brill & Resnik, 1994)
• Grammar induction and Parsing (Brill, 1993)
• Context-sensitive Spelling Correction (Mangu & Brill, 1996)
• Word Sense Disambiguation (Dini et al., 1998)
• Dialogue Act Tagging (Samuel et al., 1998a,1998b)
• Semantic Role Labeling (Higgins, 2004; Williams et al., 2004; CoNLL-2004)
TBEDLTBEDL
EMNLP’01 19/11/2001 EMNLP’01 19/11/2001
TB(ED)L: Main DrawbackTB(ED)L: Main Drawback
TBEDLTBEDL
• Computational cost – Memory & Time (specially on
Training)
• Some proposals– Ramshaw & Marcus (1994)
– LazyTBL (Samuel 98)
-TBL (Lager 99) – ICA (Hepple 00)
– FastTBL (Ngai & Florian, 01)
• Computational cost – Memory & Time (specially on
Training)
• Some proposals– Ramshaw & Marcus (1994)
– LazyTBL (Samuel 98)
-TBL (Lager 99) – ICA (Hepple 00)
– FastTBL (Ngai & Florian, 01)
EMNLP’01 19/11/2001 EMNLP’01 19/11/2001
Extensions: LazyTBEDL (Samuel 98)Extensions: LazyTBEDL (Samuel 98)
• Uses Brill’s TB(ED)L algorithm
• Applies Monte Carlo strategy to randomly sample from the space of rules, rather than exhaustively analyzing all possible rules
• The memory and time costs of the TB(ED)L algorithm are drastically reduced without compromising accuracy on unseen data
• Application to Dialogue Act Tagging – Accuracy results: 75.5% over state-of-the-art systems
• Uses Brill’s TB(ED)L algorithm
• Applies Monte Carlo strategy to randomly sample from the space of rules, rather than exhaustively analyzing all possible rules
• The memory and time costs of the TB(ED)L algorithm are drastically reduced without compromising accuracy on unseen data
• Application to Dialogue Act Tagging – Accuracy results: 75.5% over state-of-the-art systems
TBEDLTBEDL
EMNLP’01 19/11/2001 EMNLP’01 19/11/2001
TBEDLTBEDL
Extensions: LazyTBEDL (Samuel 98)Extensions: LazyTBEDL (Samuel 98)
EMNLP’01 19/11/2001 EMNLP’01 19/11/2001
TBEDLTBEDL
Extensions: LazyTBEDL (Samuel 98)Extensions: LazyTBEDL (Samuel 98)
EMNLP’01 19/11/2001 EMNLP’01 19/11/2001
TBEDLTBEDL
Extensions: LazyTBEDL (Samuel 98)Extensions: LazyTBEDL (Samuel 98)
EMNLP’01 19/11/2001 EMNLP’01 19/11/2001
TBEDLTBEDL
Extensions: LazyTBEDL (Samuel 98)Extensions: LazyTBEDL (Samuel 98)
EMNLP’01 19/11/2001 EMNLP’01 19/11/2001
TBEDLTBEDL
Extensions: LazyTBEDL (Samuel 98)Extensions: LazyTBEDL (Samuel 98)
EMNLP’01 19/11/2001 EMNLP’01 19/11/2001
TBEDLTBEDL
Extensions: FastTBEDL (Ngai & Florian 01)Extensions: FastTBEDL (Ngai & Florian 01)
EMNLP’01 19/11/2001 EMNLP’01 19/11/2001
TBEDLTBEDL
Extensions: FastTBEDL (Ngai & Florian 01)Extensions: FastTBEDL (Ngai & Florian 01)
• Software available at:
http://nlp.cs.jhu.edu/rflorian/fntbl
• Software available at:
http://nlp.cs.jhu.edu/rflorian/fntbl
EMNLP’01 19/11/2001 EMNLP’01 19/11/2001
TB(ED)L: SummaryTB(ED)L: Summary
• Advantages
– General, simple and understandable modeling
– Provides a very compact set of interpretable transformation rules
– High accuracy in many NLP applications
• Advantages
– General, simple and understandable modeling
– Provides a very compact set of interpretable transformation rules
– High accuracy in many NLP applications
TBEDLTBEDL
• Drawbacks
– Computational cost: high memory and time requirements. But some efficient variants of TBL have been proposed (fastTBL)
– Sequential application of rules
• Drawbacks
– Computational cost: high memory and time requirements. But some efficient variants of TBL have been proposed (fastTBL)
– Sequential application of rules
EMNLP’01 19/11/2001 EMNLP’01 19/11/2001
TB(ED)L: SummaryTB(ED)L: Summary
• Others
– A transformation list is a processor and not a classifier
– A comparison between Decision Trees and Transformation lists can be found in (Brill, 1995)
• Others
– A transformation list is a processor and not a classifier
– A comparison between Decision Trees and Transformation lists can be found in (Brill, 1995)
TBEDLTBEDL