EMNLP’01 19/11/2001 ML: Classical methods from AI –Decision-Tree induction –Exemplar-based Learning –Rule Induction –T ransformation B ased E rror D riven.

EMNLP’01 19/11/2001 EMNLP’01 19/11/2001

•ML: Classical methods from AI

–Decision-Tree induction

–Exemplar-based Learning

–Rule Induction

–TransformationBasedErrorDrivenLearning

•ML: Classical methods from AI

–Decision-Tree induction

–Exemplar-based Learning

–Rule Induction

–TransformationBasedErrorDrivenLearning

EMNLP’01 19/11/2001 EMNLP’01 19/11/2001

TBEDLTBEDL

Transformation-Based Error-Driven Learning (Brill 92,93,95)


• The learning algorithm is a mistake-driven greedy procedure that iteratively acquires a set of transformation rules

• Firstly, unannotated text is passed through an initial-state annotator

• Then, at each step the algorithm adds the transformation rule that best repairs the current errors

• The learning algorithm is a mistake-driven greedy procedure that iteratively acquires a set of transformation rules

• Firstly, unannotated text is passed through an initial-state annotator

• Then, at each step the algorithm adds the transformation rule that best repairs the current errors

EMNLP’01 19/11/2001 EMNLP’01 19/11/2001

TBEDLTBEDL

• Concrete rules are acquired by instantiation of a predefined set of template rules:

conjunction_of_conditions transformation

• When annotating a new text, all the transformation rules are applied in order of generation

• Concrete rules are acquired by instantiation of a predefined set of template rules:

conjunction_of_conditions transformation

• When annotating a new text, all the transformation rules are applied in order of generation



EMNLP’01 19/11/2001 EMNLP’01 19/11/2001

UnnanotatedText

Annotated Text

Rules

“Truth”

InitialState

Learner

TRAININGTRAINING

TBEDLTBEDL



EMNLP’01 19/11/2001 EMNLP’01 19/11/2001

TB(ED)L Applied to POS TaggingTB(ED)L Applied to POS Tagging(Brill 92,93,94,95)(Brill 92,93,94,95)

• Initial_State_Annotator = Most_Frequent Label

• Three types of templates

– Non lexicalized conditions

– Lexicalized patterns

– Morphological conditions for dealing with unknown words

• Initial_State_Annotator = Most_Frequent Label

• Three types of templates

– Non lexicalized conditions

– Lexicalized patterns

– Morphological conditions for dealing with unknown words

TBEDLTBEDL

EMNLP’01 19/11/2001 EMNLP’01 19/11/2001


• Non-lexicalized conditions:• Non-lexicalized conditions:

TBEDLTBEDL

EMNLP’01 19/11/2001 EMNLP’01 19/11/2001

TBEDLTBEDL

Fir

st

imp

lem

en

tati

on

Fir

st

imp

lem

en

tati

on

EMNLP’01 19/11/2001 EMNLP’01 19/11/2001


• Non-lexicalized conditions: best rules acquired• Non-lexicalized conditions: best rules acquired

TBEDLTBEDL

EMNLP’01 19/11/2001 EMNLP’01 19/11/2001


• Lexicalized patterns:• Lexicalized patterns:

TBEDLTBEDL

EMNLP’01 19/11/2001 EMNLP’01 19/11/2001


• Lexicalized patterns:• Lexicalized patterns:

TBEDLTBEDL

– as/IN tall/JJ as/IN– as/IN tall/JJ as/IN

– We do ’nt eat / We did ’nt usually drink– We do ’nt eat / We did ’nt usually drink

EMNLP’01 19/11/2001 EMNLP’01 19/11/2001


• Morphological conditions for dealing with unknown words:

• Morphological conditions for dealing with unknown words:

TBEDLTBEDL

EMNLP’01 19/11/2001 EMNLP’01 19/11/2001

TBEDLTBEDL


• Unknown words: best rules acquired• Unknown words: best rules acquired

EMNLP’01 19/11/2001 EMNLP’01 19/11/2001

• Tested on 600 Kw of the Wall Street annotated corpus

– Number of transformation rules: <500

– Accuracy: • 97.0% - 97.2% (with no unknown words)

• The accuracy of a HMM trigram tagger is achieved using only 86 transformation rules

• 96.6% considering unknown words (82.2%)

• Tested on 600 Kw of the Wall Street annotated corpus

– Number of transformation rules: <500

– Accuracy: • 97.0% - 97.2% (with no unknown words)

• The accuracy of a HMM trigram tagger is achieved using only 86 transformation rules

• 96.6% considering unknown words (82.2%)

TBEDLTBEDL


EMNLP’01 19/11/2001 EMNLP’01 19/11/2001

TBEDLTBEDL


EMNLP’01 19/11/2001 EMNLP’01 19/11/2001

TB(ED)L and NLPTB(ED)L and NLP

• POS Tagging (Brill 92,94a,95; Roche & Schabes 95; Aone & Hausman 96)

• PP-attachment disambiguation (Brill & Resnik, 1994)

• Grammar induction and Parsing (Brill, 1993)

• Context-sensitive Spelling Correction (Mangu & Brill, 1996)

• Word Sense Disambiguation (Dini et al., 1998)

• Dialogue Act Tagging (Samuel et al., 1998a,1998b)

• Semantic Role Labeling (Higgins, 2004; Williams et al., 2004; CoNLL-2004)

• POS Tagging (Brill 92,94a,95; Roche & Schabes 95; Aone & Hausman 96)

• PP-attachment disambiguation (Brill & Resnik, 1994)

• Grammar induction and Parsing (Brill, 1993)

• Context-sensitive Spelling Correction (Mangu & Brill, 1996)

• Word Sense Disambiguation (Dini et al., 1998)

• Dialogue Act Tagging (Samuel et al., 1998a,1998b)

• Semantic Role Labeling (Higgins, 2004; Williams et al., 2004; CoNLL-2004)

TBEDLTBEDL

EMNLP’01 19/11/2001 EMNLP’01 19/11/2001

TB(ED)L: Main DrawbackTB(ED)L: Main Drawback

TBEDLTBEDL

• Computational cost – Memory & Time (specially on

Training)

• Some proposals– Ramshaw & Marcus (1994)

– LazyTBL (Samuel 98)

-TBL (Lager 99) – ICA (Hepple 00)

– FastTBL (Ngai & Florian, 01)

• Computational cost – Memory & Time (specially on

Training)

• Some proposals– Ramshaw & Marcus (1994)

– LazyTBL (Samuel 98)

-TBL (Lager 99) – ICA (Hepple 00)

– FastTBL (Ngai & Florian, 01)

EMNLP’01 19/11/2001 EMNLP’01 19/11/2001

Extensions: LazyTBEDL (Samuel 98)Extensions: LazyTBEDL (Samuel 98)

• Uses Brill’s TB(ED)L algorithm

• Applies Monte Carlo strategy to randomly sample from the space of rules, rather than exhaustively analyzing all possible rules

• The memory and time costs of the TB(ED)L algorithm are drastically reduced without compromising accuracy on unseen data

• Application to Dialogue Act Tagging – Accuracy results: 75.5% over state-of-the-art systems

• Uses Brill’s TB(ED)L algorithm

• Applies Monte Carlo strategy to randomly sample from the space of rules, rather than exhaustively analyzing all possible rules

• The memory and time costs of the TB(ED)L algorithm are drastically reduced without compromising accuracy on unseen data

• Application to Dialogue Act Tagging – Accuracy results: 75.5% over state-of-the-art systems

TBEDLTBEDL

EMNLP’01 19/11/2001 EMNLP’01 19/11/2001

TBEDLTBEDL


EMNLP’01 19/11/2001 EMNLP’01 19/11/2001

TBEDLTBEDL


EMNLP’01 19/11/2001 EMNLP’01 19/11/2001

TBEDLTBEDL


EMNLP’01 19/11/2001 EMNLP’01 19/11/2001

TBEDLTBEDL


EMNLP’01 19/11/2001 EMNLP’01 19/11/2001

TBEDLTBEDL


EMNLP’01 19/11/2001 EMNLP’01 19/11/2001

TBEDLTBEDL

Extensions: FastTBEDL (Ngai & Florian 01)Extensions: FastTBEDL (Ngai & Florian 01)

EMNLP’01 19/11/2001 EMNLP’01 19/11/2001

TBEDLTBEDL

Extensions: FastTBEDL (Ngai & Florian 01)Extensions: FastTBEDL (Ngai & Florian 01)

• Software available at:

http://nlp.cs.jhu.edu/rflorian/fntbl

• Software available at:

http://nlp.cs.jhu.edu/rflorian/fntbl

EMNLP’01 19/11/2001 EMNLP’01 19/11/2001

TB(ED)L: SummaryTB(ED)L: Summary

• Advantages

– General, simple and understandable modeling

– Provides a very compact set of interpretable transformation rules

– High accuracy in many NLP applications

• Advantages

– General, simple and understandable modeling

– Provides a very compact set of interpretable transformation rules

– High accuracy in many NLP applications

TBEDLTBEDL

• Drawbacks

– Computational cost: high memory and time requirements. But some efficient variants of TBL have been proposed (fastTBL)

– Sequential application of rules

• Drawbacks

– Computational cost: high memory and time requirements. But some efficient variants of TBL have been proposed (fastTBL)

– Sequential application of rules

EMNLP’01 19/11/2001 EMNLP’01 19/11/2001

TB(ED)L: SummaryTB(ED)L: Summary

• Others

– A transformation list is a processor and not a classifier

– A comparison between Decision Trees and Transformation lists can be found in (Brill, 1995)

• Others

– A transformation list is a processor and not a classifier

– A comparison between Decision Trees and Transformation lists can be found in (Brill, 1995)

TBEDLTBEDL

EMNLP’01 19/11/2001 ML: Classical methods from AI –Decision-Tree induction –Exemplar-based Learning –Rule Induction –T ransformation B ased E rror D riven.

Documents

tbedl tbedl

tbedl concrete rules

conditions transformation

unknown words tbedl

set of transformation

pos tagging brill

best rules

earning slide