Top Banner
Transformation-based error-driven learning (TBL) LING 572 Fei Xia 1/19/06
29

Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

Transformation-based error-driven learning (TBL)

LING 572

Fei Xia

1/19/06

Page 2: Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

Outline

• Basic concept and properties

• Relation between DT, DL, and TBL

• Case study

Page 3: Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

Basic concepts and properties

Page 4: Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

TBL overview

• Introduced by Eric Brill (1992)

• Intuition:– Start with some simple solution to the problem– Then apply a sequence of transformations

• Applications:– Classification problems– Other kinds of problems: e.g., parsing

Page 5: Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

TBL flowchart

Page 6: Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

Transformations

• A transformation has two components:– A trigger environment: e.g., the previous tag is DT– A rewrite rule: change the current tag from MD to N

If (prev_tag == T) then MD N

• Similar to a rule in decision tree, but the rewrite rule can be complicated (e.g., change a parse tree)

a transformation list is a processor and not (just) a classifier.

Page 7: Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

Learning transformations

1. Initialize each example in the training data with a classifier

2. Consider all the possible transformations, and choose the one with the highest score.

3. Append it to the transformation list and apply it to the training corpus to obtain a “new” corpus.

4. Repeat steps 2-3.

Steps 2-3 can be expensive. Various ways that try to solve the problem.

Page 8: Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

Using TBL

• The initial state-annotator

• The space of allowable transformations– Rewrite rules– Triggering environments

• The objective function: minimize error rate directly.– for comparing the corpus to the truth– for choosing a transformation

Page 9: Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

Using TBL (cont)

• Two more parameters:– Whether the effect of a transformation is

visible to following transformations

– If so, what’s the order in which transformations are applied to a corpus?

• left-to-right • right-to-left

Page 10: Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

The order matters

• Transformation: If prev_label=A then change the cur_label from

A to B.

• Input: A A A A A A

• Output:– Not immediate results: A B B B B B– Immediate results, left-to-right: A B A B A B– Immediate results, right-to-left: A B B B B B

Page 11: Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

Relation between DT, DL, and TBL

Page 12: Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

DT and TBL

DT is a subset of TBL

(Proof)

when depth(DT)=1:

1.Label with S2. If X then S A3.S B

Page 13: Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

DT is a subset of TBL

L1: Label with S’ L1’

T1

T2

L2:: Label with S’’ L2’

Label with SIf X then S S’S S’’L1’ L2’

Depth=n:

Depth=n+1:

Page 14: Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

DT is a subset of TBL

Label with SIf X then S S’S S’’L1’ (renaming X with X’)L2’ (renaming X with X’’)

X’ XX’’ X

Page 15: Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

DT is a proper subset of TBL

• There exists a problem that can be solved by TBL but not a DT, for a fixed set of primitive queries.

• Ex: Given a sequence of characters– Classify a char based on its position

• If pos % 4 == 0 then “yes” else “no”

– Input attributes available: previous two chars

Page 16: Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

• Transformation list:– Label with S: A/S A/S A/S A/S A/S A/S A/S

– If there is no previous character, then S F A/F A/S A/S A/S A/S A/S A/S

– If the char two to the left is labeled ith F, then S F A/F A/S A/F A/S A/F A/S A/F

– If the char two to the left is labeled with F, then FS A/F A/S A/S A/S A/F A/S A/S– F yes– S no

Page 17: Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

DT and TBL

• TBL is more powerful than DT

• Extra power of TBL comes from– Transformations are applied in sequence– Results of previous transformations are visible

to following transformations.

Page 18: Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

DL and TBL

• DL is a proper subset of TBL.

• In two-class TBL:– (if q then y’ y) (if q then y)– If multiple transformations apply to an

example, only the last one matters

Page 19: Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

Two-class TBL DL ?

• Two-class TBL DL– Replace “if q then y’y” with “if q then y”– Reverse the rule order

• DL two-class TBL– Replace “if q then y” with “if q then y’y”– Reverse the rule order

does not hold for “dynamic” problems:– Dynamic problem: the anwers to questions are not static: – Ex: in POS tagging, when the tag of a word is changed, it

changes the answers to questions for nearby words.

Page 20: Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

DT, DL, and TBL (summary)

• K-DT is a proper subset of k-DL.• DL is a proper subset of TBL.

• Extra power of TBL comes from– Transformations are applied in sequence– Results of previous transformations are visible to following

transformations.

• TBL transforms training data. It does not split training data.

• TBL is a processor, not just a classifier

Page 21: Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

Case study

Page 22: Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

TBL for POS tagging

• The initial state-annotator: most common tag for a word.

• The space of allowable transformations– Rewrite rules: change cur_tag from X to Y.– Triggering environments (feature types):

unlexicalized or lexicalized

Page 23: Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

Unlexicalized features

• t-1 is z

• t-1 or t-2 is z

• t-1 or t-2 or t-3 is z

• t-1 is z and t+1 is w

• …

Page 24: Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

Lexicalized features

• w0 is w.

• w-1 is w

• w-1 or w-2 is w

• t-1 is z and w0 is w.

• …

Page 25: Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

TBL for POS tagging (cont)

• The objective function: tagging accuracy – for comparing the corpus to the truth: – For choosing a transformation: choose the one that

results in the greatest error reduction.

• The order of applying transformations: left-to-right.

• The results of applying transformations are not visible to other transformations.

Page 26: Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

Learned transformations

Page 27: Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

Experiments

Page 28: Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

Uncovered issues

• Efficient learning algorithms

• Probabilistic TBL– Top-N hypothesis– Confidence measure

Page 29: Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

TBL Summary

• TBL is more powerful than DL and DT.– Transformations are applied in sequence – It can handle dynamic problems well.

• TBL is more than a classifier– Classification problems: POS tagging– Other problems: e.g., parsing

• TBL performs well because it minimizes errors directly.

• Learning can be expensive various methods