Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

Transformation-based error-driven learning (TBL)

LING 572

Fei Xia

1/19/06

Outline

• Basic concept and properties

• Relation between DT, DL, and TBL

• Case study

Basic concepts and properties

TBL overview

• Introduced by Eric Brill (1992)

• Intuition:– Start with some simple solution to the problem– Then apply a sequence of transformations

• Applications:– Classification problems– Other kinds of problems: e.g., parsing

TBL flowchart

Transformations

• A transformation has two components:– A trigger environment: e.g., the previous tag is DT– A rewrite rule: change the current tag from MD to N

If (prev_tag == T) then MD N

• Similar to a rule in decision tree, but the rewrite rule can be complicated (e.g., change a parse tree)

a transformation list is a processor and not (just) a classifier.

Learning transformations

1. Initialize each example in the training data with a classifier

2. Consider all the possible transformations, and choose the one with the highest score.

3. Append it to the transformation list and apply it to the training corpus to obtain a “new” corpus.

4. Repeat steps 2-3.

Steps 2-3 can be expensive. Various ways that try to solve the problem.

Using TBL

• The initial state-annotator

• The space of allowable transformations– Rewrite rules– Triggering environments

• The objective function: minimize error rate directly.– for comparing the corpus to the truth– for choosing a transformation

Using TBL (cont)

• Two more parameters:– Whether the effect of a transformation is

visible to following transformations

– If so, what’s the order in which transformations are applied to a corpus?

• left-to-right • right-to-left

The order matters

• Transformation: If prev_label=A then change the cur_label from

A to B.

• Input: A A A A A A

• Output:– Not immediate results: A B B B B B– Immediate results, left-to-right: A B A B A B– Immediate results, right-to-left: A B B B B B

Relation between DT, DL, and TBL

DT and TBL

DT is a subset of TBL

(Proof)

when depth(DT)=1:

1.Label with S2. If X then S A3.S B


L1: Label with S’ L1’

T1

T2

L2:: Label with S’’ L2’

Label with SIf X then S S’S S’’L1’ L2’

Depth=n:

Depth=n+1:


Label with SIf X then S S’S S’’L1’ (renaming X with X’)L2’ (renaming X with X’’)

X’ XX’’ X

DT is a proper subset of TBL

• There exists a problem that can be solved by TBL but not a DT, for a fixed set of primitive queries.

• Ex: Given a sequence of characters– Classify a char based on its position

• If pos % 4 == 0 then “yes” else “no”

– Input attributes available: previous two chars

• Transformation list:– Label with S: A/S A/S A/S A/S A/S A/S A/S

– If there is no previous character, then S F A/F A/S A/S A/S A/S A/S A/S

– If the char two to the left is labeled ith F, then S F A/F A/S A/F A/S A/F A/S A/F

– If the char two to the left is labeled with F, then FS A/F A/S A/S A/S A/F A/S A/S– F yes– S no

DT and TBL

• TBL is more powerful than DT

• Extra power of TBL comes from– Transformations are applied in sequence– Results of previous transformations are visible

to following transformations.

DL and TBL

• DL is a proper subset of TBL.

• In two-class TBL:– (if q then y’ y) (if q then y)– If multiple transformations apply to an

example, only the last one matters

Two-class TBL DL ?

• Two-class TBL DL– Replace “if q then y’y” with “if q then y”– Reverse the rule order

• DL two-class TBL– Replace “if q then y” with “if q then y’y”– Reverse the rule order

does not hold for “dynamic” problems:– Dynamic problem: the anwers to questions are not static: – Ex: in POS tagging, when the tag of a word is changed, it

changes the answers to questions for nearby words.

DT, DL, and TBL (summary)

• K-DT is a proper subset of k-DL.• DL is a proper subset of TBL.

• Extra power of TBL comes from– Transformations are applied in sequence– Results of previous transformations are visible to following

transformations.

• TBL transforms training data. It does not split training data.

• TBL is a processor, not just a classifier

Case study

TBL for POS tagging

• The initial state-annotator: most common tag for a word.

• The space of allowable transformations– Rewrite rules: change cur_tag from X to Y.– Triggering environments (feature types):

unlexicalized or lexicalized

Unlexicalized features

• t-1 is z

• t-1 or t-2 is z

• t-1 or t-2 or t-3 is z

• t-1 is z and t+1 is w

• …

Lexicalized features

• w0 is w.

• w-1 is w

• w-1 or w-2 is w

• t-1 is z and w0 is w.

• …

TBL for POS tagging (cont)

• The objective function: tagging accuracy – for comparing the corpus to the truth: – For choosing a transformation: choose the one that

results in the greatest error reduction.

• The order of applying transformations: left-to-right.

• The results of applying transformations are not visible to other transformations.

Learned transformations

Experiments

Uncovered issues

• Efficient learning algorithms

• Probabilistic TBL– Top-N hypothesis– Confidence measure

TBL Summary

• TBL is more powerful than DL and DT.– Transformations are applied in sequence – It can handle dynamic problems well.

• TBL is more than a classifier– Classification problems: POS tagging– Other problems: e.g., parsing

• TBL performs well because it minimizes errors directly.

• Learning can be expensive various methods

Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

Documents