Page 1 Multiclass Classification in NLP Name/Entity Recognition Label people, locations, and organizations in a sentence [PER Sam Houston],[born in] [LOC Virginia], [was a member of the] [ORG US Congress]. Decompose into sub-problems Sam Houston, born in Virginia... (PER,LOC,ORG,?) PER (1) Sam Houston, born in Virginia... (PER,LOC,ORG,?) None (0) Sam Houston, born in Virginia... (PER,LOC,ORG,?) LOC (2) Many problems in NLP are decomposed this way Disambiguation tasks POS Tagging Word-sense disambiguation Verb Classification Semantic-Role Labeling
46
Embed
Page 1 Multiclass Classification in NLP Name/Entity Recognition Label people, locations, and organizations in a sentence [PER Sam Houston],[born in]
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1
Multiclass Classification in NLP Name/Entity Recognition
Label people, locations, and organizations in a sentence [PER Sam Houston],[born in] [LOC Virginia], [was a member of the] [ORG US
Congress].
Decompose into sub-problems Sam Houston, born in Virginia... (PER,LOC,ORG,?) PER (1) Sam Houston, born in Virginia... (PER,LOC,ORG,?) None (0) Sam Houston, born in Virginia... (PER,LOC,ORG,?) LOC (2)
Many problems in NLP are decomposed this way Disambiguation tasks
Complex relationships x is more red than blue, but not green
Millions of classes sequence labeling (e.g. POS tagging) LATER
SNoW has an implementation of Constraint Classification for the Multi-Class case. Try to compare with 1-vs-all.
Experimental Issues: when is this version of multi-class better? Several easy improvements are possible via modifying the loss
function.
Page 38
Multi-class ExperimentsPicture isn’t so clear for very high dimensional problems. Why?
Page 39
Summary
OvA Constraint Classification
Learning: independent fi(x) > 0 iff y=i
Evaluation: global h(x) = argmax fi(x)
Learning: global find {fi(x)} s.t. y = argmax fi(x)
Evaluation: global h(x) = argmax fi(x)
Learn + Inference Inference Based Training
Learning: independent fi(x) > 0 iff “i is a part of y”
Evaluation: global Inf h(x) = argmaxy\inC SU fi(x)
Learning: global find {fi(x)} s.t. y = argmax fi(x)
Evaluation: global h(x) = argmax fi(x)
Page 40
Structured Output Learning
Abstract View: Decomposition versus Constraint Classification
More details: Inference with Classifiers
Page 41
Structured Output Learning:Semantic Role Labeling
I left my pearls to my child
A0 : leaver
A1 : thing left
A2 : benefactor
For each verb in a sentence1. Identify all constituents that fill a
semantic role
2. Determine their roles• Core Arguments, e.g., Agent, Patient or
Instrument• Their adjuncts, e.g., Locative, Temporal
or Manner
Y : All possible ways to label the treeC(Y): All valid ways to label the treeargmaxy C(Y) g(x,y)
Page 42
Components of Structured Output Learning
Input: X Output: A collection of variables
Y = (y1,...,yL) {1,...,K}L
Length is example dependent Constraints on the Output C(Y)
e.g. non-overlapping, no repeated values... partition output to valid and invalid assignments
Representation scoring function: g(x,y) e.g. linear: g(x,y) = w (x,y)
Inference h(x) = argmaxvalid y g(x,y)
y3
y2
y1
Y
I left mypearls to my child X
Page 43
Decomposition-based Learning
Many choices for decomposition Depends on problem, learning model, computation resources,
etc...
Value-based decomposition A function for each output value
fk(x,l), k = {1,..,K}
e.g. SRL tagging fA0(x,node), fA1(x,node),...
OvA learning fk(x,node) > 0 iff k=y
Page 44
Learning Discriminant Functions: The General Setting g(x,y) > g(x,y’) y’ Y \ y w (x,y) > w (x,y’) y’ Y \ y w (x,y,y’) = w ((x,y) - (x,y’)) > 0 P(x,y) = {(x,y,y’)} y’ Y \ y
P(S) = {P(x,y)}(x,y) S
Learn unary classifer over P(S) (binary) (+P(S),-P(S))
Used in many works [C02,WW00,CS01,CM03,TGK03]
Page 45
Learn a collection of “scoring” functions wA0A0(x,y,n) , wA1A1(x,y,n),...
scorev(x,y,n) = wvv(x,y,n)
Global score g(x,y) = n scoreyn(x,y,n) = n wynyn(x,y,n)
Learn locally (LO, L+I) for each label variable (node) n = A0
gA0(x,y,n) = wA0A0(x,y,n) > 0 iff yn = A0
Discriminant model dictates: g(x,y) > g(x,y’), y C(Y) argmaxy C(Y) g(x,y)
Learn Globally (IBT) g(x,y) = w (x,y)
Structured Output Learning:Semantic Role Labeling
I left mypearls to my child
scoreNONE(3)
scoreA2(13)
Page 46
SummaryOvA Constraint Classification
Learning: Independent fi(x) > 0 iff y=i
Evaluation: global h(x) = argmax fi(x)
Learning: global find {fi(x)} s.t. y = argmax fi(x)
Evaluation: global h(x) = argmax fi(x)
Learn + Inference Inference Based Training
Learning: Independent fi(x) > 0 iff “i is a part of y”
Evaluation: global Inference h(x) = Inference {fi(x)}
Efficient Learning
Learning: global find {fi(x)} s.t. Y = Inference {fi(x)}
Evaluation: global inference h(x) = Inference {fi(x)}