Importance of Semantic Representation:
Dataless Classification
Ming-Wei Chang Lev Ratinov Dan Roth Vivek Srikumar
University of Illinois, Urbana-Champaign
Slide 2
Text Categorization
Classify the following sentence:
Syd Millar was the chairman of the International Rugby Board in
2003.
Pick a label:
Class1 vs. Class2
Traditionally, we need annotated data to train a classifier
Slide 3
Text Categorization
Humans don’t seem to need labeled data
Syd Millar was the chairman of the International Rugby Board in 2003.
Pick a label:
Sports vs. Finance
Label names carry a lot of information!
Slide 4
Text Categorization
Do we really always need labeled data?
Slide 5
Contributions
We can often go quite far without annotated data … if we “know” the meaning of text
This works for text categorization ….and is consistent across different domains
Slide 6
Outline
Semantic Representation
On-the-fly Classification
Datasets
Exploiting unlabeled data
Robustness to different domains
Slide 7
Outline
Semantic Representation
On-the-fly Classification
Datasets
Exploiting unlabeled data
Robustness to different domains
Slide 8
Semantic Representation
One common representation is the Bag of Words representation
All text is a vector in the space of words.
Slide 9
Semantic Representation
Explicit Semantic Analysis [Gabrilovich & Markovitch, 2006, 2007]
Text is a vector in the space of concepts
Concepts are defined by Wikipedia articles
Slide 10
Explicit Semantic Analysis: Example
Monetary Policy
International Monetary Fund
Monetary policy
Economic and Monetary Union
Hong Kong Monetary Authority
Monetarism
Central bank
ESA representation
IPod mini
IPod photo
IPod nano
Apple Computer
IPod shuffle
ITunes
Apple IPod
ESA representation
Wikipedia article titles
Slide 11
Semantic Representation
Two semantic representations
Bag of words
ESA
Slide 12
Outline
Semantic Representation
On-the-fly Classification
Datasets
Exploiting unlabeled data
Robustness to different domains
Slide 13
Traditional Text Categorization
Sports Finance
Labeled corpus
Semantic space
A classifier
Slide 14
Dataless Classification
Sports Finance
Labeled corpusLabels
What can we do using just the labels?
Slide 15
But labels are text too!
Slide 16
Dataless Classification
Sports Finance
Semantic space
LabelsNew unlabeled
document
Slide 17
What is Dataless Classification?
Humans don’t need training for classification
Annotated training data not always needed
Look for the meaning of words
Slide 18
What is Dataless Classification?
Humans don’t need training for classification
Annotated training data not always needed
Look for the meaning of words
Slide 19
On-the-fly Classification
Sports Finance
Semantic space
LabelsNew unlabeled
document
Slide 20
On-the-fly Classification
No training data needed
We know the meaning of label names
Pick the label that is closest in meaning to the
document
Nearest neighbors
Slide 21
On-the-fly Classification
Hockey Baseball
Semantic space
New labels
New unlabeled
document
Slide 22
On-the-fly Classification
No need to even know labels before hand
Compare with traditional classification Annotated training data for each label
Slide 23
Outline
Semantic Representation
On-the-fly Classification
Datasets
Exploiting unlabeled data
Robustness to different domains
Slide 24
Dataset 1: Twenty Newsgroups
Posts to newsgroups Newsgroups have descriptive names
sci.electronics = Science Electronicsrec.motorbikes = Motorbikes
Slide 25
Dataset 2: Yahoo Answers
Posts to Yahoo! Answers Posts categorized into a two level hierarchy 20 top level categories Totally 280 categories at the second level
Arts and Humanities, Theater ActingSports, Rugby League
Slide 26
Experiments
20 Newsgroups 10 binary problems (from [Raina et al, ‘06])
Religion vs. Politics.guns
Motorcycles vs. MS Windows
Yahoo! Answers 20 binary problems
Health, Diet fitness vs. Health Allergies
Consumer Electronics DVRs vs. Pets Rodents
Slide 27
Results: On-the-fly classification
Dataset Supervised Baseline
Bag of Words
ESA
Newsgroup 71.7 65.7 85.3
Yahoo! 84.3 66.8 88.6
Naïve Bayes classifier
Uses annotated data,
Ignores labels
Nearest neighbors,
Uses labels,
No annotated data
Slide 28
Outline
Semantic Representation
On-the-fly Classification
Datasets
Exploiting unlabeled data
Robustness to different domains
Slide 29
Using Unlabeled Data
Knowing the data collection helps We can learn specific biases of the dataset
Potential for semi-supervised learning
Slide 30
Bootstrapping Each label name is a “labeled” document
One “example” in word or concept space
Train initial classifier Same as the on-the-fly classifier
Loop: Classify all documents with current classifier Retrain classifier with highly confident predictions
Slide 31
Co-training Words and concepts are two independent “views”
Each view is a teacher for the other
[Blum & Mitchell ‘98]
Slide 32
Co-training
Train initial classifiers in word space and concept space
Loop Classify documents with current classifiers Retrain with highly confident predictions of both
classifiers
Slide 33
Using unlabeled data
Three approaches
Bootstrapping with labels using Bag of Words
Bootstrapping with labels using ESA
Co-training
Slide 34
More Results
No annotated data
Co-training using just labels does as well as supervision with 100 examples
Slide 35
Outline
Semantic Representation
On-the-fly Classification
Datasets
Exploiting unlabeled data
Robustness to different domains
Slide 36
Domain Adaptation
Classifiers trained on one domain and tested on another
Performance usually decreases across domains
Slide 37
But the label names are the same Label names don’t depend on the domain
Label names are robust across domains On-the-fly classifiers are domain independent
Slide 38
ExampleBaseball vs. Hockey
Slide 39
Conclusion
Sometimes, label names are tell us more about a class than annotated examples Standard learning practice of treating labels as unique
identifiers loses information
The right semantic representation helps What is the right one?