Coupling Semi-Supervised Learning of Categories and Relations Andrew Carlson, Justin Betteridge, Estevam R. Hruschka Jr., and Tom M. Mitchell Carnegie Mellon University 9/18/2012 CS 652, Peter Lindes 1
Feb 23, 2016
CS 652, Peter Lindes 1
Coupling Semi-Supervised Learning of Categories and Relations
Andrew Carlson, Justin Betteridge, Estevam R. Hruschka Jr., and Tom M. Mitchell
Carnegie Mellon University
9/18/2012
CS 652, Peter Lindes 2
9/18/2012
The Problem“We present an approach to semi-supervised learning that yields more accurate results by coupling the training of many information extractors.”
CS 652, Peter Lindes 39/18/2012
CS 652, Peter Lindes
• Predefined Categories– Unary predicates (instances are noun phrases)– Mutually exclusive relationships– Some subset relationships– Flag: proper nouns, common nouns, or both– 10-20 seed instances – 5 seed patterns (automatically derived - Hearst, 1992)
• Predefined Relations– Binary predicates (an instance is a pair of noun phrases)– Mutually exclusive relationships– 10-20 seed instances– No seed patterns
9/18/2012
CS 652, Peter Lindes 59/18/2012
The Predicates
CS 652, Peter Lindes 6
• Taken from “a 200-million page web crawl”• Filtered for English “using a stop word ratio
threshold”• Filtered out web spam and adult content
“using a ‘bad word’ list”• Segmented, tokenized, and tagged• Noisy sentences filtered out• 514-million sentences used for experiment
9/18/2012
CS 652, Peter Lindes 7
Evaluation• 3 Questions:– “Can CBL iterate many times and still achieve high
precision?”– “How helpful are the types of coupling that we
employ?”– “Can we extend existing semantic resources?”
• 3 Configurations– Full– NS: no sharing of promoted items, seeds shared– NCR: no type checking
9/18/2012
CS 652, Peter Lindes 8
Results - Precision
9/18/2012
Iterations Full NS NCR
5 92 84 89
10 82 70 84
15 83 63 79
Iterations Full NS NCR
5 92 86 74
10 83 76 68
15 84 64 62
Categories
Relations
Precision estimated by human judging of correctness for 30 samples of each predicate.
CS 652, Peter Lindes 9
Results - Recall
9/18/2012
Promoted categories and relations – 15 iterations
“At this stage of development, obtaining high recall is not a priority … it is our hope that high recall will come with time.”
CS 652, Peter Lindes 10
Example Extracted Facts
9/18/2012
“We have presented a method of coupling the semi-supervised learning of categories and relations and demonstrated empirically that the coupling forestalls the problem of semantic drift associated with bootstrap learning methods.”
CS 652, Peter Lindes 11
Comparison to Freebase
9/18/2012
“… our methods can contribute new facts to existing resources.”