Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction Saurabh Verma Baranas Hindu University, India Estevam R. Hruschka Jr. Federal University of São Carlos, Brazil ECML/PKDD2012 Bristol, UK September, 26th, 2012
Coupled Bayesian Sets Algorithm for Semi-supervised Learning and
Information Extraction
Saurabh Verma Baranas Hindu University, India
Estevam R. Hruschka Jr.
Federal University of São Carlos, Brazil
ECML/PKDD2012 Bristol, UK September, 26th, 2012
http://rtw.ml.cmu.edu
ECML/PKDD2012 Bristol, UK September, 26th, 2012
NELL: Never-Ending Language Learner
Inputs: l initial ontology l handful of examples of each predicate in ontology l the web l occasional interaction with human trainers
The task:
l run 24x7, forever • each day: 1. extract more facts from the web to populate the initial ontology 2. learn to read (perform #1) better than yesterday
ECML/PKDD2012 Bristol, UK September, 26th, 2012
NELL: Never-Ending Language Learner
Goal: • run 24x7, forever • each day:
1. extract more facts from the web to populate given ontology 2. learn to read better than yesterday
Today... Running 24 x 7, since January, 2010 Input: • ontology defining ~800 categories and relations • 10-20 seed examples of each • 1 billion web pages (ClueWeb – Jamie Callan) Result: • continuously growing KB with +1,300,000 extracted beliefs
ECML/PKDD2012 Bristol, UK September, 26th, 2012
http://rtw.ml.cmu.edu
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Bayesian Sets (BS)
Given and , rank the elements of by how well they would “fit into” a set which includes
Define a score for each : From Bayes rule, the score can be re-written as:
}{x=D DDc ⊂ D
cD
)()(
)(xx
xpDp
score c=
D∈x
)()(),()(c
c
DppDpscore
xxx =
Ghahramani & Heller; NIPS 2005
Bayesian Sets (BS)
Intuitively, the score compares the probability that x and Dc were generated by the same model with the same unknown parameters θ, to the probability that x and Dc came from models with different parameters θ and θ’.
Ghahramani & Heller; NIPS 2005
)()(),()(c
c
DppDpscore
xxx =
Bayesian Sets (BS)
Intuitively, the score compares the probability that x and Dc were generated by the same model with the same unknown parameters θ, to the probability that x and Dc came from models with different parameters θ and θ’.
Ghahramani & Heller; NIPS 2005
)()(),()(c
c
DppDpscore
xxx =
Bayesian Sets (BS)
Intuitively, the score compares the probability that x and Dc were generated by the same model with the same unknown parameters θ, to the probability that x and Dc came from models with different parameters θ and θ’.
Ghahramani & Heller; NIPS 2005
)()(),()(c
c
DppDpscore
xxx =
Bayesian Sets (BS)
Intuitively, the score compares the probability that x and Dc were generated by the same model with the same unknown parameters θ, to the probability that x and Dc came from models with different parameters θ and θ’.
Ghahramani & Heller; NIPS 2005
)()(),()(c
c
DppDpscore
xxx =
BS using NELL’s Ontology
Initial ontology:
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Everything
Person Company Vegetable Sport
Initial ontology:
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Everything
Person Company City Sport
Apple Microsoft Google
IBM Yahoo
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama
Basketball Football
Swimming Tennis Golf
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town
BS using NELL’s Ontology
Given a huge web corpus, run BS once
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Everything
Person Company City Sport Apple
Microsoft Google
IBM Yahoo
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama
Basketball Football
Swimming Tennis Golf
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town
BS using NELL’s Ontology
Given a huge web corpus, run BS once
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Everything
Person Company City Sport Apple
Microsoft Google
IBM Yahoo AT&T
Boeing Brazil Telecom
Texaco Facebook
DELL …
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama Dalai Lama
Freud Tom Mitchell
Aristotle Alan Turing
Alexander Fleming …
Basketball Football
Swimming Tennis Golf
Soccer Volleyball Jogging
Marathon Baseball
Badminton …
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town New York London
Sao Paulo Brisbane Beijing Cairo …
BS using NELL’s Ontology
ECML/PKDD2012 Bristol, UK September, 26th, 2012
BS using NELL’s Ontology
ECML/PKDD2012 Bristol, UK September, 26th, 2012
BS using NELL’s Ontology
ECML/PKDD2012 Bristol, UK September, 26th, 2012
BS using NELL’s Ontology
Given a huge web corpus, iteratively run BS
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Everything
Person Company City Sport Apple
Microsoft Google
IBM Yahoo AT&T
Boeing
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama Dalai Lama
Freud
Basketball Football
Swimming Tennis Golf
Soccer Volleyball
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town New York London
Iterative BS using NELL’s Ontology Zhang & Liu, 2011
Given a huge web corpus, iteratively run BS
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Everything
Person Company City Sport Apple
Microsoft Google
IBM Yahoo AT&T
Boeing Brazil Telecom
Texaco
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama Dalai Lama
Freud Tom Mitchell
Aristotle
Basketball Football
Swimming Tennis Golf
Soccer Volleyball Jogging
Marathon
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town New York London
Sao Paulo Brisbane
Iterative BS using NELL’s Ontology Zhang & Liu, 2011
Given a huge web corpus, iteratively run BS
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Everything
Person Company City Sport Apple
Microsoft Google
IBM Yahoo AT&T
Boeing Brazil Telecom
Texaco Facebook
DELL …
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama Dalai Lama
Freud Tom Mitchell
Aristotle Alan Turing
Alexander Fleming …
Basketball Football
Swimming Tennis Golf
Soccer Volleyball Jogging
Marathon Baseball
Badminton …
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town New York London
Sao Paulo Brisbane Beijing Cairo …
Iterative BS using NELL’s Ontology Zhang & Liu, 2011
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Iterative BS using NELL’s Ontology
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Iterative BS using NELL’s Ontology
NELL: Coupled semi-supervised training of many functions
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Coupled Training Type 2: Structured Outputs, Multitask, Posterior
Regularization, Multilabel Learn functions with the same input, different outputs, where we know some constraint
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Coupled Training Type 2: Structured Outputs, Multitask, Posterior
Regularization, Multilabel Learn functions with the same input, different outputs, where we know some constraint
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Coupled Training Type 2: Structured Outputs, Multitask, Posterior
Regularization, Multilabel Learn functions with the same input, different outputs, where we know some constraint
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Coupled Bayesian Sets (CBS)
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Coupled Bayesian Sets (CBS)
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Coupled Bayesian Sets (CBS)
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Coupled Bayesian Sets (CBS)
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Coupled Bayesian Sets (CBS)
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Coupled Bayesian Sets (CBS)
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Everything
Person Company City Sport Apple
Microsoft Google
IBM Yahoo
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama
Basketball Football
Swimming Tennis Golf
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town
CBS using NELL’s Ontology
Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Everything
Person Company City Sport Apple
Microsoft Google
IBM Yahoo
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama
Basketball Football
Swimming Tennis Golf
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town
CBS using NELL’s Ontology
MutuallyExclusive(Company,Person); MutuallyExclusive(Company,Sport); MutuallyExclusive(Company,City); MutuallyExclusive(Pearson,Sport); …
Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Everything
Person Company City Sport Apple
Microsoft Google
IBM Yahoo
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama
Basketball Football
Swimming Tennis Golf
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town
CBS using NELL’s Ontology
MutuallyExclusive(Company,Person); MutuallyExclusive(Company,Sport); MutuallyExclusive(Company,City); MutuallyExclusive(Pearson,Sport); …
Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Everything
Person Company City Sport Apple
Microsoft Google
IBM Yahoo
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama
Basketball Football
Swimming Tennis Golf
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town
CBS using NELL’s Ontology
MutuallyExclusive(Company,Person); MutuallyExclusive(Company,Sport); MutuallyExclusive(Company,City); MutuallyExclusive(Pearson,Sport); …
Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Everything
Person Company City Sport Apple
Microsoft Google
IBM Yahoo
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama
Basketball Football
Swimming Tennis Golf
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town
CBS using NELL’s Ontology
MutuallyExclusive(Company,Person); MutuallyExclusive(Company,Sport); MutuallyExclusive(Company,City); MutuallyExclusive(Pearson,Sport); …
Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Everything
Person Company City Sport Apple
Microsoft Google
IBM Yahoo
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama
Basketball Football
Swimming Tennis Golf
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town
CBS using NELL’s Ontology
MutuallyExclusive(Company,Person); MutuallyExclusive(Company,Sport); MutuallyExclusive(Company,City); MutuallyExclusive(Pearson,Sport); …
Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Everything
Person Company City Sport Apple
Microsoft Google
IBM Yahoo
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama
Basketball Football
Swimming Tennis Golf
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town
CBS using NELL’s Ontology
MutuallyExclusive(Company,Person); MutuallyExclusive(Company,Sport); MutuallyExclusive(Company,City); MutuallyExclusive(Pearson,Sport); …
Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Everything
Person Company City Sport Apple
Microsoft Google
IBM Yahoo
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama
Basketball Football
Swimming Tennis Golf
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town
CBS using NELL’s Ontology
MutuallyExclusive(Company,Person); MutuallyExclusive(Company,Sport); MutuallyExclusive(Company,City); MutuallyExclusive(Pearson,Sport); …
Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Everything
Person Company City Sport Apple
Microsoft Google
IBM Yahoo
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama
Basketball Football
Swimming Tennis Golf
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town
CBS using NELL’s Ontology
MutuallyExclusive(Company,Person); MutuallyExclusive(Company,Sport); MutuallyExclusive(Company,City); MutuallyExclusive(Pearson,Sport); …
Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Everything
Person Company City Sport Apple
Microsoft Google
IBM Yahoo AT&T
Boeing
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama
Basketball Football
Swimming Tennis Golf
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town
CBS using NELL’s Ontology
Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Everything
Person Company City Sport Apple
Microsoft Google
IBM Yahoo AT&T
Boeing
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama
Basketball Football
Swimming Tennis Golf
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town
CBS using NELL’s Ontology
Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Everything
Person Company City Sport Apple
Microsoft Google
IBM Yahoo AT&T
Boeing
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama
… AT&T
Boeing
Basketball Football
Swimming Tennis Golf
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town
CBS using NELL’s Ontology
Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Everything
Person Company City Sport Apple
Microsoft Google
IBM Yahoo AT&T
Boeing
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama
… AT&T
Boeing
Basketball Football
Swimming Tennis Golf …
AT&T Boeing
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town
CBS using NELL’s Ontology
Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Everything
Person Company City Sport Apple
Microsoft Google
IBM Yahoo AT&T
Boeing
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama
… AT&T
Boeing
Basketball Football
Swimming Tennis Golf …
AT&T Boeing
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town …
AT&T Boeing
CBS using NELL’s Ontology
Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Everything
Person Company City Sport Apple
Microsoft Google
IBM Yahoo AT&T
Boeing
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama
Basketball Football
Swimming Tennis Golf
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town
CBS using NELL’s Ontology
Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Everything
Person Company City Sport Apple
Microsoft Google
IBM Yahoo AT&T
Boeing Brazil Telecom
Texaco
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama
Basketball Football
Swimming Tennis Golf
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town
CBS using NELL’s Ontology
Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Everything
Person Company City Sport Apple
Microsoft Google
IBM Yahoo AT&T
Boeing Brazil Telecom
Texaco
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama
Basketball Football
Swimming Tennis Golf
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town
CBS using NELL’s Ontology
Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Everything
Person Company City Sport Apple
Microsoft Google
IBM Yahoo AT&T
Boeing Brazil Telecom
Texaco
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama
… Brazil Telecom
Texaco
Basketball Football
Swimming Tennis Golf
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town
CBS using NELL’s Ontology
Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Everything
Person Company City Sport Apple
Microsoft Google
IBM Yahoo AT&T
Boeing Brazil Telecom
Texaco
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama
… Brazil Telecom
Texaco
Basketball Football
Swimming Tennis Golf …
Brazil Telecom Texaco
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town
CBS using NELL’s Ontology
Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Everything
Person Company City Sport Apple
Microsoft Google
IBM Yahoo AT&T
Boeing Brazil Telecom
Texaco
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama
… Brazil Telecom
Texaco
Basketball Football
Swimming Tennis Golf …
Brazil Telecom Texaco
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town …
Brazil Telecom Texaco
CBS using NELL’s Ontology
Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Everything
Person Company City Sport Apple
Microsoft Google
IBM Yahoo AT&T
Boeing Brazil Telecom
Texaco
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama
Basketball Football
Swimming Tennis Golf
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town
CBS using NELL’s Ontology
ECML/PKDD2012 Bristol, UK September, 26th, 2012
CBS using NELL’s Ontology
ECML/PKDD2012 Bristol, UK September, 26th, 2012
CBS using NELL’s Ontology
ECML/PKDD2012 Bristol, UK September, 26th, 2012
CBS using NELL’s Ontology
ECML/PKDD2012 Bristol, UK September, 26th, 2012
CBS using NELL’s Ontology
ECML/PKDD2012 Bristol, UK September, 26th, 2012
CBS using NELL’s Ontology
ECML/PKDD2012 Bristol, UK September, 26th, 2012
CBS using NELL’s Ontology
ECML/PKDD2012 Bristol, UK September, 26th, 2012
CBS using NELL’s Ontology
ECML/PKDD2012 Bristol, UK September, 26th, 2012
CBS using NELL’s Ontology
ECML/PKDD2012 Bristol, UK September, 26th, 2012
CBS using NELL’s Ontology
ECML/PKDD2012 Bristol, UK September, 26th, 2012
CBS using NELL’s Ontology
ECML/PKDD2012 Bristol, UK September, 26th, 2012
CBS using NELL’s Ontology
ECML/PKDD2012 Bristol, UK September, 26th, 2012
CBS using NELL’s Ontology
ECML/PKDD2012 Bristol, UK September, 26th, 2012
CBS using NELL’s Ontology
ECML/PKDD2012 Bristol, UK September, 26th, 2012
CBS using NELL’s Ontology What if we do not have the mutual exclusiveness constraints?
Everything
Person Company City Sport Apple
Microsoft Google
IBM Yahoo AT&T
Boeing Brazil Telecom
Texaco …
Great Britain Keyboard
Pencil
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama Dalai Lama
Freud Tom Mitchell
Aristotle …
Basketball Football
Swimming Tennis Golf
Soccer Volleyball Jogging
Marathon …
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town New York London
Sao Paulo Brisbane
…
ECML/PKDD2012 Bristol, UK September, 26th, 2012
CBS using NELL’s Ontology What if we do not have the mutual exclusiveness constraints?
Everything
Person Company City Sport Apple
Microsoft Google
IBM Yahoo AT&T
Boeing Brazil Telecom
Texaco …
Great Britain Keyboard
Pencil
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama Dalai Lama
Freud Tom Mitchell
Aristotle …
Basketball Football
Swimming Tennis Golf
Soccer Volleyball Jogging
Marathon …
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town New York London
Sao Paulo Brisbane
…
Not Company
ECML/PKDD2012 Bristol, UK September, 26th, 2012
CBS using NELL’s Ontology What if we do not have the mutual exclusiveness constraints?
Everything
Person Company City Sport Apple
Microsoft Google
IBM Yahoo AT&T
Boeing Brazil Telecom
Texaco …
Great Britain Keyboard
Pencil
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama Dalai Lama
Freud Tom Mitchell
Aristotle …
Basketball Football
Swimming Tennis Golf
Soccer Volleyball Jogging
Marathon …
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town New York London
Sao Paulo Brisbane
…
Not Company
Great Britain Keyboard
Pencil
ECML/PKDD2012 Bristol, UK September, 26th, 2012
CBS using NELL’s Ontology What if we do not have the mutual exclusiveness constraints?
Everything
Person Company City Sport Apple
Microsoft Google
IBM Yahoo AT&T
Boeing Brazil Telecom
Texaco …
Great Britain Keyboard
Pencil
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama Dalai Lama
Freud Tom Mitchell
Aristotle …
Basketball Football
Swimming Tennis Golf
Soccer Volleyball Jogging
Marathon …
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town New York London
Sao Paulo Brisbane
…
Not Company
Great Britain Keyboard
Pencil
ECML/PKDD2012 Bristol, UK September, 26th, 2012
CBS using NELL’s Ontology What if we do not have the mutual exclusiveness constraints?
Everything
Person Company City Sport Apple
Microsoft Google
IBM Yahoo AT&T
Boeing Brazil Telecom
Texaco …
Great Britain Keyboard
Pencil
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama Dalai Lama
Freud Tom Mitchell
Aristotle …
Basketball Football
Swimming Tennis Golf
Soccer Volleyball Jogging
Marathon …
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town New York London
Sao Paulo Brisbane
…
Not Company
Great Britain Keyboard
Pencil
MutuallyExclusive(Company,NotCompany);
ECML/PKDD2012 Bristol, UK September, 26th, 2012
CBS using NELL’s Ontology What if we do not have the mutual exclusiveness constraints?
Everything
Person Company City Sport Apple
Microsoft Google
IBM Yahoo AT&T
Boeing Brazil Telecom
Texaco …
Great Britain Keyboard
Pencil
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama Dalai Lama
Freud Tom Mitchell
Aristotle …
Basketball Football
Swimming Tennis Golf
Soccer Volleyball Jogging
Marathon …
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town New York London
Sao Paulo Brisbane
…
Not Company
Great Britain Keyboard
Pencil
MutuallyExclusive(Company,NotCompany);
ECML/PKDD2012 Bristol, UK September, 26th, 2012
CBS using NELL’s Ontology What if we do not have the mutual exclusiveness constraints?
Everything
Person Company City Sport Apple
Microsoft Google
IBM Yahoo AT&T
Boeing Brazil Telecom
Texaco …
Great Britain Keyboard
Pencil
Peter Flach Bill Clinton Jeremy Lin
Adele Barak Obama Dalai Lama
Freud Tom Mitchell
Aristotle …
Basketball Football
Swimming Tennis Golf
Soccer Volleyball Jogging
Marathon …
Bristol Pittsburgh
Rio de Janeiro Tokyo
Cape Town New York London
Sao Paulo Brisbane
…
Not Company
Great Britain Keyboard
Pencil
MutuallyExclusive(Company,NotCompany);
ECML/PKDD2012 Bristol, UK September, 26th, 2012
CBS using NELL’s Ontology What if we do not have the mutual exclusiveness constraints?
ECML/PKDD2012 Bristol, UK September, 26th, 2012
CBS using NELL’s Ontology What if we do not have the mutual exclusiveness constraints?
ECML/PKDD2012 Bristol, UK September, 26th, 2012
CBS using NELL’s Ontology What if we do not have the mutual exclusiveness constraints?
ECML/PKDD2012 Bristol, UK September, 26th, 2012
CBS using NELL’s Ontology What if we do not have the mutual exclusiveness constraints?
ECML/PKDD2012 Bristol, UK September, 26th, 2012
CBS using NELL’s Ontology What if we do not have the mutual exclusiveness constraints?
ECML/PKDD2012 Bristol, UK September, 26th, 2012
CBS using NELL’s Ontology What if we do not have the mutual exclusiveness constraints?
ECML/PKDD2012 Bristol, UK September, 26th, 2012
CBS using NELL’s Ontology What if we do not have the mutual exclusiveness constraints?
ECML/PKDD2012 Bristol, UK September, 26th, 2012
CBS using NELL’s Ontology What about Semantic Relations?
ECML/PKDD2012 Bristol, UK September, 26th, 2012
CBS using NELL’s Ontology What about Semantic Relations?
ECML/PKDD2012 Bristol, UK September, 26th, 2012
CBS using NELL’s Ontology What about Semantic Relations?
Conclusions
ECML/PKDD2012 Bristol, UK September, 26th, 2012
Coupled Bayesian Sets • semi-supervised learning approach to extract category
instances (e.g. country(USA), city(New York) from web pages;
• based on the original Bayesian Sets • can outperform algorithms such as the original Bayesian
Set, the Naive Bayes classifier, the Bas-all and the coupled semi-supervised logistic regression algorithm (CPL);
• can be used to automatically generate new constraints to the set expansion task even when no mutually exclusiveness relationship is previously defined
Acknowledgements Thanks to:
ECML/PKDD2012 audience! J Also Thanks to:
• Department of Science and Technology, Government of India under Indo-Brazil cooperation programme;
• Brazilian research agencies CAPES and CNPq;
• Zoubin Ghahramani and K.A. Heller;
• All the Read The Web group;
contact: [email protected] http://rtw.ml.cmu.edu
ECML/PKDD2012 Bristol, UK September, 26th, 2012