Bayesian Sets Zoubin Ghahramani and Kathertine A. Heller NIPS 2005 Presented by Qi An Mar. 17 th , 2006
Dec 30, 2015
Bayesian Sets
Zoubin Ghahramani and Kathertine A. Heller
NIPS 2005
Presented by Qi An
Mar. 17th, 2006
Outline
• Introduction
• Bayesian Sets
• Implementation – Binary data– Exponential families
• Experimental results
• Conclusions
Introduction
• Inspired by “GoogleTM Sets”• What do Jesus and Darwin hav
e in common?– Two different views on the origin
of man– There are colleges at Cambridge
University named after them
• The objective is to retrieve items from a concept of cluster, given a query consisting of a few items from that cluster
Introduction
• Consider a universe of items , which can be a set of web pages, movies, people or any other subjects depending on the application
• Make a query of small subset of items , which are assumed be examples of some cluster in the data.
• The algorithm provides a completion to the query set, . It presumably includes all the elements in and other elements in that are also in this cluster.
D
DDc
DDc 'DcD
Introduction
• View the problem from two perspectives:– Clustering on demand
• Unlike other completely unsupervised clustering algorithm, here the query provides supervised hints or constraints as to the membership of a particular cluster.
– Information retrieval • Retrieve the information that are relevant to the
query and rank the output by relevance to the query
Bayesian Sets
• Very simple algorithm• Given and , we aim to rank the ele
ments of by how well they would “fit into” a set which includes
• Define a score for each :
• From Bayes rule, the score can be re-written as:
}{xD DDc
D
cD
)(
)()(
x
xx
p
Dpscore c
Dx
)()(
),()(
c
c
Dpp
Dpscore
x
xx
Bayesian Sets
• Intuitively, the score compares the probability that x and were generated by the same model with the same unknown parameters θ, to the probability that x and came from models with different parameters θ and θ’.
cD
cD
Bayesian Sets
Sparse Binary Data
• Assume each item is a binary vector where each component is a binary variable from an independent Bernoulli distribution:
• The conjugate prior for a Bernoulli distribution is a Beta distribution:
• For a query
where
ci Dx )',,,( 21 iJiii xxx x
cD
Sparse Binary Data
• The score can be computed as:
• If we take a log of the score and put the entire data set into one
large matrix X with J columns, we can compute a vector s of log scores for all points using a single matrix vector multiplication:
where
and
Exponential Families
• If the distribution for the model is not a Bernoulli distribution, but in the form of exponential families:
we can use the conjugate prior:
so that the score is:
Experimental results
• The experiments are performed on three different datasets: the Grolier Encyclopedia dataset, the EachMovie dataset and NIPS authors dataset.
• The running times of the algorithm is very fast on all three datasets:
Experimental results
Experimental results
Conclusions
• A simple algorithm which takes a query of a small set of items and returns additional items from belonging to this set.
• The score is computed w.r.t a statistical model and unknown model parameters are all marginalized out.
• With conjugate priors, the score can be computed exactly and efficiently.
• The methods does well when compared to Google Sets in terms of set completions.
• The algorithm is very flexible in that it can be combined with a wide variety of types of data and probabilistic model.