Bayesian Sets Zoubin Ghahramani and Kathertine A. Heller NIPS 2005 Presented by Qi An Mar. 17 th, 2006.

Bayesian Sets

Zoubin Ghahramani and Kathertine A. Heller

NIPS 2005

Presented by Qi An

Mar. 17th, 2006

Outline

• Introduction

• Bayesian Sets

• Implementation – Binary data– Exponential families

• Experimental results

• Conclusions

Introduction

• Inspired by “GoogleTM Sets”• What do Jesus and Darwin hav

e in common?– Two different views on the origin

of man– There are colleges at Cambridge

University named after them

• The objective is to retrieve items from a concept of cluster, given a query consisting of a few items from that cluster

Introduction

• Consider a universe of items , which can be a set of web pages, movies, people or any other subjects depending on the application

• Make a query of small subset of items , which are assumed be examples of some cluster in the data.

• The algorithm provides a completion to the query set, . It presumably includes all the elements in and other elements in that are also in this cluster.

D

DDc

DDc 'DcD

Introduction

• View the problem from two perspectives:– Clustering on demand

• Unlike other completely unsupervised clustering algorithm, here the query provides supervised hints or constraints as to the membership of a particular cluster.

– Information retrieval • Retrieve the information that are relevant to the

query and rank the output by relevance to the query

Bayesian Sets

• Very simple algorithm• Given and , we aim to rank the ele

ments of by how well they would “fit into” a set which includes

• Define a score for each :

• From Bayes rule, the score can be re-written as:

}{xD DDc

D

cD

)(

)()(

x

xx

p

Dpscore c

Dx

)()(

),()(

c

c

Dpp

Dpscore

x

xx

Bayesian Sets

• Intuitively, the score compares the probability that x and were generated by the same model with the same unknown parameters θ, to the probability that x and came from models with different parameters θ and θ’.

cD

cD

Bayesian Sets

Sparse Binary Data

• Assume each item is a binary vector where each component is a binary variable from an independent Bernoulli distribution:

• The conjugate prior for a Bernoulli distribution is a Beta distribution:

• For a query

where

ci Dx )',,,( 21 iJiii xxx x

cD

Sparse Binary Data

• The score can be computed as:

• If we take a log of the score and put the entire data set into one

large matrix X with J columns, we can compute a vector s of log scores for all points using a single matrix vector multiplication:

where

and

Exponential Families

• If the distribution for the model is not a Bernoulli distribution, but in the form of exponential families:

we can use the conjugate prior:

so that the score is:

Experimental results

• The experiments are performed on three different datasets: the Grolier Encyclopedia dataset, the EachMovie dataset and NIPS authors dataset.

• The running times of the algorithm is very fast on all three datasets:



Conclusions

• A simple algorithm which takes a query of a small set of items and returns additional items from belonging to this set.

• The score is computed w.r.t a statistical model and unknown model parameters are all marginalized out.

• With conjugate priors, the score can be computed exactly and efficiently.

• The methods does well when compared to Google Sets in terms of set completions.

• The algorithm is very flexible in that it can be combined with a wide variety of types of data and probabilistic model.