Leveraging User Libraries to Bootstrap Collaborative Filtering Laurent Charlin, Columbia University Richard Zemel, University of Toronto Hugo Larochelle, Université de Sherbrooke KDD'14 August 2014
Leveraging User Libraries to BootstrapCollaborative Filtering
Laurent Charlin, Columbia UniversityRichard Zemel, University of Toronto
Hugo Larochelle, Université de Sherbrooke
KDD'14August 2014
Motivation
● Difficult to keep up withnew information– Researcher:
● Hundreds of papers arepublished each year at topconferences
● ArXiv.org proposes several new papers in our fieldevery day
– How can you efficientlyfind all interesting papers?
Solution: Recommendations
● Document recommendation– Scientific articles
● Recommending papers to reviewers● Recommending papers to conference attendees
– Books, music
● Novelty: Leverage the libraries of users– Articles: researchers' previously published papers
– Books & music: purchased items
Desiderata
● Want a model which quickly gives goodrecommendations
● Model which performs well for all users– Both new and frequent users
Number of ratings per user
8
Preference Prediction
● Collaborative filtering:– Intuition: User with similar past
preferences are likely to havesimilar future preferences.
– Uses only user preferences
● Shortcoming: – Cannot deal with new users (cold-start regime)
[Salakhutdinov & Mnih'08]
9
Preference Prediction with Side Information
● Side information:– Any information from user and items excluding
preferences.
– E.g., User demographics, item content
– Advantages: ● Better predictions in cold-start regimes● Other available information may be indicative of
preferences (content information about items)
10
Collaborative Score Topic ModelCSTM
1 ? ? 3 ...
? 0 2 2 ...
ratings
2 1 5 ... 1 0 1 ... 4 1 0 ... W
ord
s 1 0 0 2 0 4 W
ord
s
11
Collaborative Score Topic ModelCSTM
● Twin topic models– Topics are shared
– Topic representationsthen live in the samespace
12
Collaborative Score Topic ModelCSTM
● Match representationof documents ( ) tousers' representations( )
● Useful for Cold-start
13
Collaborative Score Topic ModelCSTM
● Per-user regression ondocument features
● Useful for frequentusers
14
Collaborative Score Topic ModelCSTM
● A graphical model ofuser-item preferencesand textual sideinformation:
● User Libraries● Item Content
CSTM
● Relationship to other models– Degeneracies of CSTM correspond to other useful
model (Language & collaborative filtering models)
● Model is learned using EM– Variational inference
● Non-conjugate model● Mean-field for topic realizations● Dirac delta posterior (MAP) for other parameters
Related Work
● Combining item content with collab. filtering– fLDA [Agarwal & Chen'10]
– Collective Topic Regression [Wang & Blei'11]
● Using (user) side information with collab.filtering– Relational learning via collective matrix
factorization [Singh & Gordon'08]
– Regression-based Latent Factor Models [Agarwal &Chen'09]
Datasets
● Conference datasets– Users are reviewers
● User libraries arereviewers' published paper.
– NIPS'10● 48 users, 1251 items
– ICML'12● 433 users, 861 items
– NIPS'13● 1042 users, 1305 papers
● Book dataset– Users are book readers
● User libraries areusers' purchased books
– Kobo● 316 users, 2601 items
Deep Learning
RL/Planning
Bayesian Non parametrics Graphical Models
NeuroscienceOptimization
Large Margin
Preference prediction results(ICML'12)
Constant
Language Models(SI)
PMF (CF)
LR(SI)
CTR(CF+SI)
CSTM(CF+SI)
RM
SE
Book recommendation results
● CSTM outperformsothers in completelycold-start regimes
● Bag of words islimiting
● Reading interestcannot be representedas a mean book
NIPS-10ICML-12 Books
25
Preference Prediction with TextualSide Information
Test
Per
form
ance
Quantity of available user data
Onlinelearningconditionedon previoususers.
Conclusion & Future Work
● Take away– Good performance both in cold and warm start regimes
– User side-information -> Quickly provide good recommendations● Online recommendations
● Future work– Computational
● Faster inference
– Domains● Legislative, images
– How do you generally model different sources of side-info.● Active elicitation