Big & Personal: the data and the models behind Netflix recommendations
Jan 15, 2015
Big & Personal: the data and the models behind Netflix recommendations
Outline
1. The Netflix Prize & the Recommendation Problem
2. Anatomy of Netflix Personalization3. Data & Models4. More data or better Models?
What we were interested in:■ High quality recommendations
Proxy question:■ Accuracy in predicted rating ■ Improve by 10% = $1million!
● Top 2 algorithms still in production
Results
SVD
RBM
What about the final prize ensembles?
■ Our offline studies showed they were too computationally intensive to scale
■ Expected improvement not worth the engineering effort■ Plus…. Focus had already shifted to other issues that
had more impact than rating prediction.
Change of focus
2006 2013
Anatomy of Netflix Personalization
Everything is a Recommendation
Everything is personalized
Note: Recommendations are per household, not individual user
Ranking
Top 10
Personalization awareness
Diversity
DadAll SonDaughterDad&Mom MomAll Daughter MomAll?
Support for Recommendations
Social Support
Social Recommendations
Genre rows
■ Personalized genre rows focus on user interest■ Also provide context and “evidence”■ Important for member satisfaction – moving personalized
rows to top on devices increased retention■ How are they generated?
■ Implicit: based on user’s recent plays, ratings, & other interactions
■ Explicit taste preferences ■ Hybrid:combine the above■ Also take into account:■ Freshness - has this been shown before?■ Diversity– avoid repeating tags and genres, limit number
of TV genres, etc.
Genres - personalization
■ Displayed in many different contexts■ In response to
user actions/context (search, queue add…)
■ More like… rows
Similars
Data&
Models
Big Data @Netflix ■ Almost 40M subscribers■ Ratings: 4M/day■ Searches: 3M/day■ Plays: 30M/day■ 2B hours streamed in Q4
2011■ 1B hours in June 2012■ > 4B hours in Q1 2013
Member Behavior
Geo-informationTime
Impressions
Device Info
Metadata
Social
Smart Models■ Logistic/linear regression■ Elastic nets■ SVD and other MF models■ Factorization Machines■ Restricted Boltzmann Machines■ Markov Chains■ Different clustering approaches■ LDA■ Association Rules■ Gradient Boosted Decision
Trees/Random Forests■ …
SVD
X[n x m] = U[n x r] S [ r x r] (V[m x r])T
■ X: m x n matrix (e.g., m users, n videos)■ U: m x r matrix (m users, r factors)■ S: r x r diagonal matrix (strength of each ‘factor’) (r: rank of the matrix)■ V: r x n matrix (n videos, r factor)
SVD for Rating Prediction
■ User factor vectors and item-factors vector■ Baseline (bias) (user & item deviation from average)■ Predict rating as■ SVD++ (Koren et. Al) asymmetric variation w. implicit feedback
■ Where ■ are three item factor vectors■ Users are not parametrized, but rather represented by:
■ R(u): items rated by user u■ N(u): items for which the user has given implicit preference (e.g. rated vs. not
rated)
Simon Funk’s SVD
■ One of the most interesting findings during the Netflix Prize came out of a blog post
■ Incremental, iterative, and approximate way to compute the SVD using gradient descent
Restricted Boltzmann Machines
■ Restrict the connectivity in ANN to make learning easier.■ Only one layer of hidden units.
■ Although multiple layers are possible
■ No connections between hidden units.■ Hidden units are independent given the visible
states.. ■ RBMs can be stacked to form Deep Belief
Networks (DBN) – 4th generation of ANNs
hidden
i
j
visible
RBM for the Netflix Prize
Ranking Key algorithm, sorts titles in most contexts
Ranking■ Ranking = Scoring + Sorting + Filtering
bags of movies for presentation to a user■ Goal: Find the best possible ordering of a
set of videos for a user within a specific context in real-time
■ Objective: maximize consumption■ Aspirations: Played & “enjoyed” titles have
best score■ Akin to CTR forecast for ads/search results
■ Factors■ Accuracy■ Novelty■ Diversity■ Freshness■ Scalability■ …
Example: Two features, linear model
Example: Two features, linear model
Ranking
Ranking
Ranking
Novelty
Diversity
Freshness
AccuracyScalability
Learning to rank
■ Machine learning problem: goal is to construct ranking model from training data
■ Training data can have partial order or binary judgments (relevant/not relevant).
■ Resulting order of the items typically induced from a numerical score
■ Learning to rank is a key element for personalization■ You can treat the problem as a standard supervised
classification problem
Learning to Rank Approaches
1. Pointwise■ Ranking function minimizes loss function defined on individual
relevance judgment ■ Ranking score based on regression or classification■ Ordinal regression, Logistic regression, SVM, GBDT, …
2. Pairwise■ Loss function is defined on pair-wise preferences■ Goal: minimize number of inversions in ranking■ Ranking problem is then transformed into the binary classification
problem■ RankSVM, RankBoost, RankNet, FRank…
Learning to rank - metrics
■ Quality of ranking measured using metrics as ■ Normalized Discounted Cumulative Gain■ Mean Reciprocal Rank (MRR)■ Fraction of Concordant Pairs (FCP)■ Others…
■ But, it is hard to optimize machine-learned models directly on these measures (they are not differentiable)
■ Recent research on models that directly optimize ranking measures
Learning to Rank Approaches
3. Listwisea. Indirect Loss Function
■ RankCosine: similarity between ranking list and ground truth as loss function■ ListNet: KL-divergence as loss function by defining a probability distribution■ Problem: optimization of listwise loss function may not optimize IR metrics
b. Directly optimizing IR measures (difficult since they are not differentiable)■ Directly optimize IR measures through Genetic Programming or Simulated
Annealing■ Gradient descent on smoothed version of objective function (e.g. CLiMF at
Recsys 2012 or TFMAP at SIGIR 2012)■ SVM-MAP relaxes the MAP metric by adding it to the SVM constraints■ AdaRank uses boosting to optimize NDCG
Other research questions we are interested on
● Row selection○ How to select and rank lists of “related” items imposing inter-
group diversity, avoiding duplicates...● Diversity
○ Can we increase diversity while preserving relevance in a way that we optimize user response?
● Similarity○ How to compute optimal and personalized similarity between
items by using different data that can range from play histories to item metadata
● Context-aware recommendations● Mood and session intent inference● ...
More data or better models?
More data or better models?
Really?
Anand Rajaraman: Stanford & Senior VP at Walmart Global eCommerce (former Kosmix)
Sometimes, it’s not about more data
More data or better models?
[Banko and Brill, 2001]
Norvig: “Google does not have better Algorithms, only more Data”
Many features/ low-bias models
More data or better models?
More data or better models?
Sometimes, it’s not about more data
XMore data or better models?
Data without a sound approach = noise
Conclusions
The Personalization Problem■ The Netflix Prize simplified the recommendation problem
to predicting ratings■ But…
■ User ratings are only one of the many data inputs we have■ Rating predictions are only part of our solution
■ Other algorithms such as ranking or similarity are very important■ We can reformulate the recommendation problem
■ Function to optimize: probability a user chooses something and enjoys it enough to come back to the service
More data + Better models +
More accurate metrics + Better approaches & architectures
Lots of room for improvement!