Linear Submodular Bandits and their Application to Diversified Retrieval Yisong Yue (CMU) & Carlos Guestrin (CMU) Optimizing Recommender Systems • Every day, users come to news portal • For each user, • News portal recommends L articles to cover the user’s interests • Users provide feedback (clicks, ratings, “likes”). • System integrates feedback for future use. Challenge 1: Making Diversified Recommendations • Should recommend optimally diversified sets of articles. •“Israel implements unilateral Gaza cease- fire :: WRAL.com” •“Israel unilaterally halts fire, rockets persist” •“Gaza truce, Israeli pullout begin | Latest News” •“Hamas announces ceasefire after Israel declares truce - …” •“Hamas fighters seek to restore order in Gaza Strip - World - Wire …” •“Israel implements unilateral Gaza cease- fire :: WRAL.com” •“Obama vows to fight for middle class” •“Citigroup plans to cut 4500 jobs” •“Google Android market tops 10 billion downloads” •“UC astronomers discover two largest black holes ever found” Challenge 2: Personalization Modeling Diversity via Submodular Utility Functions • We assume a set of D concepts or topics • Users are modeled by how interested they are in each topic • Let F i (A) denote the how well set of articles A covers topic i. (“topic coverage function”) • We model user utility as F(A|w) = w T [F 1 (A), …, F D (A)] Linear Submodular Bandits Problem At each iteration t: • A set of available articles, A t •Each article represented using D submodular basis functio • Algorithm selects a set of L articles A t • Algorithm recommends A t to user, receives feedback •Assumptions: •Pr(like | a,A) = w T Δ(a|A) (conditional submodular independenc •Regret: (1-1/e)OPT – sum of rewards Goal: recommend a set of articles that optimally covers topics that interest the user. Each topic coverage function F i (A) is monotone submodular! A function F is submodular if i.e., the benefit of recommending a second (redundant) article is smaller than adding the first. Properties of Submodular Functions • Sums of submodular functions are submodular • So F(A|w) is submodular • Exact inference is NP-hard! • Greedy algorithm yields (1-1/e) approximation bound • Incremental gains are locally linear! • Both properties will be exploited by our online learning algorithm ) | ( ) | ( ) | ( A a w w A F w a A F T ) ( ) ( ) ( ) ( ) ( ) ( ) | ( 2 2 1 1 A F a A F A F a A F A F a A F A a D D We address two challenges: • Diversified recommendations • Exploration for personalization Example: Probabilistic Coverage LSBGreedy News Recommender User Study OR ? • Different users have different interests • Can only learn interests by recommending and receiving feedbac • Exploration versus exploitation dilemma • We model this as a bandit problem! Mean Estimate by Topic Uncertainty of Estimate • 10 days, 10 articles per day • Compared against • Multi. Weighting (no exploration) [El- Arini et al, ‘09] • Ranked Bandits + LinUCB (reduction approach, does not directly model diversity) [Radlinski et al, ’08; Li et al., ‘10] Comparing learned weights for two sessions (LSBGreedy vs MW) • 1 st session, MW overfits to “world “ topic • 2 nd session, user liked few articles, and MW did not learn anything • Maintain mean and confidence interval of user’s interests • Greedily recommend articles with highest upper confidence utility • In example below, chooses article about economy Theorem: with probability 1- δ average regret shrinks as T T L D O log Comparison Win / Tie / Loss Gain per Day % Likes LSBGreedy vs Static 24 / 0 / 0 1.07 67% LSBGreedy vs MW 24 / 1 / 1 0.54 63% LSBGreedy vs RankLinUCB 21 / 2 / 4 0.58 61% + A a i a i P A F ) | ( 1 1 ) ( • Each article a has probability P(i|a) of covering topic I • Define topic coverage function for set A as • Straightforward to show that F is monotone submodular [El-Arini et al., ‘09]