SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products
Post on 19-Jan-2017
294 Views
Preview:
Transcript
Jure Leskovec (@jure) Including joint work with J. McAuley, R. Pandey, L. Riedel
1 Jure Leskove, Stanford University & Pinterest
30+ Billion Pins categorized by people into more than
750 Million Boards
50% of pins have been created in the last 6 months
8
Object Graph
10
Can we understand how pins fit together
into a giant network?
Jure Leskove, Stanford University & Pinterest
Object Graph: Products
Pins & product catalogs: 10s of millions of products 100s of millions product reviews
How do we build the product graph Three components: Link Prediction Topic models Product hierarchies
11 Jure Leskove, Stanford University & Pinterest
Product Graph: Relations
12
Substitutes: Purchase instead
Complements: Purchase
in addition
Jure Leskove, Stanford University & Pinterest
Product Graph: Description
13
: cleaner; quieter
: cheaper; high power
: well made, easy to install
: fits perfectly, great value Jure Leskove, Stanford University & Pinterest
Product Graph: What it does?
1. Understand the notions of substitute and complement goods
is substitutable for
complements
15 Jure Leskove, Stanford University & Pinterest
Product Graph: What it does?
2. Generate explanations of why certain products are
preferred
“Good quality, soft, light weight, the colors are
beautiful and exactly like the picture!”
People prefer this because:
16 Jure Leskove, Stanford University & Pinterest
Product Graph: What it does?
3. Recommends baskets of related items
Query: Suggested outfit:
Query: Suggested outfit:
17 Jure Leskove, Stanford University & Pinterest
Product Graph: Overview
Building networks of products
Modeling: Can we use product data to model product relationships?
Understanding: Can we explain why people prefer certain products
over others?
18 Jure Leskove, Stanford University & Pinterest
Problem Setting
Binary prediction task: Given a pair of products, x and y, predict
whether they are related (substitute/complementary)
Goal: Build a probabilistic model
that encodes
19 Jure Leskove, Stanford University & Pinterest
Problem Setting How to learn
from data
Train by maximum likelihood:
20
X Complementary
Not Complementary
Jure Leskove, Stanford University & Pinterest
Attempt 1: Big bags of features
21
Features of product i: [0,0,0,0,0,0,0,1,0,5,0,0,0, … ,0,1,0,0,0,0,0,1,2]
Features of product j: [0,0,0,1,0,0,0,0,0,0,0,1,0, … ,0,0,0,0,0,0,0,1,0]
aardvark zoetrope
Jure Leskove, Stanford University & Pinterest
Attempt 1: Big bags of features
22
Features of product i: [0,0,0,0,0,0,0,1,0,5,0,0,0, … ,0,1,0,0,0,0,0,1,2]
Features of product j: [0,0,0,1,0,0,0,0,0,0,0,1,0, … ,0,0,0,0,0,0,0,1,0]
aardvark zoetrope
Parameterized probability measure (essentially weighted-nearest-neighbor)
Jure Leskove, Stanford University & Pinterest
Attempt 1: Big bags of features
23
Features of product i: [0,0,0,0,0,0,0,1,0,5,0,0,0, … ,0,1,0,0,0,0,0,1,2]
Features of product j: [0,0,0,1,0,0,0,0,0,0,0,1,0, … ,0,0,0,0,0,0,0,1,0]
aardvark zoetrope
• High-dimensional • Prone to overfitting • Too fine-grained
Jure Leskove, Stanford University & Pinterest
Attempt 2: Features from Topics
LDA
Shoes Female
Blei & McAuliffe (2007)
Product topics
Use any kind of product related features:
brand, price, reviews, product descriptions, …
Topic models:
24 Fa
shio
n Jure Leskove, Stanford University & Pinterest
Attempt 2: Features from Topics
Features of product i: [0.1, 0.4, 0.2, 0.1, 0.2] Features of product j: [0.3, 0.1, 0.3, 0.2, 0.1]
Shoes Female
25 Jure Leskove, Stanford University & Pinterest
Attempt 2: Features from Topics
On the right track, but are the topics we are discovering
relevant to link prediction? 26
Features of product i: [0.1, 0.4, 0.2, 0.1, 0.2] Features of product j: [0.3, 0.1, 0.3, 0.2, 0.1]
Shoes Female
Jure Leskove, Stanford University & Pinterest
Attempt 3: Learn “good” topics
Learn to discover topics that explain the graph structure
27 Jure Leskove, Stanford University & Pinterest
Attempt 3: Learn “good” topics
Link Prediction
Product “topics”
Idea: Learn both simultaneously
Discover topics that “explain” product relations
28 Jure Leskove, Stanford University & Pinterest
Attempt 3: Learn “good” topics
Conceptually, we want to learn to project products into topic space such that
related products are nearby 29 Jure Leskove, Stanford University & Pinterest
The SCEPTRE Model
Combining topic models with link prediction
Topic model with topic distribution 𝜽𝜽 But, the topics should be “good” as features for the link prediction
30 Jure Leskove, Stanford University & Pinterest
The SCEPTRE Model
why do people who view X eventually buy Y?
There is a link between the two products because people use similar words to describe them
But in what direction does the link flow?
Issue 1: Relationships we want to learn are not symmetric
32 Jure Leskove, Stanford University & Pinterest
The SCEPTRE Model
why do people why view X eventually buy Y?
Solution: We solve this issue by learning “relatedness” in addition to “directedness”
Relationships: Explained by product “properties” “baby, pajamas, pants, colorful”
Directedness: Subjective/qualitative language “true size, fits well, items are the same color as on the picture”
33 Jure Leskove, Stanford University & Pinterest
Learning Multiple Graphs
35
browsed together
bought together
Issue 2: We want to learn multiple relationships simultaneously
We could fit two independent models, but learning both at once: 1) Gives us more data on which to train the complete model
2) Helps with interpretability, since both relationships are explained in terms of the same topics
Jure Leskove, Stanford University & Pinterest
Learning Multiple Graphs
36
Solution: We fix this by learning multiple regressors simultaneously (one for each graph),
that operate on a single set of topics
One regressor per graph
Jure Leskove, Stanford University & Pinterest
Sceptre is Not tractable
37
Issue 3: The model has a too many parameters
Thousands of topics multiplied by millions of products
Jure Leskove, Stanford University & Pinterest
Including Hierarchy
Idea: use the category
hierarchy to sparsify the
model
Solution: Product hierarchy
38 Jure Leskove, Stanford University & Pinterest
Including Hierarchy
39
Associate each node in the category tree with a small number of topics:
Now we can fit models with thousands of topics but only 10-20 are active per product
“Car audio” topics (for example) have probability zero of being
selected for this product
Topics at the top of the hierarchy are common to all electronics products, and will contain generic (though electronics
specific) language Jure Leskove, Stanford University & Pinterest
Training the model: EM
40
E-step (topic assignments)
M-step (link prediction)
Other topic/regression parameters (word distribution 𝜙𝜙 and topic assignments z)
Jure Leskove, Stanford University & Pinterest
Building the Product Graph Now, we can generate the product graph by identifying most probable links
For every product, rank all other products according to p(x is related to y)
But this is slow! Quadratic number of comparisons!
Solution: Use product hierarchy and a matching engine
43 Jure Leskove, Stanford University & Pinterest
Experiments Just for fun, let’s use the Amazon
product catalog:
44 Jure Leskove, Stanford University & Pinterest
Ranking Performance
Manual examination shows great performance (false positives are actually very relevant)
46 Jure Leskove, Stanford University & Pinterest
Explaining user preferences Explain recommendations by identifying
words that “best explain” the link: Topic model we assign a topic to each word
Logistic regressor uses the words to make predictions
Identify phrases that maximize the likelihood of the link in order to explain it
49
Use the “directedness” model to generate explanations as it selects more subjective language (i.e., how do the products differ, and why was one product “preferable” over another).
Jure Leskove, Stanford University & Pinterest
Tourist Attractions
Food Sporting Venues
San Francisco
Art Galleries
Pinterest Graph - Example User: ● likes classic art ● just viewed a pin
about things to do in SF Artists
We are hiring!
58
jure@pinterest.com
Inferring Networks of Substitutable and Complementary Products by J. McAuley, R. Pandey, J. Leskovec. ACM SIGKDD2015.
top related