Top Banner
Jure Leskovec (@jure) Including joint work with J. McAuley, R. Pandey, L. Riedel 1 Jure Leskove, Stanford University & Pinterest
53

SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Jan 19, 2017

Download

Science

sfbiganalytics
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Jure Leskovec (@jure) Including joint work with J. McAuley, R. Pandey, L. Riedel

1 Jure Leskove, Stanford University & Pinterest

Page 2: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Connecting People & Objects

2 Jure Leskove, Stanford University & Pinterest

Page 3: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Internet

Offsite

Save

Do

On Pinterest

Pinterest: Discovery Engine

Visual Discovery Engine

Page 4: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Pins: Rich Objects

4 Jure Leskove, Stanford University & Pinterest

Page 5: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Boards: Collections

5 Jure Leskove, Stanford University & Pinterest

Page 6: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Boards: Collections

Page 7: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Pinners

Boards

Pins

Web Pages

Object Graph

Hyperlink Graph

From Pins to the Object Graph

Page 8: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

30+ Billion Pins categorized by people into more than

750 Million Boards

50% of pins have been created in the last 6 months

8

Page 9: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

How do we uncover relationships

between pins?

9

Page 10: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Object Graph

10

Can we understand how pins fit together

into a giant network?

Jure Leskove, Stanford University & Pinterest

Page 11: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Object Graph: Products

Pins & product catalogs: 10s of millions of products 100s of millions product reviews

How do we build the product graph Three components: Link Prediction Topic models Product hierarchies

11 Jure Leskove, Stanford University & Pinterest

Page 12: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Product Graph: Relations

12

Substitutes: Purchase instead

Complements: Purchase

in addition

Jure Leskove, Stanford University & Pinterest

Page 13: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Product Graph: Description

13

: cleaner; quieter

: cheaper; high power

: well made, easy to install

: fits perfectly, great value Jure Leskove, Stanford University & Pinterest

Page 14: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Product Graph: Overview

14

Substitute Complement

Jure Leskove, Stanford University & Pinterest

Page 15: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Product Graph: What it does?

1. Understand the notions of substitute and complement goods

is substitutable for

complements

15 Jure Leskove, Stanford University & Pinterest

Page 16: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Product Graph: What it does?

2. Generate explanations of why certain products are

preferred

“Good quality, soft, light weight, the colors are

beautiful and exactly like the picture!”

People prefer this because:

16 Jure Leskove, Stanford University & Pinterest

Page 17: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Product Graph: What it does?

3. Recommends baskets of related items

Query: Suggested outfit:

Query: Suggested outfit:

17 Jure Leskove, Stanford University & Pinterest

Page 18: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Product Graph: Overview

Building networks of products

Modeling: Can we use product data to model product relationships?

Understanding: Can we explain why people prefer certain products

over others?

18 Jure Leskove, Stanford University & Pinterest

Page 19: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Problem Setting

Binary prediction task: Given a pair of products, x and y, predict

whether they are related (substitute/complementary)

Goal: Build a probabilistic model

that encodes

19 Jure Leskove, Stanford University & Pinterest

Page 20: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Problem Setting How to learn

from data

Train by maximum likelihood:

20

X Complementary

Not Complementary

Jure Leskove, Stanford University & Pinterest

Page 21: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Attempt 1: Big bags of features

21

Features of product i: [0,0,0,0,0,0,0,1,0,5,0,0,0, … ,0,1,0,0,0,0,0,1,2]

Features of product j: [0,0,0,1,0,0,0,0,0,0,0,1,0, … ,0,0,0,0,0,0,0,1,0]

aardvark zoetrope

Jure Leskove, Stanford University & Pinterest

Page 22: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Attempt 1: Big bags of features

22

Features of product i: [0,0,0,0,0,0,0,1,0,5,0,0,0, … ,0,1,0,0,0,0,0,1,2]

Features of product j: [0,0,0,1,0,0,0,0,0,0,0,1,0, … ,0,0,0,0,0,0,0,1,0]

aardvark zoetrope

Parameterized probability measure (essentially weighted-nearest-neighbor)

Jure Leskove, Stanford University & Pinterest

Page 23: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Attempt 1: Big bags of features

23

Features of product i: [0,0,0,0,0,0,0,1,0,5,0,0,0, … ,0,1,0,0,0,0,0,1,2]

Features of product j: [0,0,0,1,0,0,0,0,0,0,0,1,0, … ,0,0,0,0,0,0,0,1,0]

aardvark zoetrope

• High-dimensional • Prone to overfitting • Too fine-grained

Jure Leskove, Stanford University & Pinterest

Page 24: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Attempt 2: Features from Topics

LDA

Shoes Female

Blei & McAuliffe (2007)

Product topics

Use any kind of product related features:

brand, price, reviews, product descriptions, …

Topic models:

24 Fa

shio

n Jure Leskove, Stanford University & Pinterest

Page 25: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Attempt 2: Features from Topics

Features of product i: [0.1, 0.4, 0.2, 0.1, 0.2] Features of product j: [0.3, 0.1, 0.3, 0.2, 0.1]

Shoes Female

25 Jure Leskove, Stanford University & Pinterest

Page 26: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Attempt 2: Features from Topics

On the right track, but are the topics we are discovering

relevant to link prediction? 26

Features of product i: [0.1, 0.4, 0.2, 0.1, 0.2] Features of product j: [0.3, 0.1, 0.3, 0.2, 0.1]

Shoes Female

Jure Leskove, Stanford University & Pinterest

Page 27: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Attempt 3: Learn “good” topics

Learn to discover topics that explain the graph structure

27 Jure Leskove, Stanford University & Pinterest

Page 28: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Attempt 3: Learn “good” topics

Link Prediction

Product “topics”

Idea: Learn both simultaneously

Discover topics that “explain” product relations

28 Jure Leskove, Stanford University & Pinterest

Page 29: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Attempt 3: Learn “good” topics

Conceptually, we want to learn to project products into topic space such that

related products are nearby 29 Jure Leskove, Stanford University & Pinterest

Page 30: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

The SCEPTRE Model

Combining topic models with link prediction

Topic model with topic distribution 𝜽𝜽 But, the topics should be “good” as features for the link prediction

30 Jure Leskove, Stanford University & Pinterest

Page 31: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

The SCEPTRE Model: Details

31

Topic membership

Jure Leskove, Stanford University & Pinterest

Page 32: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

The SCEPTRE Model

why do people who view X eventually buy Y?

There is a link between the two products because people use similar words to describe them

But in what direction does the link flow?

Issue 1: Relationships we want to learn are not symmetric

32 Jure Leskove, Stanford University & Pinterest

Page 33: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

The SCEPTRE Model

why do people why view X eventually buy Y?

Solution: We solve this issue by learning “relatedness” in addition to “directedness”

Relationships: Explained by product “properties” “baby, pajamas, pants, colorful”

Directedness: Subjective/qualitative language “true size, fits well, items are the same color as on the picture”

33 Jure Leskove, Stanford University & Pinterest

Page 34: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Learning Multiple Graphs

35

browsed together

bought together

Issue 2: We want to learn multiple relationships simultaneously

We could fit two independent models, but learning both at once: 1) Gives us more data on which to train the complete model

2) Helps with interpretability, since both relationships are explained in terms of the same topics

Jure Leskove, Stanford University & Pinterest

Page 35: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Learning Multiple Graphs

36

Solution: We fix this by learning multiple regressors simultaneously (one for each graph),

that operate on a single set of topics

One regressor per graph

Jure Leskove, Stanford University & Pinterest

Page 36: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Sceptre is Not tractable

37

Issue 3: The model has a too many parameters

Thousands of topics multiplied by millions of products

Jure Leskove, Stanford University & Pinterest

Page 37: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Including Hierarchy

Idea: use the category

hierarchy to sparsify the

model

Solution: Product hierarchy

38 Jure Leskove, Stanford University & Pinterest

Page 38: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Including Hierarchy

39

Associate each node in the category tree with a small number of topics:

Now we can fit models with thousands of topics but only 10-20 are active per product

“Car audio” topics (for example) have probability zero of being

selected for this product

Topics at the top of the hierarchy are common to all electronics products, and will contain generic (though electronics

specific) language Jure Leskove, Stanford University & Pinterest

Page 39: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Training the model: EM

40

E-step (topic assignments)

M-step (link prediction)

Other topic/regression parameters (word distribution 𝜙𝜙 and topic assignments z)

Jure Leskove, Stanford University & Pinterest

Page 40: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Building the Product Graph Now, we can generate the product graph by identifying most probable links

For every product, rank all other products according to p(x is related to y)

But this is slow! Quadratic number of comparisons!

Solution: Use product hierarchy and a matching engine

43 Jure Leskove, Stanford University & Pinterest

Page 41: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Experiments Just for fun, let’s use the Amazon

product catalog:

44 Jure Leskove, Stanford University & Pinterest

Page 42: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Edge Prediction Accuracy

45 Jure Leskove, Stanford University & Pinterest

Page 43: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Ranking Performance

Manual examination shows great performance (false positives are actually very relevant)

46 Jure Leskove, Stanford University & Pinterest

Page 44: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Results: Micro-Categories

47 Jure Leskove, Stanford University & Pinterest

Page 45: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Results: Micro-Categories

48 Jure Leskove, Pinterest & Stanford University

Page 46: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Explaining user preferences Explain recommendations by identifying

words that “best explain” the link: Topic model we assign a topic to each word

Logistic regressor uses the words to make predictions

Identify phrases that maximize the likelihood of the link in order to explain it

49

Use the “directedness” model to generate explanations as it selects more subjective language (i.e., how do the products differ, and why was one product “preferable” over another).

Jure Leskove, Stanford University & Pinterest

Page 47: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Example: Product Graph

50 Jure Leskove, Stanford University & Pinterest

Page 48: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Example: Product Graph

51 Jure Leskove, Stanford University & Pinterest

Page 49: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Pinterest as a graph of objects

53

Page 50: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Connecting People & Objects

54 Jure Leskove, Stanford University & Pinterest

Page 51: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Tourist Attractions

Food Sporting Venues

San Francisco

Art Galleries

Pinterest Graph - Example User: ● likes classic art ● just viewed a pin

about things to do in SF Artists

Page 52: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Pinners

Boards

Images

Web Pages

Object Graph

Hyperlink Graph

From Pins to the Object Graph

Page 53: SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

We are hiring!

58

[email protected]

Inferring Networks of Substitutable and Complementary Products by J. McAuley, R. Pandey, J. Leskovec. ACM SIGKDD2015.