Using AWS to Build a Graph-Based Product Recommendation System (BDT303) | AWS re:Invent 2013

Post on 08-Sep-2014

5749 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Magazine Luiza, one of the largest retail chains in Brazil, developed an in-house product recommendation system, built on top of a large knowledge Graph. AWS resources like Amazon EC2, Amazon SQS, Amazon ElastiCache and others made it possible for them to scale from a very small dataset to a huge Cassandra cluster. By improving their big data processing algorithms on their in-house solution built on AWS, they improved their conversion rates on revenue by more than 25 percent compared to market solutions they had used in the past.

Transcript

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Andre Fatala & Renato Pedigoni

November 14, 2013

Using AWS to Build a Graph-based Product Recommendation System

Friday, November 15, 13

About Magazine LuizaAbout Magazine Luiza

Magazine Luiza is one of the largest household appliance retail chains in Brazil. Focused on providing durable goods for Brazil's middle and lower-to-middle income classes.

• 731 stores• 8 distribution centers• more than 23.000 workers• 22.8 million customers• multi-channel strategy

Friday, November 15, 13

Friday, November 15, 13

Recommendation systems

Friday, November 15, 13

Recommendation systems

Friday, November 15, 13

Graphs

Friday, November 15, 13

Graph Stack

Distributed Graph Database Distributed database management system

Friday, November 15, 13

Graph Stack

• Used for OLTP queries

Distributed Graph Database Distributed database management system

Friday, November 15, 13

Graph Stack

• Used for OLTP queries• Native integration with Tinkerpop

Distributed Graph Database Distributed database management system

Friday, November 15, 13

Graph Stack

• Continuously available with no single point of failure• Used for OLTP queries• Native integration with Tinkerpop

Distributed Graph Database Distributed database management system

Friday, November 15, 13

Graph Stack

• Continuously available with no single point of failure• Elastic scalability

• Used for OLTP queries• Native integration with Tinkerpop

Distributed Graph Database Distributed database management system

Friday, November 15, 13

Graph Stack

• Continuously available with no single point of failure• Elastic scalability• Caching layer

• Used for OLTP queries• Native integration with Tinkerpop

Distributed Graph Database Distributed database management system

Friday, November 15, 13

Graph Stack

• Continuously available with no single point of failure• Elastic scalability• Caching layer• Built-in replication

• Used for OLTP queries• Native integration with Tinkerpop

Distributed Graph Database Distributed database management system

Friday, November 15, 13

Storing users data

Cassandra cluster

m2.xlarge m2.xlarge

m2.xlarge m2.xlarge

ElasticLoad Balancing

EC2instance

EC2instance

m2.xlarge m2.xlargeAuto Scaling

API instances

Friday, November 15, 13

Storing users data

Cassandra cluster

m2.xlarge m2.xlarge

m2.xlarge m2.xlarge

ElasticLoad Balancing

EC2instance

EC2instance

m2.xlarge m2.xlargeAuto Scaling

API instances

Friday, November 15, 13

In graph words…

person

Friday, November 15, 13

In graph words…

person session

Friday, November 15, 13

In graph words…

person sessioncreated

Friday, November 15, 13

In graph words…

person

channel

sessioncreated

Friday, November 15, 13

In graph words…

person

channel

sessioncreated

visited

Friday, November 15, 13

In graph words…

person

channel

session

item

created

visited

Friday, November 15, 13

In graph words…

person

channel

session

item

created

visited

viewed

Friday, November 15, 13

In graph words…

person

channel

session

item

created

visited

viewed +1

Friday, November 15, 13

In graph words…

person

channel

session

item

created

visited

+1add_to_cart

Friday, November 15, 13

In graph words…

person

channel

session

item

created

visited

+1add_to_cart +13

Friday, November 15, 13

In graph words…

person

channel

session

item

created

visited

+1+13bought

Friday, November 15, 13

In graph words…

person

channel

session

item

created

visited

+1+13bought +21

Friday, November 15, 13

Friday, November 15, 13

Friday, November 15, 13

Base recommendations

Who viewed this item also viewed

Friday, November 15, 13

Base recommendations

Who viewed this item also viewed

Friday, November 15, 13

Base recommendations

Who bought this item also bought

Friday, November 15, 13

Base recommendations

Bought after viewing this item

Friday, November 15, 13

Base recommendations

Upselling

Friday, November 15, 13

How to query the graph for recs?

Friday, November 15, 13

How to query the graph for recs?

Friday, November 15, 13

Gremlin Graph Language

Friday, November 15, 13

Gremlin Graph Language

• Groovy DSL for graph traversals

Friday, November 15, 13

Gremlin Graph Language

• Groovy DSL for graph traversals• Easy to learn

Friday, November 15, 13

Gremlin Graph Language

• Groovy DSL for graph traversals• Easy to learn• Great community

Friday, November 15, 13

Gremlin Graph Language

• Groovy DSL for graph traversals• Easy to learn• Great community• Part of the Tinkerpop stack

Friday, November 15, 13

Gremlin Graph Language

• Groovy DSL for graph traversals• Easy to learn• Great community• Part of the Tinkerpop stack• Works with any Blueprints enabled graph database

Friday, November 15, 13

People who viewed a product

LED TV 42"

Renato

Fatala

LED TV 40"

LCD TV 42"

LED50"

viewed

viewed

viewed

viewed

viewed

viewed

Friday, November 15, 13

g.v(4).in(‘viewed’)People who viewed a product

LED TV 42"

Renato

Fatala

LED TV 40"

LCD TV 42"

LED50"

viewed

viewed

viewed

viewed

viewed

viewed

Friday, November 15, 13

g.v(4).in(‘viewed’)People who viewed a product

LED TV 42"

Renato

Fatala

LED TV 40"

LCD TV 42"

LED50"

viewed

viewed

viewed

viewed

viewed

viewed

Friday, November 15, 13

g.v(4).in(‘viewed’)People who viewed a product

LED TV 42"

Renato

Fatala

LED TV 40"

LCD TV 42"

LED50"

viewed

viewed

viewed

viewed

viewed

viewed

Friday, November 15, 13

g.v(4).in(‘viewed’)People who viewed a product

LED TV 42"

Renato

Fatala

LED TV 40"

LCD TV 42"

LED50"

viewed

viewed

viewed

viewed

viewed

viewed

Friday, November 15, 13

Who viewed this product also viewed

LED TV 42"

Renato

Fatala

LED TV 40"

LCD TV 42"

LED50"

viewed

viewed

viewed

viewed

viewed

viewed

Friday, November 15, 13

Who viewed this product also viewed

g.v(4).in(‘viewed’).out(‘viewed’)

LED TV 42"

Renato

Fatala

LED TV 40"

LCD TV 42"

LED50"

viewed

viewed

viewed

viewed

viewed

viewed

Friday, November 15, 13

Who viewed this product also viewed

g.v(4).in(‘viewed’).out(‘viewed’)

LED TV 42"

Renato

Fatala

LED TV 40"

LCD TV 42"

LED50"

viewed

viewed

viewed

viewed

viewed

viewed

Friday, November 15, 13

Who viewed this product also viewed

g.v(4).in(‘viewed’).out(‘viewed’)

LED TV 42"

Renato

Fatala

LED TV 40"

LCD TV 42"

LED50"

viewed

viewed

viewed

viewed

viewed

viewed

Friday, November 15, 13

Who viewed this product also viewed

g.v(4).in(‘viewed’).out(‘viewed’)

LED TV 42"

Renato

Fatala

LED TV 40"

LCD TV 42"

LED50"

viewed

viewed

viewed

viewed

viewed

viewed

Friday, November 15, 13

Processing data with Spot Instances

Friday, November 15, 13

Bob

Simple Queue Service(Amazon SQS)

dispatch a task to Amazon SQS

containing the product id

Processing data with Spot Instances

Friday, November 15, 13

EC2instance

m1.large

EC2instance

m1.large

EC2instance

m1.large

Spot instances

Bob

Simple Queue Service(Amazon SQS)

consume Amazon SQS tasks

dispatch a task to Amazon SQS

containing the product id

process W*A*recommendations

Processing data with Spot Instances

Friday, November 15, 13

Simple Storage Service (Amazon S3)

EC2instance

m1.large

EC2instance

m1.large

EC2instance

m1.large

Spot instances

Bob

Simple Queue Service(Amazon SQS)

consume Amazon SQS tasks

dispatch a task to Amazon SQS

containing the product id

process W*A*recommendations

sync logs

sync logs

Processing data with Spot Instances

Friday, November 15, 13

Personalized e-mails

Abandoned cart Price dropped

Friday, November 15, 13

Personalized e-mailsUsers receive e-mails when:

Friday, November 15, 13

Personalized e-mails

• A product has a price drop

Users receive e-mails when:

Friday, November 15, 13

Personalized e-mails

• A product has a price drop• Abandoned a product on cart

Users receive e-mails when:

Friday, November 15, 13

Personalized e-mails

• A product has a price drop• Abandoned a product on cart• Visits many similar products

Users receive e-mails when:

Friday, November 15, 13

Personalized e-mails

Bob

Bob API

Friday, November 15, 13

Personalized e-mails

Bob

Bob API

notifies an

user interactionMailer

Manager

m1.largeSimple Queue Service

(Amazon SQS)

dispatch a task to Amazon SQS

containing the customer id

Bobby Mailer

Friday, November 15, 13

Personalized e-mails

EC2instance

m1.large

EC2instance

m1.large

EC2instance

m1.large

Spot instances

Bob

Bob API

notifies an

user interactionMailer

Manager

m1.largeSimple Queue Service

(Amazon SQS)

consume Amazon SQS tasks

dispatch a task to Amazon SQS

containing the customer id

find the best recommendationfor that user

Bobby Mailer

Friday, November 15, 13

Personalized e-mails

Simple EmailService (Amazon SES)

EC2instance

m1.large

EC2instance

m1.large

EC2instance

m1.large

Spot instances

Bob

Bob API

notifies an

user interactionMailer

Manager

m1.largeSimple Queue Service

(Amazon SQS)

consume Amazon SQS tasks

dispatch a task to Amazon SQS

containing the customer id

find the best recommendationfor that user

Bobby Mailer

send the e-mail

Friday, November 15, 13

Personalized e-mails

Simple Storage Service (Amazon S3)

Simple EmailService (Amazon SES)

EC2instance

m1.large

EC2instance

m1.large

EC2instance

m1.large

Spot instances

Bob

Bob API

notifies an

user interactionMailer

Manager

m1.largeSimple Queue Service

(Amazon SQS)

consume Amazon SQS tasks

dispatch a task to Amazon SQS

containing the customer id

find the best recommendationfor that user

Bobby Mailer

send the e-mail

sync logs

sync logs

Friday, November 15, 13

Analytics with Faunus

Graph Analytics Engine Distributed computing

Amazon EMR

Friday, November 15, 13

Analytics with Faunus

Graph Analytics Engine Distributed computing• Provides graphs input/output formats

Amazon EMR

Friday, November 15, 13

Analytics with Faunus

Graph Analytics Engine Distributed computing• Provides graphs input/output formats and traversal language for graphs

Amazon EMR

Friday, November 15, 13

Analytics with Faunus

Graph Analytics Engine Distributed computing• Distributed processing of large data sets across clusters• Provides graphs input/output formats

and traversal language for graphs

Amazon EMR

Friday, November 15, 13

Analytics with Faunus

Graph Analytics Engine Distributed computing• Distributed processing of large data sets across clusters• Designed to scale

• Provides graphs input/output formats and traversal language for graphs

Amazon EMR

Friday, November 15, 13

Analytics with Faunus

Graph Analytics Engine Distributed computing• Distributed processing of large data sets across clusters• Designed to scale• Detect and handle failures at application layer

• Provides graphs input/output formats and traversal language for graphs

Amazon EMR

Friday, November 15, 13

Analytics in Graphs with AWS

Friday, November 15, 13

Analytics in Graphs with AWS

> g.V.has(‘element_type’, ‘person’).age.mean()34.683232

Friday, November 15, 13

Analytics in Graphs with AWS

> g.V.has(‘element_type’, ‘person’).age.mean()34.683232

Friday, November 15, 13

Analytics in Graphs with AWS

> g.V.has(‘element_type’, ‘person’).age.mean()34.683232

Amazon EMR

Friday, November 15, 13

Backup process

nodetool script Amazon S3

Friday, November 15, 13

Backup process

nodetool script Amazon S3

Friday, November 15, 13

Backup process

nodetool script Amazon S3

Friday, November 15, 13

AmazonRoute 53

InternetGateway

ElasticLoad Balancing

EC2instance

EC2instance

m2.xlarge m2.xlargeAuto Scaling

API instances

Cassandra cluster

Backups

AmazonS3

Logs

AmazonS3

CACHE

AmazonElastiCache

Amazon EMR

m2.xlarge m2.xlarge

m2.xlarge m2.xlarge

Queue Queue Queue

Amazon SQS

EC2instance

m2.xlarge

EC2instance

m2.xlarge

Spot instances

Simple EmailService (Amazon SES)

Infrastructure

Auto Scaling

Friday, November 15, 13

Metrics

Friday, November 15, 13

Metrics

• 4.3 million Magazine Luiza identified customers

Friday, November 15, 13

Metrics

• 4.3 million Magazine Luiza identified customers• 50,000 nodes “products”

Friday, November 15, 13

Metrics

• 4.3 million Magazine Luiza identified customers• 50,000 nodes “products”• 90 million total nodes

Friday, November 15, 13

Metrics

• 4.3 million Magazine Luiza identified customers• 50,000 nodes “products”• 90 million total nodes• 350 million total edges

Friday, November 15, 13

Metrics

• 4.3 million Magazine Luiza identified customers• 50,000 nodes “products”• 90 million total nodes• 350 million total edges• 700 GB of data

Friday, November 15, 13

Metrics

• 4.3 million Magazine Luiza identified customers• 50,000 nodes “products”• 90 million total nodes• 350 million total edges• 700 GB of data• Peaks with 20,000 reads/sec - Cassandra Cluster

Friday, November 15, 13

Results matter…

10x faster 60%

Friday, November 15, 13

Results matter…

January 2013 March 2013 May 2013 July 2013 September 2013

Friday, November 15, 13

Results matter…

Solution A alone

January 2013 March 2013 May 2013 July 2013 September 2013

Friday, November 15, 13

Results matter…

First Bob testsSolution A alone

January 2013 March 2013 May 2013 July 2013 September 2013

Friday, November 15, 13

Results matter…

First Bob tests

Bob out for 2 weeks

Solution A alone

January 2013 March 2013 May 2013 July 2013 September 2013

Friday, November 15, 13

Results matter…

First Bob tests

Bob out for 2 weeks Bob alone

Solution A alone

January 2013 March 2013 May 2013 July 2013 September 2013

Friday, November 15, 13

Results matter…

First Bob tests

Bob alone

January 2013 March 2013 May 2013 July 2013 September 2013

Friday, November 15, 13

Results matter…

First Bob tests

Bob alone

190%

January 2013 March 2013 May 2013 July 2013 September 2013

Friday, November 15, 13

Next steps

Friday, November 15, 13

Next steps

• Use Faunus to pre-process all W*A* recommendations

Friday, November 15, 13

Next steps

• Use Faunus to pre-process all W*A* recommendations• Algorithms to identify communities in graph

Friday, November 15, 13

Next steps

• Use Faunus to pre-process all W*A* recommendations• Algorithms to identify communities in graph• Cassandra replication between regions

Friday, November 15, 13

Please give us your feedback on this presentation

As a thank you, we will select prize winners daily for completed surveys!

BDT303 Thank You

Friday, November 15, 13

top related