Magazine Luiza, one of the largest retail chains in Brazil, developed an in-house product recommendation system, built on top of a large knowledge Graph. AWS resources like Amazon EC2, Amazon SQS, Amazon ElastiCache and others made it possible for them to scale from a very small dataset to a huge Cassandra cluster. By improving their big data processing algorithms on their in-house solution built on AWS, they improved their conversion rates on revenue by more than 25 percent compared to market solutions they had used in the past.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Using AWS to Build a Graph-based Product Recommendation System
Friday, November 15, 13
About Magazine LuizaAbout Magazine Luiza
Magazine Luiza is one of the largest household appliance retail chains in Brazil. Focused on providing durable goods for Brazil's middle and lower-to-middle income classes.
• 731 stores• 8 distribution centers• more than 23.000 workers• 22.8 million customers• multi-channel strategy
Friday, November 15, 13
Friday, November 15, 13
Recommendation systems
Friday, November 15, 13
Recommendation systems
Friday, November 15, 13
Graphs
Friday, November 15, 13
Graph Stack
Distributed Graph Database Distributed database management system
Friday, November 15, 13
Graph Stack
• Used for OLTP queries
Distributed Graph Database Distributed database management system
Friday, November 15, 13
Graph Stack
• Used for OLTP queries• Native integration with Tinkerpop
Distributed Graph Database Distributed database management system
Friday, November 15, 13
Graph Stack
• Continuously available with no single point of failure• Used for OLTP queries• Native integration with Tinkerpop
Distributed Graph Database Distributed database management system
Friday, November 15, 13
Graph Stack
• Continuously available with no single point of failure• Elastic scalability
• Used for OLTP queries• Native integration with Tinkerpop
Distributed Graph Database Distributed database management system
Friday, November 15, 13
Graph Stack
• Continuously available with no single point of failure• Elastic scalability• Caching layer
• Used for OLTP queries• Native integration with Tinkerpop
Distributed Graph Database Distributed database management system
Friday, November 15, 13
Graph Stack
• Continuously available with no single point of failure• Elastic scalability• Caching layer• Built-in replication
• Used for OLTP queries• Native integration with Tinkerpop
Distributed Graph Database Distributed database management system
Friday, November 15, 13
Storing users data
Cassandra cluster
m2.xlarge m2.xlarge
m2.xlarge m2.xlarge
ElasticLoad Balancing
EC2instance
EC2instance
m2.xlarge m2.xlargeAuto Scaling
API instances
Friday, November 15, 13
Storing users data
Cassandra cluster
m2.xlarge m2.xlarge
m2.xlarge m2.xlarge
ElasticLoad Balancing
EC2instance
EC2instance
m2.xlarge m2.xlargeAuto Scaling
API instances
Friday, November 15, 13
In graph words…
person
Friday, November 15, 13
In graph words…
person session
Friday, November 15, 13
In graph words…
person sessioncreated
Friday, November 15, 13
In graph words…
person
channel
sessioncreated
Friday, November 15, 13
In graph words…
person
channel
sessioncreated
visited
Friday, November 15, 13
In graph words…
person
channel
session
item
created
visited
Friday, November 15, 13
In graph words…
person
channel
session
item
created
visited
viewed
Friday, November 15, 13
In graph words…
person
channel
session
item
created
visited
viewed +1
Friday, November 15, 13
In graph words…
person
channel
session
item
created
visited
+1add_to_cart
Friday, November 15, 13
In graph words…
person
channel
session
item
created
visited
+1add_to_cart +13
Friday, November 15, 13
In graph words…
person
channel
session
item
created
visited
+1+13bought
Friday, November 15, 13
In graph words…
person
channel
session
item
created
visited
+1+13bought +21
Friday, November 15, 13
Friday, November 15, 13
Friday, November 15, 13
Base recommendations
Who viewed this item also viewed
Friday, November 15, 13
Base recommendations
Who viewed this item also viewed
Friday, November 15, 13
Base recommendations
Who bought this item also bought
Friday, November 15, 13
Base recommendations
Bought after viewing this item
Friday, November 15, 13
Base recommendations
Upselling
Friday, November 15, 13
How to query the graph for recs?
Friday, November 15, 13
How to query the graph for recs?
Friday, November 15, 13
Gremlin Graph Language
Friday, November 15, 13
Gremlin Graph Language
• Groovy DSL for graph traversals
Friday, November 15, 13
Gremlin Graph Language
• Groovy DSL for graph traversals• Easy to learn
Friday, November 15, 13
Gremlin Graph Language
• Groovy DSL for graph traversals• Easy to learn• Great community
Friday, November 15, 13
Gremlin Graph Language
• Groovy DSL for graph traversals• Easy to learn• Great community• Part of the Tinkerpop stack
Friday, November 15, 13
Gremlin Graph Language
• Groovy DSL for graph traversals• Easy to learn• Great community• Part of the Tinkerpop stack• Works with any Blueprints enabled graph database
Friday, November 15, 13
People who viewed a product
LED TV 42"
Renato
Fatala
LED TV 40"
LCD TV 42"
LED50"
viewed
viewed
viewed
viewed
viewed
viewed
Friday, November 15, 13
g.v(4).in(‘viewed’)People who viewed a product
LED TV 42"
Renato
Fatala
LED TV 40"
LCD TV 42"
LED50"
viewed
viewed
viewed
viewed
viewed
viewed
Friday, November 15, 13
g.v(4).in(‘viewed’)People who viewed a product
LED TV 42"
Renato
Fatala
LED TV 40"
LCD TV 42"
LED50"
viewed
viewed
viewed
viewed
viewed
viewed
Friday, November 15, 13
g.v(4).in(‘viewed’)People who viewed a product
LED TV 42"
Renato
Fatala
LED TV 40"
LCD TV 42"
LED50"
viewed
viewed
viewed
viewed
viewed
viewed
Friday, November 15, 13
g.v(4).in(‘viewed’)People who viewed a product
LED TV 42"
Renato
Fatala
LED TV 40"
LCD TV 42"
LED50"
viewed
viewed
viewed
viewed
viewed
viewed
Friday, November 15, 13
Who viewed this product also viewed
LED TV 42"
Renato
Fatala
LED TV 40"
LCD TV 42"
LED50"
viewed
viewed
viewed
viewed
viewed
viewed
Friday, November 15, 13
Who viewed this product also viewed
g.v(4).in(‘viewed’).out(‘viewed’)
LED TV 42"
Renato
Fatala
LED TV 40"
LCD TV 42"
LED50"
viewed
viewed
viewed
viewed
viewed
viewed
Friday, November 15, 13
Who viewed this product also viewed
g.v(4).in(‘viewed’).out(‘viewed’)
LED TV 42"
Renato
Fatala
LED TV 40"
LCD TV 42"
LED50"
viewed
viewed
viewed
viewed
viewed
viewed
Friday, November 15, 13
Who viewed this product also viewed
g.v(4).in(‘viewed’).out(‘viewed’)
LED TV 42"
Renato
Fatala
LED TV 40"
LCD TV 42"
LED50"
viewed
viewed
viewed
viewed
viewed
viewed
Friday, November 15, 13
Who viewed this product also viewed
g.v(4).in(‘viewed’).out(‘viewed’)
LED TV 42"
Renato
Fatala
LED TV 40"
LCD TV 42"
LED50"
viewed
viewed
viewed
viewed
viewed
viewed
Friday, November 15, 13
Processing data with Spot Instances
Friday, November 15, 13
Bob
Simple Queue Service(Amazon SQS)
dispatch a task to Amazon SQS
containing the product id
Processing data with Spot Instances
Friday, November 15, 13
EC2instance
m1.large
EC2instance
m1.large
EC2instance
m1.large
Spot instances
…
Bob
Simple Queue Service(Amazon SQS)
consume Amazon SQS tasks
dispatch a task to Amazon SQS
containing the product id
process W*A*recommendations
Processing data with Spot Instances
Friday, November 15, 13
Simple Storage Service (Amazon S3)
EC2instance
m1.large
EC2instance
m1.large
EC2instance
m1.large
Spot instances
…
Bob
Simple Queue Service(Amazon SQS)
consume Amazon SQS tasks
dispatch a task to Amazon SQS
containing the product id
process W*A*recommendations
sync logs
sync logs
Processing data with Spot Instances
Friday, November 15, 13
Personalized e-mails
Abandoned cart Price dropped
Friday, November 15, 13
Personalized e-mailsUsers receive e-mails when:
Friday, November 15, 13
Personalized e-mails
• A product has a price drop
Users receive e-mails when:
Friday, November 15, 13
Personalized e-mails
• A product has a price drop• Abandoned a product on cart
Users receive e-mails when:
Friday, November 15, 13
Personalized e-mails
• A product has a price drop• Abandoned a product on cart• Visits many similar products
Graph Analytics Engine Distributed computing• Provides graphs input/output formats and traversal language for graphs
Amazon EMR
Friday, November 15, 13
Analytics with Faunus
Graph Analytics Engine Distributed computing• Distributed processing of large data sets across clusters• Provides graphs input/output formats
and traversal language for graphs
Amazon EMR
Friday, November 15, 13
Analytics with Faunus
Graph Analytics Engine Distributed computing• Distributed processing of large data sets across clusters• Designed to scale
• Provides graphs input/output formats and traversal language for graphs
Amazon EMR
Friday, November 15, 13
Analytics with Faunus
Graph Analytics Engine Distributed computing• Distributed processing of large data sets across clusters• Designed to scale• Detect and handle failures at application layer
• Provides graphs input/output formats and traversal language for graphs
• 4.3 million Magazine Luiza identified customers• 50,000 nodes “products”
Friday, November 15, 13
Metrics
• 4.3 million Magazine Luiza identified customers• 50,000 nodes “products”• 90 million total nodes
Friday, November 15, 13
Metrics
• 4.3 million Magazine Luiza identified customers• 50,000 nodes “products”• 90 million total nodes• 350 million total edges
Friday, November 15, 13
Metrics
• 4.3 million Magazine Luiza identified customers• 50,000 nodes “products”• 90 million total nodes• 350 million total edges• 700 GB of data
Friday, November 15, 13
Metrics
• 4.3 million Magazine Luiza identified customers• 50,000 nodes “products”• 90 million total nodes• 350 million total edges• 700 GB of data• Peaks with 20,000 reads/sec - Cassandra Cluster
Friday, November 15, 13
Results matter…
10x faster 60%
Friday, November 15, 13
Results matter…
January 2013 March 2013 May 2013 July 2013 September 2013
Friday, November 15, 13
Results matter…
Solution A alone
January 2013 March 2013 May 2013 July 2013 September 2013
Friday, November 15, 13
Results matter…
First Bob testsSolution A alone
January 2013 March 2013 May 2013 July 2013 September 2013
Friday, November 15, 13
Results matter…
First Bob tests
Bob out for 2 weeks
Solution A alone
January 2013 March 2013 May 2013 July 2013 September 2013
Friday, November 15, 13
Results matter…
First Bob tests
Bob out for 2 weeks Bob alone
Solution A alone
January 2013 March 2013 May 2013 July 2013 September 2013
Friday, November 15, 13
Results matter…
First Bob tests
Bob alone
January 2013 March 2013 May 2013 July 2013 September 2013
Friday, November 15, 13
Results matter…
First Bob tests
Bob alone
190%
January 2013 March 2013 May 2013 July 2013 September 2013
Friday, November 15, 13
Next steps
Friday, November 15, 13
Next steps
• Use Faunus to pre-process all W*A* recommendations
Friday, November 15, 13
Next steps
• Use Faunus to pre-process all W*A* recommendations• Algorithms to identify communities in graph
Friday, November 15, 13
Next steps
• Use Faunus to pre-process all W*A* recommendations• Algorithms to identify communities in graph• Cassandra replication between regions
Friday, November 15, 13
Please give us your feedback on this presentation
As a thank you, we will select prize winners daily for completed surveys!