Top Banner
Introducing VenmoPlus.com - Explore your Venmo network! Qingpeng “Q.P.” Zhang, Insight Data Engineering Fellow
26

Qingpeng zhang 0713

Apr 07, 2017

Download

Business

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Qingpeng zhang 0713

Introducing VenmoPlus.com - Explore your Venmo network!

Qingpeng “Q.P.” Zhang, Insight Data Engineering Fellow

Page 2: Qingpeng zhang 0713
Page 3: Qingpeng zhang 0713

Features - VenmoPlus.com

● fuzzy searching of user name, with friend list to help identify users with same name

● labeling the relationship between the payer and receiver● friend recommendation● searching transactions in friend circle● listing friends of the user

Page 4: Qingpeng zhang 0713

Features - VenmoPlus.com

● fuzzy searching of user name, with friend list to help identify users with same name

● labeling the relationship between the payer and receiver● friend recommendation● searching transactions in friend circle● listing friends of the user

Page 5: Qingpeng zhang 0713

Demo:VenmoPlus.com

Page 6: Qingpeng zhang 0713

Challenge:● Find the distance between nodes in dynamic graph in real time

Page 7: Qingpeng zhang 0713

Solutions

● Two databases○ Redis and ElasticSearch

● Algorithm design○ BFS -> Bidirectional Search○ Query relationship of a past transaction

● Query/search optimizations

Page 8: Qingpeng zhang 0713

Solutions

● Two databases○ Redis and ElasticSearch

● Algorithm design○ BFS -> Bidirectional Search○ Query relationship of a past transaction

● Query/search optimizations

Page 9: Qingpeng zhang 0713

Historical transactions

Real time transactions

A Tale of Two Databases

API

Page 10: Qingpeng zhang 0713

Redis for graph structure

420890 Graham Hadley

1630476 Leon Tang

810029 Harminder Toor

1371353 Ephraim Park

562884 Paul Min

420890 set(14935158, 562884)

1630476 set(1371353)

810029 set(190230,14935158)

1371353 set(810029,971156)

562884 set(196371,1371353)35 million edges6 million nodes

Page 11: Qingpeng zhang 0713

ElasticSearch for everything

Page 12: Qingpeng zhang 0713

ElasticSearch for everything

Page 13: Qingpeng zhang 0713

Redis

Elasticsearch

Page 14: Qingpeng zhang 0713

Redis + Elasticsearch => search transactions in friend circle

Page 15: Qingpeng zhang 0713

Breadth First Search -> Bidirectional Search

Shortest distance -> intersection of sets (friend lists)

● A’s 1st degree friends ∩ B’s 1st degree friends● A’s 2nd degree friends ∩ B’s 1st degree friends

O(N^2) -> O(2*N)

O(N^3) -> O(N + N^2)

Page 16: Qingpeng zhang 0713

VenmoPlus.com

m4.xlarge

m4.large

m4.xlarge

m4.large

t2.micro

$29.11/day

Page 17: Qingpeng zhang 0713

More optimization

● Only store necessary info in elasticsearch● Labeling distance of history transaction can be done in batch job, reduce

the number the real time queries● Adjust AWS instances to reduce cost

Page 18: Qingpeng zhang 0713

Qingpeng “Q.P.” Zhang

● Postdoc○ Lawrence Berkeley National Lab

● PhD in Computer Science, ○ Michigan State University

What I learned from Insight:

● Thinking as data engineer● Open source tools

○ Redis, Elasticsearch, Kafka, Spark Streaming, Flask, AngularJS, etc.

Page 19: Qingpeng zhang 0713

ElasticSearch for everything

Page 20: Qingpeng zhang 0713

Query relationship of a past transaction

Page 21: Qingpeng zhang 0713

Query relationship of a past transaction

Query distance between vertices in a historic moment in a constantly changing graph (because we don’t pre-calculate the distance….)

● If there are transactions before that one, distance = 1● If the transaction is new: distance >1

○ Remove the influence of that specific transaction temporarily○ Check distance from graph (2, 3, or >3)

Page 22: Qingpeng zhang 0713
Page 23: Qingpeng zhang 0713
Page 24: Qingpeng zhang 0713

Pipeline, raw data, in distributed way

Page 25: Qingpeng zhang 0713

Query/Search Optimizations

1. Remove aggregation for better performance… (trade-off)2. Friend recommender:

a. Using Counter to get only 5 users with the most common friends

3. Search message in friend circlea. Combine query of Elasticsearch and Redis

Page 26: Qingpeng zhang 0713

Historical transactions

Real time transactions

Pipeline

API