Top Banner
STREAM PROCESSING IN UBER MARKETPLACE
85

Streaming Processing in Uber Marketplace for Kafka Summit 2016

Apr 05, 2017

Download

Internet

Danny Yuan
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Streaming Processing in Uber Marketplace for Kafka Summit 2016

STREAM PROCESSING IN UBER MARKETPLACE

Page 2: Streaming Processing in Uber Marketplace for Kafka Summit 2016

~ 68 countries / 350+ cities Transportation as reliable as running water, everywhere, for everyone

2

Page 3: Streaming Processing in Uber Marketplace for Kafka Summit 2016

AgendaWhat’s on the menu?

•Use Cases •Problem Space •Overall Architecture •Choices & Tradeoffs •Q & A

Page 4: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Use Case: Realtime OLAP

Page 5: Streaming Processing in Uber Marketplace for Kafka Summit 2016

There is always need for quick exploration

Page 6: Streaming Processing in Uber Marketplace for Kafka Summit 2016

How many open cars in the world, NOW?

Page 7: Streaming Processing in Uber Marketplace for Kafka Summit 2016
Page 8: Streaming Processing in Uber Marketplace for Kafka Summit 2016

How many UberXs were driving clients in SF in the past 10 minutes by hexagons?

Page 9: Streaming Processing in Uber Marketplace for Kafka Summit 2016

How many UberXs were driving clients in SF in the past 10 minutes by hexagons?

Page 10: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Driving time and other metrics over time by hexagonal area

Page 11: Streaming Processing in Uber Marketplace for Kafka Summit 2016
Page 12: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Use Case: Complex Event Processing

Page 13: Streaming Processing in Uber Marketplace for Kafka Summit 2016

There are patterns in event streams

Page 14: Streaming Processing in Uber Marketplace for Kafka Summit 2016

How many drivers cancel requests more than 3 times in a row within a 10-

minute window?

Page 15: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Report riders requesting a pickup 100 miles apart within a half hour window?

Page 16: Streaming Processing in Uber Marketplace for Kafka Summit 2016

IF

This —>

Then that —>

● Sigma is similar - but for offline/batch applications

Complex Event Processing

Page 17: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Use Case: Supply Positioning

Page 18: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Clusters Of Supply & Demand

Page 19: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Predicted Health Metrics

Actual Health Metrics

Monitor Marketplace Health

Page 20: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Challenges

Page 21: Streaming Processing in Uber Marketplace for Kafka Summit 2016

OLAP of Geo-spatial Temporal Data

Reasonably Large Scale

Near Real Time

Page 22: Streaming Processing in Uber Marketplace for Kafka Summit 2016

• Indexing, Lookup, Rendering

• Symmetric Neighbors

• Convex & Compact Regions

• Equal Areas

• Equal Shape

Hexagons

Page 23: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Scale

Geo Space Vehicle Types Time Status

X X X

Page 24: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Granular Geo Areas

Page 25: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Granular Geo Areas

Over 10,000 hexagons in a city

Page 26: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Multiple Vehicle Types

7 vehicle types

Page 27: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Minute-level Time Buckets

1440 minutes in a day

Page 28: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Many Driver States

13 driver states

Page 29: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Many Cities

300 cities

Page 30: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Granular Data

1 day of data: 300 x 10,000 x 7 x 1440 x 13 = 393 billion possible combinations

Page 31: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Unknown Query Patterns

Any combination of dimensions

Page 32: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Variety of Aggregations - Heatmap

- Top N

- Histogram

- count(), avg(), sum(), percent(), geo

Page 33: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Large Data Volume

• Hundreds of thousands of events per second

• At least dozens of fields in each event

Page 34: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Multiple TopicsRider States Driver States

Page 35: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Let’s build a stream processing pipeline

Page 36: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Accurate Statistics

• E.g., can’t over count

Page 37: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Pipeline Template

Page 38: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Event Collection

Page 39: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Multiple Event Types with Different Volume

Page 40: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Hundreds of Thousands of Events Per Second

Page 41: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Events Should Be Available Under a Second

Page 42: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Events Should Rarely Get Lost

Page 43: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Multiple Consumers

Page 44: Streaming Processing in Uber Marketplace for Kafka Summit 2016
Page 45: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Natural Choice: Apache Kafka

- Low latency and high throughput

- Persistent events

- Distributes a topic by partitions

- Groups consumers by consumer groups

Page 46: Streaming Processing in Uber Marketplace for Kafka Summit 2016
Page 47: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Event Processing

Page 48: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Transformation

Page 49: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Event Transformation Example

(Lat, Long) -> (zipcode, hexagon, S2)

Page 50: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Pre-aggregation

Page 51: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Joining Multiple Streams

Page 52: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Sessionization

Page 53: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Multi-Staged Processing

Page 54: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Minimum Requirements

- Statement Management

- Checkpointing

- Automatic Resource Management

- Multi-staged processing

Page 55: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Apache Samza

Page 56: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Why Apache Samza? - DAG on Kafka

- Excellent integration with Kafka

- Built-in checkpointing

- Built-in state management

- Excellent support from our data team

Page 57: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Samza Is Conceptually Simple

Page 58: Streaming Processing in Uber Marketplace for Kafka Summit 2016

IF

This —>

Then that —>

● Sigma is similar - but for offline/batch applications

Complex Event Processing

Page 59: Streaming Processing in Uber Marketplace for Kafka Summit 2016

● Sigma is similar - but for offline/batch applications

Complex Event Processing

Page 60: Streaming Processing in Uber Marketplace for Kafka Summit 2016

● Sigma is similar - but for offline/batch applications

Complex Event Processing

Page 61: Streaming Processing in Uber Marketplace for Kafka Summit 2016

● Sigma is similar - but for offline/batch applications

Complex Event Processing

Page 62: Streaming Processing in Uber Marketplace for Kafka Summit 2016

● Sigma is similar - but for offline/batch applications

Complex Event Processing

Page 63: Streaming Processing in Uber Marketplace for Kafka Summit 2016

● Sigma is similar - but for offline/batch applications

Slightly Expanded Version

Page 64: Streaming Processing in Uber Marketplace for Kafka Summit 2016

● Sigma is similar - but for offline/batch applications

Slightly Expanded Version

Page 65: Streaming Processing in Uber Marketplace for Kafka Summit 2016

● Sigma is similar - but for offline/batch applications

Slightly Expanded Version

Page 66: Streaming Processing in Uber Marketplace for Kafka Summit 2016

● Sigma is similar - but for offline/batch applications

Slightly Expanded Version

Page 67: Streaming Processing in Uber Marketplace for Kafka Summit 2016
Page 68: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Applications

Page 69: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Dashboard of Realtime Business Metrics

Page 70: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Ad-Hoc Queries

Page 71: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Visualization with Streaming

Page 72: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Visualization with Streaming

LocationUpdatewherecity=X

LocationUpdatewherecity=Yandvehicle=‘UberX’

100%

100%

100%

10%

5%

Page 73: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Visualization with Streaming

LocationUpdatewherecity=X

LocationUpdatewherecity=Yandvehicle=‘UberX’

100%

100%

100%

10%

5%

Page 74: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Visualization with Streaming

LocationUpdatewherecity=X

LocationUpdatewherecity=Yandvehicle=‘UberX’

100%

100%

100%

10%

5%

Page 75: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Visualization with Streaming

LocationUpdatewherecity=X

LocationUpdatewherecity=Yandvehicle=‘UberX’

100%

100%

100%

10%

5%

Page 76: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Visualization with Streaming

LocationUpdatewherecity=X

LocationUpdatewherecity=Yandvehicle=‘UberX’

100%

100%

100%

10%

5%

Page 77: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Visualization with Streaming

LocationUpdatewherecity=‘SF’

LocationUpdatewherecity=‘LA’andvehicle

10%

5%

100% 100%

Page 78: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Ad-hoc Exploration

Page 79: Streaming Processing in Uber Marketplace for Kafka Summit 2016

A Few Trade-Offs

Page 80: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Lambda vs Kappa

Page 81: Streaming Processing in Uber Marketplace for Kafka Summit 2016

We Use Lambda - Spark + HDFS/S3 for batch processing - Yes, it is painful, but

- We may need to go way back due to change of business requirements

- Batch process can run faster — they scale differently - It was not easy to start a new stream processing instance

Page 82: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Processing by Event Time Is Not Always Easy

Page 83: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Leverage The Storage Layer

Page 84: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Dealing with Limitation of Samza -No broadcasting. We have to override SystemStreamPartitionGrouper

-No dynamic topology. Can’t have arbitrary number of

nested CEP queries

-Tedious configuration and deployment of jobs. In house

code-gem and deployment solution

Page 85: Streaming Processing in Uber Marketplace for Kafka Summit 2016

Thank You