Top Banner
BikeAlert A real time bike sharing station monitor system Kuan-Lin Chen
34
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Kuan lin chen-week5_demo

BikeAlert A real time bike sharing station monitor system

Kuan-Lin Chen

Page 2: Kuan lin chen-week5_demo

Motivation

Page 3: Kuan lin chen-week5_demo

Motivation

Page 4: Kuan lin chen-week5_demo

Example

Page 5: Kuan lin chen-week5_demo
Page 6: Kuan lin chen-week5_demo
Page 7: Kuan lin chen-week5_demo

Station ID: 10 Top 3 full stations: 1: 20 (100%) 4: 19 (95%) 12: 23 (92%)

Page 8: Kuan lin chen-week5_demo

How to solve it?

Page 9: Kuan lin chen-week5_demo

How to solve it?

• Need to know the number of the bike at each station.

Page 10: Kuan lin chen-week5_demo

How to solve it?

• Need to know the number of the bike at each station.

• First attempt: report the number of bikes every minute

Page 11: Kuan lin chen-week5_demo

Current Approach

• Report the number of bikes every minute

Station ID Count

1 5

2 10

3 15

4 16

5 8

Page 12: Kuan lin chen-week5_demo

Current Approach

• Report the number of bikes every minute

• NOT fault-tolerant

Station ID Count

1 5

2 10

3 15

4 16

5 8

Page 13: Kuan lin chen-week5_demo

My Approach

Page 14: Kuan lin chen-week5_demo

My Approach

• Compute the number of bike at each station from the history of the trip logs

Page 15: Kuan lin chen-week5_demo

Station ID Event Timestamp

1 Add 1 bike 2015/06/22 10:07:00

2 Add 1 bike 2015/06/22 10:08:00

1 Remove 1 bike 2015/06/22 10:20:00

3 Add 1 bike 2015/06/22 10:21:00

2 Remove 1 bike 2015/06/22 10:40:00

My Approach

• Compute the number of bike at each station from the history of the trip logs

Page 16: Kuan lin chen-week5_demo

Station ID Event Timestamp

1 Add 1 bike 2015/06/22 10:07:00

2 Add 1 bike 2015/06/22 10:08:00

1 Remove 1 bike 2015/06/22 10:20:00

3 Add 1 bike 2015/06/22 10:21:00

2 Remove 1 bike 2015/06/22 10:40:00

My Approach

• Compute the number of bike at each station from the history of the trip logs

• Raw

Page 17: Kuan lin chen-week5_demo

Station ID Event Timestamp

1 Add 1 bike 2015/06/22 10:07:00

2 Add 1 bike 2015/06/22 10:08:00

1 Remove 1 bike 2015/06/22 10:20:00

3 Add 1 bike 2015/06/22 10:21:00

2 Remove 1 bike 2015/06/22 10:40:00

My Approach

• Compute the number of bike at each station from the history of the trip logs

• Raw • Immutable

Page 18: Kuan lin chen-week5_demo

Station ID Event Timestamp

1 Add 1 bike 2015/06/22 10:07:00

2 Add 1 bike 2015/06/22 10:08:00

1 Remove 1 bike 2015/06/22 10:20:00

3 Add 1 bike 2015/06/22 10:21:00

2 Remove 1 bike 2015/06/22 10:40:00

My Approach

• Compute the number of bike at each station from the history of the trip logs

• Raw • Immutable • Perpetual

Page 19: Kuan lin chen-week5_demo

Data

Page 20: Kuan lin chen-week5_demo

Data

• The actual log data from Bay Area Bike Share have many fields

Page 21: Kuan lin chen-week5_demo

Data

• The actual log data from Bay Area Bike Share have many fields – Trip ID,Duration,Start Date,Start Station,Start

Terminal,End Date,End Station,End Terminal,Bike #,Subscription Type,Zip Code

Page 22: Kuan lin chen-week5_demo

Data

• The actual log data from Bay Area Bike Share have many fields – Trip ID,Duration,Start Date,Start Station,Start

Terminal,End Date,End Station,End Terminal,Bike #,Subscription Type,Zip Code

• For my project, I only need start/end station ID and start/end Date

Page 23: Kuan lin chen-week5_demo

Data

• The actual log data from Bay Area Bike Share have many fields – Trip ID,Duration,Start Date,Start Station,Start

Terminal,End Date,End Station,End Terminal,Bike #,Subscription Type,Zip Code

• For my project, I only need start/end station ID and start/end Date

Page 24: Kuan lin chen-week5_demo

Data

• The actual log data from Bay Area Bike Share have many fields – Trip ID,Duration,Start Date,Start Station,Start

Terminal,End Date,End Station,End Terminal,Bike #,Subscription Type,Zip Code

• For my project, I only need start/end station ID and start/end Date

• So I generated all my data

Page 25: Kuan lin chen-week5_demo

Data

• The actual log data from Bay Area Bike Share has many fields – Trip ID,Duration,Start Date,Start Station,Start

Terminal,End Date,End Station,End Terminal,Bike #,Subscription Type,Zip Code

• For my project, I only need start/end station ID and start/end Date

• So I generated all my data

Page 26: Kuan lin chen-week5_demo

Data Pipeline

KafKa

Spark Streaming

HDFS Spark

Front end service (Flask)

Cassandra

Ingestion

Real time Streaming

Page 27: Kuan lin chen-week5_demo

Data Pipeline

KafKa

Spark Streaming

HDFS Spark

Front end service (Flask)

Cassandra

Ingestion

Real time Streaming

Page 28: Kuan lin chen-week5_demo

Data Pipeline

KafKa

Spark Streaming

HDFS Spark

Front end service (Flask)

Cassandra

Ingestion

Real time Streaming

Page 29: Kuan lin chen-week5_demo

Data Pipeline

KafKa

Spark Streaming

HDFS Spark

Front end service (Flask)

Cassandra

Ingestion

Real time Streaming

Page 30: Kuan lin chen-week5_demo

Demo

• insight-bikealert.com

Page 31: Kuan lin chen-week5_demo

About me • Kuan-Lin Chen

[email protected]

• Master of Engineering in Computer Science, Cornell University, class of 2015

• Bachelor of Science in Computer Science & Math, University of Wisconsin-Madison, class of 2013

Page 32: Kuan lin chen-week5_demo

About me • Kuan-Lin Chen

[email protected]

• Master of Engineering in Computer Science, Cornell University, class of 2015

• Bachelor of Science in Computer Science & Math, University of Wisconsin-Madison, class of 2013

• I was a military police during 2013-2014.

Page 33: Kuan lin chen-week5_demo

Bay Area Bike Share Overview

• Launched on August 29, 2013

–~70 stations

–~700 bikes

–Dock count 11~27, Average = 17.7

• Looking to expand to 7000 bikes by 2017

–Potential big data problem

Page 34: Kuan lin chen-week5_demo

How big could the data be?

• California is divided into 58 counties and contains 482 municipalities (cities or towns).

• Assuming each city has 40 stations, each station has 30 docks but only half of them do have bikes (600 bikes for each city)

• Each bike is used 72 times / day (20 min / trip)

• Each simple log is 30 bytes

• 30*72*2*600*482 = 1.2 GB / day