Top Banner
TIP MAX Hsiang-Hsuan Hung
16

Hsiang hung

Apr 16, 2017

Download

Data & Analytics

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Hsiang hung

TIPMAX

Hsiang-HsuanHung

Page 2: Hsiang hung

Mo)va)on Helping taxi drivers to max their income

Page 3: Hsiang hung

WebApp:TipMaxhttp://www.tipmaxnyc.xyz

Page 4: Hsiang hung

DataSource

http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml

Page 5: Hsiang hung

Pipeline

Flask

Batch process

Page 6: Hsiang hung

Pipeline

Flask

Batch process

Problem: raw data is not ordered by time and 220GB with 13 billions events

Page 7: Hsiang hung

Real-TimePipeline

Flask

Page 8: Hsiang hung

Real-TimePipeline

Flask

Batch process

Engineer real-time streaming

Page 9: Hsiang hung

Challenges•  Connector between Cassandra and Spark

•  Design primary keys for data query

•  Cleaning data

Page 10: Hsiang hung

Challenges•  Time series forecast?

Page 11: Hsiang hung

AboutMe•  UCSD, Physics PhD 2011

•  U Illinois, ECE 2011-2012

•  U Texas Austin, Physics 2012-2015

•  Computational material science:

•  Programming, travel, fitness….

HPC, e.g. quantum Monte Carlo…

Page 12: Hsiang hung
Page 13: Hsiang hung
Page 14: Hsiang hung
Page 15: Hsiang hung
Page 16: Hsiang hung

Morecomplicatedqueries

•  Will passengers give higher tips during rush hours?

•  Will tips vary by payment type, years and weather, number of passengers?

•  ….....