Page 1
TIPMAX
Hsiang-HsuanHung
Page 2
Mo)va)on Helping taxi drivers to max their income
Page 3
WebApp:TipMaxhttp://www.tipmaxnyc.xyz
Page 4
DataSource
http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml
Page 5
Pipeline
Flask
Batch process
Page 6
Pipeline
Flask
Batch process
Problem: raw data is not ordered by time and 220GB with 13 billions events
Page 7
Real-TimePipeline
Flask
Page 8
Real-TimePipeline
Flask
Batch process
…
Engineer real-time streaming
Page 9
Challenges• Connector between Cassandra and Spark
• Design primary keys for data query
• Cleaning data
Page 10
Challenges• Time series forecast?
Page 11
AboutMe• UCSD, Physics PhD 2011
• U Illinois, ECE 2011-2012
• U Texas Austin, Physics 2012-2015
• Computational material science:
• Programming, travel, fitness….
HPC, e.g. quantum Monte Carlo…
Page 16
Morecomplicatedqueries
• Will passengers give higher tips during rush hours?
• Will tips vary by payment type, years and weather, number of passengers?
• ….....