January 11, 2012 Scaling the Mobile Millennium System in the Cloud Timothy Hunter, Teodor Moldovan, Matei Zaharia, Samy Merzgui, Justin Ma, Michael J. Franklin, Pieter Abbeel, Alexandre M. Bayen UC Berkeley
January 11, 2012
Scaling the Mobile Millennium System in the Cloud
Timothy Hunter,Teodor Moldovan, Matei Zaharia, Samy Merzgui, Justin Ma,Michael J. Franklin, Pieter Abbeel, Alexandre M. Bayen
UC Berkeley
January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 2/26
Machine learning at scale
● One goal of the AMPLab software: enable ML researchers to work at scale
● Issues with more traditional frameworks:● Algorithms often iterative in nature● Not taken into account in traditional Map-Reduce● A number of alternatives: Pregel, Twister, HaLoop, Spark
● We report lessons applying one of these frameworks (Spark) to a real-world application (car traffic estimation)
● We identified 3 challenges not widely studied before:● Framework-level memory management● Sharing large parameters● Access to storage system
January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 3/26
Plan
● Need for traffic estimation● Overview of Mobile Millennium● Presentation of the algorithm● Programming with the Spark framework● Challenges, current solutions
January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 4/26
Need for good traffic estimation
● Traffic congestion affects everyone● Up-to-date estimation is critical● Well-studied in the case of highways● More complex for urban streets (arterial roads)● Most promising source of data: cellphone GPS
January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 5/26
Real-time processing of fleet data
● Input: sampled position of taxicabs
● Observed every minute
● Covers the whole SF Bay
● 0.5 Million points / day(60M / day total)
● 0.1 Million road links
January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 6/26
Estimating the travel times
● Input: sampled position of taxicabs
● Observed every minute
● Covers the whole SF Bay
● 0.5 Million points / day(60M / day total)
● 0.1 Million road links
January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 7/26
Filtering of fleet data
● Trajectories need to be recovered
● Done using a Conditional Random Field
● Output: segments of most likely trajectories between GPS points
January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 8/26
Mobile Millennium
● A cyberphysical system for participatory sensing
January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 9/26
Mobile Millennium
● A cyberphysical system for participatory sensing
Today's talk:Batch ML jobsoutsourced a cluster
Today's talk:Batch ML jobsoutsourced a cluster
January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 10/26
Estimation of arterial traffic
● Input:● Description of road network● Observations: start time, end time, route followed
● Output: probability distributions of travel time● For each link● At different time intervals● parameter vector θ (for example: mean and variance of link
travel time)©
Go
ogle
, In
c.
January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 11/26
Estimation of traffic: Graphical Model
Links
Observations
ill
Link states(50k multidimensional variables)
Partial travel time distributions for each link(about 400k-200M variables)
Travel time observations: pairs of travel time and path(about 100k-50M observations)
Hidden random variable
Observed random variable
January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 12/26
System workflow
Database
Worker nodes
Master node
January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 13/26
System workflow
Start link parameters(on master node)
Observations(distributed, persisted across nodes)
January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 14/26
System workflow
Network parameters(distributed over the nodes)
January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 15/26
System workflow
Travel time samplesFor each observation link
January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 16/26
System workflow
Travel time samples aggregatedon a link basis
January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 17/26
System workflow
New parameters are generatedThe maximize sampled travel timesfor each link.
The master collects the vector ofnew parameters.
January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 18/26
Using the Spark programming model
● Spark: Open-source cluster computing system● Can persist datasets in memory across cluster● In a fault-tolerant manner● Written in Scala (emphasizes functional programming)
observations = spark.textFile(“hdfs:...”) .map(parseObservation _).cache()params = // Initialize parameterswhile (!converged) { samples = observations.map( obs => generateSamples(obs, params)) params = samples.groupByKey().map( (linkId, vals) => mostLikelyParam(linkId, vals) ).collect()}
Main loop of the program:
Step 1 (E step)
Step 2 (M step)
January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 19/26
Challenges
January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 20/26
Efficient utilization of memory
● The observation data is stored in memory:● Be careful with the memory footprint● Diagnose when the cluster runs out of memory
● We cache pointer-based structures ● Significant overhead in the JVM
● Solution: keep serialized data in memory● Lesson: Need for more tools understanding memory
bottlenecks● Lesson: Provide more memory-efficient primitives
January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 21/26
Broadcast of large parameters
● Need to broadcast data to all workers:● At the start of the job (network description)● Between iterations (updated parameters θ)● Common problem to many ML problems
● Network description larger than 40MB● Solution: “broadcast variables” before starting computations:
● Using Cornet (BitTorrent-like protocol)
network = // load networkbv = spark.broadcast(network)observations = spark.textFile(“...”) .map(parseObservation(_, bv.get()))
network = // load networkobservations = spark.textFile(“...”) .map(parseObservation(_, network))
No broadcast BT broadcast0
500
1000
1500
2000
2500
Data loading time
Load
ing
time
(sec
)
January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 22/26
Access to storage system
● Mobile Millennium uses PostgreSQL● Reliable, previous experience, PostGIS extensions
● We ran the DB to the cloud:● Still 75% running time spent on data loading● Bursty access pattern● Small dataset overall (1GB)
● Solution: export data to HDFS● Stale snapshot● Distributed, much faster
● Ideal solution: same storage system for on-site and cloud applications
On-site DB Cloud DB HDFS1
10
100
1000
10000
100000
1000000
Data loading throughput
Ave
rag
e th
rou
gh
pu
t (re
cord
s /
se
c)
January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 23/26
Conclusion
● We presented a first test of Spark to a real-world ML problem
● This test exposed some strengths and weaknesses:
● Improved memory management
● Distributing common large parameters
● Difficulty of integration with common storage solutions
● Implementation now faster than real time
● Used to learn more sophisticated travel time distributions
● Evaluating the quality of the output
cores
runtime
January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 24/26
Thank you
January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 25/26
System workflow
Observations(1M-100M)Persisted in memory
Travel time samples1k per observation
Link distributions(100k)
Link distribution parameters
January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 26/26
Estimation of arterial traffic
● We lack travel times on individual links:● One way around it: Expectation Maximization
● Randomly partition the total travel time amongst links● Weight each partition by likelihood according to model● Aggregate weighted samples for each link● For each link, update parameters to maximize
likelihood of link samples