Scaling the Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../06/tim-hunter-amp-camp-2012-mobile-mill… · Scaling the Mobile Millennium System in the Cloud Timothy

January 11, 2012

Scaling the Mobile Millennium System in the Cloud

Timothy Hunter,Teodor Moldovan, Matei Zaharia, Samy Merzgui, Justin Ma,Michael J. Franklin, Pieter Abbeel, Alexandre M. Bayen

UC Berkeley

January 11, 2012 MM on the cloud - AMPLab retreat winter 2012 2/26

Machine learning at scale

● One goal of the AMPLab software: enable ML researchers to work at scale

● Issues with more traditional frameworks:● Algorithms often iterative in nature● Not taken into account in traditional Map-Reduce● A number of alternatives: Pregel, Twister, HaLoop, Spark

● We report lessons applying one of these frameworks (Spark) to a real-world application (car traffic estimation)

● We identified 3 challenges not widely studied before:● Framework-level memory management● Sharing large parameters● Access to storage system


Plan

● Need for traffic estimation● Overview of Mobile Millennium● Presentation of the algorithm● Programming with the Spark framework● Challenges, current solutions


Need for good traffic estimation

● Traffic congestion affects everyone● Up-to-date estimation is critical● Well-studied in the case of highways● More complex for urban streets (arterial roads)● Most promising source of data: cellphone GPS


Real-time processing of fleet data

● Input: sampled position of taxicabs

● Observed every minute

● Covers the whole SF Bay

● 0.5 Million points / day(60M / day total)

● 0.1 Million road links


Estimating the travel times

● Input: sampled position of taxicabs

● Observed every minute

● Covers the whole SF Bay

● 0.5 Million points / day(60M / day total)

● 0.1 Million road links


Filtering of fleet data

● Trajectories need to be recovered

● Done using a Conditional Random Field

● Output: segments of most likely trajectories between GPS points


Mobile Millennium

● A cyberphysical system for participatory sensing


Mobile Millennium

● A cyberphysical system for participatory sensing

Today's talk:Batch ML jobsoutsourced a cluster

Today's talk:Batch ML jobsoutsourced a cluster


Estimation of arterial traffic

● Input:● Description of road network● Observations: start time, end time, route followed

● Output: probability distributions of travel time● For each link● At different time intervals● parameter vector θ (for example: mean and variance of link

travel time)©

Go

ogle

, In

c.


Estimation of traffic: Graphical Model

Links

Observations

ill

Link states(50k multidimensional variables)

Partial travel time distributions for each link(about 400k-200M variables)

Travel time observations: pairs of travel time and path(about 100k-50M observations)

Hidden random variable

Observed random variable


System workflow

Database

Worker nodes

Master node


System workflow

Start link parameters(on master node)

Observations(distributed, persisted across nodes)


System workflow

Network parameters(distributed over the nodes)


System workflow

Travel time samplesFor each observation link


System workflow

Travel time samples aggregatedon a link basis


System workflow

New parameters are generatedThe maximize sampled travel timesfor each link.

The master collects the vector ofnew parameters.


Using the Spark programming model

● Spark: Open-source cluster computing system● Can persist datasets in memory across cluster● In a fault-tolerant manner● Written in Scala (emphasizes functional programming)

observations = spark.textFile(“hdfs:...”) .map(parseObservation _).cache()params = // Initialize parameterswhile (!converged) { samples = observations.map( obs => generateSamples(obs, params)) params = samples.groupByKey().map( (linkId, vals) => mostLikelyParam(linkId, vals) ).collect()}

Main loop of the program:

Step 1 (E step)

Step 2 (M step)


Challenges


Efficient utilization of memory

● The observation data is stored in memory:● Be careful with the memory footprint● Diagnose when the cluster runs out of memory

● We cache pointer-based structures ● Significant overhead in the JVM

● Solution: keep serialized data in memory● Lesson: Need for more tools understanding memory

bottlenecks● Lesson: Provide more memory-efficient primitives


Broadcast of large parameters

● Need to broadcast data to all workers:● At the start of the job (network description)● Between iterations (updated parameters θ)● Common problem to many ML problems

● Network description larger than 40MB● Solution: “broadcast variables” before starting computations:

● Using Cornet (BitTorrent-like protocol)

network = // load networkbv = spark.broadcast(network)observations = spark.textFile(“...”) .map(parseObservation(_, bv.get()))

network = // load networkobservations = spark.textFile(“...”) .map(parseObservation(_, network))

No broadcast BT broadcast0

500

1000

1500

2000

2500

Data loading time

Load

ing

time

(sec

)


Access to storage system

● Mobile Millennium uses PostgreSQL● Reliable, previous experience, PostGIS extensions

● We ran the DB to the cloud:● Still 75% running time spent on data loading● Bursty access pattern● Small dataset overall (1GB)

● Solution: export data to HDFS● Stale snapshot● Distributed, much faster

● Ideal solution: same storage system for on-site and cloud applications

On-site DB Cloud DB HDFS1

10

100

1000

10000

100000

1000000

Data loading throughput

Ave

rag

e th

rou

gh

pu

t (re

cord

s /

se

c)


Conclusion

● We presented a first test of Spark to a real-world ML problem

● This test exposed some strengths and weaknesses:

● Improved memory management

● Distributing common large parameters

● Difficulty of integration with common storage solutions

● Implementation now faster than real time

● Used to learn more sophisticated travel time distributions

● Evaluating the quality of the output

cores

runtime


Thank you


System workflow

Observations(1M-100M)Persisted in memory

Travel time samples1k per observation

Link distributions(100k)

Link distribution parameters


Estimation of arterial traffic

● We lack travel times on individual links:● One way around it: Expectation Maximization

● Randomly partition the total travel time amongst links● Weight each partition by likelihood according to model● Aggregate weighted samples for each link● For each link, update parameters to maximize

likelihood of link samples

Scaling the Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../06/tim-hunter-amp-camp-2012-mobile-mill… · Scaling the Mobile Millennium System in the Cloud Timothy

Documents