HaLoop: Efficient Iterative Data Processing On Large Scale Clusters by Yingyi Bu, Bill Howe, Magdalena Balazinska, & Michael D. Ernst Presentation by Carl Erhard & Zahid Mian 1
Feb 22, 2016
1
HaLoop: Efficient Iterative Data Processing On Large Scale Clusters by Yingyi Bu, Bill Howe, Magdalena Balazinska, & Michael D. Ernst
Presentation by Carl Erhard & Zahid Mian
2
Many of the slides in this presentation were taken from the author’s website, which can be found here: http://www.ics.uci.edu/~yingyib/
Citations
3
• Introduction / Motivation• Example Algorithms Used• Architecture of HaLoop• Task Scheduling• Caching and Indexing• Experiments & Results• Conclusion / Discussion
Agenda
4
Motivation
• MapReduce can’t express recursion/iteration• Lots of interesting programs need loops
– graph algorithms– clustering– machine learning– recursive queries
• Dominant solution: Use a driver program outside of MapReduce
• Hypothesis: making MapReduce loop-aware affords optimization– …and lays a foundation for scalable implementations of
recursive languages
Bill Howe, UW
5
Thesis – Make a Loop Framework
• Observation: MapReduce has proven successful as a common runtime for non-recursive declarative languages– HIVE (SQL)– Pig (RA with nested types)
• Observation: Many people roll their own loops– Graphs, clustering, mining, recursive queries – iteration managed by external script
• Thesis: With minimal extensions, we can provide an efficient common runtime for recursive languages– Map, Reduce, Fixpoint
Bill Howe, UW
6
Example 1: PageRank
url rankwww.a.com
1.0www.b.com
1.0www.c.com
1.0www.d.com
1.0www.e.com
1.0
url_src url_destwww.a.com
www.b.comwww.a.co
mwww.c.comwww.c.co
mwww.a.comwww.e.co
mwww.d.comwww.d.co
mwww.b.comwww.c.co
mwww.e.comwww.e.co
mwww.c.omwww.a.co
mwww.d.com
Rank Table R0
Linkage Table L
url rankwww.a.com
2.13www.b.com
3.89www.c.com
2.60www.d.com
2.60www.e.com
2.13
Rank Table R3
Ri L
Ri.rank = Ri.rank/γurlCOUNT(url_dest)
Ri.url = L.url_src
π(url_dest, γurl_destSUM(rank))
Ri+1
Bill Howe, UW
7
PageRank in MapReduce
M
M
M
M
M
r
r
Ri
L-split1
L-split0M
M
r
r
i=i+1 Converged?
Join & compute rank Aggregate fixpoint evaluation
Client
done
r
r
Bill Howe, UW
8
PageRank in MapReduce
1. L is loaded on each iteration2. L is shuffled on each iteration3. Fixpoint evaluated as a separate MapReduce job per iteration
m
m
m
Ri
L-split1
L-split0M
M
r
r
1.2.
3.
L is loop invariant, but
r
r M
M
r
r
What’s the Problem?
Bill Howe, UW
9
Example 2: Descendant Query
Friend Find all friends within two hops of Eric
{Eric, Elisa}
{Eric, Tom Eric, Harry}
{}
R1
R0 {Eric, Eric}
R2
R3
Bill Howe, UW
10
Descendant Query in MapReduce
M
M
M
M
M
r
r
Si
Friend1
Friend0
i=i+1
Anything new?
JoinDupe-elim
Client
done
r
r
(compute next generation of friends) (remove the ones we’ve already seen)
Bill Howe, UW
11
Descendant Query in MapReduce
1. Friend is loaded on each iteration2. Friend is shuffled on each iteration
Friend is loop invariant, but
M
M
M
M
M
r
r
Si
Friend1
Friend0
JoinDupe-elim
r
r
(compute next generation of friends) (remove the ones we’ve already seen)
1.2.
What’s the Problem?
Bill Howe, UW
12
HaLoop – The Solution
HaLoop offers the following solutions to these problems:
1. A New Programming Model & Architecture for Iterative Programs
2. Loop-Aware Task Scheduling3. Caching for Loop Invariant Data4. Caching for Fixpoint Evaluation
13
HaLoop Architecture
14
HaLoop Architecture
Note: The loop control (i.e. determining when execution has finished) is pushed from the application into the infrastructure.
15
• HaLoop will work given the following is true:
• In other words, the next result is a join of the previous result and loop-invariant data L.
HaLoop Architecture
16
• AddMap and AddReduce Used to add a Map Reduce loop
• SetFixedPointThreshold Set a bound on the distance between
iterations• ResultDistance
A function that returns the distance between iterations
• SetMaxNumOfIterations Set the maximum number of iterations the
loop can take.
HaLoop Programming Interface
17
• SetIterationInput A function which returns the input for a
certain iteration• AddStepInput
A function will allows injection of additional data in between the Map and Reduce
• AddInvariantTable Add a table which is loop invariant
18
API’s in Action – PageRank in HaLoop
19
PageRank In HaLoop
MapRank
ReduceRank
MapAggregate
ReduceAggregate
R0
L
20
PageRank In HaLoop
MapRank
ReduceRank
MapAggregate
ReduceAggregate
R0
L
Source URL
Dest/Rank Source File
a.com 1.0 #2
a.com b.com,c.com,d.com
#1
b.com 1.0 #2
b.com #1
c.com 1.0 #2
c.com a.com, e.com #1
d.com 1.0 #2
d.com b.com #1
e.com 1.0 #2
e.com d.com,c.com #1
R0 U L
Only these values are given to reducer
21
PageRank In HaLoop
MapRank
ReduceRank
MapAggregate
ReduceAggregate
R0
L
R0 U L
Destination New Rank
b.com 1.5
c.com 1.5
d.com 1.5
a.com 1.5
e.com 1.5
b.com 1.5
d.com 1.5
c.com 1.5
Calculate New Rank
22
PageRank In HaLoop
MapRank
ReduceRank
MapAggregate
ReduceAggregate
R0
L
R0 U L
Destination New Rank
b.com 1.5
c.com 1.5
d.com 1.5
a.com 1.5
e.com 1.5
b.com 1.5
d.com 1.5
c.com 1.5
Calculate New Rank
Identity
23
PageRank In HaLoop
MapRank
ReduceRank
MapAggregate
ReduceAggregate
R0
L
R0 U L
Destination New Rank
a.com 1.5
b.com 3.0
c.com 3.0
d.com 3.0
e.com 1.5
Calculate New Rank
Identity Aggregate
R1
24
PageRank In HaLoop
MapRank
ReduceRank
MapAggregate
ReduceAggregate
R0
L
R0 U L
Destination New Rank
a.com 1.5
b.com 3.0
c.com 3.0
d.com 3.0
e.com 1.5
Calculate New Rank
Identity Aggregate
R1
Compare R0 and R1. If not under
threshold, repeat.
Destination New Rank
a.com 1.0
b.com 1.0
c.com 1.0
d.com 1.0
e.com 1.0
25
PageRank In HaLoop
MapRank
ReduceRank
MapAggregate
ReduceAggregate
L
R1 U L Calculate New Rank
Identity AggregateR1
Source URL
Dest/Rank Source File
a.com 1.5 #2
a.com b.com,c.com,d.com
#1
b.com 1.5 #2
b.com #1
c.com 1.5 #2
c.com a.com, e.com #1
d.com 1.5 #2
d.com b.com #1
e.com 1.5 #2
e.com d.com,c.com #1
26
• One goal of HaLoop is to schedule map/reduce tasks on the same machine as the data.– Scheduling the first
iteration is no different than Hadoop.
– Subsequent iterations put tasks that access the same data on the same physical node.
HaLoop Inter-Iteration Locality
27
• The master node keeps a map of node ID Filesystem Partition
• When a node becomes free, the master tries to assign a task related to data contained on that node.
• If a task is required on a node with a full load, it will utilize a nearby node.
HaLoop Scheduling
28
• Mapper Input Cache• Reducer Input Cache• Reducer Output Cache• Why is there no Mapper Output
Cache?• Haloop Indexes Cached Data
Keys and values stored in separate local files
Reduces I/O seek time (forward only)
Caching And Indexing
29
Approach: Inter-iteration caching
Mapper input cache (MI)
Mapper output cache (MO)
Reducer input cache (RI)
Reducer output cache (RO)
M
M
M
r
r
…
Loop body
Bill Howe, UW
30
RI: Reducer Input Cache
Bill Howe, UW
• Provides:– Access to loop invariant data without
map/shuffle• Used By:
– Reducer function• Assumes:
1. Mapper output for a given table constant across iterations
2. Static partitioning (implies: no new nodes)
• PageRank– Avoid shuffling the network at every step
• Transitive Closure– Avoid shuffling the graph at every step
• K-means– No help
…
31
RO: Reducer Output Cache
Bill Howe, UW
• Provides:– Distributed access to output of previous
iterations• Used By:
– Fixpoint evaluation• Assumes:
1.Partitioning constant across iterations2.Reducer output key functionally
determines Reducer input key
• PageRank– Allows distributed fixpoint evaluation– Obviates extra MapReduce job
• Transitive Closure– No help
• K-means– No help
…
32
MI: Mapper Input Cache
Bill Howe, UW
• Provides:– Access to non-local mapper input on later
iterations• Used:
– During scheduling of map tasks• Assumes:
1. Mapper input does not change
• PageRank– Subsumed by use of Reducer Input Cache
• Transitive Closure– Subsumed by use of Reducer Input Cache
• K-means– Avoids non-local data reads on iterations > 0
…
33
• Cache Must be Reconstructed: Hosting Node Fails Hosting Node has Full Node (M/R Job Needs to be Scheduled
on a Different Substitution Node)
• Process is Transparent
Cache Rebuilding
34
Results: Page Rank
-Run only for 10 iterations-Join and aggregate in every iteration-Overhead in first step for caching input
-Catches up soon and outperforms Hadoop.-Low shuffling time: time between RPC invocation by reducer and sorting of keys.
35
Results: Descendant Query
-Join and duplicate elimination in every iteration.-Less striking Performance on LiveJournal: social network, high fan out, excessive duplicate generation which dominates the join cost and reducer input caching is less useful.
36
Reducer Input Cache Benefit
Transitive Closure
Billion Triples Dataset (120GB)
90 small instances on EC2
Overall run time
Bill Howe, UW
37
Reducer Input Cache Benefit
Transitive Closure
Billion Triples Dataset (120GB)
90 small instances on EC2
Overall run time
Livejournal, 12GB
Bill Howe, UW
38
Reducer Input Cache Benefit
Bill Howe, UW
Transitive Closure
Billion Triples Dataset (120GB)
90 small instances on EC2
Reduce and Shuffle of Join Step
Livejournal, 12GB
39
Reducer Input Cache Benefit
Join & compute rank
M
M
M
M
M
r
r
Ri
L-split1
L-split0M
M
r
r
Aggregate fixpoint evaluation
r
r
40
Reducer Output Cache Benefit
Bill Howe, UW
Fixp
oint
eva
luat
ion
(s)
Iteration # Iteration #
Livejournal dataset
50 EC2 small instances
Freebase dataset
90 EC2 small instances
41
Mapper Input Cache Benefit
Bill Howe, UW
5% non-local data reads; ~5% improvement
42
Conclusions
• Relatively simple changes to MapReduce/Hadoop can support arbitrary recursive programs– TaskTracker (Cache management) – Scheduler (Cache awareness)– Programming model (multi-step loop bodies, cache control)
• Optimizations– Caching loop invariant data realizes largest gain– Good to eliminate extra MapReduce step for termination checks– Mapper input cache benefit inconclusive; need a busier cluster
• Future Work– Analyze expressiveness of Map Reduce Fixpoint– Consider a model of Map (Reduce+) Fixpoint
43
The Good …
• Haloop extends MapReduce:– Easier programming of iterative algorithms– Efficiency improvement due to loop
awareness and caching– Lets users reuse major building blocks from
existing application implementations in Hadoop.
– Fully backward compatible with Hadoop.
44
The Questionable …
• Only useful for algorithms which can be expressed as:
• Imposes constraints: fixed partition function for each iteration.
• Does not improve asymptotic running time. Still O(M+R) scheduling decisions, keep O(M*R) state in memory. And more overhead…
• Not completely novel: iMapReduce and Twister.• People still do iteration using traditional Map Reduce.
Google, Nutch, Mahout…
45
BACKUP
46
47
API’s in Action – Descendant Query
48
Descendant Query in HaLoop
MapJoin
ReduceJoin
MapDistinct
ReduceDistinct
∆S0
F
49
Descendant Query in HaLoop
MapJoin
ReduceJoin
MapDistinct
ReduceDistinct
∆S0
F
Eric Eric #2
Eric Elisa #1
Elisa Tom #1
Elisa Harry #1
50
Descendant Query in HaLoop
MapJoin
ReduceJoin
MapDistinct
ReduceDistinct
∆S0
F
Eric Eric 1
Eric Elisa 1
Elisa Tom 1
Elisa Harry 1
Eric Tom 1
Eric Harry 1
Elisa Eric 1
51
Descendant Query in HaLoop
MapJoin
ReduceJoin
MapDistinct
ReduceDistinct
∆S0
F
Eric Eric
Eric Elisa
Elisa Tom
Elisa Harry
Eric Tom
Eric Harry
Elisa Eric
∆S1
52
Descendant Query in HaLoop
MapJoin
ReduceJoin
MapDistinct
ReduceDistinct
∆S1
F
Eric Eric #2
Elisa Eric #2
Tom Eric #2
Harry Eric #2
Tom Elisa #2
Harry Elisa #2
53
Descendant Query in HaLoop
MapJoin
ReduceJoin
MapDistinct
ReduceDistinct
∆S1
F
Eric Eric 2
Elisa Eric 2
Tom Eric 2
Harry Eric 2
Tom Elisa 2
Harry Elisa 2
54
Descendant Query in HaLoop
MapJoin
ReduceJoin
MapDistinct
ReduceDistinct
∆S1
F
Eric Eric 2
Elisa Eric 2
Tom Eric 2
Harry Eric 2
Tom Elisa 2
Harry Elisa 2
Step Input
Eric Eric 1
Eric Elisa 1
Elisa Tom 1
Elisa Harry 1
Eric Tom 1
Eric Harry 1
Elisa Eric 1Eric Eric 0
Eric Eric
Tom Eric
Harry Eric
Tom Elisa
Harry Elisa
Eric Elisa
Eric Tom
Eric Harry
Elisa Tom
Elisa Harry
Step Input allows the union of all iterations to be input
55
Fixpoint Algorithm Example