Page 1
DISTRIBUTED RESEARCH ON EMERGING APPLICATIONS & MACHINESdream-lab.in | Indian Institute of Science, BangaloreDISTRIBUTED RESEARCH ON EMERGING APPLICATIONS & MACHINESdream-lab.in | Indian Institute of Science, Bangalore
DREAM:LabDREAM:Lab
©DREAM:Lab, 2014This work is licensed under a Creative Commons Attribution 4.0 International License
DISTRIBUTED RESEARCH ON EMERGING APPLICATIONS & MACHINESdream-lab.in | Indian Institute of Science, Bangalore
DREAM:Lab
SE252:Lecture 13-14, Feb 24/25ILO3:Algorithms and Programming
Patterns for Cloud Applications (Hadoop)
DREAM:Lab
Page 2
DREAM:LabDREAM:LabDREAM:Lab
ILO 3
•
•
•
•
•
Page 3
DREAM:LabDREAM:LabDREAM:Lab
Patterns & Technologies
Page 4
DREAM:LabDREAM:LabDREAM:Lab
MapReduce
Page 5
DREAM:LabDREAM:LabDREAM:Lab
MapReduce Design Pattern
Page 6
DREAM:LabDREAM:LabDREAM:Lab
MapReduce: Data-parallel Programming Model
map(ki, vi) List<km, vm>[]••
reduce(km, List<vm>[]) List<kr, vr>[]••
Page 7
DREAM:LabDREAM:LabDREAM:Lab
MR Borrows from Functional Programming
••
•
•
•
•
Page 8
MapReduce & MPI Scatter-Gather
Page 9
DREAM:LabDREAM:LabDREAM:Lab
MapReduce: Programming Model
How nowBrown cow
How doesIt work now
brown 1cow 1does 1How 2it 1now 2work 1
<How,1><now,1><brown,1><cow,1><How,1><does,1><it,1><work,1><now,1>
<How,1 1><now,1 1><brown,1><cow,1><does,1><it,1><work,1>
Map(k1,v1) → list(k2,v2)Reduce(k2, list(v2)) → list(v2)
Page 10
DREAM:LabDREAM:LabDREAM:Lab
Map
•
•
Page 11
DREAM:LabDREAM:LabDREAM:Lab
Map
map(String input_key, String input_value):
// input_key: line number
// input_value: line of text
for each Word w in input_value.tokenize()
EmitIntermediate(w, "1");
(0, “How now brown cow”) →
[(“How”, 1), (“now”, 1), (“brown”, 1), (“cow”, 1)]
Page 12
DREAM:LabDREAM:LabDREAM:Lab
let map(k, v) = emit(k.toUpper(), v.toUpper())
(“foo”, “bar”) → (“FOO”, “BAR”)
(“Foo”, “other”) → (“FOO”, “OTHER”)
(“key2”, “data”) → (“KEY2”, “DATA”)
let map(k, v) =
if (isPrime(v)) then emit(k, v)
(“foo”, 7) → (“foo”, 7)
(“test”, 10) → (nothing)
Page 13
DREAM:LabDREAM:LabDREAM:Lab
Reduce
Page 14
DREAM:LabDREAM:LabDREAM:Lab
Reduce
reduce(String output_key, Iterator intermediate_values)
// output_key: a word
// output_values: a list of counts
int sum = 0;
for each v in intermediate_values
sum += ParseInt(v);
Emit(output_key, AsString(sum));
(“A”, [1, 1, 1]) → (“A”, 3)
(“B”, [1, 1]) → (“B”, 2)
Page 15
for each w
in value do
emit(w,1)
How nowBrown cow
How doesIt work now
for all w in
value do
emit(w,1)
<How,1><now,1><brown,1><cow,1>
<How,1><does,1><it,1><work,1><now,1>
<How,1 1><now,1 1>
<brown,1><cow,1>
<does,1><it,1><work,1>
How 2now 2
does 1it 1work 1
brown 1cow 1
sum =
sum + value
emit(key,sum)
Page 16
DREAM:LabDREAM:LabDREAM:Lab
Anagram Example
public class AnagramMapper extends MapReduceBase implementsMapper<LongWritable, Text, Text, Text> {
private Text sortedText = new Text();private Text orginalText = new Text(); public void map(LongWritable key, Text value,
OutputCollector<Text, Text> outputCollector, Reporter reporter) {
String word = value.toString();char[] wordChars = word.toCharArray();Arrays.sort(wordChars);String sortedWord = new String(wordChars);sortedText.set(sortedWord);orginalText.set(word);// Sort word and emit <sorted word, word>outputCollector.collect(sortedText, orginalText);
}}
Page 17
Anagram Example…public void reduce(Text anagramKey, Iterator<Text> anagramValues,
OutputCollector<Text, Text> results, Reporter reporter) {String output = "";while(anagramValues.hasNext()) {
Text anagram = anagramValues.next();output = output + anagram.toString() + "~";
}StringTokenizer outputTokenizer =
new StringTokenizer(output,"~");// if the values contain more than one word // we have spotted a anagram.if(outputTokenizer.countTokens()>=2) {
output = output.replace("~", ",");outputKey.set(anagramKey.toString());outputValue.set(output);results.collect(outputKey, outputValue);
}}
Page 18
DREAM:LabDREAM:LabDREAM:Lab
5-min Assignment
Page 19
DREAM:LabDREAM:LabDREAM:Lab
MapReduce for Histogram
int bucketWidth = 4Map(k, v) {
emit(v/bucketWidth, 1)}
Reduce(k, v[]){sum=0;foreach(w in v[]) sum++;emit(k, sum)
}
7296025
21103540
111162181
246810110
1,10,12,11,10,10,11,1
0,10,12,10,11,11,10,1
2,12,11,10,10,12,10,1
0,11,11,12,12,12,10,1
2,12,12,12,12,12,12,12,1
0,10,10,10,10,10,1
1,11,11,11,11,11,11,11,1
0,10,10,10,10,10,1
2,8 0,12 1,8
Page 20
DREAM:LabDREAM:LabDREAM:Lab
MapReduce for Histogramint bucketWidth = 4Map(k, v) {
emit(v/bucketWidth, 1)}
Combine(k, v[]) {// same code as reducer
}
Reduce(k, v[]){sum=0;foreach(w in v[]) sum+=w;emit(k, sum)
}
8296025
21103540
111162181
246810110
2,10,12,11,1
0,10,12,10,1
2,12,11,10,1
0,11,11,12,1
0,10,11,1
1,11,10,1
0,12,10,1
2,12,10,1
2,3 1,4 0,7 2,6 1,3 0,5
2,12,12,1
1,11,11,11,1
0,10,10,10,1
0,10,10,1
2,12,12,1
1,11,11,1
0,10,10,10,10,1
2,12,12,1
2,32,6
1,41,3
0,70,5
2,9 1,7 0,12
Page 21
DREAM:LabDREAM:LabDREAM:Lab
Hadoop Execution Model
Page 22
DREAM:LabDREAM:LabDREAM:Lab
Hadoop MapReduce & HDFS
•
Page 23
DREAM:LabDREAM:LabDREAM:Lab
HDFS Read/Write
Page 24
DREAM:LabDREAM:LabDREAM:Lab
Scheduling a MR Job
Page 25
DREAM:LabDREAM:LabDREAM:Lab
MapReduce w/ 1 & N Reducers
Page 26
DREAM:LabDREAM:LabDREAM:Lab
Map only job
Page 27
DREAM:LabDREAM:LabDREAM:Lab
Pipelining during Shuffle & Sort
Page 28
DREAM:LabDREAM:LabDREAM:Lab
Sorting using MapReduce (Map Only)
Page 29
DREAM:LabDREAM:LabDREAM:Lab
Sorting using MapReduce
Page 30
MapReduce Execution Overview
Page 31
DREAM:LabDREAM:LabDREAM:Lab
MapReduce Execution Overview
Page 32
DREAM:LabDREAM:LabDREAM:Lab
MapReduce Execution Overview
Page 33
DREAM:LabDREAM:LabDREAM:Lab
MapReduce Resources
•
•
Page 34
DREAM:LabDREAM:LabDREAM:Lab
MapReduce Resources
•
Page 35
DREAM:LabDREAM:LabDREAM:Lab
MapReduce Execution Overview
Page 36
DREAM:LabDREAM:LabDREAM:Lab
MapReduce Execution Overview
Page 37
DREAM:LabDREAM:LabDREAM:Lab
MapReduce Execution Overview
Page 38
DREAM:LabDREAM:LabDREAM:Lab
MapReduce Execution Overview
Page 39
DREAM:LabDREAM:LabDREAM:Lab
Locality
•
Page 40
DREAM:LabDREAM:LabDREAM:Lab
Fault Tolerance
•
•
•
Page 41
DREAM:LabDREAM:LabDREAM:Lab
Optimizations
Page 42
DREAM:LabDREAM:LabDREAM:Lab
Reminder