Top Banner
SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB
28

SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.

Dec 25, 2015

Download

Documents

Tobias Booth
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.

SIDDHARTH MEHTAPURSUING MASTERS IN COMPUTER SCIENCE

(FALL 2008)INTERESTS: SYSTEMS, WEB

Page 2: SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.

A programming model and an associated implementation(library) for processing and generating large data sets (on large clusters).

A new abstraction allowing to express the simple computations that hides the messy details of parallelization, fault-tolerance, data distribution and load balancing in a library.

Page 3: SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.

Large-Scale Data Processing◦ Want to use 1000s of CPUs

But don’t want hassle of managing things

MapReduce provides◦ Automatic parallelization & distribution◦ Fault tolerance◦ I/O scheduling◦ Monitoring & status updates

Page 4: SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.

The MapReduce programming model has been successfully used at Google for many different purposes.

◦ First, the model is easy to use, even for programmers without experience with parallel and distributed systems, since it hides the details of parallelization, fault-tolerance, locality optimization, and load balancing.

◦ Second, a large variety of problems are easily expressible as MapReduce computations. For example, MapReduce is used for the generation of data for Google's production web search service, for sorting, for data mining, for machine learning, and many other systems.

◦ Third, developed an implementation of MapReduce that scales to large clusters of machines comprising thousands of machines. The implementation makes efficient use of these machine resources and therefore is suitable for use on many of the large computational problems encountered at Google.

Page 5: SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.
Page 6: SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.

map(key=url, val=contents):For each word w in contents, emit (w, “1”)

reduce(key=word, values=uniq_counts):Sum all “1”s in values listEmit result “(word, sum)”

see bob throwsee spot run

see 1bob 1 run 1see 1spot 1throw 1

bob 1 run 1see 2spot 1throw 1

Page 7: SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.

Distributed grep: ◦ Map: (key, whole doc/a line) (the matched line, key)◦ Reduce: identity function

Count of URL Access Frequency:◦ Map: logs of web page requests (URL, 1)◦ Reduce: (URL, total count)

Reverse Web-Link Graph:◦ Map: (source, target) (target, source)◦ Reduce: (target, list(source)) (target, list(source))

Inverted Index:◦ Map: (docID, document) (word, docID)◦ Reduce: (word, list(docID)) (word, sortedlist(docID))

Page 8: SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.
Page 9: SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.
Page 10: SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.

In Google clusters, comprised of the top of the line PCs.◦ Intel Xeon 2 x 2MB, HyperThreading◦ 2-4GB Memory◦ 100 M– 1G network◦ Local IDE disks + Google F.S.◦ Submit job to a scheduling system

Page 11: SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.

M

R

R

Page 12: SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.

Fault Tolerance – in a word: redo◦ Master pings workers, re-schedules failed tasks.◦ Note: Completed map tasks are re-executed on

failure because their output is stored on the local disk.

◦ Master failure: redo◦ Semantics in the presence of failures:

Deterministic map/reduce function: Produce the same output as would have been produced by a non-faulting sequential execution of the entire program

Rely on atomic commits of map and reduce task outputs to achieve this property.

Page 13: SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.

Partitioning Ordering guarantees Combiner function Side effects Skipping bad records Local execution Status information Counters

Page 14: SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.

Straggler: a machine that takes an unusually long time to complete one of the last few map or reduce tasks in the computation.

Cause: bad disk, … Resolution: schedule backup of in-progress

tasks near the end of MapReduce Operation

Page 15: SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.

Partition output of a map task to R pieces Default: hash(key) mod R User provided

◦ E.g. hash(Hostname(url)) mod R

‥ ‥ ‥

‥ ‥ ‥

‥ ‥ ‥

R

One Partition

M

Page 16: SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.

Guarantee: within a given partition, the intermediate key/value pairs are processed in increasing key order.

MapReduce Impl. of distributed sort◦ Map: (key, value) (key for sort, value)◦ Reduce: emit unchanged.

Page 17: SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.

E.g. : word count, many <the, 1> Combine once before reduce task, for

saving network bandwidth Executed on machine performing map task. Typically the same as reduce function Output to an intermediate file Example: count words

Page 18: SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.

Skipping Bad Records◦ Ignoring certain records makes tasks crash◦ An optional mode of execution◦ Install a signal handler to catch segmentation

violations and bus errors.

Page 19: SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.

Status Information◦ The master runs an internal HTTP server and

exports a set of status pages◦ Monitor progress of computation: how many

tasks have been completed, how many are in progress, bytes of input, bytes of intermediate data, bytes of output, processing rates, etc. The pages also contain links to the standard error and standard output files generated by each task.

◦ In addition, the top-level status page shows which workers have failed, and which map and reduce tasks they were processing when they failed.

Page 20: SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.
Page 21: SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.

Tests on grep and sort Cluster characteristics

◦ 1800 machines (!)◦ Intel Xeon 2x2MB, HyperThreading◦ 2-4 GB Bytes Memory◦ 100 M– 1G network◦ Local IDE disks + Google F.S.

Page 22: SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.

1 terabyte: 10^10 100-byte records Rare three-character pattern (10^5 freq.) Split input into 64 MB pieces, M=15000 R=1 (output is small)

Page 23: SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.
Page 24: SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.

Peak at 30 GB/s (1764 workers) 1 minute startup time

◦ Propagation of program to workers◦ GFS: open 1000 input files◦ Locality optimization

Completed in <1.5 minutes

Page 25: SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.

1 terabyte: 10^10 100-byte records Extract 10-byte sorting key Map: Emit <key,val.>: <10-byte,100-byte> Reduce: identity 2-way replication of output

◦ For redundancy, typical in GFS M=15000, R=4000 May need pre-pass MapReduce for

computing distribution of keys

Page 26: SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.

Input rate less than for grep Two humps: 2*1700 ~ 4000 Final output delayed because of sorting Rates: input>shuffle,output (locality!) Rates: shuffle>output (writing 2 copies) Effect of backup Effect of machine failures

Page 27: SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.
Page 28: SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.

◦ Restricting the programming model makes it easy to parallelize and distribute computations and to make such computations fault-tolerant.

◦ Network bandwidth is a scarce resource. A number of optimizations in the system are therefore targeted at reducing the amount of data sent across the network: the locality optimization allows to read data from local disks, and writing a single copy of the intermediate data to local disk saves network bandwidth.

◦ Redundant execution can be used to reduce the impact of slow machines, and to handle machine failures and data loss.