Top Banner
CPSC 426/526 MapReduce & BigTable Ennan Zhai Computer Science Department Yale University
104

MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Jun 04, 2018

Download

Documents

dangnhu
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

CPSC 426/526MapReduce & BigTable

Ennan ZhaiComputer Science Department

Yale University

Page 2: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Lecture Roadmap

• Cloud Computing Overview• Challenges in the Clouds• Distributed File Systems: GFS• Data Process & Analysis: MapReduce• Database: BigTable

Page 3: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Last Lecture

Google File System (GFS) - Lec 8

MapReduce - Lec 9 BigTable - Lec 9

Google Applications, e.g., Gmail and Google Map

Page 4: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

BigTable - Lec 9

Today’s Lecture

Google File System (GFS) - Lec 8

MapReduce - Lec 9

Google Applications, e.g., Gmail and Google Map

Page 5: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Recall: How GFS works?

Page 6: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Master Chunkserver1

Google File System [SOSP’03]

Chunkserver2 Chunkserver3 Chunkserver4 Chunkserver5

Page 7: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

GFS

Master Chunkserver1 Chunkserver2 Chunkserver3 Chunkserver4 Chunkserver5

Google File System [SOSP’03]

Page 8: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

GFS

Master Chunkserver1 Chunkserver2 Chunkserver3 Chunkserver4 Chunkserver5

Data

Google File System [SOSP’03]

Page 9: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

GFS

Metadata 1 2 3

Master Chunkserver1 Chunkserver2 Chunkserver3 Chunkserver4 Chunkserver5

Google File System [SOSP’03]

Page 10: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

GFS

Metadata 1 2 3

Master Chunkserver1 Chunkserver2 Chunkserver3 Chunkserver4 Chunkserver5

Google File System [SOSP’03]The design insights:- Metadata is used for indexing chunks- Huge files -> 64 MB for each chunk -> fewer chunks- Reduce client-master interaction and metadata size

Page 11: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

GFS

1 2 3Metadata

Master Chunkserver1 Chunkserver2 Chunkserver3 Chunkserver4 Chunkserver5

Data

Google File System [SOSP’03]

Page 12: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

GFS

1 2 1 3 1 2 3 2 3Metadata

Master Chunkserver1 Chunkserver2 Chunkserver3 Chunkserver4 Chunkserver5

Data

Google File System [SOSP’03]

Page 13: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

GFS

1 2 1 3 1 2 3 2 3Metadata

Master Chunkserver1 Chunkserver2 Chunkserver3 Chunkserver4 Chunkserver5

Data

Google File System [SOSP’03]The design insights:- Replicas are used to ensure availability- Master can choose the nearest replicas for the client - Read and append-only makes it easy to manage replicas

Page 14: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

GFS

1 2 1 3 1 2 3 2 3Metadata

Master Chunkserver1 Chunkserver2 Chunkserver3 Chunkserver4 Chunkserver5

Google File System [SOSP’03]

read

Page 15: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

GFS

1 2 1 3 1 2 3 2 3Metadata

Master Chunkserver1 Chunkserver2 Chunkserver3 Chunkserver4 Chunkserver5

Google File System [SOSP’03]

read- IP address for each chunk- the ID for each chunk

Page 16: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

GFS

1 2 1 3 1 2 3 2 3Metadata

Master Chunkserver1 Chunkserver2 Chunkserver3 Chunkserver4 Chunkserver5

Google File System [SOSP’03]

read<Chunkserver1’s IP, Chunk1><Chunkserver1’s IP, Chunk2><Chunkserver2‘s IP, Chunk3>

Page 17: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

GFS

1 2 1 3 1 2 3 2 3Metadata

Master Chunkserver1 Chunkserver2 Chunkserver3 Chunkserver4 Chunkserver5

Google File System [SOSP’03]

read

Page 18: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

GFS

1 2 1 3 1 2 3 2 3Metadata

Master Chunkserver1 Chunkserver2 Chunkserver3 Chunkserver4 Chunkserver5

Google File System [SOSP’03]

read

Why GFS tries to avoid random write?

Page 19: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Put GFS in a Datacenter

Page 20: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Put GFS in a Datacenter

Each rack has a master

Page 21: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Put GFS in a Datacenter

Each rack has a master

Page 22: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

After GFS, what we need to do next?

GFS

1 2 1 3 1 2 3 2 3Metadata

Master Chunkserver1 Chunkserver2 Chunkserver3 Chunkserver4 Chunkserver5

Page 23: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

GFS

1 2 1 3 1 2 3 2 3Metadata

Processing and Analyzing Data

Master Chunkserver1 Chunkserver2 Chunkserver3 Chunkserver4 Chunkserver5

Very Important!

Page 24: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Very Important!

GFS

1 2 1 3 1 2 3 2 3Metadata

Processing and Analyzing Data

Master Chunkserver1 Chunkserver2 Chunkserver3 Chunkserver4 Chunkserver5

Page 25: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Very Important!

GFS

1 2 1 3 1 2 3 2 3Metadata

Processing and Analyzing Data

Master Chunkserver1 Chunkserver2 Chunkserver3 Chunkserver4 Chunkserver5

Page 26: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Very Important!

GFS

1 2 1 3 1 2 3 2 3Metadata

Processing and Analyzing Data

Master Chunkserver1 Chunkserver2 Chunkserver3 Chunkserver4 Chunkserver5

Page 27: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Very Important!

GFS

1 2 1 3 1 2 3 2 3Metadata

Processing and Analyzing Data

Master Chunkserver1 Chunkserver2 Chunkserver3 Chunkserver4 Chunkserver5

Page 28: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

• A toy problem: The word count- We have 10 billion documents

- Average document’s size is 20KB => 10 billion docs = 200TB

• Our solution:

Processing and Analyzing Data

Page 29: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

• A toy problem: The word count- We have 10 billion documents

- Average document’s size is 20KB => 10 billion docs = 200TB

• Our solution:

for each document d{ for each word w in d {word_count[w]++;} }

Processing and Analyzing Data

Page 30: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

• A toy problem: The word count- We have 10 billion documents

- Average document’s size is 20KB => 10 billion docs = 200TB

• Our solution:

for each document d{ for each word w in d {word_count[w]++;} }

Approximately one month.

Processing and Analyzing Data

Page 31: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

• Inspired from map and reduce operations commonly used in functional programming language like LISP

• Users implement interface of two primary methods:- 1. Map: <key1, value1> <key2, value 2>

- 2. Reduce: <key2, value2> <value3>

MapReduce Programming Model

Page 32: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

MapReduce Programming Model

Map ReduceInput

Map(k,v)-->(k’,v’)Group (k’,v’)s by k’

Reduce(k’,v’[])-->v’’

Output

Page 33: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

• Inspired from map and reduce operations commonly used in functional programming language like LISP

• Users implement interface of two primary methods:- 1. Map: <key1, value1> <key2, value 2>

- 2. Reduce: <key2, value2[ ]> <value3>

MapReduce Programming Model

Map ReduceInput

Map(k,v)-->(k’,v’)Group (k’,v’)s by k’

Reduce(k’,v’[])-->v’’

Output

• Map, a pure function, written by the user, takes an input key/value pair and produces a set of intermediate key/value pairs, e.g., <doc, id> <doc, content>

Page 34: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

• Inspired from map and reduce operations commonly used in functional programming language like LISP

• Users implement interface of two primary methods:- 1. Map: <key1, value1> <key2, value 2>

- 2. Reduce: <key2, value2[ ]> <value3>

MapReduce Programming Model

Map ReduceInput

Map(k,v)-->(k’,v’)Group (k’,v’)s by k’

Reduce(k’,v’[])-->v’’

Output

• After map phase, all the intermediate values for a given output key are combined together into a list and given to a reducer for aggregating/merging the result.

Page 35: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

GFS

Master Chunkserver1 Chunkserver2 Chunkserver3 Chunkserver4 Chunkserver5

MapReduce [OSDI’04]

• GFS is responsible for storing data for MapReduce- Data is split into chunks and distributed across nodes- Each chunk is replicated - Offers redundant storage for massive amounts of data

Page 36: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

GFS

Master Chunkserver1 Chunkserver2 Chunkserver3 Chunkserver4 Chunkserver5

MapReduce [OSDI’04]

MapReduce

Page 37: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

GFS

Master Chunkserver1 Chunkserver2 Chunkserver3 Chunkserver4 Chunkserver5

MapReduce [OSDI’04]

MapReduce

Heard Hadoop?HDFS + Hadoop MapReduce

Page 38: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

GFS

MapReduce [OSDI’04]

Master Chunkserver1 Chunkserver2 Chunkserver3 Chunkserver4 Chunkserver5

MapReduceJobTracker TaskTracker TaskTracker TaskTracker TaskTracker TaskTracker

Page 39: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

GFS

Master Chunkserver1 Chunkserver2 Chunkserver3 Chunkserver4 Chunkserver5

MapReduce [OSDI’04]• Two core components

- JobTracker: assigning tasks to different workers- TaskTracker: executing map and reduce

MapReduceJobTracker TaskTracker TaskTracker TaskTracker TaskTracker TaskTracker

Page 40: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

GFS

Master Chunkserver1 Chunkserver2 Chunkserver3 Chunkserver4 Chunkserver5

MapReduce [OSDI’04]• Two core components

- JobTracker: assigning tasks to different workers- TaskTracker: executing map and reduce programs

MapReduceJobTracker TaskTracker TaskTracker TaskTracker TaskTracker TaskTracker

Page 41: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

MapReduce

GFS

JobTracker TaskTracker TaskTracker TaskTracker TaskTracker TaskTracker

Master Chunkserver1 Chunkserver2 Chunkserver3 Chunkserver4 Chunkserver5

MapReduce [OSDI’04]

Documtent A2Documtent A3Document

Page 42: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

MapReduce

GFS

JobTracker TaskTracker TaskTracker TaskTracker TaskTracker TaskTracker

Master Chunkserver1 Chunkserver2 Chunkserver3 Chunkserver4 Chunkserver5

MapReduce [OSDI’04]

Documtent A1Documtent A2Documtent A3

Page 43: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

MapReduce

GFS

JobTracker TaskTracker TaskTracker TaskTracker TaskTracker TaskTracker

Master Chunkserver1 Chunkserver2 Chunkserver3 Chunkserver4 Chunkserver5

Documtent A1Documtent A2Documtent A3

Page 44: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

MapReduce

GFS

JobTracker TaskTracker TaskTracker TaskTracker TaskTracker TaskTracker

Master Chunkserver1 Chunkserver2 Chunkserver3 Chunkserver4 Chunkserver5

Documtent A1Documtent A2Documtent A3 TaskTracker

Page 45: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Word Count Example

Page 46: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

• Word count is challenging over massive amounts of data• Fundamentals of statistics often are aggregate functions• Most aggregation functions have distributive nature• MapReduce breaks complex tasks into smaller pieces which

can be executed in parallel

Why we care about word count

Page 47: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Count the # of occurrences of each word in a large amount of input data

Map(input_key, input_value) { foreach word w in input_value: emit(w, 1);}

Map Phase (On a Worker)

Page 48: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Count the # of occurrences of each word in a large amount of input data

Map(input_key, input_value) { foreach word w in input_value: emit(w, 1);}

(3414, ‘the cat sat on the mat’)(3437, ‘the aardvark sat on the sofa’)

• Input to the Mapper

Map Phase (On a Worker)

Page 49: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Count the # of occurrences of each word in a large amount of input data

Map(input_key, input_value) { foreach word w in input_value: emit(w, 1);}

(3414, ‘the cat sat on the mat’)(3437, ‘the aardvark sat on the sofa’)

• Output from the Mapper

(‘the’, 1), (‘cat’, 1), (‘sat’, 1), (‘on’, 1),(‘the’, 1), (‘mat’, 1), (‘the’, 1), (‘aardvark’, 1), (‘sat’, 1), (‘on’, 1), (‘the’, 1), (‘sofa’, 1)

• Input to the Mapper

Map Phase (On a Worker)

Page 50: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Count the # of occurrences of each word in a large amount of input data

Map(input_key, input_value) { foreach word w in input_value: emit(w, 1);}

(3414, ‘the cat sat on the mat’)(3437, ‘the aardvark sat on the sofa’)

• Output from the Mapper

(‘the’, 1), (‘cat’, 1), (‘sat’, 1), (‘on’, 1),(‘the’, 1), (‘mat’, 1), (‘the’, 1), (‘aardvark’, 1), (‘sat’, 1), (‘on’, 1), (‘the’, 1), (‘sofa’, 1)

• Input to the Mapper

Map Phase (On a Worker)

Page 51: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

• After the Map, all the intermediate values for a given intermediate key are combined together into a list

Reducer (On a Worker)

Page 52: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

• After the Map, all the intermediate values for a given intermediate key are combined together into a list

Add up all the values associated with each intermediate key:

Reduce(output_key, intermediate_vals) { set count = 0; foreach v in intermediate_vals: count += v; emit(output_key, count);}

Reducer (On a Worker)

Page 53: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

• Input of the Reducer

(‘the’, 1), (‘cat’, 1), (‘sat’, 1), (‘on’, 1),(‘the’, 1), (‘mat’, 1), (‘the’, 1), (‘aardvark’, 1), (‘sat’, 1), (‘on’, 1), (‘the’, 1), (‘sofa’, 1)

Reducer (On a Worker)

Page 54: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Add up all the values associated with each intermediate key:

Reduce(output_key, intermediate_vals) { set count = 0; foreach v in intermediate_vals: count += v; emit(output_key, count);}

• Input of the Reducer

(‘the’, 1), (‘cat’, 1), (‘sat’, 1), (‘on’, 1),(‘the’, 1), (‘mat’, 1), (‘the’, 1), (‘aardvark’, 1), (‘sat’, 1), (‘on’, 1), (‘the’, 1), (‘sofa’, 1)

Reducer (On a Worker)

Page 55: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Add up all the values associated with each intermediate key:

Reduce(output_key, intermediate_vals) { set count = 0; foreach v in intermediate_vals: count += v; emit(output_key, count);}

• Output from the Reducer

(‘the’, 4), (‘sat’, 2), (‘on’, 2), (‘sofa’, 1), (‘mat’, 1), (‘cat’, 1), (‘aardvark’, 1)

• Input of the Reducer

(‘the’, 1), (‘cat’, 1), (‘sat’, 1), (‘on’, 1),(‘the’, 1), (‘mat’, 1), (‘the’, 1), (‘aardvark’, 1), (‘sat’, 1), (‘on’, 1), (‘the’, 1), (‘sofa’, 1)

Reducer (On a Worker)

Page 56: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Add up all the values associated with each intermediate key:

Reduce(output_key, intermediate_vals) { set count = 0; foreach v in intermediate_vals: count += v; emit(output_key, count);}

• Output from the Reducer

(‘the’, 4), (‘sat’, 2), (‘on’, 2), (‘sofa’, 1), (‘mat’, 1), (‘cat’, 1), (‘aardvark’, 1)

Grouping + Reducer• Input of the grouping

(‘the’, 1), (‘cat’, 1), (‘sat’, 1), (‘on’, 1),(‘the’, 1), (‘mat’, 1), (‘the’, 1), (‘aardvark’, 1), (‘sat’, 1), (‘on’, 1), (‘the’, 1), (‘sofa’, 1)

Page 57: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

• After the Map, all the intermediate values for a given intermediate key are combined together into a list

(‘the’, 1), (‘cat’, 1), (‘sat’, 1), (‘on’, 1),(‘the’, 1), (‘mat’, 1), (‘the’, 1), (‘aardvark’, 1), (‘sat’, 1), (‘on’, 1), (‘the’, 1), (‘sofa’, 1)

Mapper Output

aardvark, 1cat, 1mat, 1on [1, 1]sat [1, 1]sofa, 1the [1, 1, 1, 1]

Reducer Input

Grouping/Shuffling

Page 58: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

the cat sat on the matthe aardvark sat on the sofa

Mapper Input

(‘the’, 1), (‘cat’, 1), (‘sat’, 1), (‘on’, 1),(‘the’, 1), (‘mat’, 1), (‘the’, 1), (‘aardvark’, 1), (‘sat’, 1), (‘on’, 1), (‘the’, 1), (‘sofa’, 1)

aardvark, 1cat, 1mat, 1on [1, 1]sat [1, 1]sofa, 1the [1, 1, 1, 1]

aardvark, 1cat, 1mat, 1on, 2sat, 2sofa, 1the, 4

Mapping Grouping

Map + Reduce

Reducing

Page 59: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

High-Level Picture for MR

Page 60: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Let’s use MapReduce to help Google MapIndia

We want to compute the average temperature for each state

Page 61: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Let’s use MapReduce to help Google Map

We want to compute the average temperature for each state

Page 62: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Let’s use MapReduce to help Google Map

MP: 75 CG: 72OR: 72

Page 63: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Let’s use MapReduce to help Google Map

Page 64: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Let’s use MapReduce to help Google Map

Page 65: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Let’s use MapReduce to help Google Map

Page 66: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Let’s use MapReduce to help Google Map

Page 67: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Let’s use MapReduce to help Google Map

Page 68: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Lecture Roadmap

• Cloud Computing Overview• Challenges in the Clouds• Distributed File Systems: GFS• Data Process & Analysis: MapReduce• Database: BigTable

Page 69: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

BigTable - Lec 9

Today’s Lecture

Google File System (GFS) - Lec 8

MapReduce - Lec 9

Google Applications, e.g., Gmail and Google Map

Page 70: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Motivation for BigTable• Lots of (semi-)structured data at Google

- URLs: Content, crawl metadata, links, anchors [Search Engine]

- Per-user data: User preference settings, queries [Hangout]- Geographic locations: Physical entities and satellite image

data [Google maps and Google earth]

• Scale is large:- Billions of URLs, many versions/page (~20K/version)

- Hundreds of millions of users, thousands of queries/sec- 100TB+ of satellite image data

Page 71: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Why not just use commercial DB?• Scale is too large for most commercial databases

• Even if it weren’t, cost would be very high- Building internally means system can be applied across many

projects for low incremental cost

• Low-level storage optimizations help performance significantly

Fun and challenging to build large-scale DB systems :)

Page 72: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

BigTable [OSDI’06]

• Distributed multi-level map:- With an interesting data model

• Fault-tolerant, persistent• Scalable:

- Thousands of servers- Terabytes of in-memory data- Petabyte of disk-based data- Millions of reads/writes per second, efficient scans

Page 73: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

BigTable Status in 2006• Design/initial implementation started beginning of 2004• Currently ~100 BigTable cells• Production use or active development for many projects

- Google print

- My search history- Crawling/indexing pipeline- Google Maps/Google Earth

• Largest BigTable cell manages ~200TB of data spread over several thousand machines (larger cells planned)

Page 74: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Building Blocks for BigTable

• BigTable uses of building blocks:- Google File System (GFS): stores persistent state and data- Scheduler: schedules jobs involved in BigTable serving- Lock service: master election, location bootstrapping- MapReduce: often used to process BigTable data

Remember what is the difference between Database and file system

Page 75: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Building Blocks for BigTable

• BigTable uses of building blocks:- Google File System (GFS): stores persistent state and data- Scheduler: schedules jobs involved in BigTable serving- Lock service: master election, location bootstrapping- MapReduce: often used to process BigTable data1. What is the data model?

2. How to implement it?

Page 76: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

BigTable’s Data Model Design

• BigTable is NOT a relational database• BigTable appears as a large table

- A BigTable is a sparse, distributed, persistent multidimensional sorted map

Page 77: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

BigTable’s Data Model Design• BigTable is NOT a relational database• BigTable appears as a large table

- A BigTable is a sparse, distributed, persistent multidimensional sorted map

EN <!DOCTYPE html PUBLIC ...

EN <!DOCTYPE html PUBLIC ...

EN <!DOCTYPE html PUBLIC ...

... <!DOCTYPE html PUBLIC ...

“language” “content”

com.aaa

com.weather

... ...

sort

ed

Webtable example

columnsrows

com.cnn.www

Page 78: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

BigTable’s Data Model Design

• (row, column, timestamp) is cell content

EN

EN

EN

...

“language” “content”

com.aaa

com.weather

... ...

sort

ed

Webtable example

columnsrows

<!DOCTYPE html ....<!DOCTYPE html ....

<!DOCTYPE html ....

<!DOCTYPE html ....<!DOCTYPE html ....

<!DOCTYPE html ....<!DOCTYPE html ....<!DOCTYPE html ....

t2t4

t3t6

t2

t2t3

t11

versionscom.cnn.www

Page 79: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Rows

EN

EN

EN

...

“language” “content”

com.aaa

com.weather

... ...

sort

ed

Webtable example

columnsrows

<!DOCTYPE html ....<!DOCTYPE html ....

<!DOCTYPE html ....

<!DOCTYPE html ....<!DOCTYPE html ....

<!DOCTYPE html ....<!DOCTYPE html ....<!DOCTYPE html ....

t2t4

t3t6

t2

t2t3

t11

versionscom.cnn.www

Page 80: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Rows

EN

EN

EN

...

“language” “content”

com.aaa

com.weather

... ...

sort

ed

Webtable example

columnsrows

<!DOCTYPE html ....<!DOCTYPE html ....

<!DOCTYPE html ....

<!DOCTYPE html ....<!DOCTYPE html ....

<!DOCTYPE html ....<!DOCTYPE html ....<!DOCTYPE html ....

t2t4

t3t6

t2

t2t3

t11

versionscom.cnn.www

• Row name is an arbitrary string and is used as key - Access to data in a row is atomic- Row creation is implicit upon storing data

• Rows ordered lexicographically

Page 81: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Column

EN

EN

EN

...

“language” “content”

com.aaa

com.weather

... ...

sort

ed

Webtable example

columnsrows

<!DOCTYPE html ....<!DOCTYPE html ....

<!DOCTYPE html ....

<!DOCTYPE html ....<!DOCTYPE html ....

<!DOCTYPE html ....<!DOCTYPE html ....<!DOCTYPE html ....

t2t4

t3t6

t2

t2t3

t11

com.cnn.www

• Columns have two-level name structure - family:optional_qualifier- Row creation is implicit upon storing data

• Rows ordered lexicographically

versions

Page 82: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Column

EN

EN CNN CNN.com

EN

...

“language” “content”

com.aaa

com.weather

... ...

sort

ed

Webtable example

rows

<!DOCTYPE html ....<!DOCTYPE html ....

<!DOCTYPE html ....

<!DOCTYPE html ....<!DOCTYPE html ....

<!DOCTYPE html ....<!DOCTYPE html ....<!DOCTYPE html ....

com.cnn.www

“anchor:cnnsi.com”

“anchor:mylook.ca”

• Columns have two-level name structure - family:optional_qualifier- Row creation is implicit upon storing data

• Rows ordered lexicographically

Page 83: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Column

EN

EN CNN CNN.com

EN

...

“language” “content”

com.aaa

com.weather

... ...

sort

ed

Webtable example

rows

<!DOCTYPE html ....<!DOCTYPE html ....

<!DOCTYPE html ....

<!DOCTYPE html ....<!DOCTYPE html ....

<!DOCTYPE html ....<!DOCTYPE html ....<!DOCTYPE html ....

com.cnn.www

“anchor:cnnsi.com”

“anchor:mylook.ca”

• Columns have two-level name structure - family:optional_qualifier- Row creation is implicit upon storing data

• Rows ordered lexicographicallyfamily

Page 84: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Column

EN

EN CNN CNN.com

EN

...

“language” “content”

com.aaa

com.weather

... ...

sort

ed

Webtable example

rows

<!DOCTYPE html ....<!DOCTYPE html ....

<!DOCTYPE html ....

<!DOCTYPE html ....<!DOCTYPE html ....

<!DOCTYPE html ....<!DOCTYPE html ....<!DOCTYPE html ....

com.cnn.www

“anchor:cnnsi.com”

“anchor:mylook.ca”

• Columns have two-level name structure - family:optional_qualifier- Row creation is implicit upon storing data

• Rows ordered lexicographically

qualifier

Page 85: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Column

EN

EN CNN CNN.com

EN

...

“language” “content”

com.aaa

com.weather

... ...

sort

ed

Webtable example

rows

<!DOCTYPE html ....<!DOCTYPE html ....

<!DOCTYPE html ....

<!DOCTYPE html ....<!DOCTYPE html ....

<!DOCTYPE html ....<!DOCTYPE html ....<!DOCTYPE html ....

com.cnn.www

• Column family- Unit of access control- Has associated type information

• Qualifier gives unbounded columns- Additional level of indexing, if desired

“anchor:cnnsi.com”

“anchor:mylook.ca”

family:optional_qualifier

Page 86: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Column

EN

EN CNN CNN.com

EN

...

“language” “content”

com.aaa

com.weather

... ...

sort

ed

Webtable example

rows

<!DOCTYPE html ....<!DOCTYPE html ....

<!DOCTYPE html ....

<!DOCTYPE html ....<!DOCTYPE html ....

<!DOCTYPE html ....<!DOCTYPE html ....<!DOCTYPE html ....

com.cnn.www

- www.cnn.com is referenced by Sports illustrated (cnnsi.com) and mylook (mylook.ca)- The value (“com.cnn.www”, “anchor:cnnsi.com”) is “CNN”, the reference text from

cnnsi.com

“anchor:cnnsi.com”

“anchor:mylook.ca”

Page 87: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Tablet and Table

EN <!DOCTYPE html PUBLIC ...

EN <!DOCTYPE html PUBLIC ...

EN <!DOCTYPE html PUBLIC ...

... <!DOCTYPE html PUBLIC ...

“language” “content”

com.aaa

com.weather

... ...

• A table starts as one tablet• As it grows, it is split into multiple tablets

- Approximate size: 100-200 MB per tablet by default

Tablet

com.cnn.www

Page 88: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Tablet and Table

EN <!DOCTYPE html PUBLIC ...

EN <!DOCTYPE html PUBLIC ...

EN <!DOCTYPE html PUBLIC ...

... <!DOCTYPE html PUBLIC ...

“language” “content”

com.aaa

com.cnn.www

com.weather

... ...

• A table starts as one tablet• As it grows, it is split into multiple tablets

- Approximate size: 100-200 MB per tablet by default

Tablet

Page 89: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Tablet and Table

EN <!DOCTYPE html PUBLIC ...

EN <!DOCTYPE html PUBLIC ...

EN <!DOCTYPE html PUBLIC ...

“language” “content”

com.aaacom.cnn.www

com.weather

• A table starts as one tablet• As it grows, it is split into multiple tablets

- Approximate size: 100-200 MB per tablet by default

EN <!DOCTYPE html PUBLIC ...

EN <!DOCTYPE html PUBLIC ...

EN <!DOCTYPE html PUBLIC ...

com.tech

com.wikipedia

com.zoom

Tablet 1

Tablet 2

Page 90: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Building Blocks for BigTable

• BigTable uses of building blocks:- Google File System (GFS): stores persistent state and data- Scheduler: schedules jobs involved in BigTable serving- Lock service: master election, location bootstrapping- MapReduce: often used to process BigTable data1. What is the data model?

2. How to implement it?

Page 91: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

BigTable Architecture

BigTableclient library

Tablet Tablet

Tablet Server

Tablet Tablet

Tablet Server

Tablet Tablet

Tablet Server

ChubbyLock serviceGoogle File System

Cluster Scheduling

BigTable Master

Page 92: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

The First Thing: Locating Tablets• Since tablets move around from server to server, given a row,

how do clients find the right machine?- We need to find tablet whose row range covers the target row

• One solution: could use the BigTable master- Central server almost certainly would be bottleneck in large system

• Instead: store special tables containing tablet location information in BigTable cell itself

Page 93: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

The First Thing: Locating Tablets• 3-level hierarchical lookup scheme for tablets

- Location is IP:port of relevant server- 1st level: bootstrapped from lock service, points to owner of META0- 2nd level: uses META0 data to find owner of appropriate META1 tablet- 3rd level: META1 table holds locations of tablets of all other tables

Pointer to META0

location

META0 table

META1 table Actual tablet in table T

Chubby

Page 94: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

The First Thing: Locating Tablets• 3-level hierarchical lookup scheme for tablets

- Location is IP:port of relevant server- 1st level: bootstrapped from lock service, points to owner of META0- 2nd level: uses META0 data to find owner of appropriate META1 tablet- 3rd level: META1 table holds locations of tablets of all other tables

Key: A (1.1.1.1)

Pointer to META0

location

META0 table

META1 table Actual tablet in table T

Key: F (1.1.1.2)

Key: I (1.1.1.3)

Key: S (1.1.1.4)

... ...

Key: F (1.1.1.7)

Key: G (1.1.1.8)Key: H (1.1.1.9) Key: Ha (1.1.1.100)

Key: Hc (1.1.1.101)

Key: Hi (1.1.1.104)... ...

Search row key “Hi”

1.1.1.2

1.1.1.9

Chubby

Page 95: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

BigTableclient library

Tablet Tablet

Tablet Server

Tablet Tablet

Tablet Server

Tablet Tablet

Tablet Server

BigTable Master

ChubbyLock serviceGoogle File System

Cluster Scheduling

Page 96: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

BigTableclient library

Tablet Tablet

Tablet Server

Tablet Tablet

Tablet Server

Tablet Tablet

Tablet Server

ChubbyLock serviceGoogle File System

Cluster Scheduling

BigTable Master

Page 97: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

BigTableclient library

Tablet Tablet

Tablet Server

Tablet Tablet

Tablet Server

Tablet Tablet

Tablet Server

ChubbyLock serviceGoogle File System

Cluster Scheduling

BigTable Master

Page 98: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

BigTableclient library

Tablet Tablet

Tablet Server

Tablet Tablet

Tablet Server

Tablet Tablet

Tablet Server

ChubbyLock serviceGoogle File System

Cluster Scheduling

BigTable Master

1.1.1.2

Page 99: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

BigTableclient library

Tablet Tablet

Tablet Server

Tablet Tablet

Tablet Server

Tablet Tablet

Tablet Server

ChubbyLock serviceGoogle File System

Cluster Scheduling

BigTable Master

1.1.1.21.1.1.9

Page 100: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

BigTableclient library

Tablet Tablet

Tablet Server

Tablet Tablet

Tablet Server

Tablet Tablet

Tablet Server

ChubbyLock serviceGoogle File System

Cluster Scheduling

BigTable Master

Metadata OperationsCreate/delete tables

Create/delete column familieschange metadata

Page 101: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

BigTable’s APIs• Metadata operations

- Create/delete tables, column families, change metadata

• Writes: Single-row, atomic- Set(): write cells in a row- DeleteCells(): delete cells in a row- DeleteRow(): delete all cells in a rw

• Reads: Scanner abstraction- Read arbitrary cells in a Bigtable table

Page 102: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

BigTable’s APIs• Metadata operations

- Create/delete tables, column families, change metadata

• Writes: Single-row, atomic- Set(): write cells in a row- DeleteCells(): delete cells in a row- DeleteRow(): delete all cells in a rw

• Reads: Scanner abstraction- Read arbitrary cells in a Bigtable table

Page 103: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

BigTable’s Write Path

Client Log Server MemstoreTablet Server

Put/DeleteWrite to Log

File System

Write to memstore

Append

Page 104: MapReduce & BigTable - Zoo | Yale University · MapReduce & BigTable ... • Inspired from map and reduce operations commonly ... Let’s use MapReduce to help Google Map MP: 75 CG:

Next Lecture• In the lec-10, I will cover:

- Transactions in distributed systems- Consistency models- Two phase commit- Consensus protocol: Paxos