Top Banner
Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation @ IDB Lab. Seminar Presented by Jee-bum Park
72

Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

Jan 02, 2016

Download

Documents

Elmer Wiggins
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

Large-scale Incremental ProcessingUsing Distributed Transactions and Notifications

Daniel Peng and Frank DabekGoogle, Inc.OSDI 2010

15 Feb 2012Presentation @ IDB Lab. Seminar

Presented by Jee-bum Park

Page 2: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

2

Outline Introduction Design

– Bigtable overview– Transactions– Notifications

Evaluation Conclusion Good and Not So Good Things

Page 3: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

3

Introduction How can Google find the documents on the web so

fast?

Page 4: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

4

Introduction Google uses an index, built by the indexing sys-

tem, that can be used to answer search queries

Page 5: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

5

Introduction What does the indexing system do?

– Crawling every page on the web– Parsing the documents– Extracting links– Clustering duplicates– Inverting links– Computing PageRank– ...

Page 7: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

7

Introduction Compute PageRank using MapReduce

Job 1: compute R(1) Job 2: compute R(2) Job 3: compute R(3) ...

R(t) =

Page 8: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

8

Introduction Now, consider how to update that index after re-

crawling some small portion of the web

Page 9: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

9

Introduction Now, consider how to update that index after re-

crawling some small portion of the web

Is it okay to run the MapReducesover just new pages?

Page 10: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

10

Introduction Now, consider how to update that index after re-

crawling some small portion of the web

Is it okay to run the MapReducesover just new pages?

Nope, there are links between thenew pages and the rest of the web

Page 11: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

11

Introduction Now, consider how to update that index after re-

crawling some small portion of the web

Is it okay to run the MapReducesover just new pages?

Nope, there are links between thenew pages and the rest of the web

Well, how about this?

Page 12: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

12

Introduction Now, consider how to update that index after re-

crawling some small portion of the web

Is it okay to run the MapReducesover just new pages?

Nope, there are links between thenew pages and the rest of the web

Well, how about this?

MapReduces must be run again over the entire repository

Page 13: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

13

Introduction Google’s web search index was produced in this way

– Running over the entire pages

It was not a critical issue,– Because given enough computing resources, MapReduce’s

scalability makes this approach feasible

However, reprocessing the entire web– Discards the work done in earlier runs– Makes latency proportional to the size of the repository,

rather than the size of an update

Page 14: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

14

Introduction An ideal data processing system for the task of main-

taining the web search index would be optimized for incremental processing

Incremental processing system: Percolator

Page 15: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

15

Outline Introduction Design

– Bigtable overview– Transactions– Notifications

Evaluation Conclusion Good and Not So Good Things

Page 16: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

16

Design Percolator is built on top of the Bigtable distributed storage sys-

tem

A Percolator system consists of three binaries that run on every machine in the cluster– A Percolator worker– A Bigtable tablet server– A GFS chunkserver

All observers (user applications) are linked into the Percolator worker

Page 17: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

17

Design Dependencies

Observers

Percolator worker

Bigtable tablet server

GFS chunkserver

Page 18: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

Design System architecture

18

Node 1

Observers

Percolator worker

Bigtable tablet server

GFS chunkserver

Node 2

Observers

Percolator worker

Bigtable tablet server

GFS chunkserver

Node ...

Observers

Percolator worker

Bigtable tablet server

GFS chunkserver

Timestamp oracle ser-vice Lightweight lock service

Page 19: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

19

Design The Percolator worker

– Scans the Bigtable for changed columns– Invokes the corresponding observers as a function call in the

worker process

The observers– Perform transactions by sending read/write RPCs to Bigtable

tablet servers

Observers

Percolator worker

Bigtable tablet server

GFS chunkserver

Page 20: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

20

Design The Percolator worker

– Scans the Bigtable for changed columns– Invokes the corresponding observers as a function call in the

worker process

The observers– Perform transactions by sending read/write RPCs to Bigtable

tablet servers

Observers

Percolator worker

Bigtable tablet server

GFS chunkserver

1: scan

Page 21: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

21

Design The Percolator worker

– Scans the Bigtable for changed columns– Invokes the corresponding observers as a function call in the

worker process

The observers– Perform transactions by sending read/write RPCs to Bigtable

tablet servers

Observers

Percolator worker

Bigtable tablet server

GFS chunkserver

1: scan

2: in-voke

Page 22: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

22

Design The Percolator worker

– Scans the Bigtable for changed columns– Invokes the corresponding observers as a function call in the

worker process

The observers– Perform transactions by sending read/write RPCs to Bigtable

tablet servers

Observers

Percolator worker

Bigtable tablet server

GFS chunkserver

1: scan

2: in-voke3:

RPC

Page 23: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

Design The timestamp oracle service

– Provides strictly increasing timestamps A property required for correct operation of the snapshot isola-

tion protocol

The lightweight lock service– Workers use it to make the search for dirty notifications

more efficient

23

Timestamp oracle ser-vice Lightweight lock service

Page 24: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

24

Design Percolator provides two main abstractions

– Transactions Cross-row, cross-table with ACID snapshot-isolation semantics

– Observers Similar to database triggers or events

Transactions Observers Percolator

Page 25: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

25

Design – Bigtable overview Percolator is built on top of the Bigtable distributed

storage system

Bigtable presents a multi-dimensional sorted map to users– Keys are (row, column, timestamp) tuples

Bigtable provides lookup, update operations, and transactions on individual rows

Bigtable does not provide multi-row transactions

Observers

Percolator worker

Bigtable tablet server

GFS chunkserver

Page 26: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

26

Design – Transactions Percolator provides cross-row, cross-table transac-

tions with ACID snapshot-isolation semantics

Page 27: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

27

Design – Transactions Percolator stores multiple versions of each data

item using Bigtable’s timestamp dimension– Multiple versions are required to provide snapshot isola-

tion

Snapshot isolation

13

2

Page 28: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

28

Design – Transactions Case 1: use exclusive locks

1

Page 29: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

29

Design – Transactions Case 1: use exclusive locks

1

Page 30: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

30

Design – Transactions Case 1: use exclusive locks

1

2

Page 31: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

31

Design – Transactions Case 1: use exclusive locks

2

Page 32: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

32

Design – Transactions Case 1: use exclusive locks

2

Page 33: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

33

Design – Transactions Case 1: use exclusive locks

2

Page 34: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

34

Design – Transactions Case 2: do not use any locks

1

Page 35: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

35

Design – Transactions Case 2: do not use any locks

1

Page 36: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

36

Design – Transactions Case 2: do not use any locks

1

2

Page 37: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

37

Design – Transactions Case 2: do not use any locks

1

2

Page 38: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

38

Design – Transactions Case 2: do not use any locks

2

Page 39: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

39

Design – Transactions Case 2: do not use any locks

2

Page 40: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

40

Design – Transactions Case 2: do not use any locks

2

Page 41: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

41

Design – Transactions Case 3: use multiple versioning & timestamp

1

Page 42: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

42

Design – Transactions Case 3: use multiple versioning & timestamp

1

Page 43: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

43

Design – Transactions Case 3: use multiple versioning & timestamp

1

Page 44: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

44

Design – Transactions Case 3: use multiple versioning & timestamp

1

2

Page 45: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

45

Design – Transactions Case 3: use multiple versioning & timestamp

1

2

Page 46: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

46

Design – Transactions Case 3: use multiple versioning & timestamp

1

2

Page 47: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

47

Design – Transactions Case 3: use multiple versioning & timestamp

1

2

Page 48: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

48

Design – Transactions Case 3: use multiple versioning & timestamp

2

Page 49: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

49

Design – Transactions Case 3: use multiple versioning & timestamp

2

Page 50: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

50

Design – Transactions Case 3: use multiple versioning & timestamp

2

Page 51: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

51

Design – Transactions Case 3: use multiple versioning & timestamp

2

Page 52: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

52

Design – Transactions Percolator stores its locks in special in-memory col-

umns in the same Bigtable

Page 53: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

53

Design – Transactions Percolator transaction demo

Page 54: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

54

Design – Transactions Percolator transaction demo

Page 55: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

55

Design – Transactions Percolator transaction demo

Page 56: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

56

Design – Transactions Percolator transaction demo

Page 57: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

57

Design – Transactions Percolator transaction demo

Page 58: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

58

Design – Notifications In Percolator, the user writes code (“observers”) to

be triggered by changes to the table

Each observer registers a function and a set of col-umns

Percolator invokes the functions after data is written to one of those columns in any row

Observers

Percolator worker

Bigtable tablet server

GFS chunkserver

1: scan

2: in-voke3:

RPC

Page 59: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

59

A Percolator application

Design – Notifications Percolator applications are structured as a series of

observers– Each observer completes a task and creates more work for

“downstream” observers by writing to the table

Observer 1

Observer 2

Observer 4

Observer 5

Observer 3

Observer 6

Page 60: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

60

Google’s new indexing system

Design – Notifications

Document Processor (parse, extract links,

etc.)Clustering Exporter

Observers

Percolator worker

Bigtable tablet server

GFS chunkserver

1: scan

2: in-voke3:

RPC

Page 61: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

61

Design – Notifications To implement notifications, Percolator needs to effi-

ciently find dirty cells with observers that need to be run

To identify dirty cells, Percolator maintains a special “notify” Bigtable column, containing an entry for each dirty cell– When a transaction writes an observed cell, it also sets the

corresponding notify cell

Page 62: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

Design – Notifications Each Percolator worker chooses a portion of the table

to scan by picking a region of the table randomly– To avoid running observers on the same row concurrently,

each worker acquires a lock from a lightweight lock ser-vice before scanning the row

62

Timestamp oracle ser-vice Lightweight lock service

Page 63: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

63

Outline Introduction Design

– Bigtable overview– Transactions– Notifications

Evaluation Conclusion Good and Not So Good Things

Page 64: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

64

Evaluation Experiences with converting a MapReduce-based index-

ing pipeline to use Percolator

Latency– 100x faster than the previous system

Simplification– The number of observers in the new system: 10– The number of MapReduces in the previous system: 100

Easier to operate– Far fewer moving parts: tablet servers, Percolator workers,

chunkservers– In the old system, each of a hundred different MapReduces

needed to be individually configured and could independently fail

Page 65: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

65

Evaluation Crawl rate benchmark on 240 machines

Page 66: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

66

Evaluation Versus Bigtable

Page 67: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

67

Evaluation Fault-tolerance

Page 68: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

68

Outline Introduction Design

– Bigtable overview– Transactions– Notifications

Evaluation Conclusion Good and Not So Good Things

Page 69: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

69

Conclusion Percolator provides two main abstractions

– Transactions Cross-row, cross-table with ACID snapshot-isolation semantics

– Observers Similar to database triggers or events

Transactions Observers Percolator

Page 70: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

70

Outline Introduction Design

– Bigtable overview– Transactions– Notifications

Evaluation Conclusion Good and Not So Good Things

Page 71: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

71

Good and Not So Good Things Good things

– Simple and neat design– Purpose of use is clear– Detailed description based on real example: Google’s index-

ing system

Not so good things– Lack of observer examples (Google’s indexing system in par-

ticular)

Page 72: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation.

Thank You!

Any Questions or Comments?