Top Banner
PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver and Ramana Yerneni Yahoo! Research With some additions by S. Sudarshan
38

PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

Dec 18, 2015

Download

Documents

Margaret Lyons
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

PNUTS: Yahoo!’s Hosted Data Serving Platform

Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver and Ramana

Yerneni

Yahoo! ResearchWith some additions by S. Sudarshan

Page 2: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

2

How do I build a cool new web app?

Option 1: Code it up! Make it live! Scale it later

It gets posted to slashdot Scale it now! Flickr, Twitter, MySpace, Facebook, …

Page 3: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

3

How do I build a cool new web app?

Option 2: Make it industrial strength! Evaluate scalable database backends Evaluate scalable indexing systems Evaluate scalable caching systems Architect data partitioning schemes Architect data replication schemes Architect monitoring and reporting infrastructure Write application Go live Realize it doesn’t scale as well as you hoped Rearchitect around bottlenecks 1 year later – ready to go!

Page 4: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

4

Example: social network updates

Brian

Sonja Jimi Brandon Kurt

What are my friends up to?

Sonja:

Brandon:

Page 5: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

5

Example: social network updates

16 Mike <ph..

6 Jimi <ph..8 Mary <re..

12 Sonja <ph..

15 Brandon <po..

17 Bob <re..

<photo><title>Flower</title><url>www.flickr.com</url></photo>

Page 6: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

6

What do we need from our DBMS?

Web applications need: Scalability

And the ability to scale linearly Geographic scope High availability

Web applications typically have: Simplified query needs

No joins, aggregations Relaxed consistency needs

Applications can tolerate stale or reordered data

Page 7: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

7

What is PNUTS?

Page 8: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

8

What is PNUTS?

E 75656 C

A 42342 EB 42521 W

C 66354 W

D 12352 E

F 15677 E

E 75656 C

A 42342 EB 42521 W

C 66354 W

D 12352 E

F 15677 E

CREATE TABLE Parts (ID VARCHAR,StockNumber INT,Status VARCHAR…

)

CREATE TABLE Parts (ID VARCHAR,StockNumber INT,Status VARCHAR…

)

Parallel databaseParallel database Geographic replicationGeographic replication

Indexes and viewsIndexes and views

Structured, flexible schemaStructured, flexible schema

Hosted, managed infrastructureHosted, managed infrastructure

A 42342 E

B 42521 W

C 66354 W

D 12352 E

E 75656 C

F 15677 E

Page 9: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

9

Query model

Per-record operations Get Set Delete

Multi-record operations Multiget Scan Getrange

Web service (RESTful) API

Page 10: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

10

Data-path componentsData-path components

Storage units

Routers

Tablet controller

REST API

Clients

MessageBroker

Detailed architecture

Page 11: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

11

Storageunits

Routers

Tablet controller

REST API

Clients

Local region Remote regions

YMB

Detailed architecture

Page 12: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

12

Tablet splitting and balancing

Each storage unit has many tablets (horizontal partitions of the table)Each storage unit has many tablets (horizontal partitions of the table)

Tablets may grow over timeTablets may grow over timeOverfull tablets splitOverfull tablets split

Storage unit may become a hotspotStorage unit may become a hotspot

Shed load by moving tablets to other serversShed load by moving tablets to other servers

Storage unitTablet

Page 13: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

13

Query processing

Page 14: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

16

Storage unit 1 Storage unit 2 Storage unit 3

Range queries

Router

AppleAvocadoBananaBlueberry

CanteloupeGrapeKiwiLemon

LimeMangoOrange

StrawberryTomatoWatermelon

Grapefruit…Pear?

Grapefruit…Lime?

Lime…Pear?

MIN-Canteloupe

SU1

Canteloupe-Lime

SU3

Lime-Strawberry

SU2

Strawberry-MAX

SU1

SU1Strawberry-MAX

SU2Lime-Strawberry

SU3Canteloupe-Lime

SU1MIN-Canteloupe

Page 15: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

17

Updates

1

Write key k

2Write key k7 Sequence # for key k

8 Sequence # for key k

SU SU SU

3Write key k

4

5SUCCESS

6Write key k

RoutersMessage brokers

Page 16: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

18

Yahoo Message Bus

Distributed publish-subscribe service Guarantees delivery once a message is

published Logging at site where message is published,

and at other sites when received Guarantees messages published to a

particular cluster will be delivered in same order at all other clusters

Record updates are published to YMB by master copy All replicas subscribe to the updates, and

get them in same order for a particular record

Page 17: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

19

Asynchronous replication and

consistency

Page 18: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

20

Asynchronous replication

Page 19: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

21

Consistency model Goal: make it easier for applications to reason about

updates and cope with asynchrony

What happens to a record with primary key “Brian”?

Time

Record inserted

Update Update Update UpdateUpdate Delete

Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7

Generation 1

v. 6 v. 8

Update Update

Page 20: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

22

Consistency model

Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7

Generation 1

v. 6 v. 8

Current version

Stale versionStale version

Read

Page 21: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

23

Consistency model

Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7

Generation 1

v. 6 v. 8

Read up-to-date

Current version

Stale versionStale version

Page 22: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

24

Consistency model

Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7

Generation 1

v. 6 v. 8

Read ≥ v.6

Current version

Stale versionStale version

Read-critical(required version):

Page 23: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

25

Consistency model

Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7

Generation 1

v. 6 v. 8

Write

Current version

Stale versionStale version

Page 24: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

26

Consistency model

Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7

Generation 1

v. 6 v. 8

Write if = v.7

ERROR

Current version

Stale versionStale version

Test-and-set-write(required version)

Page 25: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

27

Consistency model

Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7

Generation 1

v. 6 v. 8

Write if = v.7

ERROR

Current version

Stale versionStale version

Mechanism: per record mastershipMechanism: per record mastership

Page 26: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

28

Record and Tablet Mastership

Data in PNUTS is replicated across sites Hidden field in each record stores which copy

is the master copy updates can be submitted to any copy forwarded to master, applied in order received by

master Record also contains origin of last few

updates Mastership can be changed by current master,

based on this information Mastership change is simply a record update

Tablets mastership Required to ensure primary key consistency Can be different from record mastership

Page 27: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

29

Other Features

Per record transactions Copying a tablet (on failure, for e.g.)

Request copy Publish checkpoint message Get copy of tablet as of when checkpoint

is received Apply later updates

Tablet split Has to be coordinated across all copies

Page 28: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

30

Query Processing

Range scan can span tablets Only one tablet scanned at a time Client may not need all results at once

Continuation object returned to client to indicate where range scan should continue

Notification One pub-sub topic per tablet Client knows about tables, does not know

about tablets Automatically subscribed to all tablets, even as

tablets are added/removed. Usual problem with pub-sub: undelivered

notifications, handled in usual way

Page 29: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

31

Experiments

Page 30: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

32

Experimental setup

Production PNUTS code Enhanced with ordered table type

Three PNUTS regions 2 west coast, 1 east coast 5 storage units, 2 message brokers, 1 router West: Dual 2.8 GHz Xeon, 4GB RAM, 6 disk RAID 5

array East: Quad 2.13 GHz Xeon, 4GB RAM, 1 SATA disk

Workload 1200-3600 requests/second 0-50% writes 80% locality

Page 31: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

33

Inserts

Inserts required 75.6 ms per insert in West 1

(tablet master) 131.5 ms per insert into the non-

master West 2, and 315.5 ms per insert into the non-

master East.

Page 32: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

34

10% writes by default

Page 33: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

35

Scalability

0

20

40

60

80

100

120

140

160

1 2 3 4 5 6

Storage units

Ave

rag

e la

ten

cy (

ms)

Hash table Ordered table

Page 34: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

36

Request skew

0

10

20

30

40

50

60

70

80

90

100

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Zipf parameter

Ave

rag

e la

ten

cy (

ms)

Hash table Ordered table

Page 35: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

37

Size of range scans

0

1000

2000

3000

4000

5000

6000

7000

8000

0 0.02 0.04 0.06 0.08 0.1 0.12

Fraction of table scanned

Ave

rag

e la

ten

cy (

ms)

30 clients 300 clients

Page 36: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

38

Related work

Distributed and parallel databases Especially query processing and transactions BigTable, Dynamo, S3, SimpleDB, SQL Server Data

Services, Cassandra

Distributed filesystems Ceph, Boxwood, Sinfonia

Distributed (P2P) hash tables Chord, Pastry, …

Database replication Master-slave, epidemic/gossip, synchronous…

Page 37: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

39

Conclusions and ongoing work

PNUTS is an interesting research product Research: consistency, performance, fault

tolerance, rich functionality Product: make it work, keep it (relatively)

simple, learn from experience and real applications

Ongoing work Indexes and materialized views Bundled updates Batch query processing

Page 38: PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,

40

Thanks!

[email protected] research.yahoo.com