Top Banner
Pond: the OceanStore P rototype CS 6464 Cornell University Presented by Yeounoh Chung
29

Pond: the OceanStore Prototype

Jan 19, 2016

Download

Documents

sibyl

Pond: the OceanStore Prototype. CS 6464 Cornell University Presented by Yeounoh Chung. Motivation / Introduction. “OceanStore is an Internet-scale, persistent data store” “for the first time, one can imagine providing truly durable, self maintaining storage to every computer user.” - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Pond: the OceanStore Prototype

Pond: the OceanStore Prototype

CS 6464Cornell University

Presented by Yeounoh Chung

Page 2: Pond: the OceanStore Prototype

Motivation / Introduction

• “OceanStore is an Internet-scale, persistent data store”

• “for the first time, one can imagine providing truly durable, self maintaining storage to every computer user.”

• “vision” of highly available, reliable, and persistent, data store utility model- Amazon S3 ?!

Page 3: Pond: the OceanStore Prototype

Motivation / Introduction

OceanStore (1999) Amazon S3 (2009)•Universal availability

•High durability

•Incremental Scalability

•Self-maintaining

•Self-organizing

•Virtualization

•Transparency

•Monthly access fee

•Untrusted infrastructure

•Two-tier system

•High availability

•High durability

•High scalability

•Self-maintaining

•Self-organizing

•Virtualization

•Transparency

•Pay-per-use fee

•Trusted infrastructure

•Single-tier system

Page 4: Pond: the OceanStore Prototype

Outline

• Motivation / Introduction• System Overview• Consistency• Persistency• Failure Tolerance• Implementation• Performance• Conclusion• Related Work

Page 5: Pond: the OceanStore Prototype

System Overview

Hakim

Weatherspoon

Dennis Geels

Sean Rhea

Patrick Eaton

Ben ZhaoOceanStoreCloud

OceanStoreCloud

Page 6: Pond: the OceanStore Prototype

System Overview (Data Object)

Page 7: Pond: the OceanStore Prototype

System Overview (Update Model)

• Updates are applied atomically

• An array of actions guarded by a predicate

• No explicit locks

• Application-specific consistency- e.g. database, mailbox

Page 8: Pond: the OceanStore Prototype

System Overview (Tapestry)

• Scalable overlay network, built on TCP/IP

• Performs DOLR based on GUID- Virtualization- Location independence

• Locality aware

• Self-organizing

• self-maintainingHotOS

Attendee

Paul Hogan

Page 9: Pond: the OceanStore Prototype

System Overview (Primary Rep.)

• Each data object is assigned an inner-ring

• Apply updates and create new versions

• Byzantine fault-tolerant

• Ability to change inner-ring servers any time - public key cryptography, proactive threshold signature, Tapestry

• Responsible party

Page 10: Pond: the OceanStore Prototype

System Overview (Achi. Storage)

• Erasure codes are more space efficient

• Fragments are distributed uniformly among archival storage servers- BGUID, n_frag

• Pond uses Cauchy Reed-Solomon code

Z

W

W

ZY

Xf

f -1

Page 11: Pond: the OceanStore Prototype

System Overview (Caching)

• Promiscuous caching

• Whole-block caching

• Host caches the read block and publishes its posession in Tapestry

• Pond uses LRU

• Use Heartbeat to get the most recent copy

Page 12: Pond: the OceanStore Prototype

System Overview (Diss. Tree)

Page 13: Pond: the OceanStore Prototype

Outline

• Motivation / Introduction• System Overview• Consistency• Persistency• Failure Tolerance• Implementation• Performance• Conclusion• Related Work

Page 14: Pond: the OceanStore Prototype

Consistency (Primary Replica)

• Read-only blocks

• Application-specific consistency

• Primary-copy replication- heartbeat <AGUID, VGUID, t, v_seq>

Page 15: Pond: the OceanStore Prototype

Persistency (Archival Storage)

• Archival storage

• Aggressive replication

• Monitoring- Introspection

• Replacement- Tapestry

Page 16: Pond: the OceanStore Prototype

Failure Tolerance (Everybody)

• All newly created blocks are encoded and stored in Archival servers

• Aggressive replication

• Byzantine agreement protocol for inner-ring

• Responsible Party- single point of failure?- scalability?

Page 17: Pond: the OceanStore Prototype

Outline

• Motivation / Introduction• System Overview• Consistency• Persistency• Failure Tolerance• Implementation• Performance• Conclusion• Related Work

Page 18: Pond: the OceanStore Prototype

Implementation

• Built in Java, atop SEDA

• Major subsystems are functional- self-organizing Tapestry- primary replica with Byzantine agreement- self-organizing dissemination tree- erasure-coding archive- application interface: NFS, IMAP/SMTP, HTTP

Page 19: Pond: the OceanStore Prototype

Implementation

Page 20: Pond: the OceanStore Prototype

Outline

• Motivation / Introduction• System Overview• Consistency• Persistency• Failure Tolerance• Implementation• Performance• Conclusion• Related Work

Page 21: Pond: the OceanStore Prototype

Performance

• Updata performance

• Dissemination tree performance

• Archival retrieval performance

• The Andrew Benchmark

Page 22: Pond: the OceanStore Prototype

Performance (test beds)

• Local cluster- 42 machines at Berkeley- 2x 1.0 GHz CPU, 1.5 GB SDRAM, 2x 36 GB hard drives- gigabit Ethernet adaptor and switch

• PlanetLab- 101 nodes across 43 sites- 1.2 GHz, 1 GB memory

Page 23: Pond: the OceanStore Prototype

Performance (update)

Page 24: Pond: the OceanStore Prototype

Performance (update)

Page 25: Pond: the OceanStore Prototype

Performance (archival)

Page 26: Pond: the OceanStore Prototype

Performance (dissemination tree)

Page 27: Pond: the OceanStore Prototype

Performance (Andrew benchmark)

Page 28: Pond: the OceanStore Prototype

Conclusion

• Pond is a working subset of the vision

• Promising in WAN

• Threshold signatures, erasure-coded archivalare expensive

• Pond is fault tolerant system, but it is not tested with any failed node

• Any thoughts?

Page 29: Pond: the OceanStore Prototype

Related Work

• FarSite

• ITTC, COCA

• PAST, CFS, IVY

• Pangaea