Top Banner
Project Voldemort Distributed Key-Value Storage Roshan Sumbaly
21

Project Voldemort - Stanford Universitycs.stanford.edu/people/rsumbaly/files/voldemort.pdf · Amazon Dynamo ... Start store builder job on input_data_path Trigger Fetch + Swap / Rollback

Apr 08, 2018

Download

Documents

buicong
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Project Voldemort - Stanford Universitycs.stanford.edu/people/rsumbaly/files/voldemort.pdf · Amazon Dynamo ... Start store builder job on input_data_path Trigger Fetch + Swap / Rollback

Project Voldemort �Distributed Key-Value Storage

Roshan Sumbaly

Page 2: Project Voldemort - Stanford Universitycs.stanford.edu/people/rsumbaly/files/voldemort.pdf · Amazon Dynamo ... Start store builder job on input_data_path Trigger Fetch + Swap / Rollback

What has changed?

 No joins

 Making data access APIs cacheable

 Frequent schema changes

 Rise of huge datasets - storing relationships

  Batch computed offline, serve in near real time – People you may know (#in), Who to follow (#twitter)

Page 3: Project Voldemort - Stanford Universitycs.stanford.edu/people/rsumbaly/files/voldemort.pdf · Amazon Dynamo ... Start store builder job on input_data_path Trigger Fetch + Swap / Rollback

So, what should our system do?

 Growing dataset – Horizontal Scalability   Partition the data

  Make it transparent to the application

 High availability and durability   Replicate the data

 Fast per-node performance

 Simple API with predictable performance

 No single point of failure

Page 4: Project Voldemort - Stanford Universitycs.stanford.edu/people/rsumbaly/files/voldemort.pdf · Amazon Dynamo ... Start store builder job on input_data_path Trigger Fetch + Swap / Rollback

What inspired you?

 Amazon Dynamo

  Highly Available, Horizontal Scalable system

  Key/Value model   Replication

  All nodes are peers

  Commodity Hardware

  Simple to build

 Things to remember   Replication gives high availability but causes inconsistencies

  Failures are fairly common in distributed systems

  User must be isolated from these problems

Page 5: Project Voldemort - Stanford Universitycs.stanford.edu/people/rsumbaly/files/voldemort.pdf · Amazon Dynamo ... Start store builder job on input_data_path Trigger Fetch + Swap / Rollback

Start with the design

  Single interface for all components

  get, put, getAll, delete

  Easy to test

Page 6: Project Voldemort - Stanford Universitycs.stanford.edu/people/rsumbaly/files/voldemort.pdf · Amazon Dynamo ... Start store builder job on input_data_path Trigger Fetch + Swap / Rollback

How do I talk to Voldemort?

 DB Tables ~ Stores

 Key unique to a Store

 Operations

  GET

  PUT

  GETALL   DELETE

  APPLYUPDATE – Optimistic Locking

Page 7: Project Voldemort - Stanford Universitycs.stanford.edu/people/rsumbaly/files/voldemort.pdf · Amazon Dynamo ... Start store builder job on input_data_path Trigger Fetch + Swap / Rollback

Where does my data go?

 Client or Server side

 Convert single GET, PUT, DELETE ops to multiple parallel ops

  Pluggable Routing strategy

  Consistent Hashing

  Zone aware Routing

Page 8: Project Voldemort - Stanford Universitycs.stanford.edu/people/rsumbaly/files/voldemort.pdf · Amazon Dynamo ... Start store builder job on input_data_path Trigger Fetch + Swap / Rollback

What is Consistent hashing?

  Split hash ring into partitions

 Assign partitions to nodes

 Replicas = Next partition on different node

Page 9: Project Voldemort - Stanford Universitycs.stanford.edu/people/rsumbaly/files/voldemort.pdf · Amazon Dynamo ... Start store builder job on input_data_path Trigger Fetch + Swap / Rollback

What about routing parameters?

  Per store routing parameters   N - The replication factor (how

many copies of each key-value pair we store)

  R - The number of reads required

  W - The number of writes we block for

  If R+W > N then we get to read our own writes

Page 10: Project Voldemort - Stanford Universitycs.stanford.edu/people/rsumbaly/files/voldemort.pdf · Amazon Dynamo ... Start store builder job on input_data_path Trigger Fetch + Swap / Rollback

And zone routing?

 Map nodes to zones (zone ~ datacenter, rack)

 Also provided is proximity list of zones

  In addition to N,R,W :

  ZR , ZW - Zones to block

Page 11: Project Voldemort - Stanford Universitycs.stanford.edu/people/rsumbaly/files/voldemort.pdf · Amazon Dynamo ... Start store builder job on input_data_path Trigger Fetch + Swap / Rollback

Versioning the data

 Vector Clocks – Map<Node, Integer>

 Version every key/value

 What about concurrent writes?

  Store all conflicting versions during writes

  Client resolves them during reads   Pluggable resolver

Page 12: Project Voldemort - Stanford Universitycs.stanford.edu/people/rsumbaly/files/voldemort.pdf · Amazon Dynamo ... Start store builder job on input_data_path Trigger Fetch + Swap / Rollback

Versioning the data

Page 13: Project Voldemort - Stanford Universitycs.stanford.edu/people/rsumbaly/files/voldemort.pdf · Amazon Dynamo ... Start store builder job on input_data_path Trigger Fetch + Swap / Rollback

How do we repair conflicting version?

 Read Repair

  Find inconsistent versions at read time

  Asynchronously send back correct version

  Max R network roundtrips

Page 14: Project Voldemort - Stanford Universitycs.stanford.edu/people/rsumbaly/files/voldemort.pdf · Amazon Dynamo ... Start store builder job on input_data_path Trigger Fetch + Swap / Rollback

Serialization & Storage

  Pluggable Serialization

  Custom Binary JSON, Thrift, Protocol Buffers, Avro, Java serialization

  Pluggable Storage Engine

  ConcurrentHashMap (great for testing), MySQL, BDB JE, Krati, Read-only

Page 15: Project Voldemort - Stanford Universitycs.stanford.edu/people/rsumbaly/files/voldemort.pdf · Amazon Dynamo ... Start store builder job on input_data_path Trigger Fetch + Swap / Rollback

Next problem, batch computed data

  Protect the live system

 Ability to rollback

  Failure tolerance

  Scalable – no bottleneck

Page 16: Project Voldemort - Stanford Universitycs.stanford.edu/people/rsumbaly/files/voldemort.pdf · Amazon Dynamo ... Start store builder job on input_data_path Trigger Fetch + Swap / Rollback

Read-only stores

 Build the Index offline

  Index structure   Single Hadoop job   Input – any InputFormat   Output - Multiple “chunks” per

partition (chunk ~ data + index file)

 Reads are fast   Cache warmness – Fetch the

index files last   Memory map the .index files   Search – Binary / Interpolation

Page 17: Project Voldemort - Stanford Universitycs.stanford.edu/people/rsumbaly/files/voldemort.pdf · Amazon Dynamo ... Start store builder job on input_data_path Trigger Fetch + Swap / Rollback

Read-only stores

 No performance hit on the running DB   Store N different versions of

data store_name/ version-0/ 0_0.data 0_0.index <partition>_<chunk>.<data|index> version-1/ ... version-2/ latest->version-2/

  Atomic Swap   Throttling

 Rollback - very quick!

Page 18: Project Voldemort - Stanford Universitycs.stanford.edu/people/rsumbaly/files/voldemort.pdf · Amazon Dynamo ... Start store builder job on input_data_path Trigger Fetch + Swap / Rollback

How does LinkedIn use this?

 Data dump to HDFS using Hadoop / Pig jobs   Binary JSON based OutputFormat   Custom Pig UDF which uses the above OutputFormat

 Azkaban Job   Start store builder job on input_data_path   Trigger Fetch + Swap / Rollback on voldemort_cluster_url   Optional : Voldemort Sanity check (Sample gets)�

Page 19: Project Voldemort - Stanford Universitycs.stanford.edu/people/rsumbaly/files/voldemort.pdf · Amazon Dynamo ... Start store builder job on input_data_path Trigger Fetch + Swap / Rollback

What else does Voldemort do?

 Monitoring stats via JMX

 Admin services   Allows adding, deleting stores without down-time

  Retrieving, deleting, updating partitions

 Run Map Reduce on your data - ETL

  EC2 testing framework

  Server side transforms *

  get(key, <function to run on server>)

 Rebalancing

  Move a partition from one node to another

  Add new nodes

Page 20: Project Voldemort - Stanford Universitycs.stanford.edu/people/rsumbaly/files/voldemort.pdf · Amazon Dynamo ... Start store builder job on input_data_path Trigger Fetch + Swap / Rollback

Future of Voldemort

  Publish Subscribe

 Other Repair mechanisms

  Incremental Pushes for Read-Only stores

 GUI

Page 21: Project Voldemort - Stanford Universitycs.stanford.edu/people/rsumbaly/files/voldemort.pdf · Amazon Dynamo ... Start store builder job on input_data_path Trigger Fetch + Swap / Rollback

? http://project-voldemort.com http://github.com/voldemort/voldemort http://sna-projects.com