Top Banner
Databases Databases Architectures & Architectures & Hypertable Hypertable Doug Judd Doug Judd CEO, Hypertable, Inc. CEO, Hypertable, Inc.
39

Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

Mar 26, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

Databases Architectures Databases Architectures & Hypertable& Hypertable

Doug JuddDoug Judd

CEO, Hypertable, Inc.CEO, Hypertable, Inc.

Page 2: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

Database TerminologyDatabase Terminology

Page 3: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

Structured, Semi-Structured, Structured, Semi-Structured, and Unstructured Dataand Unstructured Data

Structured is what RDBMS storeStructured is what RDBMS store Data is broken into discrete componentsData is broken into discrete components Types associated with each component:Types associated with each component:

integer, floating point, date, stringinteger, floating point, date, string Unstructured is free-form textUnstructured is free-form text Semi-structured is combination of Semi-structured is combination of

sturctured and semi-structuredsturctured and semi-structured

Page 4: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

Document-OrientedDocument-Oriented

Semi-structured documentsSemi-structured documents Accepts documents in a format such as Accepts documents in a format such as

JSON, XML, YAMLJSON, XML, YAML Often Schema-lessOften Schema-less Auto-index fieldsAuto-index fields Examples: CouchDB, MongoDBExamples: CouchDB, MongoDB Best Fit: XML or Web documentsBest Fit: XML or Web documents

Page 5: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

Graph DatabasesGraph Databases

Database designed to represent graphsDatabase designed to represent graphs APIs for performing graph operationsAPIs for performing graph operations

Traversal (depth-first, breadth-first)Traversal (depth-first, breadth-first) Shortest/Cheapest pathShortest/Cheapest path PartitioningPartitioning

Some allow HypergraphsSome allow Hypergraphs Examples:Examples:

Neo4j, HyperGraphDB, InfoGrid, Neo4j, HyperGraphDB, InfoGrid, AllegroGraph, Sones, DEX, FlockDB, AllegroGraph, Sones, DEX, FlockDB, OrientDB, VertexDB, InfiniteGraph, Filament OrientDB, VertexDB, InfiniteGraph, Filament

More info: sones graphdb landscapeMore info: sones graphdb landscape

Page 6: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

Column-OrientedColumn-Oriented

Data physically stored by columnData physically stored by column RDBMS typically row-orientedRDBMS typically row-oriented Improved performance for column Improved performance for column

operationsoperations Better data compressionBetter data compression Examples:Examples:

Hypertable, HBase, Cassandra, Vertica Hypertable, HBase, Cassandra, Vertica

Page 7: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

In-MemoryIn-Memory

Data set stored in RAMData set stored in RAM Extremely fast accessExtremely fast access Limited capacityLimited capacity Examples:Examples:

Memcached, Redis, MonetDB, VoltDBMemcached, Redis, MonetDB, VoltDB

Page 8: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

Horizontal ScalabilityHorizontal Scalability

Scale outScale out Increase capacity by adding machinesIncrease capacity by adding machines Opposite of vertical scalability (scale up)Opposite of vertical scalability (scale up) Commodity HardwareCommodity Hardware

Page 9: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

Distributed Hash Table (DHT)Distributed Hash Table (DHT)

Horizontally ScalableHorizontally Scalable DecentralizedDecentralized Fast accessFast access Restricted API: Restricted API: GET,SET,DELETEGET,SET,DELETE Peer-to-peer file sharing systems: Peer-to-peer file sharing systems:

BitTorrent, Napster, Gnutella, FreenetBitTorrent, Napster, Gnutella, Freenet Examples:Examples:

Dynamo, Cassandra, Riak, Project Voldemort, Dynamo, Cassandra, Riak, Project Voldemort, SimpleDB, S3, Redis, Scalaris, Membase SimpleDB, S3, Redis, Scalaris, Membase

Page 10: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

Scalable Database Scalable Database ArchitecturesArchitectures

Page 11: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

Auto-ShardingAuto-Sharding

Splits table data into horizontal “shards”Splits table data into horizontal “shards” Shards managed by traditional RDBMSShards managed by traditional RDBMS

(e.g. MySQL, Postgres)(e.g. MySQL, Postgres) Automated “glue” code to handle sharding Automated “glue” code to handle sharding

and request routingand request routing Examples:Examples:

MongoDB, AsterData, Greenplum MongoDB, AsterData, Greenplum

Page 12: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

MongoDBMongoDB

Page 13: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

DynamoDynamo

Developed by Amazon.com for their Developed by Amazon.com for their Shopping CartShopping Cart

Designed for high write availabilityDesigned for high write availability Eventually Consistent DHTEventually Consistent DHT Implementations:Implementations:

CassandraCassandra Project VoldemortProject Voldemort RiakRiak DynomiteDynomite

Page 14: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

Eventual ConsistencyEventual Consistency

Database update semantics in a Database update semantics in a distributed system with data replicationdistributed system with data replication

Strong Consistency - after an update Strong Consistency - after an update completes completes allall processes see the updated processes see the updated valuevalue

Eventual Consistency - Eventual Consistency - eventually alleventually all processes will see the updated valueprocesses will see the updated value

Most well-known eventual consistency Most well-known eventual consistency system is DNSsystem is DNS

Page 15: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

Eventual ConsistencyEventual Consistency

Page 16: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

Consistent HashingConsistent Hashing

Page 17: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

Amazon AWSAmazon AWS

S3S3 Online storage web serviceOnline storage web service Designed for larger amounts of dataDesigned for larger amounts of data Cost $0.15/GB per monthCost $0.15/GB per month

SimpleDBSimpleDB Designed for smaller amounts of dataDesigned for smaller amounts of data Provides indexing and richer query capabilityProvides indexing and richer query capability Cost $027/GB per month + machine utilization feeCost $027/GB per month + machine utilization fee

RDSRDS Managed MySQL instancesManaged MySQL instances

Page 18: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

Order Preserving Partitioner Order Preserving Partitioner (Cassandra)(Cassandra)

www.recipezaar.com 1091721999…6297502721091721999…629750272

++www.ribbonprinters.com 1091721999…965293103 1091721999…965293103

/ 2 =/ 2 =www.rgb????i?pQdp?.??? 1091721999…297521687?.??? 1091721999…297521687

Page 19: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

Order Preserving PartitionerOrder Preserving PartitionerBalance ProblemBalance Problem

Page 20: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

Bigtable: the infrastructure that Bigtable: the infrastructure that Google is built onGoogle is built on

Bigtable underpins 100+ Google Bigtable underpins 100+ Google services, including:services, including:

YouTube, Blogger, Google Earth, GoogleYouTube, Blogger, Google Earth, Google Maps, Orkut, Gmail, Google Analytics, Maps, Orkut, Gmail, Google Analytics,

Google Book Search, Google Code,Google Book Search, Google Code,Crawl Database…Crawl Database…

ImplementationsImplementations HypertableHypertable HBaseHBase

Page 21: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

Google StackGoogle Stack

GFSGFS - Replicates data inter-machine - Replicates data inter-machine MapReduceMapReduce - Efficiently process data in GFS - Efficiently process data in GFS BigtableBigtable - Indexed table structure - Indexed table structure

Page 22: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

Google File SystemGoogle File System

Page 23: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

Google File SystemGoogle File System

Page 24: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

System OverviewSystem Overview

Page 25: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

Data ModelData Model

Sparse, two-dimensional table with cell versionsSparse, two-dimensional table with cell versions Cells are identified by a 4-part keyCells are identified by a 4-part key

Row (string)Row (string) Column Family (byte)Column Family (byte) Column Qualifier (string)Column Qualifier (string) Timestamp (long integer)Timestamp (long integer)

Page 26: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

Table: Visual RepresentationTable: Visual Representation

Page 27: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

Table: Actual RepresentationTable: Actual Representation

Page 28: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

Scaling (part I)Scaling (part I)

Page 29: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

Scaling (part II)Scaling (part II)

Page 30: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

Scaling (part III)Scaling (part III)

Page 31: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

Request RoutingRequest Routing

Page 32: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

HypertableHypertable

Page 33: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

Hypertable OverviewHypertable Overview

Massively Scalable DatabaseMassively Scalable Database Modeled after Google’s BigtableModeled after Google’s Bigtable High Performance Implementation (C++)High Performance Implementation (C++) Thrift Interface for all popular High Level Thrift Interface for all popular High Level

Languages: Java, Ruby, Python, PHP, etcLanguages: Java, Ruby, Python, PHP, etc Open Source (GPL license)Open Source (GPL license) Project started March 2007 @ ZventsProject started March 2007 @ Zvents

Page 34: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

Hypertable In Use TodayHypertable In Use Today

Page 35: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

Hypertable vs. HBaseHypertable vs. HBase

Page 36: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

Hypertable vs. HBaseHypertable vs. HBaseTest Hypertable

Advantage Relative to HBase (%)

Random Read Zipfian 80 GB 925

Random Read Zipfian 20 GB 777

Random Read Zipfian 2.5 GB 100

Random Write 10KB values 51

Random Write 1KB values 102

Random Write 100 byte values 427

Random Write 10 byte values 931

Sequential Read 10KB values 1060

Sequential Read 1KB values 68

Sequential Read 100 byte values

129

Scan 10KB values 2

Scan 1KB values 58

Scan 100 byte values 75

Scan 10 byte values 220

Page 37: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

Annual EC2 Cost SavingsAnnual EC2 Cost Savings Assuming 200% improvementAssuming 200% improvement Extra large reserved instancesExtra large reserved instances

Page 38: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

www.hypertable.orgwww.hypertable.org

ResourcesResources

Project SiteProject Site www.hypertable.org

TwitterTwitter hypertable

Commercial SupportCommercial Support www.hypertable.com

Performance Evaluation Performance Evaluation Write-upWrite-up

blog.hypertable.com/?p=14

Page 39: Databases Architectures & Hypertable Doug Judd CEO, Hypertable, Inc.

Q&AQ&A