AWS Webcast - Build high-scale applications with Amazon DynamoDB
Post on 26-Jan-2015
110 Views
Preview:
DESCRIPTION
Transcript
Chris Munns Solutions Architect
Amazon Web Services
Build High-Scale Applications with
Amazon DynamoDB
Traditional Database Architecture
App/Web Tier
Client Tier
Database Tier
• key-value access • complex queries • transactions • analytics
One Database for All Workloads
App/Web Tier
Client Tier
RDBMS
Cloud Data Tier Architecture
App/Web Tier
Client Tier
Data Tier
Search Cache Blob Store
RDBMS NoSQL Data Warehouse
Workload Driven Data Store Selection
Data Tier
Search Cache Blob Store
RDBMS NoSQL Data Warehouse
logging analytics
key/value simple query
rich search hot reads complex queries and transactions
AWS Services for the Data Tier
Data Tier
Amazon DynamoDB
Amazon RDS
Amazon ElastiCache
Amazon S3
Amazon Redshift
Amazon CloudSearch
logging analytics
key/value simple query
rich search hot reads complex queries and transactions
RDBMS = Default Choice • Amazon.com page composed of responses from 1000’s of
independent services • Query patterns for different service are different
Catalog service is usually heavy key-value Ordering service is very write intensive (key-value) Catalog search has a different pattern for querying
Relational Era @ Amazon.com
RDBMS
Poor Availability Limited Scalability High Cost
Dynamo = NoSQL Technology • Replicated DHT with consistency management • Consistent hashing • Optimistic replication • “Sloppy quorum” • Anti-entropy mechanisms • Object versioning
Distributed Era @ Amazon.com
lack of strong every engineer needs to operational consistency learn distributed systems complexity
DynamoDB = NoSQL Cloud Service
Cloud Era @ Amazon.com
Non-Relational
Fast & Predictable Performance
Seamless Scalability
Easy Administration
DynamoDB Fundamentals
database service
automated operations predictable performance
fast development
always durable
low latency cost effective
=
partitions 1 .. N
table
• DynamoDB automatically partitions data by the hash key Hash key spreads data (& workload) across partitions
• Auto-partitioning occurs with: Data set size growth Provisioned capacity increases
Massive and Seamless Scale
large number of unique hash keys
+ uniform distribution of workload
across hash keys
ready to scale
app’s
Making life easier for developers…
• Developers are freed from: Performance tuning (latency) Automatic 3-way multi-AZ replication Scalability (and scaling operations) Security inspections, patches, upgrades Software upgrades, patches Automatic hardware failover Improving the underlying hardware …and lots of other stuff
Automated Operations
Provisioned Throughput • Request-based capacity provisioning model
• Throughput is declared and updated via the API or the console CreateTable (foo, reads/sec = 100, writes/sec = 150) UpdateTable (foo, reads/sec=10000, writes/sec=4500)
• DynamoDB handles the rest Capacity is reserved and available when needed Scaling-up triggers repartitioning and reallocation No impact to performance or availability
Predictable Performance
WRITES Continuously replicated to 3 AZ’s Quorum acknowledgment Persisted to disk (custom SSD)
READS Strongly or eventually consistent
No trade-off in latency
Durable At Scale
WRITES Continuously replicated to 3 AZ’s Quorum acknowledgment Persisted to disk (custom SSD)
READS Strongly or eventually consistent
No trade-off in latency
Low Latency At Scale
DynamoDB Customers
“DynamoDB has scaled effortlessly to match our company's explosive growth, doesn't burden our operations staff, and integrates beautifully with our other AWS assets”.
“I love how DynamoDB enables us to provision our desired throughput, and achieve low
latency and seamless scale, even with our constantly growing workloads.”
Weatherbug mobile app
Lightning detection & alerting for 40M users/month
Developed and tested in weeks, at “1/20th of the cost of the traditional DB approach”
Super Bowl promotion
Millions of interactions over a relatively short period of time
Built the app in 3 days, from
design to production-ready
Fast Development
Cost Effective
“Our previous NoSQL database required almost a full time administrator to run.
Now AWS takes care of it.”
“Being optimized at AdRoll means we spend more every month on snacks than
we do on DynamoDB – and almost nothing on an ops team”
Save Money Reduce Effort
DynamoDB Primitives
DynamoDB Concepts
table
DynamoDB Concepts
table
items
DynamoDB Concepts
attributes
items
table
schema-less schema is defined per attribute
DynamoDB Concepts
attributes
items
table
scalar data types • number, string, and binary multi-valued types • string set, number set, and binary set
DynamoDB Concepts
hash
hash keys mandatory for all items in a table key-value access pattern
PutItem UpdateItem DeleteItem BatchWriteItem
GetItem BatchGetItem
Hash = Distribution Key
partition 1 .. N
hash keys mandatory for all items in a table key-value access pattern determines data distribution
Hash = Distribution Key
large number of unique hash keys
uniform distribution of workload across hash keys
optimal schema design
+
Range = Query
range
hash
range keys model 1:N relationships enable rich query capabilities composite primary key
all items for a hash key ==, <, >, >=, <= “begins with” “between” sorted results counts top / bottom N values paged responses
Index Options
local secondary indexes (LSI) alternate range key + same hash key index and table data is co-located (same partition)
Projected Attributes
KEYS_ONLY INCLUDE ALL
Projected Attributes
KEYS_ONLY INCLUDE ALL
Projected Attributes
KEYS_ONLY INCLUDE ALL
Index Options
global secondary indexes (GSI)
any attribute indexed as new hash or range key
Same projected attribute options
• Currently 13 operations in total
Simple API
Manage Tables
• CreateTable
• UpdateTable
• DeleteTable
• DescribeTable
• ListTables
Read and Write Items
• PutItem
• GetItem
• UpdateItem
• DeleteItem
Read and Write Multiple Items
• BatchGetItem
• BatchWriteItem
• Query
• Scan
• Scalar data types String (S) - Unicode with UTF8 binary encoding Number (N) up to 38 digits precision and can be between 10-128 to
10+126
• Variable width encoding can occupy up to 21 bytes
• Multi-valued types String Set (SS) Number Set (NS) Not ordered
Data types
• Data is indexed by the primary key Single Hash Key
• Targeted towards object persistence
Hash Range composite Key • Sorted collection within hash bucket • Can store series of events for a given entity
• Automatic partitioning Leading hash key spreads data & workload across partitions
• Traffic is scaled out and parallelized
Indexing & Partitioning
• Consistent Reads Inventory, shopping cart applications
• Atomic Counters Increment and return new value in same operation
• Conditional Writes Expected value before write – fails on mismatch “state machine” use cases
• Sparse Indexes Ideal for sorted lists; fast access to a subset of items Popular: identify recently updated items; top lists; leaderboards
Other Features
• Use API/SDK/CLI Management Console to crate tables • Use the AWS SDK to interact with DynamoDB
PutItem, UpdateItem, DeleteItem Query Scan etc.
How to use DynamoDB?
$client = $aws->get("dynamodb");
$tableName = "ProductCatalog";
$response = $client->putItem(array(
"TableName" => $tableName,
"Item" => $client->formatAttributes(array(
"Id" => 120,
"Title" => "Book 120 Title",
"ISBN" => "120-1111111111",
"Authors" => array("Author12", "Author22"),
"Price" => 20,
"Category" => "Book",
"Dimensions" => "8.5x11.0x.75",
"InPublication" => 0,
)
),
"ReturnConsumedCapacity" => 'TOTAL'
));
Libraries, SDK’s
Web Console
Interaction
Command Line
Figure: Writing an item to a table via the PHP SDK
• Higher-Level Programming Interfaces
Object Persistence Model for .NET & Java Helper Classes for .NET Transaction Library for Java
• Local DynamoDB available for development and testing • Dynamic DynamoDB for auto-scaling • Many community contributed tools/frameworks
How to use DynamoDB?
[DynamoDBTable("ProductCatalog")]
public class Book
{
[DynamoDBHashKey]
public int Id { get; set; }
public string Title { get; set; }
public int ISBN { get; set; }
[DynamoDBProperty("Authors")]
public List<string> BookAuthors { get; set; }
[DynamoDBIgnore]
public string CoverPage { get; set; }
}
Figure: .NET class using object persistence model
Use Libraries and Tools
Transactions Atomic transactions across multiple items & tables Tracks status of ongoing transactions via two tables
1. Transactions 2. Pre-transaction snapshots of modified items
Geolocation Add location awareness to mobile
applications
Find Yourself – sample app
https://github.com/awslabs
• Third party library for automating scaling decisions • Scale up for service levels, scale down for cost • CloudFormation template for fast deployment
Autoscaling with Dynamic DynamoDB
• Disconnected development with full API support
No network No usage costs
Develop and Test Locally – DynamoDB Local
Note! DynamoDB Local does not have a durability or availability SLA
m2.4xlarge
DynamoDB Local
do this instead!
Some minor differences from Amazon DynamoDB • DynamoDB Local ignores your provisioned throughput
settings The values that you specify when you call CreateTable and
UpdateTable have no effect
• DynamoDB Local does not throttle read or write activity • The values that you supply for the AWS access key and the
Region are only used to name the database file • Your AWS secret key is ignored but must be specified
Recommended using a dummy string of characters
Develop and Test Locally – DynamoDB Local
• Reports CloudWatch metrics Latency Consumed throughput Errors Throttling
• Alarms can be used to dynamically size throughput
Monitoring
CloudWatch
• DynamoDB can be used for large data ingest • Redshift can directly load data from DynamoDB (COPY) • EMR can directly read from DynamoDB by using Hive
Analytics
CREATE EXTERNAL TABLE pc_dynamodb (
[attributes]
)
STORED BY
'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler
'
TBLPROPERTIES ([properties]);
Amazon S3
Redshift
EMR
External Hive table
External Hive table
Hive DynamoDB
CREATE EXTERNAL TABLE pc_s3 (
[attributes]
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION 's3://myawsbucket1/catalog/';
• Provisioned Throughput: $0.0065 per hour for every 10 units of Write Capacity 1 write per second for 1 KB items $0.0065 per hour for every 50 units of Read Capacity 1 consistent read per second for 4 KB items
• Storage $0.25 per GB-month of storage
• Free tier! 100MB storage + 50 writes/sec + 10 reads/sec each month
Pricing
Best Practices
• Method 1. Describe the overall use case – maintain context 2. Identify the individual access patterns of the use case 3. Model each access pattern to its own discrete data set 4. Consolidate data sets into tables and indexes
• Benefits Single table fetch for each query Payloads are minimal for each access
Access Pattern Modeling
• Design for uniform data access across items Partition distribution based on hash key Hash Key should be well distributed Access frequency should be distributed across different hash keys
• Time Series Pattern Logging Focus only on recent data
Table Best Practices
Hash Key value Efficiency
User ID, where the application has many users. Good
Status code, where there are only a few possible status codes. Bad
Device ID, where even if there are a lot of devices being tracked, one is by far more popular than all the others.
Bad
• Use One-to-Many Tables instead of large set attributes
Break items up in multiple tables
• Use Multiple Tables to support Varied Access Patterns If you frequently access large items but do not use all attributes, store
smaller frequently attributes in separate tables
• Compress large attributes Reduces cost of storage and throughput
• Store large attributes in S3
Item Best Practices
• Avoid sudden burst of read Activity Reduce page size of Scans Isolate scan operations; create separate tables and write to both:
• Mission-Critical Table • Shadow Table
• Take advantage of parallel scans Sequential scans take longer
Query and Scan Best Practices
Quick Poll + Questions?
Thanks for joining!
top related