Top Banner
Under the Covers of DynamoDB Philip Fitzsimons Manager, Solution Architecture
100

Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Jan 27, 2015

Download

Technology

In this presentation you'll learn about the decisions that went into designing and building Amazon DynamoDB, and how it allows you to stay focused on your application while enjoying single digit latencies at any scale. We'll dive deep on how to model data, maintain maximum throughput, and drive analytics against your data, while profiling real world use cases, tips and tricks from customers running on Amazon DynamoDB today.

Phil Fitzsimons, Solution Architect, AWS
Rob Greig, CTO, Royal Opera House
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Under the Covers of DynamoDB

Philip Fitzsimons

Manager, Solution Architecture

Page 2: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

1. Getting started

3. Data modeling

4. Partitioning

5. Reporting & Analytics

Overview

2. Customer story: Royal Opera House

Page 3: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Getting started

Page 4: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

DynamoDB is a managed

NoSQL database service.

Store and retrieve any amount of data.

Serve any level of request traffic.

Page 5: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Without the operational burden.

Page 6: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Consistent, predictable performance.

Single digit millisecond latency.

Backed on solid-state drives.

Page 7: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Flexible data model.

Key/attribute pairs. No schema required.

Easy to create. Easy to adjust.

Page 8: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Seamless scalability.

No table size limits. Unlimited storage.

No downtime.

Page 9: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Durable.

Consistent, disk only writes.

Replication across data centers and availability zones.

Page 10: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Without the operational burden.

Page 11: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Focus on your app.

Page 12: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Two decisions + three clicks

= ready for use

Page 13: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Two decisions + three clicks

= ready for use

Primary keys

Level of throughput

Page 14: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Two decisions + three clicks

= ready for use

Primary keys

Level of throughput

Page 15: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Provisioned throughput.

Reserve IOPS for reads and writes.

Scale up for down at any time.

Page 16: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Pay per capacity unit.

Priced per hour of provisioned throughput.

Page 17: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Write throughput.

Size of item x writes per second

$0.0065 for 10 write units

Page 18: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Consistent writes.

Atomic increment and decrement.

Optimistic concurrency control: conditional writes.

Page 19: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Transactions.

Item level transactions only.

Puts, updates and deletes are ACID.

Page 20: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Read throughput.

Strong or eventual consistency

Page 21: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Read throughput.

Strong or eventual consistency

Provisioned units = size of item x reads per second

$0.0065 per hour for 50 units

Page 22: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Read throughput.

Strong or eventual consistency

Provisioned units = size of item x reads per second

$0.0065 per hour for 100 units

2

Page 23: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Read throughput.

Strong or eventual consistency

Same latency expectations.

Mix and match at ‘read time’.

Page 24: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Customer story

Royal Opera House

Page 25: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Royal Opera House

Rob Greig

CTO @rob_greig

Page 26: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Strategy

Innovation using open data and collaborative working

Web 3.0?

Page 27: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

>600 pages

Inconsistent structure

…and this doesn’t include microsites

Page 28: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Manual interlinking

Poor discoverability

Page 29: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Coherent and consistent linking

throughout the complete

production lifecycle

Page 30: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

iPad

Page 31: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Open Data

Page 32: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB
Page 33: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB
Page 34: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Customer Service

Page 35: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB
Page 36: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

DynamoDB

Page 37: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB
Page 38: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB
Page 39: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB
Page 40: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB
Page 41: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Provisioned throughput is

managed by DynamoDB.

Page 42: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Data is partitioned and

managed by DynamoDB.

Page 43: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Reserved capacity.

Up to 53% for 1 year reservation.

Up to 76% for 3 year reservation.

Page 44: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Authentication.

Session based to minimize latency.

Uses the Amazon Security Token Service.

Handled by AWS SDKs.

Integrates with IAM.

Page 45: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Monitoring.

CloudWatch metrics:

latency, consumed read and write throughput,

errors and throttling.

Page 46: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Libraries, mappers and mocks.

ColdFusion, Django, Erlang, Java, .Net,

Node.js, Perl, PHP, Python, Ruby

http://j.mp/dynamodb-libs

Page 47: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Data Modeling

Page 48: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

id = 100 date = 2012-05-16-

09-00-10 total = 25.00

id = 101 date = 2012-05-15-

15-00-11 total = 35.00

id = 101 date = 2012-05-16-

12-00-10 total = 100.00

id = 102 date = 2012-03-20-

18-23-10 total = 20.00

id = 102 date = 2012-03-20-

18-23-10 total = 120.00

Page 49: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

id = 100 date = 2012-05-16-

09-00-10 total = 25.00

id = 101 date = 2012-05-15-

15-00-11 total = 35.00

id = 101 date = 2012-05-16-

12-00-10 total = 100.00

id = 102 date = 2012-03-20-

18-23-10 total = 20.00

id = 102 date = 2012-03-20-

18-23-10 total = 120.00

Table

Page 50: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

id = 100 date = 2012-05-16-

09-00-10 total = 25.00

id = 101 date = 2012-05-15-

15-00-11 total = 35.00

id = 101 date = 2012-05-16-

12-00-10 total = 100.00

id = 102 date = 2012-03-20-

18-23-10 total = 20.00

id = 102 date = 2012-03-20-

18-23-10 total = 120.00

Item

Page 51: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

id = 100 date = 2012-05-16-

09-00-10 total = 25.00

id = 101 date = 2012-05-15-

15-00-11 total = 35.00

id = 101 date = 2012-05-16-

12-00-10 total = 100.00

id = 102 date = 2012-03-20-

18-23-10 total = 20.00

id = 102 date = 2012-03-20-

18-23-10 total = 120.00

Attribute

Page 52: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Where is the schema?

Tables do not require a formal schema.

Items are an arbitrarily sized hash.

Page 53: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Indexing.

Items are indexed by primary and secondary keys.

Primary keys can be composite.

Secondary keys index on other attributes.

Page 54: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

ID Date Total

id = 100 date = 2012-05-16-09-00-10 total = 25.00

id = 101 date = 2012-05-15-15-00-11 total = 35.00

id = 101 date = 2012-05-16-12-00-10 total = 100.00

id = 102 date = 2012-03-20-18-23-10 total = 20.00

id = 102 date = 2012-03-20-18-23-10 total = 120.00

Page 55: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

ID Date Total

id = 100 date = 2012-05-16-09-00-10 total = 25.00

id = 101 date = 2012-05-15-15-00-11 total = 35.00

id = 101 date = 2012-05-16-12-00-10 total = 100.00

id = 102 date = 2012-03-20-18-23-10 total = 20.00

id = 102 date = 2012-03-20-18-23-10 total = 120.00

Hash key

Page 56: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

ID Date Total

id = 100 date = 2012-05-16-09-00-10 total = 25.00

id = 101 date = 2012-05-15-15-00-11 total = 35.00

id = 101 date = 2012-05-16-12-00-10 total = 100.00

id = 102 date = 2012-03-20-18-23-10 total = 20.00

id = 102 date = 2012-03-20-18-23-10 total = 120.00

Hash key Range key

Composite primary key

Page 57: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

ID Date Total

id = 100 date = 2012-05-16-09-00-10 total = 25.00

id = 101 date = 2012-05-15-15-00-11 total = 35.00

id = 101 date = 2012-05-16-12-00-10 total = 100.00

id = 102 date = 2012-03-20-18-23-10 total = 20.00

id = 102 date = 2012-03-20-18-23-10 total = 120.00

Hash key Range key Secondary range key

Page 58: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Programming DynamoDB.

Small but perfectly formed API.

Page 59: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

CreateTable

UpdateTable

DeleteTable

DescribeTable

ListTables

Query

Scan

PutItem

GetItem

UpdateItem

DeleteItem

BatchGetItem

BatchWriteItem

Page 60: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

CreateTable

UpdateTable

DeleteTable

DescribeTable

ListTables

Query

Scan

PutItem

GetItem

UpdateItem

DeleteItem

BatchGetItem

BatchWriteItem

Page 61: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

CreateTable

UpdateTable

DeleteTable

DescribeTable

ListTables

Query

Scan

PutItem

GetItem

UpdateItem

DeleteItem

BatchGetItem

BatchWriteItem

Page 62: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Conditional updates.

PutItem, UpdateItem, DeleteItem can take

optional conditions for operation.

UpdateItem performs atomic increments.

Page 63: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

One API call, multiple items

BatchGet returns multiple items by key.

Throughput is measured by IO, not API calls.

BatchWrite performs up to 25 put or delete operations.

Page 64: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

CreateTable

UpdateTable

DeleteTable

DescribeTable

ListTables

Query

Scan

PutItem

GetItem

UpdateItem

DeleteItem

BatchGetItem

BatchWriteItem

Page 65: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Query vs Scan

Query for Composite Key queries.

Scan for full table scans, exports.

Both support pages and limits.

Maximum response is 1Mb in size.

Page 66: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Query patterns

Retrieve all items by hash key.

Range key conditions:

==, <, >, >=, <=, begins with, between.

Counts. Top and bottom n values.

Paged responses.

Page 67: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Mapping relationships.

EXAMPLE 1:

Page 68: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Players

user_id =

mza

location =

Cambridge

joined =

2011-07-04

user_id =

jeffbarr

location =

Seattle

joined =

2012-01-20

user_id =

werner

location =

Worldwide

joined =

2011-05-15

Page 69: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Players

user_id =

mza

location =

Cambridge

joined =

2011-07-04

user_id =

jeffbarr

location =

Seattle

joined =

2012-01-20

user_id =

werner

location =

Worldwide

joined =

2011-05-15

Scores user_id =

mza

game =

angry-birds

score =

11,000

user_id =

mza

game =

tetris

score =

1,223,000

user_id =

werner

location =

bejewelled

score =

55,000

Page 70: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Players

user_id =

mza

location =

Cambridge

joined =

2011-07-04

user_id =

jeffbarr

location =

Seattle

joined =

2012-01-20

user_id =

werner

location =

Worldwide

joined =

2011-05-15

Scores Leader boards

user_id =

mza

game =

angry-birds

score =

11,000

user_id =

mza

game =

tetris

score =

1,223,000

user_id =

werner

location =

bejewelled

score =

55,000

game =

angry-birds

score =

11,000

user_id =

mza

game =

tetris

score =

1,223,000

user_id =

mza

game =

tetris

score =

9,000,000

user_id =

jeffbarr

Page 71: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Players

user_id =

mza

location =

Cambridge

joined =

2011-07-04

user_id =

jeffbarr

location =

Seattle

joined =

2012-01-20

user_id =

werner

location =

Worldwide

joined =

2011-05-15

user_id =

mza

game =

angry-birds

score =

11,000

user_id =

mza

game =

tetris

score =

1,223,000

user_id =

werner

location =

bejewelled

score =

55,000

Scores game =

angry-birds

score =

11,000

user_id =

mza

game =

tetris

score =

1,223,000

user_id =

mza

game =

tetris

score =

9,000,000

user_id =

jeffbarr

Leader boards

Query for scores

by user

Page 72: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Players

user_id =

mza

location =

Cambridge

joined =

2011-07-04

user_id =

jeffbarr

location =

Seattle

joined =

2012-01-20

user_id =

werner

location =

Worldwide

joined =

2011-05-15

Scores Leader boards

user_id =

mza

game =

angry-birds

score =

11,000

user_id =

mza

game =

tetris

score =

1,223,000

user_id =

werner

location =

bejewelled

score =

55,000

game =

angry-birds

score =

11,000

user_id =

mza

game =

tetris

score =

1,223,000

user_id =

mza

game =

tetris

score =

9,000,000

user_id =

jeffbarr

High scores by game

Page 73: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Storing large items.

EXAMPLE 2:

Page 74: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Unlimited storage.

Unlimited attributes per item.

Unlimited items per table.

Maximum of 64k per item.

Page 75: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

message_id = 1 part = 1 message =

<first 64k>

message_id = 1 part = 2 message =

<second 64k>

message_id = 1 part = 3 joined =

<third 64k>

Split across items.

Page 76: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

message_id = 1 message =

http://s3.amazonaws.com...

message_id = 2 message =

http://s3.amazonaws.com...

message_id = 3 message =

http://s3.amazonaws.com...

Store a pointer to S3.

Page 77: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Time series data

EXAMPLE 3:

Page 78: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

event_id =

1000

timestamp =

2013-04-16-09-59-01

key =

value

event_id =

1001

timestamp =

2013-04-16-09-59-02

key =

value

event_id =

1002

timestamp =

2013-04-16-09-59-02

key =

value

Hot and cold tables. April

March

event_id =

1000

timestamp =

2013-03-01-09-59-01

key =

value

event_id =

1001

timestamp =

2013-03-01-09-59-02

key =

value

event_id =

1002

timestamp =

2013-03-01-09-59-02

key =

value

Page 79: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

April March February January December

Page 80: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Archive data.

Move old data to S3: lower cost.

Still available for analytics.

Run queries across hot and cold data

with Elastic MapReduce.

Page 81: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Partitioning

Page 82: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Uniform workload.

Data stored across multiple partitions.

Data is primarily distributed by primary key.

Provisioned throughput is divided evenly across partitions.

Page 83: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

To achieve and maintain full

provisioned throughput, spread

workload evenly across hash keys.

Page 84: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Non-Uniform workload.

Might be throttled, even at high levels of throughput.

Page 85: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Distinct values for hash keys.

BEST PRACTICE 1:

Hash key elements should have a

high number of distinct values.

Page 86: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

user_id =

mza

first_name =

Matt

last_name =

Wood

user_id =

jeffbarr

first_name =

Jeff

last_name =

Barr

user_id =

werner

first_name =

Werner

last_name =

Vogels

user_id =

simone

first_name =

Simone

last_name =

Brunozzi

... ... ...

Lots of users with unique user_id.

Workload well distributed across hash key.

Page 87: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Avoid limited hash key values.

BEST PRACTICE 2:

Hash key elements should have a

high number of distinct values.

Page 88: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

status =

200

date =

2012-04-01-00-00-01

status =

404

date =

2012-04-01-00-00-01

status

404

date =

2012-04-01-00-00-01

status =

404

date =

2012-04-01-00-00-01

Small number of status codes.

Unevenly, non-uniform workload.

Page 89: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Model for even distribution.

BEST PRACTICE 3:

Access by hash key value should be evenly

distributed across the dataset.

Page 90: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

mobile_id =

100

access_date =

2012-04-01-00-00-01

mobile_id =

100

access_date =

2012-04-01-00-00-02

mobile_id =

100

access_date =

2012-04-01-00-00-03

mobile_id =

100

access_date =

2012-04-01-00-00-04

... ...

Large number of devices.

Small number which are much more popular than others.

Workload unevenly distributed.

Page 91: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

mobile_id =

100.1

access_date =

2012-04-01-00-00-01

mobile_id =

100.2

access_date =

2012-04-01-00-00-02

mobile_id =

100.3

access_date =

2012-04-01-00-00-03

mobile_id =

100.4

access_date =

2012-04-01-00-00-04

... ...

Sample access pattern.

Workload randomized by hash key.

Page 92: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Reporting & Analytics

Page 93: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Seamless scale.

Scalable methods for data processing.

Scalable methods for backup/restore.

Page 94: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Amazon Elastic MapReduce.

Managed Hadoop service for

data-intensive workflows.

aws.amazon.com/emr

Page 95: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

create external table items_db

(id string, votes bigint, views bigint) stored by

'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'

tblproperties

("dynamodb.table.name" = "items",

"dynamodb.column.mapping" =

"id:id,votes:votes,views:views");

Page 96: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

select id, likes, views

from items_db

order by views desc;

Page 97: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Summary

1. Getting started

3. Data modeling

4. Partitioning

5. Reporting & Analytics

2. Customer story: Royal Opera House

Page 98: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

Free tier.

Page 99: Data & Analytics - Session 3 - Under the Covers with Amazon DynamoDB

aws.amazon.com/dynamodb