Getting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB Day

Jim Scharf

General Manager, DynamoDB

Time : 10:10 – 10:50

Getting Started with Amazon

DynamoDB

Getting Started with Amazon DynamoDB

AGENDA

• Brief history of data processing

• Relational (SQL) vs. Non-relational (NoSQL)

• DynamoDB tables & indexes

• Scaling

• Integration and Search Capabilities

• Pricing and Free Tier

• Customer Use Cases

Timeline of Database Technology

Data Volume Since 2010

• 90% of stored data generated in

last 2 years

• 1 Terabyte of data in 2010 equals

6.5 Petabytes today

• Linear correlation between data

pressure and technical innovation

• No reason these trends will not

continue over time

Technology Adoption and the Hype Curve

Relational (SQL) vs.

Non-relational (NoSQL)

Amazon’s Path to DynamoDB

RDBMSDynamoDB

Relational vs. Non-relational Databases

Traditional SQL NoSQL

DB

Primary Secondary

Scale Up

DB

DB

DBDB

DB DB

Scale Out

Why NoSQL?

Optimized for storage Optimized for compute

Normalized/relational Denormalized/hierarchical

Ad hoc queries Instantiated views

Scale vertically Scale horizontally

Good for OLAP Built for OLTP at scale

SQL NoSQL

SQL vs. NoSQL Schema Design

NoSQL design optimizes for

Compute instead of storage

NoSQL Opportunity

SQL NoSQL

Evolution of Databases

Amazon DynamoDB

Fully Managed

Low Cost

Predictable Performance

Massively Scalable

Highly Available

Consistently Low Latency At Scale

PREDICTABLE

PERFORMANCE!!!

High Availability and Durability

WRITESReplicated continuously to 3 AZ’s

Persisted to disk (custom SSD)

READSStrongly or eventually consistent

No latency trade-off

Designed to

support

99.99%of availability

Built for high

Durability

How DynamoDB Scales

partitions1 .. N

table

DynamoDB automatically partitions data

• Partition key spreads data (and workload) across

partitions

• Automatically partitions as data grows and throughput

needs increase

Large number of unique hash keys+

Uniform distribution of workloadacross hash keys

High-scale Apps

Flexibility and Low Cost

Reads per

second

Writes per

second

table

• Customers can configure a table

for just a few RPS or for

hundreds of thousands of RPS

• Customers only pay for how

much they provision

• Provides maximum flexibility to

adjust expenditure based on the

workload

Fully managed service = Automated Operations

DB hosted on premise DB hosted on Amazon EC2


DB hosted on premise DynamoDB

DynamoDB Tables & Indexes

DynamoDB Table StructureTable

Items

Attributes

PartitionKey

SortKey

Mandatory

Key-value access pattern

Determines data distribution Optional

Model 1:N relationships

Enables rich query capabilities

All items for key==, <, >, >=, <=“begins with”“between”“contains”“in”sorted resultscountstop/bottom N values

00 55 A954 FFAA

Partition Keys

Partition Key uniquely identifies an item

Partition Key is used for building an unordered hash index

Allows table to be partitioned for scale

Id = 1

Name = Jim

Hash (1) = 7B

Id = 2

Name = Andy

Dept = Eng

Hash (2) = 48

Id = 3

Name = Kim

Dept = Ops

Hash (3) = CD

Key Space

Partition:Sort KeyPartition:Sort Key uses two attributes together to uniquely identify an Item

Within unordered hash index, data is arranged by the sort key

No limit on the number of items (∞) per partition key• Except if you have local secondary indexes

00:0 FF:∞

Hash (2) = 48

Customer# = 2

Order# = 10

Item = Pen

Customer# = 2

Order# = 11

Item = Shoes

Customer# = 1

Order# = 10

Item = Toy

Customer# = 1

Order# = 11

Item = Boots

Hash (1) = 7B

Customer# = 3

Order# = 10

Item = Book

Customer# = 3

Order# = 11

Item = Paper

Hash (3) = CD

55 A9:∞54:∞ AA

Partition 1 Partition 2 Partition 3

Partitions are three-way replicated

Id = 2

Name = Andy

Dept = Engg

Id = 3

Name = Kim

Dept = Ops

Id = 1

Name = Jim

Id = 2

Name = Andy

Dept = Engg

Id = 3

Name = Kim

Dept = Ops

Id = 1

Name = Jim

Id = 2

Name = Andy

Dept = Engg

Id = 3

Name = Kim

Dept = Ops

Id = 1

Name = Jim

Replica 1

Replica 2

Replica 3

Partition 1 Partition 2 Partition N

Local secondary index (LSI)

Alternate sort key attribute

Index is local to a partition key

A1(partition)

A3(sort)

A2(item key)

A1(partition)

A2(sort)

A3 A4 A5

LSIs A1(partition)

A4(sort)

A2(item key)

A3(projected)

Table

KEYS_ONLY

INCLUDE A3

A1(partition)

A5(sort)

A2(item key)

A3(projected)

A4(projected)

ALL

10 GB max per partition

key, i.e. LSIs limit the # of

range keys!

Global secondary index (GSI)Alternate partition and/or sort key

Index is across all partition keys

A1(partition)

A2 A3 A4 A5

GSIs A5(partition)

A4(sort)

A1(item key)

A3(projected)

Table

INCLUDE A3

A4(partition)

A5(sort)

A1(item key)

A2(projected)

A3(projected) ALL

A2(partition)

A1(itemkey) KEYS_ONLY

RCUs/WCUs

provisioned separately

for GSIs

Online indexing

How do GSI updates work?

Table

Primary

tablePrimary

tablePrimary

tablePrimary

table

Global

Secondary

Index

Client

2. Asynchronous

update (in progress)

If GSIs don’t have enough write capacity, table writes will be throttled!

LSI or GSI?

LSI can be modeled as a GSI

If data size in an item collection > 10 GB, use GSI

If eventual consistency is okay for your scenario, use

GSI!

Scaling

Scaling

Throughput

• Provision any amount of throughput to a table

Size

• Add any number of items to a table

• Max item size is 400 KB

• LSIs limit the number of range keys due to 10 GB limit

Scaling is achieved through partitioning

Throughput

Provisioned at the table level

• Write capacity units (WCUs) are measured in 1 KB per second

• Read capacity units (RCUs) are measured in 4 KB per second

• RCUs measure strictly consistent reads

• Eventually consistent reads cost 1/2 of consistent reads

Read and write throughput limits are independent

WCURCU

Partitioning math

In the future, these details might change…

Number of Partitions

By Capacity (Total RCU / 3000) + (Total WCU / 1000)

By Size Total Size / 10 GB

Total Partitions CEILING(MAX (Capacity, Size))

Partitioning example Table size = 8 GB, RCUs = 5000, WCUs = 500

RCUs per partition = 5000/3 = 1666.67WCUs per partition = 500/3 = 166.67Data/partition = 10/3 = 3.33 GB

RCUs and WCUs are uniformly

spread across partitions


By Capacity (5000 / 3000) + (500 / 1000) = 2.17

By Size 8 / 10 = 0.8

Total Partitions CEILING(MAX (2.17, 0.8)) = 3

To learn more, please attend:

Deep Dive on DynamoDB Room E450a, 11:45am-12:45pm

Rick Houlihan, Principal Solutions Architect

Integration Capabilities

DynamoDB Triggers

Implemented as AWS

Lambda functions

Your code scales

automatically

Java, Node.js, and Python

DynamoDB Streams

Stream of table updates

Asynchronous

Exactly once

Strictly ordered

24-hr lifetime per item

Integration Capabilities (cont’d)

• Elasticsearch integration

• Full-text queries

Add search to mobile apps

Monitor IoT sensor status codes

App telemetry pattern discovery

using regular expressions

• Fine-grained access control

via AWS IAM

• Table-, Item-, and attribute-

level access control

Connect to other AWS Data Stores

Customer Use Cases

Over 200 million usersOver 4 billion items stored

Millions of ads per month

Cross-device ad solutions

130+ million new users in 1 year

150+ million messages per month

Process requests in milliseconds High-performance ads

Statcast uses burst scalability

for many games on a single day

Flexibility for fast growth

Web clickstream insights

Specialty online & retail stores

Over 5 billion items

processed daily

About 200 million messages

processed daily

Cognitive training

Job-matching platform

5+ million registered users

Mobile game analytics

10M global users

Home security

Wearable and IoT

solutions

170,000 concurrent players

https://youtu.be/ie4dWGT76LM?t=42m49s


http://aws.amazon.com/solutions/case-studies/duolingo/


http://aws.amazon.com/solutions/case-studies/vidroll/


http://aws.amazon.com/solutions/case-studies/myriad-group/


https://aws.amazon.com/solutions/case-studies/aws-case-study--remind/


http://aws.amazon.com/solutions/case-studies/tigerspike/


http://aws.amazon.com/solutions/case-studies/doapp/


http://aws.amazon.com/solutions/case-studies/major-league-baseball-mlbam/


http://aws.amazon.com/solutions/case-studies/nextdoor/


http://aws.amazon.com/solutions/case-studies/justgiving/


https://www.youtube.com/watch?v=HDY6cE3R0rQ


https://aws.amazon.com/solutions/case-studies/expedia/


http://aws.amazon.com/solutions/case-studies/peak/


https://aws.amazon.com/solutions/case-studies/jobandtalent/


http://aws.amazon.com/solutions/case-studies/beatpacking/


http://aws.amazon.com/solutions/case-studies/unalis/


http://aws.amazon.com/solutions/case-studies/infraware/


http://aws.amazon.com/solutions/case-studies/canary/


http://aws.amazon.com/solutions/case-studies/mediatek/


The Climate Corporation (TCC) Scales with Amazon DynamoDB

The Climate Corporation is a San Francisco-based

company that examines weather data to help farmers

optimize their decision-making.

The elasticity of DynamoDB

read/write Ops made

DynamoDB the fastest and

most efficient solution to

achieve our high ingest rateMohamed Ahmed

Director of Engineering,

Site Reliability Engineering & Data Analytics

The Climate Corporation

”

“ • Climate is digitizing agriculture, helping

farmers increase their yields and productivity

using scientific and mathematical models on

top of massive amounts of data

• Weather and Satellite imagery is one large

source of data used in TCC’s calculations

• TCC uses DynamoDB to ingest a burst of

data and satellite images retrieved from 3rd

parties before processing them

• TCC goes from few Read/Write Ops to

thousands each day to keep up with the

bursts of data written and read from it main

DynamoDB tables

Thank you!

Agenda

• Brief history of data processing

• Relational (SQL) vs. Non-relational (NoSQL)

• DynamoDB tables & indexes

• Scaling

• Int and Search Capabilities

• Pricing and Free Tier

• Customer Use Cases

Timeline of Database Technology

Data Volume Since 2010

• 90% of stored data generated in

last 2 years

• 1 Terabyte of data in 2010 equals

6.5 Petabytes today

• Linear correlation between data

pressure and technical innovation

• No reason these trends will not

continue over time

Technology Adoption and the Hype Curve

Relational (SQL) vs.

Non-relational (NoSQL)

Amazon’s Path to DynamoDB

RDBMSDynamoDB

Relational vs. Non-relational Databases

Traditional SQL NoSQL

DB

Primary Secondary

Scale Up

DB

DB

DBDB

DB DB

Scale Out

Why NoSQL?

Optimized for storage Optimized for compute

Normalized/relational Denormalized/hierarchical

Ad hoc queries Instantiated views

Scale vertically Scale horizontally

Good for OLAP Built for OLTP at scale

SQL NoSQL

SQL vs. NoSQL Schema Design

NoSQL design optimizes for

Compute instead of storage

NoSQL Opportunity

http://www.idc.com/getdoc.jsp?containerId=258162

http://www.idc.com/getdoc.jsp?containerId=258162

SQL NoSQL

Evolution of Databases

The Year of the Monkey

DynamoDB!

Amazon DynamoDB

Fully Managed

Low Cost

Predictable Performance

Massively Scalable

Highly Available

Consistently Low Latency At Scale

PREDICTABLE

PERFORMANCE!!!

High Availability and Durability

WRITESReplicated continuously to 3 AZ’s

Persisted to disk (custom SSD)

READSStrongly or eventually consistent

No latency trade-off

Designed to

support

99.99%of availability

Built for high

Durability

How DynamoDB Scales

partitions1 .. N

table

DynamoDB automatically partitions data

• Partition key spreads data (and workload) across

partitions

• Automatically partitions as data grows and throughput

needs increase

Large number of unique hash keys+

Uniform distribution of workloadacross hash keys

High-scale Apps

Flexibility and Low Cost

Reads per

second

Writes per

second

table

• Customers can configure a table

for just a few RPS or for

hundreds of thousands of RPS

• Customers only pay for how

much they provision

• Provides maximum flexibility to

adjust expenditure based on the

workload


DB hosted on premise DB hosted on Amazon EC2


DB hosted on premise DynamoDB

DynamoDB Tables & Indexes

DynamoDB Table StructureTable

Items

Attributes

PartitionKey

SortKey

Mandatory

Key-value access pattern

Determines data distribution Optional

Model 1:N relationships

Enables rich query capabilities

All items for key==, <, >, >=, <=“begins with”“between”“contains”“in”sorted resultscountstop/bottom N values

00 55 A954 FFAA

Partition Keys

Partition Key uniquely identifies an item

Partition Key is used for building an unordered hash index

Allows table to be partitioned for scale

Id = 1

Name = Jim

Hash (1) = 7B

Id = 2

Name = Andy

Dept = Eng

Hash (2) = 48

Id = 3

Name = Kim

Dept = Ops

Hash (3) = CD

Key Space

Partition:Sort KeyPartition:Sort Key uses two attributes together to uniquely identify an Item

Within unordered hash index, data is arranged by the sort key

No limit on the number of items (∞) per partition key• Except if you have local secondary indexes

00:0 FF:∞

Hash (2) = 48

Customer# = 2

Order# = 10

Item = Pen

Customer# = 2

Order# = 11

Item = Shoes

Customer# = 1

Order# = 10

Item = Toy

Customer# = 1

Order# = 11

Item = Boots

Hash (1) = 7B

Customer# = 3

Order# = 10

Item = Book

Customer# = 3

Order# = 11

Item = Paper

Hash (3) = CD

55 A9:∞54:∞ AA

Partition 1 Partition 2 Partition 3

Partitions are three-way replicated

Id = 2

Name = Andy

Dept = Engg

Id = 3

Name = Kim

Dept = Ops

Id = 1

Name = Jim

Id = 2

Name = Andy

Dept = Engg

Id = 3

Name = Kim

Dept = Ops

Id = 1

Name = Jim

Id = 2

Name = Andy

Dept = Engg

Id = 3

Name = Kim

Dept = Ops

Id = 1

Name = Jim

Replica 1

Replica 2

Replica 3

Partition 1 Partition 2 Partition N

Local secondary index (LSI)

Alternate sort key attribute

Index is local to a partition key

A1(partition)

A3(sort)

A2(item key)

A1(partition)

A2(sort)

A3 A4 A5

LSIs A1(partition)

A4(sort)

A2(item key)

A3(projected)

Table

KEYS_ONLY

INCLUDE A3

A1(partition)

A5(sort)

A2(item key)

A3(projected)

A4(projected)

ALL

10 GB max per partition

key, i.e. LSIs limit the # of

range keys!

Global secondary index (GSI)Alternate partition and/or sort key

Index is across all partition keys

A1(partition)

A2 A3 A4 A5

GSIs A5(partition)

A4(sort)

A1(item key)

A3(projected)

Table

INCLUDE A3

A4(partition)

A5(sort)

A1(item key)

A2(projected)

A3(projected) ALL

A2(partition)

A1(itemkey) KEYS_ONLY

RCUs/WCUs

provisioned separately

for GSIs

Online indexing

How do GSI updates work?

Table

Primary

tablePrimary

tablePrimary

tablePrimary

table

Global

Secondary

Index

Client

2. Asynchronous

update (in progress)

If GSIs don’t have enough write capacity, table writes will be throttled!

LSI or GSI?

LSI can be modeled as a GSI

If data size in an item collection > 10 GB, use GSI

If eventual consistency is okay for your scenario, use

GSI!

Scaling

Scaling

Throughput

• Provision any amount of throughput to a table

Size

• Add any number of items to a table

• Max item size is 400 KB

• LSIs limit the number of range keys due to 10 GB limit

Scaling is achieved through partitioning

Throughput

Provisioned at the table level

• Write capacity units (WCUs) are measured in 1 KB per second

• Read capacity units (RCUs) are measured in 4 KB per second

• RCUs measure strictly consistent reads

• Eventually consistent reads cost 1/2 of consistent reads

Read and write throughput limits are independent

WCURCU

Partitioning math

In the future, these details might change…


By Capacity (Total RCU / 3000) + (Total WCU / 1000)

By Size Total Size / 10 GB

Total Partitions CEILING(MAX (Capacity, Size))

Partitioning example Table size = 8 GB, RCUs = 5000, WCUs = 500

RCUs per partition = 5000/3 = 1666.67WCUs per partition = 500/3 = 166.67Data/partition = 10/3 = 3.33 GB

RCUs and WCUs are uniformly

spread across partitions


By Capacity (5000 / 3000) + (500 / 1000) = 2.17

By Size 8 / 10 = 0.8

Total Partitions CEILING(MAX (2.17, 0.8)) = 3

To learn more, please attend:

Deep Dive on DynamoDB Room E450a, 11:45am-12:45pm

Rick Houlihan, Principal Solutions Architect

Integration Capabilities

DynamoDB Triggers

Implemented as AWS

Lambda functions

Your code scales

automatically

Java, Node.js, and Python

DynamoDB Streams

Stream of table updates

Asynchronous

Exactly once

Strictly ordered

24-hr lifetime per item

Integration Capabilities (cont’d)

• Elasticsearch integration

• Full-text queries

Add search to mobile apps

Monitor IoT sensor status codes

App telemetry pattern discovery

using regular expressions

• Fine-grained access control

via AWS IAM

• Table-, Item-, and attribute-

level access control

Connect to other AWS Data Stores

Customer Use Cases

Over 200 million usersOver 4 billion items stored

Millions of ads per month

Cross-device ad solutions

130+ million new users in 1 year

150+ million messages per month

Process requests in milliseconds High-performance ads

Statcast uses burst scalability

for many games on a single day

Flexibility for fast growth

Web clickstream insights

Specialty online & retail stores

Over 5 billion items

processed daily

About 200 million messages

processed daily

Cognitive training

Job-matching platform

5+ million registered users

Mobile game analytics

10M global users

Home security

Wearable and IoT

solutions

170,000 concurrent players







































The Climate Corporation (TCC) Scales with Amazon DynamoDB

The Climate Corporation is a San Francisco-based

company that examines weather data to help farmers

optimize their decision-making.

The elasticity of DynamoDB

read/write Ops made

DynamoDB the fastest and

most efficient solution to

achieve our high ingest rateMohamed Ahmed

Director of Engineering,

Site Reliability Engineering & Data Analytics

The Climate Corporation

”

“ • Climate is digitizing agriculture, helping

farmers increase their yields and productivity

using scientific and mathematical models on

top of massive amounts of data

• Weather and Satellite imagery is one large

source of data used in TCC’s calculations

• TCC uses DynamoDB to ingest a burst of

data and satellite images retrieved from 3rd

parties before processing them

• TCC goes from few Read/Write Ops to

thousands each day to keep up with the

bursts of data written and read from it main

DynamoDB tables

Thank you!