Page 1
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Amazon Redshift Overview & What’s Next
Rahul Pathak, Redshift PM ([email protected] ) Anurag Gupta, Redshift GM ([email protected] )
November 13, 2013
Page 2
Amazon Redshift
Fast, simple, petabyte-scale data warehousing for less than $1,000/TB/Year
Page 3
Amazon Redshift dramatically reduces I/O
• Data compression
• Zone maps
• Direct-attached storage
• With row storage you do unnecessary I/O
• To get total amount, you have to read everything
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
Page 4
• With column storage, you only read the data you need
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
Amazon Redshift dramatically reduces I/O
• Data compression
• Zone maps
• Direct-attached storage
Page 5
analyze compression listing; Table | Column | Encoding ---------+----------------+---------- listing | listid | delta listing | sellerid | delta32k listing | eventid | delta32k listing | dateid | bytedict listing | numtickets | bytedict listing | priceperticket | delta32k listing | totalprice | mostly32 listing | listtime | raw
Slides not intended for redistribution.
Amazon Redshift dramatically reduces I/O
• Data compression
• Zone maps
• Direct-attached storage
• COPY compresses automatically on load
• You can analyze and override
• More performance, less cost
Page 6
Amazon Redshift dramatically reduces I/O
• Data compression
• Zone maps
• Direct-attached storage
10 | 13 | 14 | 26 |…
… | 100 | 245 | 324
375 | 393 | 417…
… 512 | 549 | 623
637 | 712 | 809 …
… | 834 | 921 | 959
10
324
375
623
637
959
• Track the minimum and maximum value for each block
• Skip over blocks that don’t contain relevant data
Page 7
Amazon Redshift dramatically reduces I/O
• Data compression
• Zone maps
• Direct-attached storage
DW.HS1.8XL:
• > 2 GB/s scan rate
• Optimized for data processing
• High disk density
DW.HS1.XL:
Page 8
Amazon Redshift architecture
• Leader Node – SQL endpoint – Stores metadata – Coordinates query execution
• Compute Nodes
– Local, columnar storage – Execute queries in parallel – Load, backup, restore via Amazon S3 – Parallel load from Amazon DynamoDB
• Single node version available
10 GigE (HPC)
Ingestion Backup Restore
JDBC/ODBC
Page 9
Amazon Redshift parallelizes and distributes everything
• Load
• Backup/Restore
• Resize
Page 10
• Load in parallel from Amazon S3 or Amazon DynamoDB
• Data automatically distributed and sorted according to DDL
• Scales linearly with number of nodes
Amazon Redshift parallelizes and distributes everything
• Load
• Backup/Restore
• Resize
Page 11
• Backups to Amazon S3 are automatic, continuous and incremental
• Configurable system snapshot retention period
• Take user snapshots on-demand
• Streaming restores enable you to resume querying faster
Amazon Redshift parallelizes and distributes everything
• Load
• Backup/Restore
• Resize
Page 12
• Resize while remaining online
• Provision a new cluster in the background
• Copy data in parallel from node to node
• Only charged for source cluster
Amazon Redshift parallelizes and distributes everything
• Load
• Backup/Restore
• Resize
Page 13
• Automatic SQL endpoint switchover via DNS
• Decommission the source cluster
• Simple operation via Console or API
Amazon Redshift parallelizes and distributes everything
• Load
• Backup/Restore
• Resize
Page 14
Amazon Redshift lets you start small and grow big
Extra Large Node (DW.HS1.XL) 3 spindles, 2 TB, 16 GB RAM, 2 cores Single Node (2 TB) Cluster 2-32 Nodes (4 TB – 64 TB)
Eight Extra Large Node (DW.HS1.8XL) 24 spindles, 16 TB, 128 GB RAM, 16 cores, 10 GigE Cluster 2-100 Nodes (32 TB – 1.6 PB)
Note: Nodes not to scale
Page 15
Amazon Redshift is priced to let you analyze all your data
Price Per Hour for HS1.XL Single Node
Effective Hourly Price per TB
Effective Annual Price per TB
On-Demand $ 0.850 $ 0.425 $ 3,723
1 Year Reservation $ 0.500 $ 0.250 $ 2,190
3 Year Reservation $ 0.228 $ 0.114 $ 999
Simple Pricing Number of Nodes x Cost per Hour No charge for Leader Node No upfront costs Pay as you go
Page 16
Amazon Redshift has security built in • SSL to secure data in transit
• Encryption to secure data at rest
– AES-256; hardware accelerated – All blocks on disk and in Amazon S3
encrypted
• No direct access to compute nodes
• Amazon VPC support
10 GigE (HPC)
Ingestion Backup Restore
Customer VPC
Internal VPC
JDBC/ODBC
Page 17
Amazon Redshift automatically manages data replication and hardware failures
• Replication within the cluster and backup to Amazon S3 to maintain multiple copies of data at all times
• Backups to Amazon S3 are continuous, automatic, and incremental – Designed for eleven nines of durability
• Continuous monitoring and automated recovery from failures of drives and
nodes
• Able to restore snapshots to any Availability Zone within a region
Page 18
Growing ecosystem
Page 19
AWS Marketplace • Find software to use with
Amazon Redshift
• One-click deployments
• Flexible pricing options
http://aws.amazon.com/marketplace
Page 20
Over 40 new features since launch on Feb 14 • Regions
– N. Virginia, Oregon, Dublin, Tokyo, Singapore, Sydney
• Certifications – PCI, SOC 1/2/3
• Security
– Load/unload encrypted files, Resource-level IAM, Temporary credentials
• Manageability – Snapshot sharing, backup/restore progress indicators
• Query
– Regex, Cursors, MD5, SHA1, Time zone, workload queue timeout
• Ingestion – S3 Manifest, LZOP/LZO, JSON built-ins, UTF-8 4byte, invalid character substitution, CSV, auto datetime format
detection, epoch
Page 21
Amazon Redshift – What’s Next
Page 22
Security, visibility and control
• Audit logging
• SNS Alerts
Redshift
Page 23
Visibility and control
• Audit logging
• SNS Alerts
Amazon S3
Amazon Redshift
Database Activity Logins, Login failures,
Queries, Loads
System Activity Creates, Changes,
Deletes, Resizes
AWS CloudTrail
Page 24
Visibility and control
• Audit logging
• SNS Alerts
Amazon Redshift
SNS Topic
Monitoring Security
Maintenance Errors
Page 25
Batch operations
• Cluster Creation
• Faster Resize Amazon
Redshift
Amazon S3
Amazon EMR
Amazon EC2
Corporate Data Center
Page 26
Batch operations
• Cluster Creation
• Faster Resize Amazon
Redshift
Amazon S3
Amazon EMR
Amazon EC2
Corporate Data Center
Page 27
Batch operations
• Cluster Creation
• Faster Resize
15-20 min
3 min
Page 28
Batch operations
• Cluster Creation
• Faster Resize
29 hours
7 hours
Page 29
Performance & Concurrency
Page 30
Performance & Concurrency
692.8s
34.9s
< 2%
Page 31
Performance & Concurrency
5,951.7s
2,151.9s
Page 32
Performance & Concurrency
15
50
Page 33
Service Launch (2/14)
PDX (4/2)
Temp Credentials (4/11)
Unload Encrypted Files
DUB (4/25)
NRT (6/5)
JDBC Fetch Size (6/27)
Unload logs (7/5)
4 byte UTF-8 (7/18)
Statement Timeout (7/22)
SHA1 Builtin (7/15)
Timezone, Epoch, Autoformat (7/25)
WLM Timeout/Wildcards (8/1)
CRC32 Builtin, CSV, Restore Progress (8/9)
UTF-8 Substitution (8/29)
JSON, Regex, Cursors (9/10)
Split_part, Audit tables (10/3)
SIN/SYD (10/8)
HSM Support (11/11)
EMR/HDFS/SSH copy, Distributed Tables, Audit Logging/CloudTrail,
Concurrency, Resize Perf., Approximate Count Distinct, SNS
Alerts, WLM Memory Management (11/13)
SOC1/2/3 (5/8)
Sharing snapshots (7/18)
Resource Level IAM (8/9)
PCI (8/22)
Feature Delivery
6 weeks left
Page 34
Redshift Customers at re:Invent BDT 101: Big Data ‘State of the Union’
Earlier today
DAT 305: Getting Maximum Performance from Amazon Redshift Wednesday 11/13: 3pm in Murano 3303
Page 35
Redshift Customers at re:Invent DAT 306: How Amazon.com is Leveraging Amazon Redshift
Thursday 11/14: 3pm in Murano 3303
DAT 205: Amazon Redshift in Action: Enterprise, Big Data, SaaS Friday 11/15: 9am in Lido 3006
Page 36
Please give us your feedback on this presentation
As a thank you, we will select prize winners daily for completed surveys!
DAT 103