This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
• Skip over blocks that don’t contain relevant data
10 | 13 | 14 | 26 |…
… | 100 | 245 | 324
375 | 393 | 417…
… 512 | 549 | 623
637 | 712 | 809 …
… | 834 | 921 | 959
10
324
375
623
637
959
Amazon Redshift
Column storage
Data compression
Zone maps
Direct-attached storage
• Local storage for performance
• High scan rates
• Automatic replication
• Continuous backup and streaming restores to/from Amazon S3
• User snapshots on demand
• Cross region backups for disaster recovery
Amazon Redshift online resize
Continue querying during resize
New cluster deployed in the background at no extra cost
Data copied in parallel from node to node
Automatic SQL endpoint switchover via DNS
SnowflakeStar
Amazon Redshift works with existing data models
Distribution Key All
Node 1
Slice 1
Slice 2
Node 2
Slice 3
Slice 4
Node 1
Slice 1
Slice 2
Node 2
Slice 3
Slice 4
key1
key2
key3
key4
All data on every node
Same key to same location
Node 1
Slice 1
Slice 2
Node 2
Slice 3
Slice 4
EvenRound robin distribution
Amazon Redshift data distribution
Sorting data in Amazon Redshift
In the slices (on disk), the data is sorted by a sort key
Choose a sort key that is frequently used in your queries
Data in columns is marked with a min/max value so Redshift can skip blocks not relevant to the query
A good sort key also prevents reading entire blocks
User Defined Functions
Python 2.7
PostgreSQL UDF Syntax System
Network calls within UDFs are prohibited
Pandas, NumPy, and SciPy pre-installed
Import your own
Interleaved Multi Column Sort
Currently support Compound Sort Keys• Optimized for applications that filter data by one leading column
Adding support for Interleaved Sort Keys• Optimized for filtering data by up to eight columns• No storage overhead unlike an index• Lower maintenance penalty compared to indexes
Amazon Redshift works with yourexisting analysis tools
JDBC/ODBC
Amazon Redshift
Questions?
AWS Summit – Chicago: An exciting, free cloud conference designed to educate and inform new customers about the AWS platform, best practices and new cloud services.
Details• July 1, 2015 • Chicago, Illinois• @ McCormick Place
Featuring• New product launches• 36+ sessions, labs, and bootcamps• Executive and partner networking
Registration is now open • Come and see what AWS and the cloud can do for you.• Click here to register: http://amzn.to/1RooPPL