Top Banner
21

Accelerating performance with

Jan 15, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Accelerating performance with
Page 2: Accelerating performance with

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Accelerating performance with materialized views

A N T 4 1 7

Vuk Ercegovac

Principal Engineer

Amazon Web Services

Yannis Papakonstantinou

Senior Principal Scientist

Amazon Web Services

Page 3: Accelerating performance with

Amazon Redshift Materialized Views: Preview

1. Speed up queries by orders of magnitude

• For predictable workloads

• Save work with precomputed, materialized views

Page 4: Accelerating performance with

Speedup: Original Query

item_key store_key cust_key price

i1 s1 c1 12.00

i2 s2 c1 3.00

i3 s2 c2 7.00

store_key owner loc

s1 Joe SF

s2 Ann Chicago

s3 Lisa SF

Loc total_sales

SF 12.00

sales store_info

Join-Aggregate

Page 5: Accelerating performance with

Speedup: Materialized Views Precomputed Results

loc total_sales

SF 12.00

Chicago 10.00

loc_sales

item_key store_key cust_key price

i1 s1 c1 12.00

i2 s2 c1 3.00

i3 s2 c2 7.00

store_key owner loc

s1 Joe SF

s2 Ann Chicago

s3 Lisa SF

sales store_info

Page 6: Accelerating performance with

Materialized Views Speedup Query

loc total_sales

SF 12.00

loc total_sales

SF 12.00

Chicago 10.00

loc_sales

Page 7: Accelerating performance with

Amazon Redshift Materialized Views: Preview

1. Speed up

2. Simplify and accelerate maintenance of precomputed results

• Fast refresh: Efficient, incremental

• Example: ETL/BI pipelines

Page 8: Accelerating performance with

Fast Refresh: Amazon Redshift Incrementally Maintains

item_key store_key cust_key price

i1 s1 c1 12.00

i2 s2 c1 3.00

i3 s2 c2 7.00

store_key owner loc

s1 Joe SF

s2 Ann Chicago

s3 Lisa SF

loc total_sales

SF 12.00

Chicago 10.00 sales

store_info

loc_sales

i1 s3 c3 5.00

i2 s2 c4 8.00

Page 9: Accelerating performance with

Fast Refresh: Amazon Redshift Incrementally Maintains

item_key store_key cust_key price

i1 s1 c1 12.00

i2 s2 c1 3.00

i3 s2 c2 7.00

store_key owner loc

s1 Joe SF

s2 Ann Chicago

s3 Lisa SF

loc total_sales

SF 12.00

Chicago 10.00 sales

store_info

loc_sales

i1 s3 c3 5.00

i2 s2 c4 8.00

Page 10: Accelerating performance with

Fast Refresh: Amazon Redshift Incrementally Maintains

item_key store_key cust_key price

i1 s1 c1 12.00

i2 s2 c1 3.00

i3 s2 c2 7.00store_key owner loc

s1 Joe SF

s2 Ann Chicago

s3 Lisa SF

loc total_sales

SF 12.00+5.00

Chicago 10.00+8.00

sales

store_info

loc_sales

i1 s3 c3 5.00

i2 s2 c4 8.00

Page 11: Accelerating performance with

Fast Refresh: Amazon Redshift Incrementally Maintains

item_key store_key cust_key price

i1 s1 c1 12.00

i2 s2 c1 3.00

i3 s2 c2 7.00store_key owner loc

s1 Joe SF

s2 Ann Chicago

s3 Lisa SF

loc total_sales

SF 17.00

Chicago 18.00

sales

store_info

loc_sales

i1 s3 c3 5.00

i2 s2 c4 8.00

Page 12: Accelerating performance with

Example: Customer ETL Pipeline

1. Regular ingest of 3.8 M records per day

2. Strategy 1: CTAS (CREATE TABLE AS)

3. Strategy 2: Materialized Views

➢ Faster ingest times!

➢ Time to ingest new records is constant

➢ Smaller ingest batch → faster ingest

CTAS MV Speedup

Steady-State

Average371s 49s 7.9x

Page 13: Accelerating performance with

Materialized View Refresh Results

0

50

100

150

200

250

300

350

400

450

500

350 450 550 650 750 850 950 1,050

Qu

ery

Tim

e (

seco

nd

s)

Base Table Row Count (millions)

CTAS vs. MV Refresh Time (lower is better)

CTAS

RMV 16M

RMV 3.8M

Linear (CTAS)

Linear (RMV 16M)

Linear (RMV 3.8M)

Page 14: Accelerating performance with

Amazon Redshift Materialized Views: Preview

1. Speed up queries by orders of magnitude

3. Easier and faster migration to Amazon Redshift

Page 15: Accelerating performance with

Amazon Redshift Materialized Views: Functionality

➢ User creates materialized views that use one or more tables✓ CREATE MATERIALIZED VIEW loc_sales AS (

SELECT si.loc, SUM(s.price) FROM sales s, sales_info si WHERE s.store_key = si.store_keyGROUP BY si.loc);

➢ Speed up queries by accessing materialized views✓ SELECT * FROM loc_sales WHERE loc = ‘SF’;

➢ REFRESH command incrementally maintains materialized views✓ REFRESH MATERIALIZED VIEW loc_sales;

Page 16: Accelerating performance with

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Learn big data with AWS Training and Certification

Visit aws.amazon.com/training/paths-specialty/

New free digital course, Data Analytics Fundamentals, introduces

Amazon S3, Amazon Kinesis, Amazon EMR,

AWS Glue, and Amazon Redshift

Validate expertise with the AWS Certified Big Data - Specialty exam or the new AWS Certified Data Analytics - Specialty beta exam

Resources created by the experts at AWS to help you build and validate data analytics skills

Classroom offerings, including Big Data on AWS, feature AWS expert instructors and hands-on labs

Page 17: Accelerating performance with

Thank you!

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Yannis Papakonstantinou

[email protected]

Vuk Ercegovac

[email protected]

Page 18: Accelerating performance with

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Page 19: Accelerating performance with

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Page 20: Accelerating performance with

Additional Topics

➢How to use materialized views for physical

design tuning?

➢What part of SQL allows for incremental maintenance?

✓ INNER JOINs

✓ WHERE, HAVING, GROUP BY

✓ Associative Aggregate functions

✓ All other SQL is refreshed by recomputing the materialized view

Page 21: Accelerating performance with

Additional Topics: Coming Soon

➢Expand incremental maintenance to more SQL

➢More features coming soon. Stay tuned!