Top Banner
54

Azure SQL Data Warehouse

Jan 21, 2018

Download

Technology

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Azure SQL Data Warehouse
Page 2: Azure SQL Data Warehouse

AzureSQL Data Warehouse

Page 3: Azure SQL Data Warehouse

1982 I started working with computers

1988 I started my professional career in computers industry

1996 I started working with SQL Server 6.0

1998 I earned my first certification at Microsoft as Microsoft Certified Solution Developer (3rd in Greece)

1999 I started my career as Microsoft Certified Trainer (MCT) with more than 30.000 hours of training until now!

2010 I became for first time Microsoft MVP on Data PlatformI created the SQL School Greece www.sqlschool.gr

2012 I became MCT Regional Lead by Microsoft Learning Program.

2013 I was certified as MCSE : Data PlatformI was certified as MCSE : Business Intelligence

2016 I was certified as MCSE: Data Management & Analytics

Antonios Chatzipavlis

SQL Server Expert and EvangelistData Platform MVP

MCT, MCSE, MCITP, MCPD, MCSD, MCDBA, MCSA, MCTS, MCAD, MCP, OCA, ITIL-F

Page 4: Azure SQL Data Warehouse

Μια πηγή ενημέρωσης για τον Microsoft SQL Server προς τους Έλληνες IT Professionals, DBAs, Developers, Information Workers αλλά και απλούς χομπίστες που απλά τους αρέσει ο SQL Server.

Help line : [email protected]

• Articles about SQL Server• SQL Server News• SQL Nights• Webcasts• Downloads• Resources

What we are doing here Follow us in socials

fb/sqlschoolgrfb/groups/sqlschool

@antoniosch@sqlschool

yt/c/SqlschoolGr

SQL School Greece group

S E L E C T K N O W L E D G E F R O M S Q L S E R V E R

Page 5: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Presentation Content

5

• First Look on Azure SQL DW

• Designing for Azure SQL DW

• Loading Data on Azure SQL DW

• Querying and Tuning Azure SQL DW

Page 6: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

First Look on Azure SQL Data Warehouse

6

Page 7: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

What is Azure SQL Data Warehouse?

7

Service inMicrosoft Azure

It’s a PAAS offering

It’s a Massively Parallel Processing

System

DistributeStorage

DistributedCompute

Page 8: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

SMP vs MPP

8

Symmetric Multiprocessing Massively Parallel Processing

Page 9: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Data Warehousing Unit

9

A measure of the underlying compute power of the database

Page 10: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Data Warehousing Unit

10

For Example

50 100

100 DWU 500 DWU

3 table loaded in 15 min20 minutes to run a report

3 table loaded in 3 min4 minutes to run a report

Page 11: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Why Choose Cloud Over On-Premises DW?

11

• Doesn’t need large CAPEX to get started

• Doesn’t need large OPEX

• We can scale storage and compute up or down on demand

Page 12: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

What and How do you pay for this Service ?

12

• Storage– Storage is billed by GB

– Standard or Premium Geo Redundant

– No cost for storage transactions

– Outbound data transfer is billed

• Compute Power– Compute is billed by DWUs

– Can go from 100 to 2000

– Billed per hour

When not in use, compute power of the DW can be completely paused for maximum savings

Page 13: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Provisioning Azure SQL Data Warehouse

13

Select a Region

Select or Create a Server

Pick origin of the data

Pick DWU level

Page 14: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Methods of Provisioning

14

• Azure Portal

– Select New > Data + Storage

• PowerShell

– New AzureRmSqlDatabase Cmdlet

• T-SQL

– CREATE DATABASE Command

Page 15: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Provision a Data Warehouse

15

DEMO

Page 16: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Designing forAzure SQL Data Warehouse

16

Page 17: Azure SQL Data Warehouse

SQL Server Azure SQL DW!=An Azure SQL DW database requires design decisions that are different from SQL Server

Page 18: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Distribution Key

18

Determines the method in which Azure SQL Data Warehouse spreads the data across multiple nodes

Azure SQL Data Warehouse uses up to 60 distributions when loading data into the system

Page 19: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Hash Distribution

19

RecordNo CustomerID InvoiceDate

1 1000 2017-04-21

2 1000 2017-04-22

3 2000 2017-04-22

4 3000 2017-04-22

5 4000 2017-04-22

Hashing by CustomerID

Page 20: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Round-Robin Distribution

20

RecordNo CustomerID InvoiceDate

1 1000 2017-04-21

2 1000 2017-04-22

3 2000 2017-04-22

4 3000 2017-04-22

5 4000 2017-04-22

Rows distributed to all nodes

Page 21: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Data Distribution best practice

21

Even DistributionOdd Distribution

Page 22: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Good Hash Key

22

DistributesEvenly

Used for Grouping

Used as Join Condition

Is NotUpdated

Has more than60

distinct values

Round-Robin will always provide a uniform distribution but not necessarily the best performance

Page 23: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Data Types

23

Use the smallest data type which will support your data

Avoid defining allcharacter columnsto a large default

length

Define columns asVARCHAR instead ofNVARCHAR if you

don’t need Unicode

The goal is to not only save space but also move data as efficiently as possible

Some complex data types (xml, geography, etc) are not supported yet

Page 24: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Table Types

24

Clustered Columnstore

Default table type

High compression

ratio

Ideally Segments of

1M rows

No secondary indexes Heap

No index on the data

Fast Load

No compression

Allows secondrary

indexes

Clustered B-Tree

Sorted index on the data

Fast singleton lookup

No compression

Allows secondary

indexes

Page 25: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Table Partitioning

25

1. Ease of loading and removal of data from a partitioned table

2. Targeting specific partitions on table maintenance operation

3. Performance improvements due to partition elimination

Partitioning is very common in SQL Server Data Warehouses for three reasons:

A highly granular partitioning scheme can work in SQL Server but hurt performance in Azure SQL DW

60 Distributions 365 Partitions 21.900 Data Buckets

21.900 Data Buckets Ideal Segment Size (1M Rows)

21.900.000.000 Rows

Lower Granularity (week, month) can perform better depending on how much data you have

Page 26: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

How do we apply these principles to a Dimensional Model?

26

• Fact Table– Large ones are better as Columnstores

– Distributed through Has key as much as possible as long as it is even

– Partitioned only if the is large enough to fill up each segment

• Dimension Tables– Can be Hash distributed or Round-Robin if there is no clear candidate

join key

– Columnstore for large dimensions

– Heap or Clustered Index for small dimensions

– Add secondary indexes for alternate join columns

– Partitioning not recommended

Page 27: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Analyzing distribution and data types for DW tables

27

DEMO

Page 28: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Loading Data on Azure SQL Data Warehouse

28

Page 29: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Loading an MPP System

29

The main principle of loading data into Azure DW is to do as much work in parallel as possible

Page 30: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Data Warehouse Readers

30

100 200 300 400 500 600 1000 1200 1500 2000

Readers 8 16 24 32 40 48 60 60 60 60

Writers 60 60 60 60 60 60 60 60 60 60

DWU

Your DWUs have a direct impact on how fast you can load data in parallel

- Azure SQL Data Warehouse introduces the concept of Data Warehouse Readers.

- These are threads that will be reading data in parallel and then passing it off to Writer threads.

Page 31: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Optimize Insert Batch Size

31

• Avoid trickle insert pattern– Ideal batch size is 1 million or more direct or in a file

• Avoid Ordered Data– Data ordered by distribution key can introduce hot spots that slow down the load

operation

• Using Temporary Tables– Stage and transform on a Temp Heap table before moving to permanent storage

• Use the CREATE TABLE AS statement– Fully parallel operation

– It’s minimally logged

– It can change: distribution, table type, partitioning

Page 32: Azure SQL Data Warehouse

CREATE TABLE #fact_tmpWITH(DISTRIBUTION = ROUND_ROBIN

)ASSELECT *FROM dbo.FactInternetSales;

Page 33: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

User Resource Class

33

Class Smallrc Mediumrc Largerc Xlargerc

Default 8 16 24 32

Memory 100 MB 100-1600 MB 200-3200 MB 400-6400 MB

The lower range corresponds to DWU100 the upper range to DWU2000

User Resource classes as database roles that govern how many resources are given to a query

For fast and high quality loads create a user just for loading which utilize a medium or large RC

Page 34: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Loading Methods

34

• Single-client loading methods– SSIS

– Azure Data Factory

– BCP

– Can add some parallel capabilities but are bottleneck at the Control node

• Parallel readers loading methods– PolyBase

– Reads from Azure Blob Storage and loads the content into Azure SQL DW

– Bypasses the Control node and loads directly into the Compute Nodes

Page 35: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Control Node

35

The Control Node receives connections and orchestrates the queries

The Compute Nodes do processing on the data and scale with the DWUs

Page 36: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Loading with SSIS

36

SSIS Control Node

Page 37: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Loading data with SSIS

37

DEMO

Page 38: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Loading with PolyBase

38

Control Node Azure

Blob Storage

PolyBase can load data from UTF-8 delimited text files and popular Hadoop file formats (RC file, ORC and Parquet)

Page 39: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Loading data with PolyBase

39

DEMO

Page 40: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Migration Utility

40

• Supports SQL Server 2012+ and Azure SQL Database

• Provides a migration report pointing out possible issues

• Assists with schema migration

• Assists with data migration

Page 41: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Using the Azure SQL DW migration utility

41

DEMO

Page 42: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Querying and Tuning Azure SQL Data Warehouse

42

Page 43: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Workload Management Principles

43

User Resource Class

Concurrency Model

Transaction Size

Tw

o M

ax

imu

m L

imit

s

1024 Connections

32 Concurrent Queries

Page 44: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Concurrency Queries and Concurrency Slots

44

100 200 300 400 500 600 1000 1200 1500 2000

Slots 4 8 12 16 20 24 40 48 60 80

DWU

Queries Executing Queries Incoming Queries Queued

DW200 7 2 1

DW1000 32 2 2

Examples

The above examples assumes that each query is consuming 1 concurrency slot

Page 45: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Resource Class and Concurrency Slots

45

Class Smallrc Mediumrc Largerc Xlargerc

DWUs 100-2000 100-2000 100-2000 100-2000

Slots 1 1-6 2-32 4-64

SELECT queries against system views, stats and other management commands do not use concurrency slots

Page 46: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Transaction Size Limits

46

100 200 300 400 500 600 1000 1200 1500 2000

GB / Distribution

1 1,5 2,25 3 3,75 4,5 7,5 9 11,25 15

DWU

A DW200 transaction doing equal work per distribution could consume 60 x 1,5 GB = 90 GB of space

Page 47: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Maintaining Statistics

47

• The service does not create or maintain stats automatically

• Creating New stats– Sampled single column stats is a good start– Multi columns stats for joins involving multiple columns– Focus on columns used in JOINs, GROUP BY, HAVING and WHERE clauses– Increase the sample if necessary

• Updating existing stats– If new dates or dimension categories added– If new data loads have completed– If an UPDATE or DELETE changes the distribution of data

Page 48: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Index Defrag

48

• Heap– Does not have a defrag option

• B-Tree Index– Useful for removing low levels of fragmentation

• Columnstore– Proactively compresses CLOSED rowgroups

• On a large table with heavy fragmentation it is often faster to recreate the table with the CREATE TABLE AS SELECT and switch it with the older

Page 49: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Index Rebuild

49

• Heap– Can be rebuilt to remove forward pointers

• B-Tree Index– Will remove high levels of fragmentation

• Columnstore– Can increase the density of segments

• Rebuilding as index is an OFFLINE operation in Azure SQL DW

Page 50: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Scaling Performance

50

• Increase the User Resource Class– EXEC sp_addrolemember ‘largerc’, ‘loaduser’;

– Higher Resource Class – more memory and CPU

– More concurrency slots – less concurrent queries

– The highest role assigned takes precedence

• Increase the Data Warehouse Units– ALTER DATABASE AWDW MODIFY (SERVICE_OBJECTIVE=‘DW1000’);

– It is an OFFLINE operation

– Make sure there are no loads or transactions in progress

– Can also be done through the Azure Portal

Page 51: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Tracking Queries with Lables

51

SELECT sum(Qty)FROM dbo.FactInternetSalesOPTION (LABEL=‘mylabel’);

SELECT *FROM sys.dm_pdw_exec_requestsWHERE label=‘mylabel’);

User Query

Admin Query

Page 52: Azure SQL Data Warehouse

Azure SQL Data Warehouse SQLschool.gr GWAB Athens 2017

Labeling a query and tracking its execution

52

DEMO

Page 53: Azure SQL Data Warehouse

https://aka.ms/cc9cf1

Page 54: Azure SQL Data Warehouse

Thank You

S E L E C T K N O W L E D G E F R O M S Q L S E R V E R