Top Banner
Who Moved My Tuple— Columnstore Indexes in SQL Server 2014 Joe D’Antoni Philadelphia SQL Server Users Group 25 March 2014
30

In memory columnstore indexes--make your data warehouse

Jun 14, 2015

Download

Technology

Jdanton

Presentation on SQL Server 2012 and 2014 Columnstore Indexing feature presented to Philadelphia SQL BI Usergroup on November 19, 2013
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: In memory columnstore indexes--make your data warehouse

Who Moved My Tuple—Columnstore Indexes in SQL Server 2014

Joe D’Antoni Philadelphia SQL Server Users Group25 March 2014

Page 2: In memory columnstore indexes--make your data warehouse

Joe D’Antoni

Joe has over 15 years of experience with a wide variety of data platforms, in both Fortune 50 companies as well as smaller organizations

He is a frequent speaker on database administration, big data, and career management

He is the co-president of the Philadelphia SQL Server User’s Group

He wants you to make sure you can restore your data

Joedantoni.wordpress.com – Blog, Slides

http://bit.ly/SQLColumnstore -- Slides, Resources

Page 3: In memory columnstore indexes--make your data warehouse

AgendaIndexes—a basic overview

Columnstore—an introduction

Query Performance—Demo

2012 and 2014—What’s Changing?

2014—Demo

Questions

Page 4: In memory columnstore indexes--make your data warehouse

Indexes• Data Structure that allows us

to speed data retrieval, by maintaining an extra copy of data

• Can be filtered

• Can be function based, or ordered

• Penalty is that writes become more expensive

• More storage required

Page 5: In memory columnstore indexes--make your data warehouse

Indexes in SQL Server• Clustered vs. Nonclustered

• Clustered Index—Index Organized Table

• Non-clustered index “just an index”

Page 6: In memory columnstore indexes--make your data warehouse

Clustered Index• Data is ordered as is inserted

into pages• Data in clustered index is only

stored on disk once (it’s the data from the tables)

• Table without a clustered index is called a heap—no order at all

Page 7: In memory columnstore indexes--make your data warehouse

LastName FirstName Address PhoneNumber

Gates Bill 101 Money Ln (206)-555-1111

Smith John 101 Anywhere Rd

(212)-566-1112

Smith John 181 Uphill Way (215)-555-2425

Zuckerberg Mark 1 Hacker Way (650)-555-9999

Clustered Index Layout

Ellison Larry 1 Oracle Way (650)-555-1245New Record to be inserted

LastName FirstName Address PhoneNumber

Ellison Larry 1 Oracle Way (650)-555-1245

Gates Bill 101 Money Ln (206)-555-1111

Smith John 101 Anywhere Rd

(212)-566-1112

Smith John 181 Uphill Way (215)-555-2425

Zuckerberg Mark 1 Hacker Way (650)-555-9999

Page 8: In memory columnstore indexes--make your data warehouse

Non-Clustered Index• Duplicate copy of the data in table

• Provides point from index to table data

• No specific order of data in index

Page 9: In memory columnstore indexes--make your data warehouse

LastName FirstName Address PhoneNumber

Gates Bill 101 Money Ln (206)-555-1111

Smith John 101 Anywhere Rd

(212)-566-1112

Smith John 181 Uphill Way (215)-555-2425

Zuckerberg Mark 1 Hacker Way (650)-555-9999

Non-Clustered Index Layout

Ellison Larry 1 Oracle Way (650)-555-1245New Record to be inserted

LastName FirstName Address PhoneNumber

Gates Bill 101 Money Ln (206)-555-1111

Smith John 101 Anywhere Rd

(212)-566-1112

Smith John 181 Uphill Way (215)-555-2425

Zuckerberg Mark 1 Hacker Way (650)-555-9999

Ellison Larry 1 Oracle Way (650)-555-1245

Page 10: In memory columnstore indexes--make your data warehouse

So Why All This Talk About Indexes?

Page 11: In memory columnstore indexes--make your data warehouse

Data Warehouse Queries• Data Warehouses have a lot of data

• Querying lots of a data can take a really long time

• Processing data row by row—may not be the most efficient way to perform aggregations

Page 12: In memory columnstore indexes--make your data warehouse

Traditional Approaches To Improving Performance• Partitioned Tables• Indexed Views• Data Compression

Page 13: In memory columnstore indexes--make your data warehouse

LastName FirstName Address PhoneNumber

Ellison Larry 1 Oracle Way (650)-555-1245

Gates Bill 101 Money Ln (206)-555-1111

Smith John 101 Anywhere Rd

(212)-566-1112

Smith John 181 Uphill Way (215)-555-2425

Zuckerberg Mark 1 Hacker Way (650)-555-9999

Compression in SQL Server

Uncompressed Table

LastName

FirstName

Address PhoneNumber

Ellison Larry 1 Oracle Way (650)-555-1245

Gates Bill 101 Money Ln (206)-555-1111

Smith John 101 Anywhere Rd

(212)-566-1112

Smith John 181 Uphill Way

(215)-555-2425

Zuckerberg

Mark 1 Hacker Way (650)-555-9999

Row Compressed Table

LastName

FirstName

Address PhoneNumber

Ellison Larry 1 ***c** W** (650)-555-*245

G*t** B*** *0* M**** ** *2***********

S***h J*** *** ******** ** *************

***** **** *8* Up**** *** *************

Z******** **** * ******* *** *************

Page Compressed Table

Page 14: In memory columnstore indexes--make your data warehouse

Introducing Columnstore Indexes (SQL 2012)• Data is stored in columns, as

opposed to rows• This allows a much higher rate

of compression• Columns not used in a query a

simply not scanned, nor returned

• Recommended practice is to add most columns in a table to a index

Page 15: In memory columnstore indexes--make your data warehouse

Fn LnAreaCode Phone StNum StName StType City State

A Disney 661872-4547 111Wilson Dr

Bakersfield CA

Al Disney 530778-3737 222Main St Lewiston CA

Amy Disney 209577-5824 410Park Av

Santa Rosa CA

Anita Disney 559642-4472 89

Ahwahnee St San Diego CA

Anita Disney 209966-4472 781Mariposa Dr Napa CA

Ann Disney 949830-1883 3Amato Ct Yountville CA

Original Table

Fn

A

Al

Amy

Anita

Anita

Ann

LnDisneyDisneyDisneyDisneyDisneyDisney

AreaCode

661530209559209949

Phone872-4547778-3737577-5824642-4472966-4472830-1883

StNum111222410

89781

3

StNameWilsonMainParkAhwahneeMariposaAmato

StTypeDrStAvStDrCt

CityBakersfieldLewistonSanta RosaSan DiegoNapaYountville

StateCACACACACACA

Split in Columns

Fn A*l*my*nita********

LnDisney******************************

AreaCode

6615302*9*******4*

Phone872-4547***-3*3****-****6**-****9**-******0-1***

StNum1112224*089

7**3

StNameWilsonMa**P*rk*hw***e****i*******t*

StTypeDrStAv****C*

CityBakersfieldL*wi*tonS**** ******* DiegoNapaYountville

StateCA**********

Columnstore Compressed

Page 16: In memory columnstore indexes--make your data warehouse

Columnar Data Storage

From Microsoft SIGMOD Paper

Page 17: In memory columnstore indexes--make your data warehouse

So How are Columnstores So Much Faster?• Very good compression ratio for Column

oriented data• Better use of Memory• Segment Elimination Skips Large Chunks of

Data• Batch Mode

• Processes data in chunks of a 1000 row “batches” rather than row by row

• 7-40x CPU savings with batch mode

“The key to getting the best performance is to make sure your queries process the large majority of data in batch mode.”

Page 18: In memory columnstore indexes--make your data warehouse

Columnstore All The Things?• Awesome performance—so

what’s the negative?• Can’t update/insert in

2012• Can only be nonclustered

index—so we are storing more data on disk

• Data types are somewhat limited

• One index per table• Can’t be a sorted index

Page 19: In memory columnstore indexes--make your data warehouse

Update Process (2012)

Fact Table

Partition 1

Fact Table

Partition 3

Fact Table

Partition 2

Staging Table Data To Be

Loaded

Build Columnstore Index

Fact Table

Partition 4Partition Switch

Data From Staging to Fact Table

Page 20: In memory columnstore indexes--make your data warehouse

So Where To Use Columnstore Indexes?• Only on Large Tables—Fact

tables and Dimension Tables > 3 Million Rows

• Include Every Column • Structure Queries as star

joins with grouping and aggregation

More details here

Page 21: In memory columnstore indexes--make your data warehouse

Columnstore 2014

Page 22: In memory columnstore indexes--make your data warehouse

Columnstore in 2014• Fewer Data Type Limitations

• Updateable

• Can be Clustered Index

• New Archival Compression Mode

• Batch Mode Improvements

Page 23: In memory columnstore indexes--make your data warehouse

Columnstore Trickle Updates (2014)

Updates To Index

Collected until they reach 210

rows

Tuple Movers

Move into Index

This is the process when loading 102,399 rows or fewer

Page 24: In memory columnstore indexes--make your data warehouse

Columnstore Bulk Insert

Page 25: In memory columnstore indexes--make your data warehouse

Columnstore Updates (2014)• Bulk Inserts go

through special API• Updates are

processed as inserts and deletes, so expensive operation

Page 26: In memory columnstore indexes--make your data warehouse

Columnstore Compression Effect

1 2 3 4 5 6 70

50

100

150

200

250

300

Columnstore Compression

No CS Clustered CS Archival CS

1 2 3 4 5 6 70

10

20

30

40

50

60

70

80

Columnstore Archival Compression

Clustered CS Archival CS

• Average space savings of columnstore versus no compression—69%

• Average space savings of columnstore Archival versus regular columnstore—29%

Page 27: In memory columnstore indexes--make your data warehouse

Columnstore 2014Demo

Page 28: In memory columnstore indexes--make your data warehouse

What Do We Do Differently in 2014• Best Practices are mostly the

same• Batch mode gets enhanced

and gains more query types• No need to worry about

dropping and rebuilding indexes—just append data

• Still focus on large tables where data is not frequently updated

• Archival Compression Good for old unused data

Page 29: In memory columnstore indexes--make your data warehouse

Questions

Page 30: In memory columnstore indexes--make your data warehouse

Contact [email protected]

Joedantoni.wordpress.com

@jdanton

http://bit.ly/SQLColumnstore -- Slides, Resources