Internals Concepts & Best Practices (Updated for V11.1) · DB2 LUW with BLU Acceleration : Internals Concepts & Best Practices ... Package Cache Lock List 1. SQL statement sent over

DB2 LUW with BLU Acceleration :Internals Concepts & Best Practices (Updated for V11.1)

Presenter : Vincent Kulandaisamy, Senior Software Engineer, [email protected]://www.linkedin.com/in/vincentkulandaisamy

Contributors : Matthew Emmerton, Matt Huras, David Kalmuk, Peter Kokosielis, Michael Kwok, Sam Lightstone, Jessica Rockwood, Berni Schiefer, IBM

2

#IDUG

Safe Harbour StatementIBM’s statements regarding its plans, directions, and intent, including the statements made in and during this presentation, are subject to change or withdrawal without notice at IBM’s sole discretion. Information regarding future products or features is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding future products or features is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about future products or features may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.

Performance and compression data is based on measurements and projections using IBM benchmarks in a controlled environment. The actual throughput, performance or compression that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.

3

#IDUG

Agenda• BLU Concepts : Recap

• A recap of the key design points behind BLU acceleration

• BLU Best Practices including selected internals information to help you understand “why”• Workload Selection• Operating System and Hardware Configuration• Software (DB2) Configuration• Table Creation• Table Population• Running your Workload & Managing your System

• Summary

Best practices are denoted with this icon throughout the presentation

4

#IDUG

Agenda• BLU Concepts : Recap

• A recap of the key design points behind BLU acceleration


• Summary


5

#IDUG

What is DB2 with BLU Acceleration?• Innovativetechnologyforanalyticqueries• Columnarstorage• Newrun-timeenginewithvector(akaSIMD)processing,deepmulti-coreoptimizationsandcache-awarememorymanagement

• “Activecompression”- uniqueencodingforfurtherstoragereductionbeyondDB210levels,andrun-timeprocessingwithoutdecompression

• “RevolutionbyEvolution”• BuiltdirectlyintotheDB2kernel• BLUtablescancoexistwithtraditionalrowtables,insameschema,tablespaces,bufferpools

• QueryanycombinationofBLUorrowdata• MemoryandCPUcacheoptimized

• Value:Order-of-magnitudebenefitsin …• Performance• Storagesavings• Timetovalue

0

50

100

150

200

250

CognosROLAP(AIX)

CognosBI(AIX)

USBank(AIX)

EuropeanISV(Linux)

TPC-H(AIX) SAP-BW(AIX)

TPC-DS(Linux)

WorkloadspeedofDB2Cancunover10.5FP1

%Im

provem

ent

60% 40%30%

48%70% 54%

230%

#IDUG

Click to edit Master title styleBLUAccelerationNewEraofSmart

Introducing BLU AccelerationIBM Research & Development Lab Innovations

BLU Acceleration

6

#IDUG


BLU Acceleration’s Innovation Drives Exceptional Workload Performance and Storage Savings

• ActionableCompression

• ParallelVectorProcessing

• DynamicIn-memoryProcessing

• DataSkipping

IBM Means Innovation

BLUAccelerationincludesover30+patentsandnewpatentspending fromIBMResearch&DevelopmentLaboratories.

7

#IDUG


How Big Are The Storage Savings? • Median compression rate is better than 10x• Add a further 20-40% benefit from reduced need for indexes and MQTs.

Compression Ratios (how many times smaller is the data?)

02468

101214161820

Foodmanufacturer

Consultant Machinemanufacturer

European ISV InvestmentBank

Transportation

Tim

es S

mal

ler

#IDUG


BLU Acceleration Illustration10TB query in seconds or less

10TB data

Actionable Compressionreduces to 1TB

In-memory

Parallel Processing32MB linear scan on each core via

Scans as fast as8MB through SIMD and CPU-cache optimized algorithms Result in

seconds or less

Column Processingreduces to 10GB

Data Skippingreduces to 1GB

DATA

DATA

DATA

DATA

DATA

DATA

DATA

DATA

DATA

DATA DATA DATA

DATADATA DATA

l The System: 32 cores, 1TB memory, 10TB table with 100 columns and 10 years of data

l The Query: How many “sales” did we have in 2010? - SELECT COUNT(*) from MYTABLE where YEAR = ‘2010’

l The Result: In seconds or less as each CPU core examines the equivalent of just 8MB of data

9

#IDUG


BLU Acceleration Illustration10TB query in seconds or less• The System: 32 cores, 1TB memory, 10TB table with 100 columns and 10 years of data• The Query: How many “sales” did we have in 2010?

• SELECT COUNT(*) from MYTABLE where YEAR = ‘2010’

• The Result: In seconds or less as each CPU core examines the equivalent of just 8MB of data

10TB data

Actionable Compressionreduces to 1TB

In-memory

Parallel Processing32MB linear scan on each core via

Scans as fast as8MB through SIMD and CPU-cache optimized algorithms Result in

seconds or less

Column Processingreduces to 10GB

Data Skippingreduces to 1GB

DATA

DATA

DATA

DATA

DATA

DATA

DATA

DATA

DATA

DATA DATA DATA

DATADATA DATA

10

#IDUG


“One of the things I like about these BLU Acceleration column-organized tables is that there's nothing for me to do with them. I just load them up, DB2 does its magic, and I'm done.”

- Andrew Juarez, Lead SAP Basis and DBA

11

#IDUG


How Fast is BLU Acceleration ?Workload Speed-up for several workloads.

These tests are on identical hardware versus traditional technology.

Examples of queries running over 1000x faster

[Sources: User feedback and internal tests.]

Speed-Up

0 10 20 30 40 50 60 70 80

Finance

Food

Telcom

BI ISV

Investment Banking

Insurance

BI ISV

Consulting

Medical

Travel & Entertainment

Transportation

Technology

Average37x

12

#IDUG


“We've tested DB2 10.5 with BLU Acceleration and found that it can be up to 43X faster with an analytic workload than our existing multi-server partitioned database environment.”- Randy Wilson, Lead DB2 for LUW Database Administrator, BlueCross BlueShield of Tennessee

“We've tested DB2 10.5 with BLU Acceleration and found that it can be up to 43X faster with an analytic workload than our existing multi-server partitioned database environment.”- Randy Wilson, Lead DB2 for LUW Database Administrator, BlueCross BlueShield of Tennessee

“While expanding our initial DB2 tests with BLU Acceleration, we continued to see exceptional compression rates – our tables compressed at over 92%. But, our greatest thrill wasn’t the compression rates (though we really like it), rather the improvement we found in query speed which was more than 50X faster than with row-organized tables.”- Xu Chang, Chief DBA Support - DB2 and Oracle Databases

13

#IDUG


“Wow…unbelievable speedup in query run times! We saw a speedup of 273x in our Vehicle Tracking report, taking a query from 10 minutes to 2.2 seconds. That adds value to our business; our end users are going to be ecstatic!”- Ruel Gonzalez - Information Services

“Compared to our current production system, DB2 10.5 with BLU Acceleration is running 106x faster for our Admissions and Enrollment workloads. We had one query that we would often cancel if it didn’t finish in 30 minutes. Now it runs in 56 seconds every time. 32x faster, predictable response time, no tuning…what more could we ask for?” - Brenda Boshoff, Sr. DBA

#IDUG


“My largest row-organized, adaptive compressed table gave me 3.2x storage savings. However, converting this row-organized uncompressed table to a column-organized table in DB2 10.5 delivered a massive 15.4x savings!”- Iqbal Goralwalla, Head of DB2 Managed Services, Triton

31.5x storage savings

(97% less storage required)

13.5x faster load time

#IDUG


“One of the things I like about these BLU Acceleration column-organized tables is that there's nothing for me to do with them. I just load them up, DB2 does its magic, and I'm done.” - Andrew Juarez, Lead SAP Basis and DBA

#IDUG


“We were very impressed with the performance and simplicity of BLU. We found that some queries achieved an almost 100x speed up with literally no tuning!”- Lennart Henäng, IT Architect, Handelsbanken

#IDUG


18

“DB2 BLU Acceleration is all it says it is. Simplicity at its best, the “Load and Go!” tagline is all true. We didn’t have to change any of our SQL, it was very simple to setup, and extremely easy to use. Not only did we get amazing performance gains and storage savings, but this was achieved without extra effort on our part.” - Ruel Gonzalez - Information Services

19

#IDUG

Agenda• BLU Concepts Recap

• The key design points behind BLU acceleration


• Summary


20

#IDUG

Clients

CPUDB2 Server

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Parallel Subagents

CoordinatorAgent

Log Buffers Buffer Pools

Logger Prefetchers Page Cleaners

Package Cache

LockList

1. SQL statement sent overnetwork to coordinator agent

2. SQL statement compiled andoptimized

3. Resulting access plan stored in shared access plan cache

4. Access plan execution begins; subagents kicked off to perform parallel scan

5. Periodic async prefetch requests sent to prefetchers (aka ‘io servers’)

6. Prefetchers asynchronouslydrive parallel I/O againsttablespace containers to bringin extents from disk intoseparate pages in bufferpool

7. Entire rows read out of buffer pool and decompressed. C2 values compared to ‘5’. Matching C1 values added to result set.

8. Result set sent back to client.

23

6

7

1 8

SELECT C1 FROM T1 WHERE C2=‘5’

77

5

4

extent extent

Review : Row Organized Query Processing

21

#IDUG

Clients

CPUDB2 Server

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Parallel Subagents

CoordinatorAgent



Package Cache

LockList

1. Access plan execution begins; subagents kicked off to perform parallel scan of column C2

2. Periodic prefetch requests sent to prefetchers (aka ‘io servers’)

3. Prefetchers asynchronouslydrive parallel I/O againsttablespace containers to bringrequested pages containing only C2 values from disk intoseparate pages in bufferpool (*)

4. Batch of C2 values is read out of buffer pool and compared to ‘5’ (data remains compressed -“active” compression), forming a batch of qualifying tuple sequence numbers (TSNs)

5. The C1 values corresponding to the batch of qualifying TSNs are prefetched

6. Qualifying C1 values added to result set,…

7. and sent to client

5

4

7

44

2

1

… Initial steps skipped …

(*) Synopsis filtering not shown here; More on this later.

3

6 6

ColumnarStorage

Each extent containsvalues for 1 column

BLU

SELECT C1 FROM T1 WHERE C2=‘5’Column Organized Query Processing

C1C3 C4

C2 C1C3 C4

C2 C1C3 C4

C2

22

#IDUG

Clients

CPUDB2 Server

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Parallel Subagents

CoordinatorAgent



Package Cache

LockList




4. Access plan execution begins

5. Agent searches for a page in the table large enough for row

6. Page found, and read into buffer pool

7. Agent acquires X lock on row

8. Agent writes log record to log buffer in memory (describes how to redo and undo the upcoming insert)

9. Agent inserts record to page in buffer pool (“dirties” page)

10. Success sent to client

2 43 5

9

D

1 10

INSERT INTO T1 (…)

6

D

8 7

Review : Row Organized INSERT Processing

23

#IDUG

Clients

CPUDB2 Server

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Parallel Subagents

CoordinatorAgent



Package Cache

LockList




4. Access plan execution begins

5. For each column, agent finds a page large enough for the column value (BLU uses an append approach for this)

6. Pages found, and read into buffer pool

7. Agent acquires X lock on logical row (aka TSN)

8. For each column, agent writes log records to log buffer in memory (describes how to redo and undo per column)

9. Agent inserts column tuples to pages in buffer pool (“dirties” pages)

10. Success sent to client

2 43 5

9

1 10

68 7

C1C3 C4

C2

ColumnarStorage

Each extent containsvalues for 1 column (*) Synopsis maintenance not shown here; More on this later.

DD

DD

BLUBLUReview : Column Organized INSERT Processing

INSERT INTO T1 (…)

24

#IDUG

Target Analytical or Warehousing Workloads

Probably

• Queries touch a subset of the columns in a table

• Grouping, aggregation, range scans, joins

• Star or snowflake schema

• Analytic workloads, data warehouses, data marts

Probably Not

• Queries touch most or all columns in a table

• Very few rows inserted, updated or deleted per transaction

• OLTP workloads

Will my workload benefit from BLU ?

25

#IDUG

Use Data Server Manager’s Table Organization Advisor to Help Identify Which TablesShould be Columnar

26

#IDUG

• MON_GET_TABLE function includes new metrics to help assess columns accessed per query

select section_exec_with_col_references as num_queries,(num_columns_referenced / nullif(section_exec_with_col_references, 0)) as avg_cols_ref_per_query

from table(mon_get_table(‘MYSCHEMA’,’MYTABLE’,-1))

NUM_QUERIES AVG_COLS_REF_PER_QUERY----------- ----------------------

655 2 If few columns are accessed relative to full set in table this may be

a good candidate to convert to a columnar

table

Also Helpful : DB2’s Built-in Monitoring

27

#IDUG




• Summary


28

#IDUG

Use a Recommended Operating System and Processor (V11.1)

Operating System supporting BLU (*)

Minimum Version Requirements

Recommended OS Versions

Processor Recommendations

AIX AIX 7.1 TL3 SP5 or AIX 7.2 AIX 7.2 or higher POWER8

Linux x86 (64-bit only)

RHEL 6.7, RHEL 7.1, SLES 11 SP4, SLES 12, Ubuntu 14.04.2

RHEL 7.2, SLES 12, Ubuntu 16.04 (or higher levels)

Intel Broadwell

Linux on Power (LE)

RHEL 7.1, SLES 12, Ubuntu 14.04.2

RHEL 7.2, SLES 12, Ubuntu 16.04 (or higher levels)

POWER8

zLinux RHEL7.1, SLES12, Ubuntu 16.04 RHEL 7.2, SLES 12, Ubuntu 16.04 (or higher levels)

z13

Windows (64-bit only)

Windows 7 Enterprise, Professional, Ultimate

Windows Server 2012 Intel Broadwell

29

#IDUG

Use (Virtual) Machines with …16 Cores for every 3TB Raw (uncompressed) Data16 GB of Main Memory per Core

• BLU loves multi-core machines !• BLU’s deep multi-core parallelism and SIMD optimizations are designed to

scale virtually linearly as the number of cores grows

• General purpose best practices for #cores and memory• To support general warehousing workload demands, we recommend a best

practice of 16 cores for every 3TB of raw (i.e. uncompressed) data• To ensure a balance between cores and memory, we recommend 16GB of

main memory for every core

• If you have detailed knowledge of your workload and access patterns, you can fine-tune your sizing • See later charts for an example, and/or contact your IBM technical sales

representative

These numbers apply to single node and DPF environments. On DPF, they apply to each Database (DB) Partition, based on the amount of raw data in the DB Partition.

30

#IDUG

BLU on DPF : Data Distribution• Just as with row organized tables …

§ Rows are distributed across DB partitionsvia a distribution key

§ A distribution key is 1 or more columnsin the table

§ Each table defines its own distribution key§ Similar distributed join strategies, eg.

§ Collocated. Joined tables have matching distributionkeys and are joined on those columns. Executedlocally on each DB Partition without sending data.

§ Directed. Join column(s) of one table hashed andsent to corresponding DBPartition of the other table.

§ Broadcast. All rows of one table sent to all DBPartitionsof the other table.,

Rows

Hash Function

Distribution Key

DB Partition

Columnar Storage

DB Partition

Columnar Storage

DB Partition

Columnar Storage

31

#IDUG

• Favour multiple database partitions (DPF) if …• Utility parallelism is important

• Utilities such as BACKUP, RESTORE and REORG can run in parallel across database partitions

• This can significantly reduce the time windows needed for maintenance/recovery operations• ETL/ELT parallelism is important

• INSERT, UPDATE, DELETE, INGEST and LOAD operations can run in parallel across database partitions

• This can significantly reduce the time windows needed for ETL/ELT and other batch jobs• You are using an OS image that spans multiple nodes in a NUMA machine

• NUMA machine = a machine that exhibits significant Non-Uniform-Memory-Access characteristics

• Favour a single database partition (non-DPF) if …• The overall simplicity of a non-clustered environment is desired• DB2’s HADR physical replication is desired

>20TB ? Use Multiple Database (DB) Partitions (i.e. DPF) <10TB ? Use a Single DB Partition (i.e. non DPF)In doubt ? Use the Following Considerations

32

#IDUG

32 cores512 GB6TB

600TB100x

32 cores512 GB

Some Examples

96 cores1.5 TB18TB

With challengingbackup & ETL windows

6 Logical DB Partitions


33

#IDUG

On DPF use 1 Database Partition per Socket(using Multiple Logical DB Partitions as necessary)

8 Coresper Socket

2 Socketsper Computer

Each Computer

90TB90TB RawData Total

1 LogicalDatabase Partitionper Socket

1.5 TB Rawper Logical DatabasePartition (16 coresper 3TB raw)

1.5TB 1.5TB

x30

With DB2_WORKLOAD=ANALYTICS, DB2 will automatically set default query SMP degree = # total cores / # logical DB partitions

34

#IDUG

On Non-Uniform-Memory-Access (NUMA) MachinesBind Logical DB Partitions to NUMA Nodes

8 Coresper Socket


Each Computer


1 LogicalDB Partitionper Socket

1.5 TB Rawper Logical DBPartition (16 coresper 3TB raw)

With per-SocketMemory/Controller

1.5TB 1.5TB

x10

1.5TB 1.5TB

35

#IDUG


8 Coresper Socket


Each Computer


1 LogicalDB Partitionper Socket

1.5 TB Rawper Logical DBPartition (16 coresper 3TB raw)

With SharedL3 Cache & MemoryController

1.5TB 1.5TB

x10

1.5TB 1.5TB

Linux Example

$ numactl --hardware

Output similar to thefollowing displays:

available: 4 nodes (0-3)node 0 size: 1901 MB node 0 free: 1457 MB node 1 size: 1910 MB node 1 free: 1841 MB node 2 size: 1910 MB node 2 free: 1851 MB node 3 size: 1905 MB node 3 free: 1796 MB

There are 4 NUMA nodes on the system. Update the db2nodes.cfg file to bind each DB Partition to a NUMA node on each system. We assume a homogeneouscluster here.

0 hostA 0 hostA 01 hostA 1 hostA 12 hostA 2 hostA 23 hostA 3 hostA 34 hostB 0 hostB 0

…

36

#IDUG



• BLU Best Practices including selected internals information to help you understand “why”• Workload Selection• Hardware Configuration• Software (DB2) Configuration• Table Creation• Table Population• Running your Workload & Managing your System

• Summary


37

#IDUG

The DB2_WORKLOAD=ANALYTICS Setting• A single registry variable setting that automatically optimizes DB2 for BLU

Acceleration and analytical workloads• Takes effect when the database is created (by default), or when autoconfigure is run on a database

• The following table highlights how the DB2_WORKLOAD=ANALYTICS setting changes autoconfigure’s actions

• autoconfigure will set many other parameters beyond those listed here (eg. bufferpool, and other memory heaps).

• autoconfigure is ‘DPF aware’. It will optimize for DPF, including cases where multiple logical DB Partitions are on the same machine.

• It calculates memory settings on the machine with the most dense logical partition configuration & propagates those values

Database Memory Configuration

CATALOGCACHE_SZ, SORTHEAP, and SHEAPTHRES_SHR are set to values that are higher than the default and optimized for the hardware that you are using. UTILITY_HEAP = AUTO, 1000000.

Other Database Configuration

DFT_TABLE_ORG = COLUMNPAGESIZE = 32 KB, DFT_EXTENT_SZ = 4DFT_DEGREE = ANY

Intraquery Parallelism Enabled for any workload (including SYSDEFAULTUSERWORKLOAD) that specifies MAXIMUM DEGREE DEFAULT, even if the database manager configuration parameter INTRA_PARALLEL = OFF.

Automatic Workload Management

A default concurrency threshold on the SYSDEFAULTMANAGEDSUBCLASS service subclass is enabled to ensure maximum efficiency and utilization of the server. More on this later.

Automatic Space Reclamation

Sets AUTO_MAINT = ON and AUTO_REORG = ON, and space reclamation is performed for column-organized tables by default. More on this later.

38

#IDUG

• Step 1: Set the DB2_WORKLOAD registry variable for optimal configuration defaults

db2set DB2_WORKLOAD=ANALYTICSdb2start

• Step 2: Create your databasedb2 create db mydb

autoconfigure using mem_percent 80

apply db and dbm

In past releases, create database ran autoconfigure with a default of 25%

of the (virtual) machine’s memory devoted to the database. Use mem_percent to override this.

Here we assume the database created is the only database in the DB2 instance. This is the best practice for production databases. ( Use ‘apply

db only’ if there are other databases in the instance. )

Scenario 1 : New Analytic Database

Set DB2_WORKLOADS=ANALYTICS before Creating Your Database

39

#IDUG

In this example we assume there are other databases in

the instance, so we used a smaller memory setting and

applied autoconfigure’s actions to this database only.

• Step 1: Set the DB2_WORKLOAD registry variable for optimal configuration defaults

db2set DB2_WORKLOAD=ANALYTICSdb2start

• Step 2: Run autoconfiguredb2 connect to mydbdb2 autoconfigure

using mem_percent 50apply db only

Scenario 2 : Existing Analytic Database

In a mixed analytical/transactional workload, consider selectively applying autoconfigure’s recommendations by using the ‘apply none’ option, reviewing autoconfigure’s recommendations, and then individually apply those that make sense in your environment.

Set DB2_WORKLOADS=ANALYTICS before Creating Your Database

40

#IDUG



• BLU Best Practices including, where pertinent,internals information to help you understand “why”• Workload Selection• Hardware Configuration• Software (DB2) Configuration• Table Creation• Table Population• Running your Workload & Managing your System

• Summary


41

#IDUG

41

§ With BLU, each page and extent contains values for a single columnTSN

0123456789…

TSN = Tuple Sequence Number

Mike HernandezChou Zhang

Carol WhiteheadWhitney SamuelsErnesto FryRick WashingtonPamela Funk

Sam GerstnerSusan NakagawaJohn Piconne

Mike HernandezChou Zhang

Carol WhiteheadWhitney SamuelsErnesto FryRick WashingtonPamela Funk

Sam GerstnerSusan NakagawaJohn Piconne

4322

6180357829

553247

4322

6180357829

553247

404 EscuelaSt.300 Grand Ave

1114 Apple Lane14 California Blvd.8883 Longhorn Dr.5661 Bloom St.166 Elk Road #47

911 Elm St.455 N. 1st St.18 Main Street

404 EscuelaSt.300 Grand Ave

1114 Apple Lane14 California Blvd.8883 Longhorn Dr.5661 Bloom St.166 Elk Road #47

911 Elm St.455 N. 1st St.18 Main Street

CACA

CACAAZNCOR

OHCAMA

CACA

CACAAZNCOR

OHCAMA

9003390047

9501491117857012760597075

436019511301111

9003390047

9501491117857012760597075

436019511301111

Los AngelesLos Angeles

CupertinoPasadenaTucsonRaleigh Beaverton

ToledoSan JoseSpringfield

Los AngelesLos Angeles

CupertinoPasadenaTucsonRaleigh Beaverton

ToledoSan JoseSpringfield

Page

Extent(assume

extentsize=2)

§ TSNs (a logical Row ID) are used to stitch together column values that belong in the same row during query processing

§ eg. SELECT zipcode FROM t WHERE name=“Mike Hernandez”§ an internal ‘page map index’ allows DB2 to quickly find the page containing the zipcode for TSN 4

§ Typically, column-organized tables use significantly less space than row-organized tables§ Unusual case: column-organized tables with many columns and very few rows can be larger than row-organized

tables as each column requires at least 1 extent

A Deeper Look at Internals : Column Storage§ With traditional tables, each page contains entire rows

42

#IDUG

BLU Tables are Created in Tablespaces(just like other tables)CREATE STORAGE SG1 ON ‘path1’, ‘path2’, ‘path3’CREATE TABLESPACE TS1 USING STOGROUP SG1CREATE TABLE SALES

(SALESKEY BIGINT not null,SALESPERSONKEY INT not null, PRODUCTKEY INT not null, PERIODKEY BIGINT not null,…)

ORGANIZE BY COLUMN IN TS1

SALEST1

TS1

StorageGroup

Bufferpool

createdwithbufferpool…createdusing…

createdin…

UsetheDFT_TABLE_ORGdatabaseconfigurationparametertosetdefaulttableorganization(roworcolumn).Thisissettocolumnifthedatabasewas

createdwithDB2_WORKLOAD=ANALYTICSin

effect.

43

#IDUG

CREATE TABLE t1(c1 int, c2 int)

ALTER TABLE … ADD CONSTRAINT uc1UNIQUE (c2)

CREATE INDEX i1 …CREATE INDEX i2 …CREATE INDEX i3 …

uc1

extentextentextentextentextentextentextent

i1

i3i2

Index (inx)object

Table (dat) object(each extent containpages of rows)

Tablespace

Roworganized

table



c1 datac1 nullability


c1 data nullabilityc2 data

c2 nullability

uc1

Index(inx)object

Table (dat)object

(meta data & compression dictionary)

Column (cde) object

(each extent contains pages of data for 1 column)

Columnorganized

table

Synopsis(records the

range of column values existing in

different TSN ranges of the

table)

Table and Index Definition & Creation : Row vs BLU

Tablespace

44

#IDUG

Define columns as NOT NULL if possible

• Removes the need to store the extra nullability information for each column• Reduces storage and memory consumption• Improves query processing efficiency





c1 data nullabilityc2 data

c2 nullability

uc1

Columnorganized

table

Tablespace

CREATE TABLE t1(c1 int not null,c2 int not null)


c1 datac2 datac1 datac2 data

uc1

Columnorganized

table

Tablespace

45

#IDUG

Use Informational Unique Constraints (instead of Enforced) if Uniqueness is Enforced Outside DB2

• If you have rigorous ETL or other procedures that ensure uniqueness, define your unique constraints as NOT ENFORCED (aka “informational”)

• Avoids creation of internal B-tree to enforce uniqueness• Reduced storage consumption• More efficient table maintenance

• Informational constraints help DB2’s query optimizer produce optimal plans




uc1

Columnorganized

table

Tablespace


ALTER TABLE … ADD CONSTRAINT uc1UNIQUE NOT ENFORCED (c2)


Columnorganized

table

Tablespace

46

#IDUG

SSD

Place Performance Critical Tables (and their Indexes) in Dedicated Tablespaces

• Allows you to optionally use dedicated bufferpools and storage groups (and therefore storage) for critical fact table data and any associated indexes needed to maintain enforced unique constraints• Not always necessary, but can come in handy if needed for fine tuning later

• Can use same storage group and bufferpools to begin with• Allows the table to be recovered independently from other tables

TS1

SG1 BP1

TS2

SG2 BP2

CREATE TABLE SALES( c1 BIGINT not null,

c2 INT not null,c3 BIGINT not null, …)

IN TS1 INDEXES IN TS2

TSDEF

SGDEF BPDEF

SALESSALES

indexes

OtherTables

SSDSATA

47

#IDUG




• Summary


48

#IDUG

BLU Compression Dictionaries

• Column-level dictionaries: one per column• Dictionary created during

• LOAD REPLACE, LOAD INSERT into empty table via new LOAD Analyze Phase• SQL INSERT, INGEST via Automatic Dictionary Creation (ADC)

• Page-level dictionaries: may also be created• If space savings outweighs cost of storing page-level dictionaries• Exploit local data clustering at page level to further compress data

Data Page

Column 1 Data

Page DictionaryColumn N Compression Dictionary

Column 1 Compression Dictionary

“dat” object

Data Page

Column 1 Data

Page DictionaryData Page

Column 1 Data

Page Dictionary

Data Page

Column 1 Data


Column 1 Data


Column N Data

Page Dictionary

“cde” object

… …

Table 1

49

#IDUG

BLU on DPF Architecture : Common Compression Encoding

CPUsCPUsCPUsCPUs

BLU Acceleration Dynamic In-Memory Processing

BLU BLU BLU BLU

DB2 BLU with 4 data partitions

A B C D

MyTablePartition 1

MyTablePartition 2

MyTablePartition 3

MyTablePartition 4

A B C D A B C D A B C D

• Just as in single node BLU …• Multiple compression techniques are

used, e.g.• Dictionary • Huffman• Prefix• Offset

• Each column uses it’s own compression dictionary

• New with BLU on DPF …• Each column uses the same

compression encoding on all database partitions

• Allows data communication in compressed form with no additional processing costs

• Increases effective network bandwidth significantly

In general, different columns will have

different compression encodings

But, a given column X will always have the same encoding

in all partitions

50

#IDUG

Build Histogram

CopyDictionary

Build Histogram

Copy Dictionary

BLU on DPF : Automatic Global Dictionary Creation (1/2)

Build Histogram

Build Dictionary

Co-ordinatorDB Partition

• Initial data is stored in table in uncompressed form• Initial data is used to create common table dictionary

• Each data member creates local histogram• A “Build” member (automatically selected) collects local histograms, creates global

table dictionary and sends it to other data nodes• “Z” lock no longer required; No longer requires committed data for dictionary build

• ( LOAD also creates a global dictionary, but does not use ADC )

INGEST,INSERT,IMPORT

51

#IDUG

DictionaryDictionaryCo-ordinatorDB Partition

• Remainder of data is stored in the table in compressed form• Online queries start using dictionary when available on all

members• Any uncompressed data read is converted to compressed

form during query processing

Dictionary

INGEST,INSERT,IMPORT

BLU on DPF : Automatic Global Dictionary Creation (2/2)

52

#IDUG

For Best Compression, Initially PopulateTables using LOAD and Representative Data

• Options for table population include LOAD, INGEST, INSERT, and IMPORT

• The initial table population creates the column level compression dictionaries

• LOAD’s analyze phase typically produces the highest quality column dictionaries• When the column level dictionaries are created with automatic dictionary creation (ADC)

via INSERT (or INGEST, IMPORT), the amount of data that can be practically analyzed may be limited

• Performing the initial LOAD with a representative set of data allows LOAD’s analyze phase to create high quality column dictionaries that maintains effective compression ratios as additional data is loaded

• Ensuring the best compression, not only saves storage, and memory, but also leads to the significant query processing efficiency improvements

Prior to 10.5 Cancun (aka FP4), LOAD’s analyze phase considered all input data. This could result in a lengthy analyze phase. In Cancun, LOAD automatically samples the input data to limit the analyze phase duration. If you are running on FP3 or earlier, consider:- first running LOAD … RESETDICTIONARYONLY with a representative data sample- then run LOAD … INSERT with a remaining data, and then run RUNSTATS

53

#IDUG

Ensure at least 1,000,000 pages are available in the Utility Heap for LOAD’s Analyze Phase; Use the new AUTOMATIC setting.

• LOAD’s analyze phase uses memory from the utility heap to build the histograms and other data structures it needs to create the column level compression dictionaries

• If the analyze phase runs short on memory, compression ratios can suffer

• The new AUTOMATIC setting allows the utility heap to grow automatically if it becomes exhausted and there is unused memory available elsewhere

The 1,000,000 page minimum mentioned above is a Rule of Thumb. It’s possible that, for some data, a larger utility heap would be helpful. See subsequent pages for information on evaluating the effectiveness of the column compression dictionaries.

These Utility Heap settings occur automatically if you tune DB2with DB2_WORKLOAD=ANALYTICS as described earlier !

54

#IDUG

S_DATE QTY ...

2005-03-01 176 ...

2005-03-02 85 ...

2005-03-02 267

2005-03-04 231

...

...

2006-10-17 476

User table: SALES_COL

SYN130330165216275152_SALES_COL

TSNMIN TSNMAX S_DATEMIN S_DATEMAX ...

0 1023 2005-03-01 2006-10-17 ...

1024 2047 2006-08-25 2007-09-15 ...

...

TSN = Tuple Sequence Number

0

1023

1024

2047

§ Meta-data that describes which ranges of values exist in which parts of the user table

§ On DPF distributed into DB Partitions like the base table

§ Enables DB2 to skip portions of a table when scanning data to answer a query

§ Benefits from data clustering, loading pre-sorted data§ Tiny : typically ~ 0.2% the size of the base table§ Transparently and automatically maintained by DB2

0

1023

Synopsis Table

55

#IDUG

Consider Sorting Data on Selected Columns Prior to Table Population

• If your ELT/ETL procedures allow, consider presorting fact table data on the columns that appear frequently in predicates that filter the fact table, or columns that are used to join with dimension tables

• This type of presorting can• Improve the effectiveness of data skipping via the synopsis tables• Improve compression ratios

• Optional

• Preserving natural ordering by date/time can also help data skipping

56

#IDUG

Use Catalog Statistics to Evaluate Effectiveness of Column Compression Dictionaries

C1 PCTENCODED = 90C2 PCTENCODED = 75C3 PCTENCODED = 100

C1 PCTENCODED = 0C2 PCTENCODED = 10C3 PCTENCODED = 0

§ The PCTENCODED statistic indicates the percentage of the column values that have been encoded (compressed) by the column-level dictionary § Measures the percentage of values compressed (NOT compression ratio)§ On DPF, reflects a single DBPartition (similarly to RUNSTATS)

§ If you have many low values, review previous best practices regarding initial population

§ Consider data characteristics when evaluating, eg:§ Completely random data may not compress well (eg. random string data)§ Note : Unique keys are not necessarily random ! BLU has effective compression

techniques for unique data. A low PCTENCODED for a unique key should not be ignored.

SYSCAT.COLUMNS

The existing PCTPAGESAVED statistic from SYSCAT.SYSTABLES also applies to columnar tables. It estimates the percentage of pages saved by BLU’s compression. RUNSTATs collects PCTPAGESAVED by estimating the number of pages needed to store the table in uncompressed row organization.

57

#IDUG

Col A Col B Col C Col D

A Deeper Look at INSERT Processing1. Fix ‘append data page’ for each

column into the buffer pool. (If there is no free space for a column, first pre-allocate an extent’s worth of pages.) Unfix pages.

2. As rows are inserted they are split into column values, compressed, and placed in temporary buffers (outside the bufferpool).

3. When a temporary buffer fills, or on COMMIT Refix target pages. Write 1 log record for an entire buffer of values for a specific column, and,…

4. Copy values from temporary buffers to the buffer pool pages. Unfix pages.

5. On COMMIT, flush log.

AC D

B

ColumnarStorage

Each extent containsvalues for 1 column

Buffer PoolLog Buffer

Temporary Buffers

1insertineachof1000UoWs:4000logrecords1000insertsinjust1UoW:4logrecords!!

58

#IDUG

When INSERTing, Commit in BatchesUse a Batch Size as large as( 1000 * #DB Partitions ) Rows if Possible

• Results in• Significant reduction in log space consumption• Smaller and more efficient synopsis tables

• Updates to the synopsis occur on commit (*)

• Improved performance

• Applies to any INSERT-based population method• INSERT, INGEST, IMPORT

59

#IDUG




• Summary


60

#IDUG

60

• Traditional ‘reorg’ not needed with BLU tables !

• No concept of ‘clustering’• BLU uses a ‘circular append’ approach for inserts

which means consecutive inserts use the same pages

• Deleted space can be easily reclaimed via REORG TABLE t1 RECLAIM EXTENTS

• Freed extents can exist anywhere in the column object (uses efficient “sparse table” technique used with MDC and ITC tables)

• The storage can be subsequently reused by any table in the tablespace

• Done online while work continues

• Done automatically via DB2’s automatic table maintenance when DB2_WORKLOAD=ANALYTICS

• Space is freed online while work continues

• Regular space management can result in increased performance of RUNSTATS and some queries

Column3Column1 Column2

2012 2012

2012

2012

DELETE * FROM MyTable WHERE Year = 2012

These extents hold only deleted data

Storage extent 2013 2013 2013

2013

BLU Automatic Space Reclamation

61

#IDUG

0) DB2_WORKLOAD=ANALYTICS

1) CREATE db, tablespaces

2) CREATE TABLE t1 …PRIMARY KEY (c1)ORGANIZE BY COLUMNIN TS1 INDEX IN TS2

3) LOAD FROM myfile INTO t1 …

4) SELECT c2,c3 … FROM t1 WHERE …

5) INSERT INTO t1 …

6) DELETE FROM t1 WHERE …

7) Automatic table maintenancereturns space to tablespace

Index(inx)object

Table (‘dat’)object

(meta data & compression dictionary)

Column (‘col’) object

(each extent contains pages of data for 1 column)

Synopsis(records the

range of column values existing in different regions

of the table)

TS2TS1

automatically maintains synopsis & collects table & index statistics

automatically maintains synopsis

c1c2c3c4c1c2c3c4

<empty><empty><empty><empty>

An Illustrative Scenario

62

#IDUG

Take Advantage of DB2’s Automatic Space Reclamation & Automatic Statistics Collection

Occurs by default with DB2_WORKLOAD=ANALYTICS

• Simply relying on DB2’s automatic tablemaintenance is typically all that’s needed !

• On occasion, explicit maintenance may be useful, eg.• “I’m running low on space, and need to reclaim some space now”

• Check for tables high reclaimable spaceSELECT SUBSTR(t,1,20) AS t, DBPARTITIONNUM, RECLAIMABLE_SPACEFROM SYSIBMADM.ADMINTABINFOWHERE t LIKE 'mytable%'ORDER BY tabname WITH UR

• Run reorg on selected tablesREORG TABLE t1 RECLAIM EXTENTS ON DBPARTITIONNUM N

• Done online while work continues• “I’ve just added significant data to a table, and am not using automatic statement statistics”

• Run RUNSTATS…WITH DISTRIBUTION AND SAMPLED DETAILED INDEXES ALL

All other maintenance activities that involve column-organized tables, such as backup and restore, are identical to those that involve row-organized tables.

Automatic maintenance (AUTO_MAINT) = ONAutomatic table maintenance (AUTO_TBL_MAINT) = ON

Automatic runstats (AUTO_RUNSTATS) = ONReal-time statistics (AUTO_STMT_STATS) = ONAutomatic reorganization (AUTO_REORG) = ON

63

#IDUG

Automatic Concurrency Management• Every additional query naturally consumes more memory, locks, CPU & memory bandwidth• In some databases, more queries can lead to contention and performance degradation • DB2 10.5 avoids this by automatically optimizing the level of concurrency

• DB2 10.5 allows an arbitrarily high number of concurrent queries to be submitted, but limits the number that consume resources at any point in time

• Lightweight queries that instant response, bypass this control

• Enabled automatically when DB2_WORKLOAD=ANALYTICS

...

Applications & UsersUp to tens of thousands of

SQL queries at once

DB2 DBMS kernel

SQL Queries

Moderate number of queries consume resources

Automatically determined based on available machine CPU

resources

DB Partitions (Data)

Moderate number of queries consume

resources

Moderate number of queries consume

resources

MPP Partition (Coordinator)

64

#IDUG

SYSDEFAULTUSERCLASS

SYSDEFAULT-SUBCLASS

Automatic Concurrency Management : Details

SYSDEFAULT-USERWORKLOAD

else

SYSDEFAULT-MANAGEDSUBCLASS

SYSDEFAULT-CONCURRENT

limitconcurrencytoN,queueexcess

querycost>X?

SYSDEFAULT-USERWAS

NewWLMobjectsshowningreen

Threshold

WorkClassSet

• New objects created in all 10.5 and 11 databases• SYSDEFAULTMANAGEDSUBCLASS - default subclass for managed queries• SYSDEFAULTUSERWAS - default work class set to map expensive queries (cost > X) to the above subclass• SYSDEFAULTCONCURRENT - default concurrency threshold to limit concurrently executing managed queries to N

• Default concurrency threshold enabled on database creation when DB2_WORKLOAD=ANALYTICS

• X and N are determined automatically by DB2• Based on available CPU resources• On DPF, this is a global system concurrency limit managed at the

catalog DBP

65

#IDUG

Take Advantage of DB2’s Automatic Workload Concurrency Management

Occurs by default with DB2_WORKLOAD=ANALYTICS

• Simply relying on DB2’s automatic workload concurrency management is typically sufficient

• On occasion, explicit maintenance may be useful, eg.• “I’ve just added more memory and CPUs to my machine, so the concurrency limit DB2

automatically calculated when I created the database, may no longer be optimal”1. Run autoconfigure without applying recommendations

AUTOCONFIGURE APPLY NONE

2. Review Recommended value for ““Threshold SYSDEFAULTCONCURRENT Maxvalue”

3. Alter the thresholdALTER THRESHOLD SYSDEFAULTCONCURRENTWHEN CONCURRENTDBCOORDACTIVITIES > 18AND QUEUEDACTIVITIES UNBOUNDED STOP EXECUTION

DESCRIPTION CURRENT RECOMMENDED--------------------------------------------------------------THRESHOLD SYSDEFAULTCONCURRENT MAXVALUE 16 18

If necessary, you can define additional DB2 workload management controls to meet the service-level agreement (SLA) requirements of the workload.

66

#IDUG

Monitoring and Fine Tuning• If you followed the previous best practices, you’re off to a great start !• To keep your installation running smoothly here are some important hints &

tips to be aware of :

• Sort memory. The Sortheap is used in BLU not only for sorting, but also for important run-time operations such as hash join and groupby

• You can access several monitoring elements to understand if your Sortheap configuration is healthy. Here are some of the key parameters:

Operation ElementSort ACTIVE_SORTS,

TOTAL_SORTS,SORT_OVERFLOWS,POST_SHRTHRESHOLD_SORTS,

Join ACTIVE_HASH_JOINS,TOTAL_HASH_JOINS, HASH_JOIN_OVERFLOWS, POST_SHRTHRESHOLD_HASH_JOINS,

Group by ACTIVE_HASH_GRPBYS, TOTAL_HASH_GRPBYS, HASH_GRPBY_OVERFLOWS, POST_THRESHOLD_HASH_GRPBYS

1/2

67

#IDUG

Monitoring and Fine Tuning• Sort Memory (continued)

• An “Overflow” indicates a query performed a run-time operation such as sort, hash join, groupby, and required more memory than what was available in its sort heap

• “Post threshold” operations indicate operations that did not receive all requested sortheap memory, due to consumption by other concurrent operations

• Take away : If you see significant overflows, or post threshold, consider :• Increasing SORTHEAP• Increasing the ratio of SHEAPTHRES_SHR to SORTHEAP

• Other relevant monitoring elements added for columnar tables include :

2/2

Category ElementBufferpool Health

New versions of existing monitor elements (for example, COL_HIT_RATIO_PERCENT and COL_PHYSICAL_READS) were added to enable you to monitor buffer pool usage by column-organized tables.

Prefetcher Activity

New versions of existing monitor elements (for example, POOL_QUEUED_ASYNC_COL_PAGES and SKIPPED_PREFETCH_COL_P_READS) were added to enable you to monitor prefetching for column-organized tables.

68

#IDUG

SummaryTarget Analytical or Warehousing Workloads Use Data Server Manager to Help Identify Which Tables Should be Columnar Use a Recommended Operating System and Processor Use a (Virtual) Machine with at Least 8-16 Cores and …

16 Cores for every 3TB Raw (uncompressed) Data16 GB of Main Memory per Core

Set DB2_WORKLOADS=ANALYTICS Before Creating Your Database Define columns as NOT NULL if possibleUse Informational Unique Constraints (instead of Enforced) if Uniqueness is Enforced Outside DB2 Place Performance Critical Tables (and their Indexes) in Dedicated TablespacesFor Best Compression, Initially Populate Tables using LOAD and Representative DataEnsure a least 1,000,000 pages are available in the Utility Heap for LOAD’s Analyze Phase; Use the new AUTOMATIC setting. Consider Sorting Data on Selected Columns Prior to Table Population

Use Catalog Statistics to Evaluate Effectiveness of Column Compression Dictionaries

When INSERTing, Commit in Batches of >1000 Rows if Possible

Take Advantage of DB2’s Automatic Workload Concurrency Management Take Advantage of DB2’s Automatic Space Reclamation & Automatic Statistics Collection

On DPF use 1 Database Partition per Socket(using Multiple Logical DB Partitions as necessary)



Vincent Kulandaisamy

[email protected]://www.linkedin.com/in/vincentkulandaisamy

Internals Concepts & Best Practices (Updated for V11.1) · DB2 LUW with BLU Acceleration : Internals Concepts & Best Practices ... Package Cache Lock List 1. SQL statement sent over

Documents