DB2 LUW with BLU Acceleration : Internals Concepts & Best Practices (Updated for V11.1) Presenter : Vincent Kulandaisamy, Senior Software Engineer, IBM [email protected]https://www.linkedin.com/in/vincentkulandaisamy Contributors : Matthew Emmerton, Matt Huras, David Kalmuk, Peter Kokosielis, Michael Kwok, Sam Lightstone, Jessica Rockwood, Berni Schiefer, IBM
69
Embed
Internals Concepts & Best Practices (Updated for V11.1) · DB2 LUW with BLU Acceleration : Internals Concepts & Best Practices ... Package Cache Lock List 1. SQL statement sent over
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
DB2 LUW with BLU Acceleration :Internals Concepts & Best Practices (Updated for V11.1)
Presenter : Vincent Kulandaisamy, Senior Software Engineer, [email protected]://www.linkedin.com/in/vincentkulandaisamy
Contributors : Matthew Emmerton, Matt Huras, David Kalmuk, Peter Kokosielis, Michael Kwok, Sam Lightstone, Jessica Rockwood, Berni Schiefer, IBM
2
#IDUG
Safe Harbour StatementIBM’s statements regarding its plans, directions, and intent, including the statements made in and during this presentation, are subject to change or withdrawal without notice at IBM’s sole discretion. Information regarding future products or features is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding future products or features is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about future products or features may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.
Performance and compression data is based on measurements and projections using IBM benchmarks in a controlled environment. The actual throughput, performance or compression that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
3
#IDUG
Agenda• BLU Concepts : Recap
• A recap of the key design points behind BLU acceleration
• BLU Best Practices including selected internals information to help you understand “why”• Workload Selection• Operating System and Hardware Configuration• Software (DB2) Configuration• Table Creation• Table Population• Running your Workload & Managing your System
• Summary
Best practices are denoted with this icon throughout the presentation
4
#IDUG
Agenda• BLU Concepts : Recap
• A recap of the key design points behind BLU acceleration
• BLU Best Practices including selected internals information to help you understand “why”• Workload Selection• Operating System and Hardware Configuration• Software (DB2) Configuration• Table Creation• Table Population• Running your Workload & Managing your System
• Summary
Best practices are denoted with this icon throughout the presentation
5
#IDUG
What is DB2 with BLU Acceleration?• Innovativetechnologyforanalyticqueries• Columnarstorage• Newrun-timeenginewithvector(akaSIMD)processing,deepmulti-coreoptimizationsandcache-awarememorymanagement
Click to edit Master title styleBLUAccelerationNewEraofSmart
How Big Are The Storage Savings? • Median compression rate is better than 10x• Add a further 20-40% benefit from reduced need for indexes and MQTs.
Compression Ratios (how many times smaller is the data?)
02468
101214161820
Foodmanufacturer
Consultant Machinemanufacturer
European ISV InvestmentBank
Transportation
Tim
es S
mal
ler
#IDUG
Click to edit Master title styleBLUAccelerationNewEraofSmart
BLU Acceleration Illustration10TB query in seconds or less
10TB data
Actionable Compressionreduces to 1TB
In-memory
Parallel Processing32MB linear scan on each core via
Scans as fast as8MB through SIMD and CPU-cache optimized algorithms Result in
seconds or less
Column Processingreduces to 10GB
Data Skippingreduces to 1GB
DATA
DATA
DATA
DATA
DATA
DATA
DATA
DATA
DATA
DATA DATA DATA
DATADATA DATA
l The System: 32 cores, 1TB memory, 10TB table with 100 columns and 10 years of data
l The Query: How many “sales” did we have in 2010? - SELECT COUNT(*) from MYTABLE where YEAR = ‘2010’
l The Result: In seconds or less as each CPU core examines the equivalent of just 8MB of data
9
#IDUG
Click to edit Master title styleBLUAccelerationNewEraofSmart
BLU Acceleration Illustration10TB query in seconds or less• The System: 32 cores, 1TB memory, 10TB table with 100 columns and 10 years of data• The Query: How many “sales” did we have in 2010?
• SELECT COUNT(*) from MYTABLE where YEAR = ‘2010’
• The Result: In seconds or less as each CPU core examines the equivalent of just 8MB of data
10TB data
Actionable Compressionreduces to 1TB
In-memory
Parallel Processing32MB linear scan on each core via
Scans as fast as8MB through SIMD and CPU-cache optimized algorithms Result in
seconds or less
Column Processingreduces to 10GB
Data Skippingreduces to 1GB
DATA
DATA
DATA
DATA
DATA
DATA
DATA
DATA
DATA
DATA DATA DATA
DATADATA DATA
10
#IDUG
Click to edit Master title styleBLUAccelerationNewEraofSmart
“One of the things I like about these BLU Acceleration column-organized tables is that there's nothing for me to do with them. I just load them up, DB2 does its magic, and I'm done.”
- Andrew Juarez, Lead SAP Basis and DBA
11
#IDUG
Click to edit Master title styleBLUAccelerationNewEraofSmart
How Fast is BLU Acceleration ?Workload Speed-up for several workloads.
These tests are on identical hardware versus traditional technology.
Examples of queries running over 1000x faster
[Sources: User feedback and internal tests.]
Speed-Up
0 10 20 30 40 50 60 70 80
Finance
Food
Telcom
BI ISV
Investment Banking
Insurance
BI ISV
Consulting
Medical
Travel & Entertainment
Transportation
Technology
Average37x
12
#IDUG
Click to edit Master title styleBLUAccelerationNewEraofSmart
“We've tested DB2 10.5 with BLU Acceleration and found that it can be up to 43X faster with an analytic workload than our existing multi-server partitioned database environment.”- Randy Wilson, Lead DB2 for LUW Database Administrator, BlueCross BlueShield of Tennessee
“We've tested DB2 10.5 with BLU Acceleration and found that it can be up to 43X faster with an analytic workload than our existing multi-server partitioned database environment.”- Randy Wilson, Lead DB2 for LUW Database Administrator, BlueCross BlueShield of Tennessee
“While expanding our initial DB2 tests with BLU Acceleration, we continued to see exceptional compression rates – our tables compressed at over 92%. But, our greatest thrill wasn’t the compression rates (though we really like it), rather the improvement we found in query speed which was more than 50X faster than with row-organized tables.”- Xu Chang, Chief DBA Support - DB2 and Oracle Databases
13
#IDUG
Click to edit Master title styleBLUAccelerationNewEraofSmart
“Wow…unbelievable speedup in query run times! We saw a speedup of 273x in our Vehicle Tracking report, taking a query from 10 minutes to 2.2 seconds. That adds value to our business; our end users are going to be ecstatic!”- Ruel Gonzalez - Information Services
“Compared to our current production system, DB2 10.5 with BLU Acceleration is running 106x faster for our Admissions and Enrollment workloads. We had one query that we would often cancel if it didn’t finish in 30 minutes. Now it runs in 56 seconds every time. 32x faster, predictable response time, no tuning…what more could we ask for?” - Brenda Boshoff, Sr. DBA
#IDUG
Click to edit Master title styleBLUAccelerationNewEraofSmart
“My largest row-organized, adaptive compressed table gave me 3.2x storage savings. However, converting this row-organized uncompressed table to a column-organized table in DB2 10.5 delivered a massive 15.4x savings!”- Iqbal Goralwalla, Head of DB2 Managed Services, Triton
31.5x storage savings
(97% less storage required)
13.5x faster load time
#IDUG
Click to edit Master title styleBLUAccelerationNewEraofSmart
“One of the things I like about these BLU Acceleration column-organized tables is that there's nothing for me to do with them. I just load them up, DB2 does its magic, and I'm done.” - Andrew Juarez, Lead SAP Basis and DBA
#IDUG
Click to edit Master title styleBLUAccelerationNewEraofSmart
“We were very impressed with the performance and simplicity of BLU. We found that some queries achieved an almost 100x speed up with literally no tuning!”- Lennart Henäng, IT Architect, Handelsbanken
#IDUG
Click to edit Master title styleBLUAccelerationNewEraofSmart
18
“DB2 BLU Acceleration is all it says it is. Simplicity at its best, the “Load and Go!” tagline is all true. We didn’t have to change any of our SQL, it was very simple to setup, and extremely easy to use. Not only did we get amazing performance gains and storage savings, but this was achieved without extra effort on our part.” - Ruel Gonzalez - Information Services
19
#IDUG
Agenda• BLU Concepts Recap
• The key design points behind BLU acceleration
• BLU Best Practices including selected internals information to help you understand “why”• Workload Selection• Operating System and Hardware Configuration• Software (DB2) Configuration• Table Creation• Table Population• Running your Workload & Managing your System
• Summary
Best practices are denoted with this icon throughout the presentation
20
#IDUG
Clients
CPUDB2 Server
CPU
CPU
CPU
CPU
CPU
CPU
CPU
Parallel Subagents
CoordinatorAgent
Log Buffers Buffer Pools
Logger Prefetchers Page Cleaners
Package Cache
LockList
1. SQL statement sent overnetwork to coordinator agent
2. SQL statement compiled andoptimized
3. Resulting access plan stored in shared access plan cache
4. Access plan execution begins; subagents kicked off to perform parallel scan
5. Periodic async prefetch requests sent to prefetchers (aka ‘io servers’)
6. Prefetchers asynchronouslydrive parallel I/O againsttablespace containers to bringin extents from disk intoseparate pages in bufferpool
7. Entire rows read out of buffer pool and decompressed. C2 values compared to ‘5’. Matching C1 values added to result set.
8. Result set sent back to client.
23
6
7
1 8
SELECT C1 FROM T1 WHERE C2=‘5’
77
5
4
extent extent
Review : Row Organized Query Processing
21
#IDUG
Clients
CPUDB2 Server
CPU
CPU
CPU
CPU
CPU
CPU
CPU
Parallel Subagents
CoordinatorAgent
Log Buffers Buffer Pools
Logger Prefetchers Page Cleaners
Package Cache
LockList
1. Access plan execution begins; subagents kicked off to perform parallel scan of column C2
2. Periodic prefetch requests sent to prefetchers (aka ‘io servers’)
3. Prefetchers asynchronouslydrive parallel I/O againsttablespace containers to bringrequested pages containing only C2 values from disk intoseparate pages in bufferpool (*)
4. Batch of C2 values is read out of buffer pool and compared to ‘5’ (data remains compressed -“active” compression), forming a batch of qualifying tuple sequence numbers (TSNs)
5. The C1 values corresponding to the batch of qualifying TSNs are prefetched
6. Qualifying C1 values added to result set,…
7. and sent to client
5
4
7
44
2
1
… Initial steps skipped …
(*) Synopsis filtering not shown here; More on this later.
3
6 6
ColumnarStorage
Each extent containsvalues for 1 column
BLU
SELECT C1 FROM T1 WHERE C2=‘5’Column Organized Query Processing
C1C3 C4
C2 C1C3 C4
C2 C1C3 C4
C2
22
#IDUG
Clients
CPUDB2 Server
CPU
CPU
CPU
CPU
CPU
CPU
CPU
Parallel Subagents
CoordinatorAgent
Log Buffers Buffer Pools
Logger Prefetchers Page Cleaners
Package Cache
LockList
1. SQL statement sent overnetwork to coordinator agent
2. SQL statement compiled andoptimized
3. Resulting access plan stored in shared access plan cache
4. Access plan execution begins
5. Agent searches for a page in the table large enough for row
6. Page found, and read into buffer pool
7. Agent acquires X lock on row
8. Agent writes log record to log buffer in memory (describes how to redo and undo the upcoming insert)
9. Agent inserts record to page in buffer pool (“dirties” page)
10. Success sent to client
2 43 5
9
D
1 10
INSERT INTO T1 (…)
6
D
8 7
Review : Row Organized INSERT Processing
23
#IDUG
Clients
CPUDB2 Server
CPU
CPU
CPU
CPU
CPU
CPU
CPU
Parallel Subagents
CoordinatorAgent
Log Buffers Buffer Pools
Logger Prefetchers Page Cleaners
Package Cache
LockList
1. SQL statement sent overnetwork to coordinator agent
2. SQL statement compiled andoptimized
3. Resulting access plan stored in shared access plan cache
4. Access plan execution begins
5. For each column, agent finds a page large enough for the column value (BLU uses an append approach for this)
6. Pages found, and read into buffer pool
7. Agent acquires X lock on logical row (aka TSN)
8. For each column, agent writes log records to log buffer in memory (describes how to redo and undo per column)
9. Agent inserts column tuples to pages in buffer pool (“dirties” pages)
10. Success sent to client
2 43 5
9
1 10
68 7
C1C3 C4
C2
ColumnarStorage
Each extent containsvalues for 1 column (*) Synopsis maintenance not shown here; More on this later.
DD
DD
BLUBLUReview : Column Organized INSERT Processing
INSERT INTO T1 (…)
24
#IDUG
Target Analytical or Warehousing Workloads
Probably
• Queries touch a subset of the columns in a table
• Grouping, aggregation, range scans, joins
• Star or snowflake schema
• Analytic workloads, data warehouses, data marts
Probably Not
• Queries touch most or all columns in a table
• Very few rows inserted, updated or deleted per transaction
• OLTP workloads
Will my workload benefit from BLU ?
25
#IDUG
Use Data Server Manager’s Table Organization Advisor to Help Identify Which TablesShould be Columnar
26
#IDUG
• MON_GET_TABLE function includes new metrics to help assess columns accessed per query
select section_exec_with_col_references as num_queries,(num_columns_referenced / nullif(section_exec_with_col_references, 0)) as avg_cols_ref_per_query
from table(mon_get_table(‘MYSCHEMA’,’MYTABLE’,-1))
655 2 If few columns are accessed relative to full set in table this may be
a good candidate to convert to a columnar
table
Also Helpful : DB2’s Built-in Monitoring
27
#IDUG
Agenda• BLU Concepts Recap
• The key design points behind BLU acceleration
• BLU Best Practices including selected internals information to help you understand “why”• Workload Selection• Operating System and Hardware Configuration• Software (DB2) Configuration• Table Creation• Table Population• Running your Workload & Managing your System
• Summary
Best practices are denoted with this icon throughout the presentation
28
#IDUG
Use a Recommended Operating System and Processor (V11.1)
Operating System supporting BLU (*)
Minimum Version Requirements
Recommended OS Versions
Processor Recommendations
AIX AIX 7.1 TL3 SP5 or AIX 7.2 AIX 7.2 or higher POWER8
Use (Virtual) Machines with …16 Cores for every 3TB Raw (uncompressed) Data16 GB of Main Memory per Core
• BLU loves multi-core machines !• BLU’s deep multi-core parallelism and SIMD optimizations are designed to
scale virtually linearly as the number of cores grows
• General purpose best practices for #cores and memory• To support general warehousing workload demands, we recommend a best
practice of 16 cores for every 3TB of raw (i.e. uncompressed) data• To ensure a balance between cores and memory, we recommend 16GB of
main memory for every core
• If you have detailed knowledge of your workload and access patterns, you can fine-tune your sizing • See later charts for an example, and/or contact your IBM technical sales
representative
These numbers apply to single node and DPF environments. On DPF, they apply to each Database (DB) Partition, based on the amount of raw data in the DB Partition.
30
#IDUG
BLU on DPF : Data Distribution• Just as with row organized tables …
§ Rows are distributed across DB partitionsvia a distribution key
§ A distribution key is 1 or more columnsin the table
§ Each table defines its own distribution key§ Similar distributed join strategies, eg.
§ Collocated. Joined tables have matching distributionkeys and are joined on those columns. Executedlocally on each DB Partition without sending data.
§ Directed. Join column(s) of one table hashed andsent to corresponding DBPartition of the other table.
§ Broadcast. All rows of one table sent to all DBPartitionsof the other table.,
Rows
Hash Function
Distribution Key
DB Partition
Columnar Storage
DB Partition
Columnar Storage
DB Partition
Columnar Storage
31
#IDUG
• Favour multiple database partitions (DPF) if …• Utility parallelism is important
• Utilities such as BACKUP, RESTORE and REORG can run in parallel across database partitions
• This can significantly reduce the time windows needed for maintenance/recovery operations• ETL/ELT parallelism is important
• INSERT, UPDATE, DELETE, INGEST and LOAD operations can run in parallel across database partitions
• This can significantly reduce the time windows needed for ETL/ELT and other batch jobs• You are using an OS image that spans multiple nodes in a NUMA machine
• NUMA machine = a machine that exhibits significant Non-Uniform-Memory-Access characteristics
• Favour a single database partition (non-DPF) if …• The overall simplicity of a non-clustered environment is desired• DB2’s HADR physical replication is desired
>20TB ? Use Multiple Database (DB) Partitions (i.e. DPF) <10TB ? Use a Single DB Partition (i.e. non DPF)In doubt ? Use the Following Considerations
32
#IDUG
32 cores512 GB6TB
600TB100x
32 cores512 GB
Some Examples
96 cores1.5 TB18TB
With challengingbackup & ETL windows
6 Logical DB Partitions
>20TB ? Use Multiple Database (DB) Partitions (i.e. DPF) <10TB ? Use a Single DB Partition (i.e. non DPF)In doubt ? Use the Following Considerations
33
#IDUG
On DPF use 1 Database Partition per Socket(using Multiple Logical DB Partitions as necessary)
There are 4 NUMA nodes on the system. Update the db2nodes.cfg file to bind each DB Partition to a NUMA node on each system. We assume a homogeneouscluster here.
• BLU Best Practices including selected internals information to help you understand “why”• Workload Selection• Hardware Configuration• Software (DB2) Configuration• Table Creation• Table Population• Running your Workload & Managing your System
• Summary
Best practices are denoted with this icon throughout the presentation
37
#IDUG
The DB2_WORKLOAD=ANALYTICS Setting• A single registry variable setting that automatically optimizes DB2 for BLU
Acceleration and analytical workloads• Takes effect when the database is created (by default), or when autoconfigure is run on a database
• The following table highlights how the DB2_WORKLOAD=ANALYTICS setting changes autoconfigure’s actions
• autoconfigure will set many other parameters beyond those listed here (eg. bufferpool, and other memory heaps).
• autoconfigure is ‘DPF aware’. It will optimize for DPF, including cases where multiple logical DB Partitions are on the same machine.
• It calculates memory settings on the machine with the most dense logical partition configuration & propagates those values
Database Memory Configuration
CATALOGCACHE_SZ, SORTHEAP, and SHEAPTHRES_SHR are set to values that are higher than the default and optimized for the hardware that you are using. UTILITY_HEAP = AUTO, 1000000.
Intraquery Parallelism Enabled for any workload (including SYSDEFAULTUSERWORKLOAD) that specifies MAXIMUM DEGREE DEFAULT, even if the database manager configuration parameter INTRA_PARALLEL = OFF.
Automatic Workload Management
A default concurrency threshold on the SYSDEFAULTMANAGEDSUBCLASS service subclass is enabled to ensure maximum efficiency and utilization of the server. More on this later.
Automatic Space Reclamation
Sets AUTO_MAINT = ON and AUTO_REORG = ON, and space reclamation is performed for column-organized tables by default. More on this later.
38
#IDUG
• Step 1: Set the DB2_WORKLOAD registry variable for optimal configuration defaults
db2set DB2_WORKLOAD=ANALYTICSdb2start
• Step 2: Create your databasedb2 create db mydb
autoconfigure using mem_percent 80
apply db and dbm
In past releases, create database ran autoconfigure with a default of 25%
of the (virtual) machine’s memory devoted to the database. Use mem_percent to override this.
Here we assume the database created is the only database in the DB2 instance. This is the best practice for production databases. ( Use ‘apply
db only’ if there are other databases in the instance. )
Scenario 1 : New Analytic Database
Set DB2_WORKLOADS=ANALYTICS before Creating Your Database
39
#IDUG
In this example we assume there are other databases in
the instance, so we used a smaller memory setting and
applied autoconfigure’s actions to this database only.
• Step 1: Set the DB2_WORKLOAD registry variable for optimal configuration defaults
db2set DB2_WORKLOAD=ANALYTICSdb2start
• Step 2: Run autoconfiguredb2 connect to mydbdb2 autoconfigure
using mem_percent 50apply db only
Scenario 2 : Existing Analytic Database
In a mixed analytical/transactional workload, consider selectively applying autoconfigure’s recommendations by using the ‘apply none’ option, reviewing autoconfigure’s recommendations, and then individually apply those that make sense in your environment.
Set DB2_WORKLOADS=ANALYTICS before Creating Your Database
40
#IDUG
Agenda• BLU Concepts Recap
• The key design points behind BLU acceleration
• BLU Best Practices including, where pertinent,internals information to help you understand “why”• Workload Selection• Hardware Configuration• Software (DB2) Configuration• Table Creation• Table Population• Running your Workload & Managing your System
• Summary
Best practices are denoted with this icon throughout the presentation
41
#IDUG
41
§ With BLU, each page and extent contains values for a single columnTSN
0123456789…
TSN = Tuple Sequence Number
Mike HernandezChou Zhang
Carol WhiteheadWhitney SamuelsErnesto FryRick WashingtonPamela Funk
Sam GerstnerSusan NakagawaJohn Piconne
Mike HernandezChou Zhang
Carol WhiteheadWhitney SamuelsErnesto FryRick WashingtonPamela Funk
Sam GerstnerSusan NakagawaJohn Piconne
4322
6180357829
553247
4322
6180357829
553247
404 EscuelaSt.300 Grand Ave
1114 Apple Lane14 California Blvd.8883 Longhorn Dr.5661 Bloom St.166 Elk Road #47
911 Elm St.455 N. 1st St.18 Main Street
404 EscuelaSt.300 Grand Ave
1114 Apple Lane14 California Blvd.8883 Longhorn Dr.5661 Bloom St.166 Elk Road #47
911 Elm St.455 N. 1st St.18 Main Street
CACA
CACAAZNCOR
OHCAMA
CACA
CACAAZNCOR
OHCAMA
9003390047
9501491117857012760597075
436019511301111
9003390047
9501491117857012760597075
436019511301111
Los AngelesLos Angeles
CupertinoPasadenaTucsonRaleigh Beaverton
ToledoSan JoseSpringfield
Los AngelesLos Angeles
CupertinoPasadenaTucsonRaleigh Beaverton
ToledoSan JoseSpringfield
Page
Extent(assume
extentsize=2)
§ TSNs (a logical Row ID) are used to stitch together column values that belong in the same row during query processing
§ eg. SELECT zipcode FROM t WHERE name=“Mike Hernandez”§ an internal ‘page map index’ allows DB2 to quickly find the page containing the zipcode for TSN 4
§ Typically, column-organized tables use significantly less space than row-organized tables§ Unusual case: column-organized tables with many columns and very few rows can be larger than row-organized
tables as each column requires at least 1 extent
A Deeper Look at Internals : Column Storage§ With traditional tables, each page contains entire rows
42
#IDUG
BLU Tables are Created in Tablespaces(just like other tables)CREATE STORAGE SG1 ON ‘path1’, ‘path2’, ‘path3’CREATE TABLESPACE TS1 USING STOGROUP SG1CREATE TABLE SALES
(SALESKEY BIGINT not null,SALESPERSONKEY INT not null, PRODUCTKEY INT not null, PERIODKEY BIGINT not null,…)
CREATE INDEX i1 …CREATE INDEX i2 …CREATE INDEX i3 …
uc1
extentextentextentextentextentextentextent
i1
i3i2
Index (inx)object
Table (dat) object(each extent containpages of rows)
Tablespace
Roworganized
table
CREATE TABLE t1(c1 int, c2 int)
ALTER TABLE … ADD CONSTRAINT uc1UNIQUE (c2)
c1 datac1 nullability
c2 datac2 nullability
c1 data nullabilityc2 data
c2 nullability
uc1
Index(inx)object
Table (dat)object
(meta data & compression dictionary)
Column (cde) object
(each extent contains pages of data for 1 column)
Columnorganized
table
Synopsis(records the
range of column values existing in
different TSN ranges of the
table)
Table and Index Definition & Creation : Row vs BLU
Tablespace
44
#IDUG
Define columns as NOT NULL if possible
• Removes the need to store the extra nullability information for each column• Reduces storage and memory consumption• Improves query processing efficiency
CREATE TABLE t1(c1 int, c2 int)
ALTER TABLE … ADD CONSTRAINT uc1UNIQUE (c2)
c1 datac1 nullability
c2 datac2 nullability
c1 data nullabilityc2 data
c2 nullability
uc1
Columnorganized
table
Tablespace
CREATE TABLE t1(c1 int not null,c2 int not null)
ALTER TABLE … ADD CONSTRAINT uc1UNIQUE (c2)
c1 datac2 datac1 datac2 data
uc1
Columnorganized
table
Tablespace
45
#IDUG
Use Informational Unique Constraints (instead of Enforced) if Uniqueness is Enforced Outside DB2
• If you have rigorous ETL or other procedures that ensure uniqueness, define your unique constraints as NOT ENFORCED (aka “informational”)
• Avoids creation of internal B-tree to enforce uniqueness• Reduced storage consumption• More efficient table maintenance
• Informational constraints help DB2’s query optimizer produce optimal plans
CREATE TABLE t1(c1 int not null,c2 int not null)
ALTER TABLE … ADD CONSTRAINT uc1UNIQUE (c2)
c1 datac2 datac1 datac2 data
uc1
Columnorganized
table
Tablespace
CREATE TABLE t1(c1 int not null,c2 int not null)
ALTER TABLE … ADD CONSTRAINT uc1UNIQUE NOT ENFORCED (c2)
c1 datac2 datac1 datac2 data
Columnorganized
table
Tablespace
46
#IDUG
SSD
Place Performance Critical Tables (and their Indexes) in Dedicated Tablespaces
• Allows you to optionally use dedicated bufferpools and storage groups (and therefore storage) for critical fact table data and any associated indexes needed to maintain enforced unique constraints• Not always necessary, but can come in handy if needed for fine tuning later
• Can use same storage group and bufferpools to begin with• Allows the table to be recovered independently from other tables
TS1
SG1 BP1
TS2
SG2 BP2
CREATE TABLE SALES( c1 BIGINT not null,
c2 INT not null,c3 BIGINT not null, …)
IN TS1 INDEXES IN TS2
TSDEF
SGDEF BPDEF
SALESSALES
indexes
OtherTables
SSDSATA
47
#IDUG
Agenda• BLU Concepts Recap
• The key design points behind BLU acceleration
• BLU Best Practices including, where pertinent,internals information to help you understand “why”• Workload Selection• Hardware Configuration• Software (DB2) Configuration• Table Creation• Table Population• Running your Workload & Managing your System
• Summary
Best practices are denoted with this icon throughout the presentation
48
#IDUG
BLU Compression Dictionaries
• Column-level dictionaries: one per column• Dictionary created during
• LOAD REPLACE, LOAD INSERT into empty table via new LOAD Analyze Phase• SQL INSERT, INGEST via Automatic Dictionary Creation (ADC)
• Page-level dictionaries: may also be created• If space savings outweighs cost of storing page-level dictionaries• Exploit local data clustering at page level to further compress data
Data Page
Column 1 Data
Page DictionaryColumn N Compression Dictionary
Column 1 Compression Dictionary
“dat” object
Data Page
Column 1 Data
Page DictionaryData Page
Column 1 Data
Page Dictionary
Data Page
Column 1 Data
Page DictionaryData Page
Column 1 Data
Page DictionaryData Page
Column N Data
Page Dictionary
“cde” object
… …
Table 1
49
#IDUG
BLU on DPF Architecture : Common Compression Encoding
CPUsCPUsCPUsCPUs
BLU Acceleration Dynamic In-Memory Processing
BLU BLU BLU BLU
DB2 BLU with 4 data partitions
A B C D
MyTablePartition 1
MyTablePartition 2
MyTablePartition 3
MyTablePartition 4
A B C D A B C D A B C D
• Just as in single node BLU …• Multiple compression techniques are
used, e.g.• Dictionary • Huffman• Prefix• Offset
• Each column uses it’s own compression dictionary
• New with BLU on DPF …• Each column uses the same
compression encoding on all database partitions
• Allows data communication in compressed form with no additional processing costs
But, a given column X will always have the same encoding
in all partitions
50
#IDUG
Build Histogram
CopyDictionary
Build Histogram
Copy Dictionary
BLU on DPF : Automatic Global Dictionary Creation (1/2)
Build Histogram
Build Dictionary
Co-ordinatorDB Partition
• Initial data is stored in table in uncompressed form• Initial data is used to create common table dictionary
• Each data member creates local histogram• A “Build” member (automatically selected) collects local histograms, creates global
table dictionary and sends it to other data nodes• “Z” lock no longer required; No longer requires committed data for dictionary build
• ( LOAD also creates a global dictionary, but does not use ADC )
INGEST,INSERT,IMPORT
51
#IDUG
DictionaryDictionaryCo-ordinatorDB Partition
• Remainder of data is stored in the table in compressed form• Online queries start using dictionary when available on all
members• Any uncompressed data read is converted to compressed
form during query processing
Dictionary
INGEST,INSERT,IMPORT
BLU on DPF : Automatic Global Dictionary Creation (2/2)
52
#IDUG
For Best Compression, Initially PopulateTables using LOAD and Representative Data
• Options for table population include LOAD, INGEST, INSERT, and IMPORT
• The initial table population creates the column level compression dictionaries
• LOAD’s analyze phase typically produces the highest quality column dictionaries• When the column level dictionaries are created with automatic dictionary creation (ADC)
via INSERT (or INGEST, IMPORT), the amount of data that can be practically analyzed may be limited
• Performing the initial LOAD with a representative set of data allows LOAD’s analyze phase to create high quality column dictionaries that maintains effective compression ratios as additional data is loaded
• Ensuring the best compression, not only saves storage, and memory, but also leads to the significant query processing efficiency improvements
Prior to 10.5 Cancun (aka FP4), LOAD’s analyze phase considered all input data. This could result in a lengthy analyze phase. In Cancun, LOAD automatically samples the input data to limit the analyze phase duration. If you are running on FP3 or earlier, consider:- first running LOAD … RESETDICTIONARYONLY with a representative data sample- then run LOAD … INSERT with a remaining data, and then run RUNSTATS
53
#IDUG
Ensure at least 1,000,000 pages are available in the Utility Heap for LOAD’s Analyze Phase; Use the new AUTOMATIC setting.
• LOAD’s analyze phase uses memory from the utility heap to build the histograms and other data structures it needs to create the column level compression dictionaries
• If the analyze phase runs short on memory, compression ratios can suffer
• The new AUTOMATIC setting allows the utility heap to grow automatically if it becomes exhausted and there is unused memory available elsewhere
The 1,000,000 page minimum mentioned above is a Rule of Thumb. It’s possible that, for some data, a larger utility heap would be helpful. See subsequent pages for information on evaluating the effectiveness of the column compression dictionaries.
These Utility Heap settings occur automatically if you tune DB2with DB2_WORKLOAD=ANALYTICS as described earlier !
54
#IDUG
S_DATE QTY ...
2005-03-01 176 ...
2005-03-02 85 ...
2005-03-02 267
2005-03-04 231
...
...
2006-10-17 476
User table: SALES_COL
SYN130330165216275152_SALES_COL
TSNMIN TSNMAX S_DATEMIN S_DATEMAX ...
0 1023 2005-03-01 2006-10-17 ...
1024 2047 2006-08-25 2007-09-15 ...
...
TSN = Tuple Sequence Number
0
1023
1024
2047
§ Meta-data that describes which ranges of values exist in which parts of the user table
§ On DPF distributed into DB Partitions like the base table
§ Enables DB2 to skip portions of a table when scanning data to answer a query
§ Benefits from data clustering, loading pre-sorted data§ Tiny : typically ~ 0.2% the size of the base table§ Transparently and automatically maintained by DB2
0
1023
Synopsis Table
55
#IDUG
Consider Sorting Data on Selected Columns Prior to Table Population
• If your ELT/ETL procedures allow, consider presorting fact table data on the columns that appear frequently in predicates that filter the fact table, or columns that are used to join with dimension tables
• This type of presorting can• Improve the effectiveness of data skipping via the synopsis tables• Improve compression ratios
• Optional
• Preserving natural ordering by date/time can also help data skipping
56
#IDUG
Use Catalog Statistics to Evaluate Effectiveness of Column Compression Dictionaries
§ The PCTENCODED statistic indicates the percentage of the column values that have been encoded (compressed) by the column-level dictionary § Measures the percentage of values compressed (NOT compression ratio)§ On DPF, reflects a single DBPartition (similarly to RUNSTATS)
§ If you have many low values, review previous best practices regarding initial population
§ Consider data characteristics when evaluating, eg:§ Completely random data may not compress well (eg. random string data)§ Note : Unique keys are not necessarily random ! BLU has effective compression
techniques for unique data. A low PCTENCODED for a unique key should not be ignored.
SYSCAT.COLUMNS
The existing PCTPAGESAVED statistic from SYSCAT.SYSTABLES also applies to columnar tables. It estimates the percentage of pages saved by BLU’s compression. RUNSTATs collects PCTPAGESAVED by estimating the number of pages needed to store the table in uncompressed row organization.
57
#IDUG
Col A Col B Col C Col D
A Deeper Look at INSERT Processing1. Fix ‘append data page’ for each
column into the buffer pool. (If there is no free space for a column, first pre-allocate an extent’s worth of pages.) Unfix pages.
2. As rows are inserted they are split into column values, compressed, and placed in temporary buffers (outside the bufferpool).
3. When a temporary buffer fills, or on COMMIT Refix target pages. Write 1 log record for an entire buffer of values for a specific column, and,…
4. Copy values from temporary buffers to the buffer pool pages. Unfix pages.
When INSERTing, Commit in BatchesUse a Batch Size as large as( 1000 * #DB Partitions ) Rows if Possible
• Results in• Significant reduction in log space consumption• Smaller and more efficient synopsis tables
• Updates to the synopsis occur on commit (*)
• Improved performance
• Applies to any INSERT-based population method• INSERT, INGEST, IMPORT
59
#IDUG
Agenda• BLU Concepts Recap
• The key design points behind BLU acceleration
• BLU Best Practices including, where pertinent,internals information to help you understand “why”• Workload Selection• Hardware Configuration• Software (DB2) Configuration• Table Creation• Table Population• Running your Workload & Managing your System
• Summary
Best practices are denoted with this icon throughout the presentation
60
#IDUG
60
• Traditional ‘reorg’ not needed with BLU tables !
• No concept of ‘clustering’• BLU uses a ‘circular append’ approach for inserts
which means consecutive inserts use the same pages
• Deleted space can be easily reclaimed via REORG TABLE t1 RECLAIM EXTENTS
• Freed extents can exist anywhere in the column object (uses efficient “sparse table” technique used with MDC and ITC tables)
• The storage can be subsequently reused by any table in the tablespace
• Done online while work continues
• Done automatically via DB2’s automatic table maintenance when DB2_WORKLOAD=ANALYTICS
• Space is freed online while work continues
• Regular space management can result in increased performance of RUNSTATS and some queries
Column3Column1 Column2
2012 2012
2012
2012
DELETE * FROM MyTable WHERE Year = 2012
These extents hold only deleted data
Storage extent 2013 2013 2013
2013
BLU Automatic Space Reclamation
61
#IDUG
0) DB2_WORKLOAD=ANALYTICS
1) CREATE db, tablespaces
2) CREATE TABLE t1 …PRIMARY KEY (c1)ORGANIZE BY COLUMNIN TS1 INDEX IN TS2
3) LOAD FROM myfile INTO t1 …
4) SELECT c2,c3 … FROM t1 WHERE …
5) INSERT INTO t1 …
6) DELETE FROM t1 WHERE …
7) Automatic table maintenancereturns space to tablespace
Index(inx)object
Table (‘dat’)object
(meta data & compression dictionary)
Column (‘col’) object
(each extent contains pages of data for 1 column)
Synopsis(records the
range of column values existing in different regions
of the table)
TS2TS1
automatically maintains synopsis & collects table & index statistics
automatically maintains synopsis
c1c2c3c4c1c2c3c4
<empty><empty><empty><empty>
An Illustrative Scenario
62
#IDUG
Take Advantage of DB2’s Automatic Space Reclamation & Automatic Statistics Collection
Occurs by default with DB2_WORKLOAD=ANALYTICS
• Simply relying on DB2’s automatic tablemaintenance is typically all that’s needed !
• On occasion, explicit maintenance may be useful, eg.• “I’m running low on space, and need to reclaim some space now”
• Check for tables high reclaimable spaceSELECT SUBSTR(t,1,20) AS t, DBPARTITIONNUM, RECLAIMABLE_SPACEFROM SYSIBMADM.ADMINTABINFOWHERE t LIKE 'mytable%'ORDER BY tabname WITH UR
• Run reorg on selected tablesREORG TABLE t1 RECLAIM EXTENTS ON DBPARTITIONNUM N
• Done online while work continues• “I’ve just added significant data to a table, and am not using automatic statement statistics”
• Run RUNSTATS…WITH DISTRIBUTION AND SAMPLED DETAILED INDEXES ALL
All other maintenance activities that involve column-organized tables, such as backup and restore, are identical to those that involve row-organized tables.
Automatic maintenance (AUTO_MAINT) = ONAutomatic table maintenance (AUTO_TBL_MAINT) = ON
Automatic Concurrency Management• Every additional query naturally consumes more memory, locks, CPU & memory bandwidth• In some databases, more queries can lead to contention and performance degradation • DB2 10.5 avoids this by automatically optimizing the level of concurrency
• DB2 10.5 allows an arbitrarily high number of concurrent queries to be submitted, but limits the number that consume resources at any point in time
• Lightweight queries that instant response, bypass this control
• Enabled automatically when DB2_WORKLOAD=ANALYTICS
...
Applications & UsersUp to tens of thousands of
SQL queries at once
DB2 DBMS kernel
SQL Queries
Moderate number of queries consume resources
Automatically determined based on available machine CPU
resources
DB Partitions (Data)
Moderate number of queries consume
resources
Moderate number of queries consume
resources
MPP Partition (Coordinator)
64
#IDUG
SYSDEFAULTUSERCLASS
SYSDEFAULT-SUBCLASS
Automatic Concurrency Management : Details
SYSDEFAULT-USERWORKLOAD
else
SYSDEFAULT-MANAGEDSUBCLASS
SYSDEFAULT-CONCURRENT
limitconcurrencytoN,queueexcess
querycost>X?
SYSDEFAULT-USERWAS
NewWLMobjectsshowningreen
Threshold
WorkClassSet
• New objects created in all 10.5 and 11 databases• SYSDEFAULTMANAGEDSUBCLASS - default subclass for managed queries• SYSDEFAULTUSERWAS - default work class set to map expensive queries (cost > X) to the above subclass• SYSDEFAULTCONCURRENT - default concurrency threshold to limit concurrently executing managed queries to N
• Default concurrency threshold enabled on database creation when DB2_WORKLOAD=ANALYTICS
• X and N are determined automatically by DB2• Based on available CPU resources• On DPF, this is a global system concurrency limit managed at the
catalog DBP
65
#IDUG
Take Advantage of DB2’s Automatic Workload Concurrency Management
Occurs by default with DB2_WORKLOAD=ANALYTICS
• Simply relying on DB2’s automatic workload concurrency management is typically sufficient
• On occasion, explicit maintenance may be useful, eg.• “I’ve just added more memory and CPUs to my machine, so the concurrency limit DB2
automatically calculated when I created the database, may no longer be optimal”1. Run autoconfigure without applying recommendations
AUTOCONFIGURE APPLY NONE
2. Review Recommended value for ““Threshold SYSDEFAULTCONCURRENT Maxvalue”
3. Alter the thresholdALTER THRESHOLD SYSDEFAULTCONCURRENTWHEN CONCURRENTDBCOORDACTIVITIES > 18AND QUEUEDACTIVITIES UNBOUNDED STOP EXECUTION
DESCRIPTION CURRENT RECOMMENDED--------------------------------------------------------------THRESHOLD SYSDEFAULTCONCURRENT MAXVALUE 16 18
If necessary, you can define additional DB2 workload management controls to meet the service-level agreement (SLA) requirements of the workload.
66
#IDUG
Monitoring and Fine Tuning• If you followed the previous best practices, you’re off to a great start !• To keep your installation running smoothly here are some important hints &
tips to be aware of :
• Sort memory. The Sortheap is used in BLU not only for sorting, but also for important run-time operations such as hash join and groupby
• You can access several monitoring elements to understand if your Sortheap configuration is healthy. Here are some of the key parameters:
Group by ACTIVE_HASH_GRPBYS, TOTAL_HASH_GRPBYS, HASH_GRPBY_OVERFLOWS, POST_THRESHOLD_HASH_GRPBYS
1/2
67
#IDUG
Monitoring and Fine Tuning• Sort Memory (continued)
• An “Overflow” indicates a query performed a run-time operation such as sort, hash join, groupby, and required more memory than what was available in its sort heap
• “Post threshold” operations indicate operations that did not receive all requested sortheap memory, due to consumption by other concurrent operations
• Take away : If you see significant overflows, or post threshold, consider :• Increasing SORTHEAP• Increasing the ratio of SHEAPTHRES_SHR to SORTHEAP
• Other relevant monitoring elements added for columnar tables include :
2/2
Category ElementBufferpool Health
New versions of existing monitor elements (for example, COL_HIT_RATIO_PERCENT and COL_PHYSICAL_READS) were added to enable you to monitor buffer pool usage by column-organized tables.
Prefetcher Activity
New versions of existing monitor elements (for example, POOL_QUEUED_ASYNC_COL_PAGES and SKIPPED_PREFETCH_COL_P_READS) were added to enable you to monitor prefetching for column-organized tables.
68
#IDUG
SummaryTarget Analytical or Warehousing Workloads Use Data Server Manager to Help Identify Which Tables Should be Columnar Use a Recommended Operating System and Processor Use a (Virtual) Machine with at Least 8-16 Cores and …
16 Cores for every 3TB Raw (uncompressed) Data16 GB of Main Memory per Core
Set DB2_WORKLOADS=ANALYTICS Before Creating Your Database Define columns as NOT NULL if possibleUse Informational Unique Constraints (instead of Enforced) if Uniqueness is Enforced Outside DB2 Place Performance Critical Tables (and their Indexes) in Dedicated TablespacesFor Best Compression, Initially Populate Tables using LOAD and Representative DataEnsure a least 1,000,000 pages are available in the Utility Heap for LOAD’s Analyze Phase; Use the new AUTOMATIC setting. Consider Sorting Data on Selected Columns Prior to Table Population
Use Catalog Statistics to Evaluate Effectiveness of Column Compression Dictionaries
When INSERTing, Commit in Batches of >1000 Rows if Possible
Take Advantage of DB2’s Automatic Workload Concurrency Management Take Advantage of DB2’s Automatic Space Reclamation & Automatic Statistics Collection
On DPF use 1 Database Partition per Socket(using Multiple Logical DB Partitions as necessary)
>20TB ? Use Multiple Database (DB) Partitions (i.e. DPF) <10TB ? Use a Single DB Partition (i.e. non DPF)In doubt ? Use the Following Considerations
On Non-Uniform-Memory-Access (NUMA) MachinesBind Logical DB Partitions to NUMA Nodes