Les Conférences Groupe des Utilisateurs SQL Server Juin 2013 – SQL Server in-memory Alexandre Chemla – Masao Frédéric Pichaut – Microsoft.

Les Conférences

Groupe des Utilisateurs SQL Server

Juin 2013 – SQL Server in-memoryAlexandre Chemla – MasaoFrédéric Pichaut – Microsoft

Frédéric PichautSR Escalation EngineerMicrosoft France

SQL Server 14In-Memory “Hekaton”

24 Juin 2013

Agenda• What is “Hekaton”• Integrated in SQL Server• Memory consideration• Storage• New DMV’s• AMR Tool• Bonus…

Memory optimized table and index structures

No buffer Pool

Native compilation of business logic in stored procedures

“Hekaton” is fully integrated into SQL Server

Latch- and lock-free data structures

What is HekatonProject “Hekaton” adds in-memory technology to boost performance of OLTP workloads in SQL “14”

Memory-optimized Table Filegroup Data Filegroup

SQL Server.exe

Hekaton Engine: Memory_optimized Tables &

Indexes

TDS Handler and Session Management

Hekaton Integration and Application Migration

Natively Compiled SPs and Schema

Buffer Pool for Tables & Indexes

Proc/Plan cache for ad-hoc T-SQL and SPs

Client App

Transaction Log

Query Interop

Non-durable Table T1 T4T3T2

T1 T4T3T2

T1 T4T3T2

T1 T4T3T2

Tables

Indexes

Interpreter for TSQL, query plans, expressions

T1 T4T3T2

T1 T4T3T2

Checkpoint & Recovery

Access Methods

Parser, Catalog, Algebrize

r, Optimize

r

Hekaton Compiler

Hekaton Compone

nt

KeyExisting

SQL Compone

nt

Generated .dll

Memory-optimized Table

FilegroupData Filegroup

SQL Server.exe

Hekaton Engine for Memory_optimized Tables & Indexes

TDS Handler and Session Management

Performance Gains

Natively Compiled SPs and Schema

Buffer Pool for Tables & Indexes

Proc/Plan cache for ad-hoc T-SQL and

SPs

Client App

Transaction Log

Query Interop

Interpreter for TSQL, query plans, expressions

Access Methods

Parser, Catalog, Algebrize

r, Optimize

r

Hekaton

Compiler

10-30x more efficient

Reduced log bandwidth &

contention. Log latency remains

Checkpoints are background

sequential IO

No improvements in communication

stack, parameter passing, result set

generation Hekaton Compone

nt

KeyExisting

SQL Compone

nt

Generated .dll

Operation Factor faster (slower) than regular SQL

Comments

Interop Native

Select count(*)1 (2.5) = No clustered index scan in Hekaton

Hash Join1 (1.3) N/A Uses index scan

Nested-loop Join1 4.0 N/A Probes into hash index

Single-row selects1 1.3 40 SP doing selects in loop

Single-row selects1 1.2 17 Native compiled SP calls SQL’s rand()

Single-row updates1 N/A 10 SP doing update in loop

Bwin Session State 6 Version M4

Hekaton Performances

Expectation for OLTP workloads

Advantage of pushing work to

SPs

Interop targets app migration,

not perf

(1) 1 million rows accessed in single query or SP


Integrated ExperienceBackup and RestoreFull and log backup and restore is supported; piece-meal restore is supported

Failover ClusteringFailover time depends on size of durable memory optimized tables

AlwaysOnSecondary has memory optimized tables in memoryFailover time is not dependent on size of durable memory optimized tables

DMVs, Catalog Views, Perfmon counters, XEventsMonitoring memory, GC activity, transaction details

SSMSCreating, managing and monitoring tables, databases and server

Query OptimisationSame SQL Server Optimiser

• StorageALTER DATABASE ContosoOLTP ADD FILEGROUP [ContosoOLTP_hk_fs_fg] CONTAINS MEMORY_OPTIMIZED_DATA;ALTER DATABASE ContosoOLTP

ADD FILE (NAME = [ContosoOLTP_fs_dir], FILENAME = 'H:\MOUNTHEAD\DATA\CONTOSOOLTP_FS_DIR') to FILEGROUP [ContosoOLTP_hk_fs_fg];

• TableCREATE TABLE Customers (

CustomerID nchar (5) NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT=100000),CompanyName nvarchar (40) NOT NULL INDEX IX_CompanyName HASH(CompanyName) WITH (BUCKET_COUNT=65536),ContactName nvarchar (30) NOT NULL , ContactTitle nvarchar (30) NOT NULL , Address nvarchar (60) NOT NULL , City nvarchar (15) NOT NULL INDEX IX_City HASH(City) WITH (BUCKET_COUNT=1024), Region nvarchar (15) NOT NULL INDEX IX_Region HASH(Region) WITH (BUCKET_COUNT=1024), PostalCode nvarchar (10) NOT NULL INDEX IX_PostalCode HASH(PostalCode) WITH (BUCKET_COUNT=100000),Country nvarchar (15) NOT NULL , Phone nvarchar (24) NOT NULL) WITH (MEMORY_OPTIMIZED=ON, , DURABILITY = SCHEMA_AND_DATA)

• Native procedureCREATE PROC InsertCustomers (@CustomerID nchar(5),@CompanyName nvarchar(40),

@ContactName nvarchar(30),@ContactTitle nvarchar(30), @Address nvarchar(60),@City nvarchar(15),@Region nvarchar(15),@PostalCode nvarchar(10),@Country nvarchar(15),@Phone nvarchar(24))

WITH NATIVE_COMPILATION, SCHEMABINDING, execute as owner asBEGIN ATOMIC WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, language = 'english')

INSERT INTO [dbo].[Customers] VALUES(@CustomerID,@CompanyName,@ContactName,@ContactTitle,@Address, @City,@Region,@PostalCode,@Country,@Phone,@Fax);END

Hekaton syntaxes

Table CreationCREATE TABLE DDL

Table code generated

Compiler invoked

Table DLL produced

Table DLL loaded

Memory Optimized Tables and Indexes

90,150 Susan Bogota

50, ∞ Jane Prague

100, 200

John Paris

200, ∞ John Beijing

Timestamps NameChain ptrs City

Hash index on City

BP

Hash index on Name

JS

Garbage Collection Removes Unused Rows

Hekaton Memory Transaction log

237 | 001 | George | SEA235 | 002 | Fred | CHI 237 | 001 | George |

SEA

234 | 001 | George | LAX

235 | 002 | Fred | CHI237 | 001 | George | SEA

Checkpoint File Delta File

234 | 001 | 237

XID RowID Name(PK) Airport

Create DeleteXID RowID XID

Del

Add

Add

Add

235 | 002 | Fred | CHI

234 | 001 | George | LAX234 | 001 | George | LAX

234 | 001 | George | LAX234 | 001 | George | LAX

Hekaton Checkpoint Data Flows

Memory Optimized Tables - LimitationsOptimized for high-throughput OLTPNo DML triggersNo XML and no CLR data types

Optimized for in-memoryRows are at most 8060 bytes – no off row dataNo Large Object (LOB) types like varchar(max)

Scoping limitationsNo FOREIGN KEY and no CHECK constraintsNo schema changes (ALTER TABLE) – need to drop/recreate tableNo add/remove index – need to drop/recreate table

Accessing Memory Optimized Tables Natively Compiled Stored

ProceduresAccess only memory optimized tablesMaximum performanceLimited T-SQL surface area

When to useOLTP-style operationsOptimize performance critical business logic

Interpreted T-SQL AccessAccess both memory- and disk-based tables Less performantVirtually full T-SQL surface area

When to useAd hoc queriesReporting-style queriesSpeeding up app migration

T-SQL Compiled to Machine Code

• T-SQL compiled to machine code via C code generator and VC

• Invoking a procedure is just a DLL entry-point

• Aggressive optimizations @ compile-time

Stalling CPU clock rate

Hardware trends

Native Compiled Stored Procedures – Design Considerations

Efficient, business-logic

processingCu

sto

mer

Ben

efi

ts

Hekato

n T

ech

P

illa

rsD

rivers

Native Compiled Stored Procedures

Non-Native Compilation

Performance High. Significantly less instructions to go through

No different than T-SQL calls in SQL Server today

Migration Strategy Application changes – development overhead

Easier app migration as can still access Memory Optimized (MO) tables

Access to objects Can only interact with Memory Optimized tables

All Objects. Access for transactions across MO and b-tree tables

Support for T-SQL Constructs

Limited. T-SQL surface area (limit on MO interaction)

Optimization/Stats and Query Plan

Statistics utilized at CREATE -> Compile time

Statistics updates can be utilized to modify plan at runtime

Flexibility Limited (e.g., no ALTER procedure, compile-time isolation level)

Ad-hoc query patterns

• Statistics on the index key columns are created when the table is empty.

• Need to be updated after data is loaded into the table.• For natively compiled stored procedures, execution

plans for queries in the procedure are optimized when the procedure is compiled. When the procedure is created and when the server restarts, not when statistics are updated.

• The tables need to contain a representative set of data and statistics need to be up-to-date before the procedures are created. (Natively compiled stored procedures are recompiled if the database is taken offline and brought back online.)

Statistics On Memory-optimized Table

Hekaton Concurrency ControlMulti-version data store

Snapshot-based transaction isolation

No TempDB

Conflict detection to ensure isolation

No deadlocks

No locks, no latches, minimal context switches

No blocking

Multi-version

Optimistic

Supported Isolation LevelsSNAPSHOTReads are consistent as of start of the transactionWrites are always consistent

REPEATABLE READRead operations yield same row versions if repeated at commit time

SERIALIZABLETransaction is executed as if there are no concurrent transactions – all actions happen at a single serialization point (commit time)

Example: Write conflict

Time Transaction T1 (SNAPSHOT) Transaction T2 (SNAPSHOT)

1 BEGIN

2 BEGIN

3 UPDATE t SET c1=‘bla’ WHERE c2=123

4 UPDATE t SET c1=‘bla’ WHERE c2=123 (write conflict)

First writer wins

Guidelines for usage1. Declare isolation level – no locking hints

2. Use retry logic to handle conflicts and validation failures

3. Avoid using long-running transactions


“Hekaton” MemoryTable Data

Rule of thumb: 2 x data_size

Index Data

Bucket_count x pointer_size

Monitoring

Use DMVs, SMO, SSMS

Configuration

Use resource governor

Considerations

Management

Memory Size EstimationCREATE TABLE dbo.Orders( OrderID int NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT=1000000), CustomerID int NOT NULL INDEX IX_CustomerID HASH WITH (BUCKET_COUNT=1000000), OrderDate datetime NOT NULL, OrderDescription nvarchar(1000) ) WITH (MEMORY_OPTIMIZED=ON)• Assume the Orders table has 1M rows, and the average length of OrderDescription 78

characters.• Index size:

• The bucket_count 1000000. This is rounded up to the nearest power of 2: 1048576. • 8 * 1048576 + 8 * 1048576 = 16777216 bytes

• Table data size• [row size] * [row count] = [row size] * 8379 • [row size] = [row header size] + [actual row body size] • [row header size] = 24 + 8 * [number of indices] = 24 + 8 * 2 = 40 bytes • [actual row body size]

• SUM([size of shallow types]) = 4 [int] + 4 [int] + 8 [datetime] = 16 • 2 + 2 * [number of deep type columns] = 2 + 2 * 1 = 4 • NULL array = 1 + NULL array padding = 1 • Size so far is 16 + 4 + 1 + 1 = 22. Padding to Nearest multiple of 8 is• [actual row body size] = 24 + 2*78 = 180 bytes. So [row size] = 40 + 180 =

220 bytes • [table size] = 16777216 + 220 * 1000000 = 236777216 bytes ~= 230Mb

• The bucket count should be set to about two times the maximum expected number of distinct values in the index key.

• Balance the amount of memory allocated to the hash table and the number of distinct values in the index key.

• The higher the bucket_count value, the more empty buckets there will be in the index.

• The lower the bucket count, the more values are assigned to a single bucket. This decreases performance for point lookups and inserts, because SQL Server may need to traverse several values in a single bucket to find the value specified by the search predicate.

Determining Bucket Count

Memory ConsiderationsScenarioInserting more rows than rows that can fit in memory

Database does not come online

Transactions start failing

Recovering database that does not fit in memory

Memory pressure from “Hekaton” workload on other workloads

Operations in other workloads start failing

Read error log

Identify via DMVs, SSMS whether “Hekaton” is using most memory

Free up memory

Add memory

Identify and stop long running transactions

Symptom

Diagnosis

Solution

Agenda• What is “Hekaton”• Integrated in SQL Server• Memory consideration• Storage• AMR Tool• New DMV’s• Bonus…

• SCHEMA_ONLY (non-durable table) • When SQL Server is restarted, the non-durable table is recreated, but

starts with no data.• Avoids both transaction logging and checkpoint, which can significantly

reduce IO operations.

• SCHEMA_AND_DATA (durable table) • The data is persisted in the memory-optimized filegroup (a filestream

filegroup) .• It can hold multiple containers.

Durability Options

• Root File• Contains metadata / Other files description

• Data File• The rows are appended in the transaction log order• A given data file will contain transactions that occurred within the

range of transaction end timestamps.

• Delta File• Contains data rows that were deleted. For each deleted row, it inserts

minimal {inserting_tx_id, row_id, deleting_tx_id }• Each data file has a corresponding delta file.

Containers

• Data and delta file are populated by a background thread called offline checkpoint.

Populating Data and Delta File

Memory Optimized Data Filegroup

Ran

ge 3

00-

399

Ran

ge 1

00-

199

Del 150 TS

Ran

ge 2

00-

299

Del 250 TS

Ran

ge 4

00-

499

Del 420 TS

Ran

ge 5

00-

Insert

offline checkpoint Thread

Read Log records

A transaction with a commit timestamp of 600 inserts one new row and deletes rows inserted by transactions with a commit timestamp of 150, 250 and 420

Merge OperationFiles as of Time 500



Key

Ran

ge 1

00-

199

Ran

ge 2

00-

299

Ran

ge 3

00-

399

R

an

ge 4

00-

499

Ran

ge 1

00-

199

Ran

ge 2

00-

399

Ran

ge 4

00-

499

Ran

ge 5

00-

599

Data file with rows generated in timestamp range a-b

Delta file with IDs of Deleted Rows

Merge200-399

Ran

ge 2

00-

299

Ran

ge 3

00-

399

Deleted Files

Files as of Time 600

“Hekaton” Storage ConsiderationsCapacity needed is 2-3 x size of durable memory optimized tables

Use sequential bandwidth sufficient to meet RTO

Spinning media

Latency is important

SSDs

Per transaction log consumption is less than disk based tables

Data

Log

Agenda• What is “Hekaton”• Integrated in SQL Server• Memory consideration• Storage• AMR Tool• New DMV’s• Bonus…

New DMV’s for In-Memory OLTPsys.dm_db_xtp_checkpoint Returns database that has one or more IM objects

sys.dm_db_xtp_checkpoint_files Displays information about checkpoint files

sys.dm_db_xtp_hash_index_stats Useful for understanding and tuning the bucket counts and duplicates for index key

sys.dm_db_xtp_index_stats Reports statistics about scans on an index

sys.dm_db_xtp_memory_consumers Reports the database-level memory consumers in the IM database engine.

sys.dm_db_xtp_object_stats Reports statistics about operations on a memory optimized object.

sys.dm_db_xtp_table_memory_stats Returns memory usage statistics for each IM table (user and system) in the current database.

sys.dm_db_xtp_transactions Reports the active transactions in the IM database engine.

sys.dm_xtp_consumer_memory_usage

Reports memory usage for all memory consumers including @database level and @system level.

sys.dm_xtp_gc_stats Reports information about the current behavior of the IM garbage-collection process.

sys.dm_xtp_system_memory_consumers

Reports information about memory usage.

sys.dm_xtp_threads Reports the threads that the IM database engine has started internally.

sys.dm_xtp_transaction_stats Reports statistics about transactions that have run since the server started.

New and Updated PropertiesNew or updated property,

system view, stored procedures, or DMV

Change

OBJECTPROPERTYEX New property: TableIsMemoryOptimized.

SERVERPROPERTY New property: IsXTPSupported.

sys.data_spaces The following columns display additional values: type and type_desc

sys.indexes The following columns display additional values: type and type_desc.

sys.parameters New column: is_nullable.

sys.all_sql_modules New column: uses_native_compilation.

sys.sql_modules New column: uses_native_compilation.

sys.table_types New column: is_memory_optimized.

sys.tables New columns: durability, durability_desc, and is_memory_optimized.

sys.hash_indexes New: Shows the current hash indexes and the hash index properties

sp_xtp_merge_checkpoint_files New stored procedure: Merges all data and delta files in the transaction range specified.

There are also 13 new waits types in sys.dm_os_wait_stats New Extended Event under category xtp

AMR• Analysis, Migrate and

Report Tool• Configure Management

Data Warehouse,• Configure Data Collection,

and• Run AMR Reports to identify

performance hotspots• Included in CPT1

BEGIN

Is MDW Set up?

Configure Management Data Warehouse

Configure Data Collection

Establish System Performance Baseline Run workload

Run AMR Reports

Migrate

Run Workload and collect performance

metrics

Compare to Baseline and set as new baseline

COMPLETE

AMR Data Collection

AMR Report• Table Analysis• Usage Analysis• Contention analysis

• Store Procedure Analysis• Usage Analysis

AMR Report• Table Analysis• Gain

expected

Agenda• What is “Hekaton”• Integrated in SQL Server• Memory consideration• Storage• AMR Tool• Bonus…

• Fast execution for data warehouse queries• Speedups of 10x and more

• No need for separate base table• Save space

• Data can be inserted, updated or deleted

• Eliminate need for other indexes

• More data types supported• Removes many limitations from non-clustered columnstores in

SQL 2012

Clustered Column Store Index

0.0

5.0

10.0

15.0

20.0

Space Used in GB101 million row table (Table + index space)

Structure of a CCI Partition• CREATE CLUSTERED COLUMNSTORE

Organizes and compresses data into CS• BULK INSERT: Creates new CS row groups• INSERT: Rows are placed in the RS (heap)

• When RS is big enough, a new CS row group is created

• DELETE: Rows are marked in the Deleted Bitmap

• UPDATE: Delete plus Insert

Not intended for OLTP applications, but great for read-mostly data warehouses!Most data is in CS format

Column Store (CS)

DeletedBitmap

Row Store (RS)

Partition

Commandes• CREATE TABLE <table> ( … )

• CREATE CLUSTERED COLUMNSTORE INDEX <name> on <table> Converts entire table to CS formatTake care of memory needed and parallelism (MAXDOP 1)

• BULK INSERT, SELECT INTO <name> on <table>Creates new CS row groups

• INSERT/UPDATEStore in Row Store

Tuple MoverWhen RS reaches 1M rows, convert to a CS row groupRuns every 5 minutes by defaultStarted explicitly by ALTER INDEX <name> ON <table> REORGANIZE

Partitioning works on clustered columnstoresJust like any other tableThe motivation is manageability more than performance

• SQL Server 2012There are several engine limitations thatcan cause queries to run in row modeinstead of batch mode

• SQL Server 14Support for all flavors of JOIN

OUTER JOINSemi-join: IN, NOT IN

UNION ALLScalar aggregatesMixed mode plansImprovements in bitmaps, spill support, …

Batch Mode Improvements

LinksSQL Server 2014 Hastens Transaction Processinghttp://www.cio.com/article/734462/SQL_Server_2014_Hastens_Transaction_Processing

Hekaton Breaks Throughhttp://research.microsoft.com/en-us/news/features/hekaton-122012.aspx

Hekaton: SQL Server’s Memory-Optimized OLTP Enginehttp://research.microsoft.com/apps/pubs/default.aspx?id=193594

http://www.cio.com/article/734462/SQL_Server_2014_Hastens_Transaction_Processing

http://www.cio.com/article/734462/SQL_Server_2014_Hastens_Transaction_Processing

http://research.microsoft.com/en-us/news/features/hekaton-122012.aspx

http://research.microsoft.com/en-us/news/features/hekaton-122012.aspx

http://research.microsoft.com/apps/pubs/default.aspx?id=193594

http://research.microsoft.com/apps/pubs/default.aspx?id=193594

© 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Les Conférences

Groupe des Utilisateurs SQL Server

GUSS.fr

Les Conférences Groupe des Utilisateurs SQL Server Juin 2013 – SQL Server in-memory Alexandre Chemla – Masao Frédéric Pichaut – Microsoft.

Documents

sp slide

t sql

george lax slide

machine code tsql

hekaton memory transaction

hekaton engine

compiled sps

hekaton hash join