How SQL Server Indexes Work Sharon F. Dooley [email protected].

How SQL Server Indexes Work

Sharon F. [email protected]

SQL Server Indexes• SQL Server indexes are based on B-trees

– Special records called nodes that allow keyed access to data– Two kinds of nodes are special

• Root• Leaf

Intermediate node

Leafnode

Datapages

Root node A O

O T

T W

E IGCA T

A C E G I K M N O Q

A I

SQL Server B-Tree Rules• Root and intermediate nodes point only to other nodes• Only leaf nodes point to data• The number of nodes between the root and any leaf is the

same for all leaves• A node always contains between K and K/2 branches,

where K is the branching factor– Branching factor is the number of keys in the node

• B-trees are always sorted• The tree will be maintained during insertion, deletion, and

updating so that these rules are met– When records are inserted or updated, nodes may split

– When records are deleted, nodes may be collapsed

What Is a Node?• A page that contains key and pointer pairs

Key Pointer

Key Pointer

Key Pointer

Key Pointer

Key Pointer

Key Pointer

Key Pointer

Key Pointer

Splitting a B-Tree Node

Root (Level 0)

Node (Level 1)

Leaf (Level 2)

Abby Bob Carol Dave

Abby Ada Andy Ann

Ada Alan Amanda Amy

Bob Alan Amanda Carol Amy Dave Ada DB

Let’s Add Alice• Step 1: Split the leaf node

Bob Alan Amanda Carol Amy Dave Ada Alice

Ada Alan Alice Amanda Amy

Adding Alice• Step 2: Split the next level up

DB

Leaf

Abby Ada Amanda Andy Ann


Ada Alan Alice Amanda Amy

Adding Alice(continued)

• Split the root

DB

LeafAda Alan Alice


Amanda Amy

Andy Ann

Carol DaveAbby Andy Bob

Abby Ada Amanda

Adding Alice(continued)

• When the root splits, the tree grows another level

Root (Level 0)

Node(Level 1)

Node(Level 2)

Leaf(Level 3)

DB

Abby Carol

Amanda Amy


Ada Alan Alice

Abby Andy Bob

Abby Ada Amanda

Carol Dave

Andy Ann

Page splits cause fragmentation• Two types of fragmentation

– Data pages in a clustered table– Index pages in all indexes

• Fragmentation happens because these pages must be kept in order

• Data page fragmentation happens when a new record must be added to a page that is full

– Consider an Employee table with a clustered index on LastName, FirstName

– A new employee, Peter Dent, is hired

ExtentAdams, CarolAlly, KentBaccus, Mary

David, SueDulles, KellyEdom, Mike

Farly, LeeFrank, Joe

Ollen, CarolOppus, Larry

...

Data Page Fragmentation

Extent

ExtentDulles, KellyEdom, Mike ...

Adams, CarolAlly, KentBaccus, Mary

David, SueDent, Peter

Farly, LeeFrank, Joe

Ollen, CarolOppus, Larry

...

Index Fragmentation• Index page fragmentation occurs when a new key-pointer pair must be

added to an index page that is full– Consider an Employee table with a nonclustered index on Social

Security Number

• Employee 048-12-9875 is added

036-11-9987, pointer036-33-9874, pointer038-87-8373, pointer


116-11-9987, pointer116-33-9874, pointer

...124-11-9987, pointer124-33-9874, pointer125-87-8373, pointer

Extent

Index Fragmentation (continued)

Extent

Extent


046-11-9987, pointer048-12-9875, pointer 116-11-9987, pointer

116-33-9874, pointer

...124-11-9987, pointer124-33-9874, pointer125-87-8373, pointer

048-33-9874, pointer052-87-8373, pointer

...

Studying Fragmentation in SQL Server 2000• To determine if there is fragmentation

– In a clustered table or a nonclustered indexDBCC SHOWCONTIG [([table_id | table_name | view_id |

view_name [, index_id | index_name])]

DBCC SHOWCONTIG scanning 'Employees' table...Table: 'Employees' (1977058079); index ID: 1, database ID: 7TABLE level scan performed.- Pages Scanned................................: 707- Extents Scanned..............................: 90- Extent Switches..............................: 572- Avg. Pages per Extent........................: 7.9- Scan Density [Best Count:Actual Count].......: 15.53% [89:573]- Logical Scan Fragmentation ..................: 39.18%- Extent Scan Fragmentation ...................: 58.89%- Avg. Bytes Free per Page.....................: 4338.9- Avg. Page Density (full).....................: 46.39%DBCC execution completed. If DBCC printed error messages, contact your system administrator.

Studying Fragmentation in SQL Server 2000(continued)

• Unless the table or index spans multiple files– Extent Switches and Extents Scanned should be approximately

equal• Scan Density should be close to 100 percent• Avg. Page Density should be high and Avg. Bytes Free Per Page

should be low• Logical Scan Fragmentation and Extent Scan Fragmentation should be

as close to 0 as possible• Clearly the Employees table is terribly fragmented!

Studying Fragmentation in SQL Server 2005 and 2008

SELECT object_name(s.object_id), name, avg_fragmentation_in_percentFROM sys.dm_db_index_physical_stats

(db_id('databasename'), object_id('tablename'), {indexid | NULL}, {partitionnumber | NULL}, {'LIMITED' | 'SAMPLED' | 'DETAILED' | NULL}) as

sINNER JOIN sys.indexes as iON s.object_id = i.object_idAND s.index_id = i.index_id

• If NULL is supplied for the last argument, LIMITED is assumed• The avg_fragmentation_in_percent should be as close to 0 as possible

Studying Fragmentation in SQL Server 2005 and 2008 (continued)

SELECT object_name(s.object_id), name, avg_fragmentation_in_percent

FROM sys.dm_db_index_physical_stats (db_id('bigwind'), object_id('Employees'), null, null, null) as s

INNER JOIN sys.indexes as iON s.object_id = i.object_idAND s.index_id = i.index_idWHERE alloc_unit_type_desc = 'IN_ROW_DATA'

• Results

Employees LastName_IDx 0.685

Employees PK_Employees 3.0303

Employees City_IDX 0

Employees Region_IDX 3.922

Repairing Fragmentation• Repair index fragmentation by rebuilding index

• Rebuilding clustered index repairs table fragmentation

• DBCC DBREINDEXDBCC DBREINDEX (tablename [, indexname [, fillfactor]])– Can reorganize indexes that implement primary key and unique constraints

• CREATE INDEX

– DROP_EXISTING causes SQL Server to create and drop the index in a single step

• Faster than dropping with the DROP INDEX command and then re-creating

• ALTER TABLE … ADD CONSTRAINT … PRIMARY KEY or UNIQUE

Repairing Fragmentation(continued)

• DBCC INDEXDEFRAG does not– Lock the index– Do as thorough job as the other methods– Allow specification of a fill factor

• Uses the fill factor from the last CREATE INDEX for this indexDBCC INDEXDEFRAG ( { database_name | database_id | 0 } , { table_name | table_id} , { index_name | index_id } )

• ALTER INDEX index_name ON table_name REORGANIZE– Same as DBCC INDEXDEFRAG

• ALTER INDEX index_name ON table_name REBUILD– Allows concurrent access if you add WITH (ONLINE = ON) to the ALTER

INDEX command– Uses the version store in tempdb– Same as DBCC DBREINDEX

2005 2008

SQL Server Indexes• SQL Server indexes come in two flavors

– Clustered indexes• Database rows are in order on the index key• The data pages are the leaf nodes of the index

– Nonclustered indexes• Leaf level is in index order but the data is not• Leaf nodes contain pointers to rows

• One clustered index per table– Choose wisely– Should always have a clustered index

• Allows reorganization of the data pages

• 249 nonclustered indexes per table

Clustered Index

Database and leaf node

Root Abby Bob Carol Dave

Abby Ada Andy Ann

Ada Alan Amanda Amy

Nonclustered Index

Database


Amy Ada Amanda Alan

Leaf node

Abby Ada Andy Ann

Ada Alan Amanda Amy

Clustered and Nonclustered Indexes Interact

• Clustered indexes are always unique– If you don’t specify unique when creating them, SQL Server may add a

“uniqueifier” to the index key• Only used when there actually is a duplicate• Adds 4 bytes to the key

• The clustering key is used in nonclustered indexes– This allows SQL Server to go directly to the record from the nonclustered

index– If there is no clustered index, a record identifier will be used instead

1 Jones John

2 Smith Mary

3 Adams Mark

4 Douglas Susan

Adams 3

Douglas 4

Jones 1

Smith 2

Leaf node of a clustered index on EmployeeID

Leaf node of a nonclustered index on LastName

Clustered and Nonclustered Indexes Interact(continued)

• Another reason to keep the clustering key small!• Consider the following query:

SELECT LastName, FirstNameFROM EmployeeWHERE LastName = 'Douglas'

• When SQL Server uses the nonclustered index, it– Traverses the nonclustered index until it finds the

desired key– Picks up the associated clustering key– Traverses the clustered index to find the data

Heaps and Chains• When you place a clustered index on a table, the pages are chained together in

a doubly linked list

• SQL Server can follow the pointers to move from page to page• When there is no clustered index, the table is called a heap

– Data is located• Through nonclustered indexes• By scanning all the pages in the table

Indexes and Inserts• When there is no clustered index on a table, SQL Server uses the Page Free

Space page to find a page with space for the new record• Inserts into tables with a clustered index can cause page splits or hotspots• When a particular part of the database is particularly popular, it is called a

hotspot• Hotspots create contention problems• Clustered indexes on keys that arrive in random order cause page splits• Clustered indexes on keys that arrive in index order create hotspots

– All inserts are again at the end of the table• Identities• Dates

– However, there will be no page splits or collapses

Indexes and Updates• When data is modified, indexes may have to be modified as well• Changing data in a table with no indexes

– Data will be changed in place unless the update means that the row will no longer fit on the page

– If the row won’t fit, it will simply be moved to a new page• Changing a clustering key column

– The row will be deleted from its original location– It will be inserted into the new location– All nonclustered indexes must be maintained– Exception:

• A change that doesn’t affect the index order

• Changing “Thompson” to “Thompsen” will be done in place

• Avoid clustered indexes on columns that frequently change

Thompson TyneThomas

Indexes and Updates(continued)

• Changing any non-key column in a table with only nonclustered indexes– Data will be changed in place unless the row will no longer fit– If the row will no longer fit

• It will be moved to another page• A forwarding pointer will be left on the original page that points to the

new location• Index pointers don’t need to be updated• An additional I/O will now be required• There will never be more than one forwarding pointer, no matter how

many times a row moves• To see whether updates have produced forwarding pointers, use

DBCC SHOWCONTIG('tablename') WITH TABLERESULTS

SELECT forwarded_record_count FROM sys.dm_db_index_physical_stats (db_id('databasename'), object_id('tablename'), NULL, NULL, NULL)

2000

2005 2008

Indexes and Deletes• Deleting from a heap

– Row is physically deleted– Remaining rows on page below the deleted record are not moved up at the

time the delete happens– Page will be compressed when space is needed for another row on the

page• Deleting from a leaf node

– Data pages of clustered index– Leaf nodes of nonclustered index– Records may not be physically deleted at the time the delete is issued– Records may be marked as ghost records

• Used by lock manager• Not retrievable by users

– Special SQL Server process cleans up the ghost records• Won’t clean up records that are part of an active transaction• Doesn’t compress page

Indexes and Deletes(continued)

• Deleting from a nonleaf node– No ghost records– Page is not compressed

• When rows are deleted, both nonclustered and clustered indexes must be maintained

• When the last row is deleted from a page (index or data), the page is deallocated and returned to the free space pool– Unless it is the only page in the table– A table always has at least one page, even if it is empty

What Are Index Statistics?• Metric used by the optimizer in determining

whether or not an index is useful for a particular query

• Stored in– An image column named statblob in the sysindexes table

– An internal and invisible table• Essentially a histogram• Statistics are kept for 200 steps

2000

2005 2008

Selectivity• The statistics allow the optimizer to determine the selectivity of an index

– A unique, single-column index always has a selectivity of 1• One index entry points to exactly one row

• Another term for this is density– Density is the inverse of selectivity

• Density values range from 0 to 1• A selective index has a density of 0.10 or less• A unique, single-column index always has a density of 0.0

• When the index is composite, it becomes a little more complicated– SQL Server maintains detailed statistics only on the leftmost column– It does compute density for each column

• Assume there is an index on (col1, col2, col3)• Density is computed for

– Col1– Col1 + Col2– Col1 + Col2 + Col3

Exploring Statistics• To see the index statistics, use

– DBCC SHOW_STATISTICS ('tablename', {'indexname' | 'statisticsname'})

– DBCC SHOW_STATISTICS ('Employees', 'EmployeeName_Idx')• Interpreting the step output

RANGE_HI_KEY Upper-bound value of a histogram step

RANGE_ROWS Number of rows from the sample that fall within a histogram step, excluding the upper bound

EQ_ROWS Number of rows from the sample that are equal in value to the upper bound of the histogram step

DISTINCT_RANGE_ROWS Number of distinct values within a histogram step, excluding the upper bound

AVG_RANGE_ROWS Average number of duplicate values within a histogram step, excluding the upper bound

Exploring Index Statistics(continued)

Statistics for INDEX 'EmployeeName_IDX'.

Updated Rows Rows Sampled Steps Density Avg key length -------------------- ------- ------------- ------ ------------ ------------

Jan 27 2002 7:00PM 10009 10009 200 1.3958309E-4 28.475372

All density Average Length Columns------------ -------------- -------- 1.4271443E-4 13.252672 LastName9.9940036E-5 24.475372 LastName, FirstName9.9910081E-5 28.475372 LastName, FirstName, EmployeeID RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS ------------ ---------- ------- -------------------- -------------- Aaby 0.0 2.0 0 0.0Abrahamson 59.0 2.0 43 1.3409091. . .Zuran 13.0 1.0 10 1.3Zvonek 0.0 1.0 0 0.0

Distribution steps

The report for SQL Server 2005 and later is slightly different at the

beginning

Statistics Maintenance• By default, SQL Server will automatically maintain the statistics• Index statistics are computed (or recomputed) when the index is

created or rebuilt• SQL Server keeps track of the updates to a table

– Each INSERT, UPDATE, or DELETE statement updates a counter in sysindexes named rowmodctr

• Note that TRUNCATE TABLE does not modify this counter

– Whenever the statistics are recomputed, the counter is set back to zero

• When you issue a query, the optimizer checks rowmodctr to see whether the statistics are up to date– If they are not, the statistics will be updated

Statistics Maintenance(continued)

• Note that this may not always happen at the best time in a production system– Can turn off automatic update– Can manually update– In SQL Server 2005 and later, can set the AUTO_UPDATE_STATISTICS_ASYNC

database option

• Example:– Assume that a table has 1,000 rows– The threshold would be 500 + (.20 * 1000)– You would expect to see the statistics automatically updated

after about 700 modifications

Table type Empty condition

Threshold when empty Threshold when not empty

Permanent < 500 rows Number of changes >= 500 Number of changes >= 500 + (Number of rows * 20%)

Temporary < 6 rows Number of changes >= 6 Number of changes >= 500

Estimating Page Accesses• No index

– Number of data pages in the table• Equality query using a unique index

– Nonclustered index• Number of index levels + 1 (if there is no clustered index)• Number of nonclustered index levels + the number of levels in the clustered

index– Clustered index

• Number of index levels

Estimating Page Accesses for a Clustered Index

• Number of levels + number of data pages– Number of data pages = number of qualifying rows /

rows per page

Database and leaf node


Abby Ada Andy Ann

Ada Alan Amanda Amy

Estimating Page Accesses for a Nonclustered Index

• Number of levels + number of qualifying leaf pages + number of rows ornumber of levels + number of qualifying leaf pages + (number of rows * number of clustered index levels)– Number of qualifying leaf pages

= number of qualifying rows / rows per page

– Assumes every row is on a different page

Database

RootAbby Bob Carol Dave

Amy Ada Amanda Alan

Leaf node

Abby Ada Andy Ann

Ada Alan Amanda Amy

Covering Indexes• When a nonclustered index includes all the data requested in a query (both the items

in the SELECT list and the WHERE clause), it is called a covering index• With a covering index, there is no need to access the actual data pages

– Only the leaf nodes of the nonclustered index are accessed

• Because the leaf node of a clustered index is the data itself, a clustered index covers all queries

Leaf node of a nonclustered index on LastName, FirstName, Birthdate

Adams Mark 1/14/1956 3

Douglas Susan 12/12/1947 4

Jones John 4/15/1967 1

Smith Mary 7/14/1970 2

The last column is EmployeeID.

Remember that the clustering key is always included in a nonclustered

index.

Covering Indexes(continued)

• This index covers the following queries:SELECT EmployeeID, LastName, FirstName, BirthdateFROM EmployeesWHERE Birthdate >= '1/12/1941'SELECT LastName FROM Employees WHERE EmployeeID = 7SELECT LastName, FirstNameFROM EmployeesWHERE LastName BETWEEN 'A' AND 'C'

• Remember that the number of accesses for a nonclustered index is the number of levels + the number of qualifying leaf pages + either the number of rows or (the number of rows * the number of levels in the clustered index)– A covering index eliminates the “number of rows” term from the equation– The optimizer is highly likely to use a nonclustered index because of this

Non-Key Index Columns• SQL Server 2005 and later allow you to include columns in a non-clustered

index that are not part of the key– Allows the index to cover more queries– Included columns only appear in the leaf level of the index– Up to 1,023 additional columns– Can include data types that cannot be key columns

• Except text, ntext, and image data types• Syntax

CREATE [ UNIQUE ] NONCLUSTERED INDEX index_name ON <object> ( column [ ASC | DESC ] [ ,...n ] ) [ INCLUDE ( column_name [ ,...n ] ) ]

• ExampleCREATE NONCLUSTERED INDEX NameRegion_IDXON Employees(LastName)INCLUDE (Region)

How SQL Server Indexes Work Sharon F. Dooley [email protected].

Documents

pointer extent slide

pointer pairs key pointer

id index

index id

indexes fragmentation

extent scan fragmentation

studying fragmentation

node level