Dynamic Hashing and Indexing

©Silberschatz, Korth and Sudarshan12.1Database System Concepts

Dynamic Hashing

Good for database that grows and shrinks in size Allows the hash function to be modified dynamically Extendable hashing – one form of dynamic hashing

This hashing scheme take advantage of the fact that the result of applying a hashing function is a non-negative integer which can be represented as a binary number- a string of bits.

a type of directory, i.e., an array of 2d bucket addresses—is maintained, where d is called the global depth of the directory.

A local depth d’—stored with each bucket—specifies the number of bits on which the bucket contents are based

Value of d grows and shrinks as the size of the database grows and shrinks.

Thus, actual number of buckets is < 2d

The number of buckets changes dynamically due to coalescing and splitting of buckets.


Splitting and Coalescing of Buckets

Splitting of buckets is done when an overflow occurs; the value of d is incremented by one.

For a bucket whose hash value starting with 01, after splitting, first contains records whose hash value start with 010 and the other with 011

Coalescing occurs when records are deleted, i.e. d>d’; The value of d is decremented by one.

Extendible Hashing - Example

Record K h(K) h(K)2

rec1 2639 1 00001rec2 3760 16 10000rec3 4692 20 10100rec4 4871 7 00111rec5 5659 27 11011rec6 1821 29 11101rec7 1074 18 10010rec8 2115 11 01011rec9 1620 20 10100rec10 2428 28 11100rec11 3943 7 00111rec12 4750 14 01110rec13 6975 31 11111rec14 4981 21 10101rec15 9208 24 11000

01

rec 1rec 2

d1=0

record 3 = overflow!!

splitting bucket

d = 1d = 0

d1 = local depthd = global depth

rec 1

d1 = 1

d1 = 1

rec 2rec 3

rec 4


splitting bucket

NEXT

Directory Locations

rec 2rec 3

rec 1rec 4

rec 5rec 6

00

10

d = 2

d1 = 2

d1 = 1

11d1 = 2

01


splitting bucket

NEXT

rec 2

rec 3

rec 1rec 4

rec 5rec 6

d1 = 3

000

110

d = 3

d1 = 1

111d1 = 2

001010011

100101

d1 = 3

rec 7


splitting bucket

NEXT


splitting bucket

NEXTrec 1

rec 4

000

110

d = 3

111

001010011

100101

d1 = 3

d1 = 2

d1 = 3

d1 = 3

d1 = 3

d1 = 2rec 8

rec 2

rec 3

rec 5rec 6

rec 7

rec 9


splitting bucket

NEXT

rec 5

rec 6rec 10

d1 = 3

d1 = 3

rec 7

rec 9

d1 = 3

d1 = 3

rec 1

rec 4

000

110

d = 3

111

001010011

100101

rec 2

rec 3

d1 = 3

d1 = 3

d1 = 2rec 8

rec 11

rec 12

rec 6 d1 = 4

d1 = 3

d1 = 3

d1 = 3

d1 = 2

0000

1110

d = 4

1111

000100100011

11001101

0100010101100111

10101011

10001001

d1 = 3

d1 = 4

d1 = 3

rec 10

rec 13

rec 1

rec 4

rec 2

rec 3

rec 5

rec 7

rec 8

rec 11

rec 12

rec 14

rec 15


Advantages and Disadvantages

Benefits of extendable hashing: Hash performance does not degrade with growth of file

Minimal space overhead

Disadvantages of extendable hashing Extra level of indirection to find desired record

Bucket address table may itself become very big (larger than memory) Need a tree structure to locate desired record in the structure!

Changing size of bucket address table is an expensive operation

Linear hashing is an alternative mechanism which avoids these disadvantages at the possible cost of more bucket overflows. That is the directory is not needed.


Indexing


Indexing : Basic Concepts

Indexing mechanisms used to speed up access to desired data. E.g., The catalog of library.

Search Key - attribute to set of attributes used to look up records in a file.

An index file consists of records (called index entries) of the form

Index files are typically much smaller than the original file Two basic kinds of indices:

Ordered indices: search keys are stored in sorted order

Hash indices: search keys are distributed uniformly across “buckets” using a “hash function”.

search-key pointer


Index Evaluation Factors

Access types supported efficiently. E.g., records with a specified value in the attribute

or records with an attribute value falling in a specified range of values.

Access time Insertion time Deletion time Space overhead- additional space occupied by an index

structure.


Ordered Indices

In an ordered index, index entries are stored sorted on the search key value. E.g., author catalog in library.

Primary index: in a sequentially ordered file, the index whose search key specifies the sequential order of the file. Also called clustering index

The search key of a primary index is usually but not necessarily the primary key.

Secondary index: an index whose search key specifies an order different from the sequential order of the file. Also called non-clustering index.

Index-sequential file: ordered sequential file with a primary index.

Indexing techniques evaluated on basis of:


Dense Index Files

Dense index — Index record appears for every search-key value in the file.


Sparse Index Files

Sparse Index: contains index records for only some search-key values. Applicable when records are sequentially ordered on search-key

To locate a record with search-key value K we: Find index record with largest search-key value < K

Search file sequentially starting at the record to which the index record points

Less space and less maintenance overhead for insertions and deletions.

Generally slower than dense index for locating records. Good tradeoff: sparse index with an index entry for every block

in file, corresponding to least search-key value in the block.


Example of Sparse Index Files


Multilevel Index

If primary index does not fit in memory, access becomes expensive.

To reduce number of disk accesses to index records, treat primary index kept on disk as a sequential file and construct a sparse index on it. outer index – a sparse index of primary index

inner index – the primary index file

If even outer index is too large to fit in main memory, yet another level of index can be created, and so on.

Indices at all levels must be updated on insertion or deletion from the file.


Multilevel Index (Cont.)


Index Update: Insertion

Single-level index insertion: Perform a lookup using the search-key value appearing in the record

to be inserted.

Dense indices – if the search-key value does not appear in the index, insert it.

Sparse indices – if index stores an entry for each block of the file, no change needs to be made to the index unless a new block is created. In this case, the first search-key value appearing in the new block is inserted into the index.

Multilevel insertion (as well as deletion) algorithms are simple extensions of the single-level algorithms


Index Update: Deletion

If deleted record was the only record in the file with its particular search-key value, the search-key is deleted from the index also.

Single-level index deletion: Dense indices – deletion of search-key is similar to file record

deletion.

Sparse indices – if an entry for the search key exists in the index, it is deleted by replacing the entry in the index with the next search-key value in the file (in search-key order). If the next search-key value already has an index entry, the entry is deleted instead of being replaced.


Secondary Indices

Frequently, one wants to find all the records whose values in a certain field (which is not the search-key of the primary index satisfy some condition. Example 1: In the account database stored sequentially

by account number, we may want to find all accounts in a particular branch

Example 2: as above, but where we want to find all accounts with a specified balance or range of balances

We can have a secondary index with an index record for each search-key value; index record points to a bucket that contains pointers to all the actual records with that particular search-key value.


Secondary Index on balance field of account


THANK YOU.

That’s all about Indices……

Dynamic Hashing and Indexing

Documents