Index Sen Zhang
Dec 20, 2015
Index
Sen Zhang
INDEX
• When a table contains a lot of records, it can take a long time for the search engine of oracle (or other RDBMS) to look through the table to locate specific records.
Index Structures
• Secondary access structure used to speed up the retrieval of records in response to certain search conditions.
• Indexes are useful because they help you to locate specific target within a large amount of data without having to look through every object.
A data table
First name Last name Resid state
Year of born
Year of entering
John Edward VM 1989 2001
Kathy Alex NY 1987 2002
Joseph bush NJ 1983 2000
George Clinton CA 1983 1999
Alex Jordan VA 1991 2000
Bill Herbert AZ 1984 1999
James Perl SC 1986 1998
Narian Geller NC 1992 1999
Frank Thomason FL 1994 2005
Index table vs. data table
First name Last name Resid state
Year of born
Year of entering
John Edward VM 1989 2001
Kathy Alex NY 1987 2002
Joseph bush NJ 1983 2000
George Clinton CA 1983 1999
Alex Jordan VA 1991 2000
Bill Herbert AZ 1984 1999
James Perl SC 1986 1998
Narian Geller NC 1992 1999
Frank Thomason FL 1994 2005
First Index pointer
Alex 5
Bill 6
Frank 9
George 4
James 7
John 1
joseph 3
kathy 2
Narian 8
• Logically speaking at a conceptual level, the index is simply the row number.
• But physically, the index are pointers to the precise position on external storage , i.e. disks or memory (when loaded into memory.)
• They are pointers to the offsets of records in the physical file.
• The file system actually will map these offsets to a specific offsets in a specific sector in a specific track on a specific disk.
Which attributes should be indexed?
• A table could be associated with multiple indexes.
• Which attribute(s) should be indexed? The choice depends on user requirement, what do you want from those indexes, and depend on your applications, depend upon each individual database designer or database programmer’s understanding toward the application.
How index will work in a dynamic environment.
• Once you created an index on a table, oracle automatically keeps the index synchronized with that table.
• That means, when you insert a new record to the data table, oracle will insert a pointer to the index table at the right position at the same time. So every insertion will take a little bit more time.
• Similarly, when delete, update are involved, the index tables will also be involved, thus take a little bit more time in this sense, but it is tricky.
• As a tradeoff, index will expedite (speed up) other data manipulation operations such as delete, update and select.
• Insertion to tables with indexes will take longer time!
• So, it is not always a good idea to enforce indexes to tables, especially for OLTP.
OLTP
• OLTP (On-Line Transaction Processing) – supports a business? day-to-day activities – high insertion rate, simple queries – MB or GB
OLAP
• OLAP (On-Line Analytical Processing) system or Data Warehouse – analyzes operational data – low update rate, complex queries – GB or TB – a strategic business decision: high rewards,
but low chance of success
DDL statement to create index
• Create index index_name on table_name(column_name_list);
• An index is an auxiliary way to organize your data file based on the characteristics of values contained in your data file.
Why index works
• Proper data structure• Data manipulation related operations such
as inserting, deleting, updating, and searching algorithm can achieve better time complexities.
• For example, searching in an unsorted list could be O(n); but searching in a sorted list could be O(lgn), binary search for example.
Binary search tree
• A binary search tree is a binary tree where every node has a value, every node's left subtree has values less than the node's value, and every right subtree has values greater. A new node is added as a leaf.
Are these BSTs?50
7525
12 45 66 90
50
7525
12 55 73 90
1 a BST?
2 a BST?
• Note that the worst case of this build_binary_tree routine is O(n2) - if you feed it a sorted list of values, it chains them into a linked list with no left subtrees.
• For example, build_binary_tree([1, 2, 3, 4, 5]) yields the tree (None, 1, (None, 2, (None, 3, (None, 4, (None, 5, None))))).
• There are a variety of balanced schemes for overcoming this flaw with simple binary trees.
• Because databases cannot typically be maintained entirely in memory (512M main memory is good ), b-trees or b* trees are often used to index the data and to provide fast access.
• Theoretically speaking, searching an unindexed and unsorted database containing n key values will have a worst case running time of O(n); if the same data is indexed with a b-tree, the same search operation will run in O(log n).
• For example, to perform a search for a single key on a set of one million keys (1,000,000), a linear search will require at most 1,000,000 comparisons at the worst case. If the same data is indexed with a b-tree of minimum degree 10, 114 comparisons will be required in the worst case.
B*-tree
• It is a B-tree in which nodes are kept 2/3 full by redistributing keys to fill two child nodes, then splitting them into three nodes.
B-tree
Definition: A balanced search tree in which every node has between m/2 and m children, where m>1 is a fixed integer. m is the order. The root may have as few as 2 children. This is a good structure if much of the tree is in slow memory (disk), since the height, and hence the number of accesses, can be kept small, say one or two, by picking a large m.
• Also known as balanced multiway tree. • Generalization :balanced tree, search tree.
• Clearly, indexing large amounts of data can significantly improve search performance. Although other balanced tree structures can be used, a b-tree also optimizes costly disk accesses that are of concern when dealing with large data sets.
view