Top Banner
Designing Concurrent Search Structure Algorithms Dennis Shasha
30

Designing Concurrent Search Structure Algorithms Dennis Shasha.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Designing Concurrent Search Structure Algorithms Dennis Shasha.

Designing Concurrent Search Structure Algorithms

Dennis Shasha

Page 2: Designing Concurrent Search Structure Algorithms Dennis Shasha.

What is a Search Structure?

• Data structure (typically a B tree, hash structure, R-tree, etc.) that supports a dictionary.

• Operations are insert key-value pair, delete key-value pair, and search for key-value pair.

Page 3: Designing Concurrent Search Structure Algorithms Dennis Shasha.

How to make a search structure algorithm concurrent

• Naïve approach: use two phase locking (but then at the very least the root is read-locked so lock conflicts are frequent).

• Semi-naïve algorithm: use hierarchical tree locking: lock root; afterwards lock node n only if you hold lock on parent of n. (Still tends to hold locks high in tree.)

Page 4: Designing Concurrent Search Structure Algorithms Dennis Shasha.

How can we do better: fundamental insight

• In a search structure algorithm, all that we really care about is that we implement the dictionary operations correctly.

• Operations on structure need not even be serializable provided they maintain certain constraints.

Page 5: Designing Concurrent Search Structure Algorithms Dennis Shasha.

Train Your Intuition:parable of the library

• Imagine a library with books.

• It’s a little old fashion so there are still card catalogues that identify the shelf where a book is held.

• Bob wants to get a book B.

• Alice is working on reorganizing the library by moving books from shelf to shelf and then changing the card catalogue.

Page 6: Designing Concurrent Search Structure Algorithms Dennis Shasha.

Parable of the library: interleaving of ops

• Bob 1. look up book B in catalogue.

• Bob 2. read “go to shelf S”

• Bob 3. Start walking but see friend.

• Alice 1: move several books from S to S’, leaving a note.

• Alice 2: change catalogue so B maps to S’

• Bob 4: go to S, follow note to S’

Page 7: Designing Concurrent Search Structure Algorithms Dennis Shasha.

Parable of the library: observations

• Not conflict-preserving serializable:Bob Alice (Bob reads catalog then Alice changes it)Alice Bob(Alice modifies S before Bob reads)

• Indeed in no serial execution would Bob go to two shelves.

• Yet execution is completely ok!

Page 8: Designing Concurrent Search Structure Algorithms Dennis Shasha.

Parable of the library: what’s going on?

• All we care about is that 1. structure is ok after Alice finishes.2. Bob gets his book if it’s there

• We want to find a general theory for this.• Ref: Vossen Weikum book and

``Concurrent Search Structure Algorithms'‘ D. Shasha and N. Goodman, ACM Transactions on Database Systems, vol. 13, no. 1,pp. 53-90, March 1988.

Page 9: Designing Concurrent Search Structure Algorithms Dennis Shasha.

Good Structure for any Dictionary Data Structure

• Dictionary holds a set of key-value pairs. Values don’t matter for our theory so consider just the set of keys that could be present, denoted keyspace. Example: all natural numbers.

• From the root (in general, any root), must be able to navigate to a node n such that n either has a key being sought or no node has that key.

Page 10: Designing Concurrent Search Structure Algorithms Dennis Shasha.

Example: binary search tree

50

7010

35

Inset = Keyspace

Inset = {x| x > 50}Inset = {x| x < 50}

Inset = {x| x < 50 and x > 10}

Page 11: Designing Concurrent Search Structure Algorithms Dennis Shasha.

Inset, Outset, Keyset

Inset(n) is the subset of Keyspace that are either in n or could be reachable (according to the rules of the structure) from n

• Edgeset(n,n’) is the subset of Keyspace directed to descendant n’ of n. Union of all edgesets with source n is outset(n)

• Keyset(n) = Inset(n) – Outset(n). The set of keys that are in node n or nowhere.

Page 12: Designing Concurrent Search Structure Algorithms Dennis Shasha.

Notes

Inset(n) = union over all edges (m,n) of inset(m) ^ edgeset(m,n).

• Note that Edgeset(n,n’) need not always be a subset of Inset(n). You’ll see why this is good later.

Page 13: Designing Concurrent Search Structure Algorithms Dennis Shasha.

Example: binary search treeKeyspace is all integers

50

7010

35

Inset = Keyspace; keyset = {50}

Outset = {x|x!=50}

Inset = {x| x > 50} = edgeset(node 50,

node 70)

Keyset = Inset

Inset = {x| x < 50}

Keyset = Inset – {x| x > 10}

= {x| x <= 10}

Inset = {x| x < 50 and x > 10}

edgeset (node 10, node 35)

= {x|x > 10}

Keyset = Inset

Page 14: Designing Concurrent Search Structure Algorithms Dennis Shasha.

Structure Goodness Conditions

• The keysets of the nodes partition the keyspace.So U {Keyset(n) | n is a node} = Keyspaceand if n!=n’ then keyset(n) is disjoint from keyset(n’).

• Edgsets leaving node n are disjoint• Let Existkeys(n) be the keys actually

present at node n. Existkeys(n) is a subset of keyset(n).

Page 15: Designing Concurrent Search Structure Algorithms Dennis Shasha.

Structure Goodness Conditions(applies to each root)

• In the library, suppose that initially, inset(shelf S) = {books | authors begin with “S”}.Afterwards, outset(S) = {books|author names begin with “Sh” or later}

• At end keyset(S) = books having names starting with Sa through Sg. Inset(S’)= books having names starting with Sh through Sz.

Page 16: Designing Concurrent Search Structure Algorithms Dennis Shasha.

Example: library at beginning

Cat

SA

Inset of catalog = Keyspace Outset = Keyspace; keyset = {}

Inset = {x| x begins with “S”} = edgeset(cat,S)

Keyset = Inset

Inset = {x| x begins with “A”}= edgeset(cat,S) …

Page 17: Designing Concurrent Search Structure Algorithms Dennis Shasha.

Example: library after reshelving

Cat

SA

Inset of catalog = Keyspace Outset = Keyspace; keyset = {}

Inset = {x| x begins with “Sh” .. “Sz”}

Keyset = Inset

Inset = {x| x begins with “A”}

S’

Inset = {x| x begins with “S”} = edgeset(cat,S)

Outset = {x |x begins with “Sh” or greater}

Page 18: Designing Concurrent Search Structure Algorithms Dennis Shasha.

Example: library after reshelvingand catalog change

Cat

SA

Inset of catalog = Keyspace Outset = Keyspace; keyset = {}

Inset = {x| x begins with “Sh” .. “Sz”} = edgeset(Cat, S’)

Keyset = Inset

Inset = {x| x begins with “A”}

S’

Inset = {x| x begins with “S” through “Sg”} =

edgset(cat, S)

Outset = {x |x begins with “Sh” or greater}

Page 19: Designing Concurrent Search Structure Algorithms Dennis Shasha.

Observe

• Without the note from S to S’, there would be keys on S’ yet S’ would have a null inset and hence a null keyset.

• This violates the Existkeys part of the structural condition.

• Note also that we can’t eliminate the note from S to S’ even after the catalog is updated. Why?

Page 20: Designing Concurrent Search Structure Algorithms Dennis Shasha.

Execution Goodness

• For a search for an item B beginning at node m, the following invariant holds:

• After any operation of any process, if the search for item B is at node x, then B is in keyset(x) or there is a path from x to node y such that B is in keyset(y) and every edge E along that path has B in its edgeset.

Page 21: Designing Concurrent Search Structure Algorithms Dennis Shasha.

Execution Goodness Proof Sketch

• Provided the search reaches the node having B in its keyset, the search will find B there or will find it nowhere.

• The invariant ensures that the search will not end its search anywhere else.

Page 22: Designing Concurrent Search Structure Algorithms Dennis Shasha.

Execution Goodness Proof

• Why is it that Bob is fine in spite of the fact that the Bob and Alice concurrent execution could never execute serially?

• Because even when Bob is at shelf S, the book Bob is looking for is in edgeset(S,S’) and B is in keyset(S’).

Page 23: Designing Concurrent Search Structure Algorithms Dennis Shasha.

Practical Applications

• Most sophisticated database management systems use some version of the library parable in their B-trees, hash structures, etc.

• Reason: locks need not be held as long and can be held lower in the tree.

• B trees for example have links at the leaf level. So a split looks like this:

Page 24: Designing Concurrent Search Structure Algorithms Dennis Shasha.

B tree simplified (two vals per node)

50

701, 7

Inset = {x | 0 <=90}; keyset = {}

Outset = inset

Inset = {x| x > 50 and x <= 90} = edgeset(node

50, node 70)

Keyset = Inset

Inset = {x| x < 50}

Keyset = Inset

Page 25: Designing Concurrent Search Structure Algorithms Dennis Shasha.

B tree insert(32): split left leaf at 15Only 1,7 node needs to be locked

50

701, 7 32

Inset = {x | 0 <=90}; keyset = {}

Outset = inset

Inset = {x| x > 50 and x <= 90} = edgeset(node

50, node 70)

Keyset = Inset

Inset = {x| x < 50}

Keyset = Inset – {x| x > 15}

= {x| x <= 15}

Edgeset = {x|x > 15}

Page 26: Designing Concurrent Search Structure Algorithms Dennis Shasha.

Readjust parent (so lock it briefly)

15, 50

701, 7 32

Inset = {x | 0 <=90}; keyset = {}

Outset = inset

Inset = {x| x > 50 and x <= 90} = edgeset(node

50, node 70)

Keyset = Inset

Inset = {x| x < 50}

Keyset = Inset – {x| x > 15}

= {x| x <= 15}

Edgeset = {x|x > 15}

Page 27: Designing Concurrent Search Structure Algorithms Dennis Shasha.

Can Generalize Using Model

• Above algorithm is due to Lehman and Yao and is called the B-link algorithm. Long journal article to present and prove.

• Now can generalize to any structure. Ensure structure works and invariant holds on execution.

• Also possible to invent a new algorithm making direct use of the model.

Page 28: Designing Concurrent Search Structure Algorithms Dennis Shasha.

High Concurrency Without Links:Give-up algorithm

• Explicitly record the description of inset of each node in the node.

• Search(B) descends. If B is ever not in the inset of the current node, then give up and start over.

• Happens rarely enough that performance is as good as B-link for searches. Less work for deletions.

• Proof is immediate.

Page 29: Designing Concurrent Search Structure Algorithms Dennis Shasha.

Conclusion

• Simple framework for all search structures. Handful of concepts: keyspace, inset, edgeset, outset, keyset.

• Can be a guide to coding.

Page 30: Designing Concurrent Search Structure Algorithms Dennis Shasha.

Exercise

• When can Alice remove the note directing those seeking certain books to go from S to S’?

• Try to design a merge algorithm for a B-tree in the give-up setting. Lock as little and as low as possible.