Ideas on Treaps Ideas on Treaps Maverick Woo Maverick Woo <[email protected]> <[email protected]> U,2 P,8 K,7 W,6 S,12 M,14 E,9 T,17 H,20 N,33 Z,4
Dec 26, 2015
Ideas on TreapsIdeas on Treaps
Maverick WooMaverick Woo
<[email protected]><[email protected]>
U,2P,8K,7
W,6S,12M,14E,9
T,17H,20
N,33
Z,4
May 2, 2001May 2, 2001 22
DisclaimerDisclaimer
Articles of interestArticles of interest Raimund Seidel and Cecilia R. Aragon.Raimund Seidel and Cecilia R. Aragon.
Randomized search trees. Randomized search trees. Algorithmica 16 (1996), 464-497.Algorithmica 16 (1996), 464-497.
Guy E. Blelloch and Margaret Reid-Miller.Guy E. Blelloch and Margaret Reid-Miller.Fast Set Operations Using Treaps. Fast Set Operations Using Treaps. In Proc. 10th Annual ACM SPAA, 1998.In Proc. 10th Annual ACM SPAA, 1998.
Of course this is joint work with Guy.Of course this is joint work with Guy. Hopefully Daniel will also show up.Hopefully Daniel will also show up.
May 2, 2001May 2, 2001 33
BackgroundBackground
Very high level talkVery high level talk No analysisNo analysis
To make this a technical talkTo make this a technical talk
Some backgroundSome background Splay Trees (zig, zig zig, zig zig zig…)Splay Trees (zig, zig zig, zig zig zig…) Treaps, if you still remember…Treaps, if you still remember…
May 2, 2001May 2, 2001 44
AgendaAgenda
Data structure research overviewData structure research overview
Treaps refresherTreaps refresher
Some current issues on TreapsSome current issues on Treaps
May 2, 2001May 2, 2001 55
Data Structure ResearchData Structure Research
I am not qualified to say yet, but I I am not qualified to say yet, but I do have some “feelings” about it.do have some “feelings” about it.Not that many high-level problems.Not that many high-level problems. Representing a set/orderingRepresenting a set/ordering Support some operationsSupport some operations
Some say it’s all about applications.Some say it’s all about applications. Applications don’t have to very Applications don’t have to very
specific.specific. But need to be specific enough---we But need to be specific enough---we
can make assumptions.can make assumptions.
May 2, 2001May 2, 2001 66
What Operations?What Operations?
BasicBasic Insert, MembershipInsert, Membership
IntermediateIntermediate Delete (e.g. Binomial vs. Fibonacci Delete (e.g. Binomial vs. Fibonacci
Heaps)Heaps) Disjoint-Union (e.g. Union-Find)Disjoint-Union (e.g. Union-Find)
Higher LevelHigher Level Union, Intersection, DifferenceUnion, Intersection, Difference Finger SearchFinger Search
May 2, 2001May 2, 2001 77
Behavior RestrictionsBehavior Restrictions
PersistencePersistence ““Functional”Functional” More later…More later…
Architecture IndependenceArchitecture Independence Relatively new, a.k.a. “Cache-oblivious”Relatively new, a.k.a. “Cache-oblivious” Runs efficiently on hierarchical memoryRuns efficiently on hierarchical memory
Avoid memory-specific parameterizationAvoid memory-specific parameterizationForget data block size, cache line width etc.Forget data block size, cache line width etc.
Not my theme todayNot my theme today
May 2, 2001May 2, 2001 88
Why Persistence?Why Persistence?
Many reasons for persistenceMany reasons for persistence It’s practical with good garbage It’s practical with good garbage
collectors.collectors. Functional programming makes Functional programming makes
everyone’s life easier.everyone’s life easier.For the theoreticianFor the theoretician
You don’t need to worry about side effects. You don’t need to worry about side effects. Better analysis possible: NESLBetter analysis possible: NESL
For the programmerFor the programmer You don’t need to worry about side effects.You don’t need to worry about side effects. Less memory leak, less dangling pointersLess memory leak, less dangling pointers
May 2, 2001May 2, 2001 99
Real-life example 1Real-life example 1
You are have operations working on You are have operations working on multiple-instances.multiple-instances. You index the web.You index the web. You build your indices with your cool You build your indices with your cool
data structures.data structures. Conjunction query (AND) is intersection.Conjunction query (AND) is intersection. You do the intersection on two indices.You do the intersection on two indices. Now one of the indices can get Now one of the indices can get
corrupted.corrupted.
May 2, 2001May 2, 2001 1010
Real-life example 2Real-life example 2
You are rich. You are rich. Once upon a time, in a dot-com far Once upon a time, in a dot-com far
away…away… You run a multi-processor machine.You run a multi-processor machine. You learned that Splay Trees are cool.You learned that Splay Trees are cool. You even learned how to write multi-You even learned how to write multi-
threaded programs.threaded programs.Thread1 searches for Thread1 searches for xx on on SplayInstance42SplayInstance42..Thread2 searches for Thread2 searches for yy on on SplayInstance42SplayInstance42..
Real-world situation: search enginesReal-world situation: search engines
May 2, 2001May 2, 2001 1111
Data Structure vs. HackingData Structure vs. Hacking
ExamplesExamples To learn more about Splay TreesTo learn more about Splay Trees
Dial Dial (412)-HACKERS(412)-HACKERS..Ask for Danny Sleator…Ask for Danny Sleator…
OK, real exampleOK, real example (Persistent) FIFO Queues(Persistent) FIFO QueuesOperationsOperations
IsEmpty(Q), Enqueue(Q,x), Dequeue(Q)IsEmpty(Q), Enqueue(Q,x), Dequeue(Q)
Need to grow, let’s use Linked List…Need to grow, let’s use Linked List…
May 2, 2001May 2, 2001 1212
FIFO QueuesFIFO Queues
Linked List is “bad” thoughLinked List is “bad” though Transverse to tail takes linear time.Transverse to tail takes linear time.
Either Enqueue or Dequeue is going to be linear time.Either Enqueue or Dequeue is going to be linear time.
How about doubly-ended queues (deques)?How about doubly-ended queues (deques)? With that much extra space, may be faster with a With that much extra space, may be faster with a
tree.tree.
If one is not good enough, use two.If one is not good enough, use two. Suppose queue is xSuppose queue is x11xx22…x…xiiyyi+1i+1yyi+2i+2…y…ynn..
Represent as [xRepresent as [x11xx22…x…xii],[y],[ynn,y,yn-1n-1…y…yi+1i+1].]. You can figure out the details yourself.You can figure out the details yourself. In the end, isn’t this In the end, isn’t this just a hack?just a hack?
May 2, 2001May 2, 2001 1313
AgendaAgenda
Data structure research overviewData structure research overview
Treaps refresherTreaps refresher
Some current issues on TreapsSome current issues on Treaps
May 2, 2001May 2, 2001 1414
Treaps RefresherTreaps RefresherA Treap is a recursive data structure.A Treap is a recursive data structure. datatype 'a Treap = datatype 'a Treap = E | T of priority * 'a Treap * 'a * 'a TreapE | T of priority * 'a Treap * 'a * 'a Treap
Each node has a key and a priority.Each node has a key and a priority. Assume all uniqueAssume all unique
Arrange key in in-order, priority in heap-Arrange key in in-order, priority in heap-orderorderPriority is chosen uniformly at random.Priority is chosen uniformly at random. 8-way independence suffices for the analysis8-way independence suffices for the analysis
Can be computed with hash functionsCan be computed with hash functions Don’t need to store the priorityDon’t need to store the priority A key’s priority can be made consistent across runsA key’s priority can be made consistent across runs
May 2, 2001May 2, 2001 1515
Treap OperationsTreap Operations
MembershipMembership As in binary search treesAs in binary search trees
InsertInsert Add as leaf by key (in-order)Add as leaf by key (in-order) Rotate up by priority (heap-order)Rotate up by priority (heap-order)
DeleteDelete Reverse what insert doesReverse what insert does
Find-min, etc.Find-min, etc. Walk on the left spine, etc.Walk on the left spine, etc.
May 2, 2001May 2, 2001 1616
Treap SplitTreap Split
Want top-down split (it’s faster)Want top-down split (it’s faster)
(less, x, gtr) = Split(root, k)(less, x, gtr) = Split(root, k) If (root.k > k) // want to split left subtreeIf (root.k > k) // want to split left subtree
Let (l1, m, r1) = Split(root.left, k)Let (l1, m, r1) = Split(root.left, k) (l1, m, T(root.p, r1, root.k, root.right))(l1, m, T(root.p, r1, root.k, root.right))
If (root.k < k) // want to split right subtreeIf (root.k < k) // want to split right subtree Let (l1, m, r1) = Split(root.right, k)Let (l1, m, r1) = Split(root.right, k) (T(root.p, root.left, root.k, l1), m, r1)(T(root.p, root.left, root.k, l1), m, r1)
ElseElse (root.left, root.k, root.right)(root.left, root.k, root.right)
May 2, 2001May 2, 2001 1717
Treap Split ExampleTreap Split Example
BeforeBefore AfterAfter
U,2P,8K,7
W,6S,12M,14E,9
T,17H,20
N,33
Z,4
U,2
P,8K,7
W,6
S,12M,14E,9
T,17H,20
N,33
Z,4
less gtrSplit(Tr,“V”)
May 2, 2001May 2, 2001 1818
Treap Split PersistenceTreap Split Persistence
These figures are deceptive.These figures are deceptive.
Only 4 new nodes createdOnly 4 new nodes created
All on the search path to “V”All on the search path to “V”
U,2P,8K,7
W,6S,12M,14E,9
T,17H,20
N,33
Z,4
U,2
P,8K,7
W,6
S,12M,14E,9
T,17H,20
N,33
Z,4
less gtr
May 2, 2001May 2, 2001 1919
Treap JoinTreap Join
Join(less, gtr) // less < x < gtrJoin(less, gtr) // less < x < gtr Handle empty less or gtrHandle empty less or gtr If (less.p > gtr.p)If (less.p > gtr.p)
T(less.p, less.left, less.k, Join(less.right, T(less.p, less.left, less.k, Join(less.right, gtr))gtr))
ElseElseT(gtr.p, Join(less, gtr.left), gtr.k, gtr.right)T(gtr.p, Join(less, gtr.left), gtr.k, gtr.right)
May 2, 2001May 2, 2001 2020
Treap Join ExampleTreap Join Example
AfterAfter BeforeBefore
U,2P,8K,7
W,6S,12M,14E,9
T,17H,20
N,33
Z,4
U,2
P,8K,7
W,6
S,12M,14E,9
T,17H,20
N,33
Z,4
less gtrJoin(less,gtr)
May 2, 2001May 2, 2001 2121
Treap Running TimeTreap Running Time
All expected All expected O(lg n)O(lg n)
Also of note is Finger SearchAlso of note is Finger Search Given a finger in a treapGiven a finger in a treap Find the key that is Find the key that is d d away in sorted away in sorted
orderorder Expected Expected O(lg d)O(lg d) time time Require parent pointers Require parent pointers
Evil… Waste so much spaceEvil… Waste so much space
See Seidel and Aragon for details.See Seidel and Aragon for details.
May 2, 2001May 2, 2001 2222
Treap UnionTreap Union
Treaps really shine in set Treaps really shine in set operations.operations.
Union(a,b)Union(a,b) Suppose roots are (k1,p1), (k2,p2)Suppose roots are (k1,p1), (k2,p2) WLOG assume p1 > p2.WLOG assume p1 > p2. Let (less,x,gtr) = Split(b,k1).Let (less,x,gtr) = Split(b,k1). T(p1, Union(a.left, less), k1, T(p1, Union(a.left, less), k1,
Union(a.right, gtr)) Union(a.right, gtr))
May 2, 2001May 2, 2001 2323
Treap IntersectionTreap Intersection
Inter(a,b)Inter(a,b) Suppose roots are (k1,p1), (k2,p2); Suppose roots are (k1,p1), (k2,p2);
p1>p2p1>p2 Let (less,x,gtr) = Split(b,k1)Let (less,x,gtr) = Split(b,k1) If x is null // k1 is not in b, sorry dudeIf x is null // k1 is not in b, sorry dude
Join(Inter(a.left, less), Inter(a.right, gtr))Join(Inter(a.left, less), Inter(a.right, gtr)) ElseElse
T(p1, Inter(a.left, less), k1, Inter(a.right, T(p1, Inter(a.left, less), k1, Inter(a.right, gtr))gtr))
May 2, 2001May 2, 2001 2424
Treap DifferenceTreap Difference
Similar to intersectionSimilar to intersection Change the logic a bitChange the logic a bit Messier because it is not symmetricMessier because it is not symmetric
Leave as an exercise to the reader.Leave as an exercise to the reader.
May 2, 2001May 2, 2001 2525
Points of NotePoints of Note
PersistencePersistence Did you see a side effect? Did you see a side effect?
(assignments?)(assignments?)
ParallelizationParallelization Parallelize without persistence is a pain.Parallelize without persistence is a pain. Very natural divide-and-conquerorVery natural divide-and-conqueror
Run the two recursive calls on different CPUsRun the two recursive calls on different CPUs
Running times…Running times…
May 2, 2001May 2, 2001 2626
Set Operation Running Set Operation Running TimeTime
For two sets of size For two sets of size mm and and nn ( (m m ·· n n)) Optimal isOptimal is(m lg (n/m))(m lg (n/m))
What’s known before this workWhat’s known before this work With AVL Trees, With AVL Trees, O(m lg(n/m))O(m lg(n/m))
Rather complicated algorithmsRather complicated algorithms For the sake of your smooth digestion…For the sake of your smooth digestion…
Compare this to Compare this to O(m+n)O(m+n) or or O(m lg n)O(m lg n) With TreapsWith Treaps
Can use Finger Search if we have parent pointersCan use Finger Search if we have parent pointers Does not parallelize---multiple fingers???Does not parallelize---multiple fingers???
May 2, 2001May 2, 2001 2727
Set Operation Running Set Operation Running TimeTime
What’s known after this workWhat’s known after this work No parent pointersNo parent pointers Parallelize naturallyParallelize naturally Optimal expected running timeOptimal expected running time
O(m lg (n/m))O(m lg (n/m)) Analysis available in Blelloch and MillerAnalysis available in Blelloch and Miller
Relatively simple algorithmRelatively simple algorithm Experimental resultsExperimental results
6.3-6.8 speedup on 8-processor SGI machine6.3-6.8 speedup on 8-processor SGI machine 4.1-4.4 speedup on 5-processor Sun machine4.1-4.4 speedup on 5-processor Sun machine
May 2, 2001May 2, 2001 2828
AgendaAgenda
Data structure research overviewData structure research overview
Treaps refresherTreaps refresher
Some current issues on TreapsSome current issues on Treaps
May 2, 2001May 2, 2001 2929
A Word on Splay TreesA Word on Splay Trees
Splay Trees are Splay Trees are slow in practice!slow in practice! Even a single simple search would Even a single simple search would
require require O(lg n)O(lg n) pointer updatespointer updates!!
Skip Lists are way simpler and Skip Lists are way simpler and fasterfaster..
Let’s switch all Splay Trees to Skip Let’s switch all Splay Trees to Skip Lists.Lists.
Danny???Danny???
May 2, 2001May 2, 2001 3030
Bruce said…Bruce said…
First find Danny.First find Danny. Ditch Splay Trees---say they are slow.Ditch Splay Trees---say they are slow. Then praise Skip Lists.Then praise Skip Lists.
Danny will Danny will refute by quoting experimental refute by quoting experimental
studies.studies.Splay Trees are not much slower than Skip Splay Trees are not much slower than Skip
List in practice.List in practice. ask who’s my advisor.ask who’s my advisor.
I wonder if that works. So I tried.I wonder if that works. So I tried.
May 2, 2001May 2, 2001 3131
Current Issues on TreapsCurrent Issues on Treaps
Treaps are simpler than Splay TreesTreaps are simpler than Splay Trees No famous conjecture for my back No famous conjecture for my back
pocketpocketNeat idea from Adam KalaiNeat idea from Adam Kalai
Not self-adjustingNot self-adjustingAccess introduces more explicit changesAccess introduces more explicit changes
Adding data compression to TreapsAdding data compression to Treaps
Finger search on TreapsFinger search on Treaps Work by Guy + Daniel BlandfordWork by Guy + Daniel Blandford
May 2, 2001May 2, 2001 3232
Adding Compression to Adding Compression to TreapsTreaps
Search enginesSearch engines Infrequent Infrequent offlineoffline update (once a update (once a
month)month) Frequent Frequent onlineonline query and set query and set
operationsoperations Keys are unique.Keys are unique. Keys can be huge and occurs sparsely.Keys can be huge and occurs sparsely.
Let’s compress the keys!Let’s compress the keys!
Assume they are 64-bit integers.Assume they are 64-bit integers.
May 2, 2001May 2, 2001 3333
We’ve got a problem!We’ve got a problem!
I don’t know how to deploy data I don’t know how to deploy data compression to general data compression to general data structures.structures.Begin with the simplest---ArrayBegin with the simplest---ArrayThe naïve approachThe naïve approach Compress the whole arrayCompress the whole array When need to access an elementWhen need to access an element
decompress the whole arraydecompress the whole arraydo the accessdo the accesscompress the whole array againcompress the whole array again
May 2, 2001May 2, 2001 3434
Isn’t that dumb?Isn’t that dumb?
Any suggestions?Any suggestions?
Use chunkingUse chunking Divide the array into blocks of size Divide the array into blocks of size CC.. Compress each block individually.Compress each block individually. Now we are back to “constant” time!Now we are back to “constant” time!
Shh!!! That could be a trade secret. Shh!!! That could be a trade secret. Of course they use something better Of course they use something better
than vanilla array.than vanilla array.
May 2, 2001May 2, 2001 3535
Chunking a TreapChunking a Treap
A sub-tree is a chunk.A sub-tree is a chunk. Desire consistent chunk sizeDesire consistent chunk size
But Treaps are usually not full.But Treaps are usually not full. Need better chunking rulesNeed better chunking rules
ChunksChunks Can’t be too big---hurt running timeCan’t be too big---hurt running time Can’t be too small---hurt compression Can’t be too small---hurt compression
(space)(space)
May 2, 2001May 2, 2001 3636
VocabVocab
Internal node and Leaf blockInternal node and Leaf block
More preciselyMore precisely datatypedatatype
tblock = tblock = Packed of int * key * key * key vector Packed of int * key * key * key vector | UnPacked of int * int * key vector | UnPacked of int * int * key vector datatype datatype trearray = trearray = TE TE | TB of tblock | TB of tblock | TN of trearray * key * trearray | TN of trearray * key * trearray
All running time are in All running time are in expected expected casecase..
May 2, 2001May 2, 2001 3737
Idea 1 – ThresholdsIdea 1 – Thresholds
Priority is in the range 1 to Priority is in the range 1 to maxPmaxP
Invent a threshold Invent a threshold PPthth
e.g. e.g. maxP - log(maxP)maxP - log(maxP)
For For n=(p,k)n=(p,k) If If p > Pp > Pthth, then , then nn is an internal node. is an internal node. Otherwise, Otherwise, nn is in some leaf block. is in some leaf block.
Trick done when a key is inserted.Trick done when a key is inserted. Also maintained by various operations.Also maintained by various operations.
May 2, 2001May 2, 2001 3838
Idea 1 – FeaturesIdea 1 – Features
On average, constant ratio between On average, constant ratio between internal keys to “keys in block”.internal keys to “keys in block”.
With With PPthth = = maxPmaxP - - log(maxP)log(maxP), , NN keys keys log Nlog N internal nodes internal nodes Height is Height is log log Nlog log N.. O(log N)O(log N) “bottom” node, each w/ a block “bottom” node, each w/ a block ExpectExpect (N-log N) / O(log N) (N-log N) / O(log N) keys / block keys / block Binary search in block takes Binary search in block takes O(log N)O(log N)
May 2, 2001May 2, 2001 3939
Idea 1 – Running TimeIdea 1 – Running Time
Query is still Query is still O(log n)O(log n)..
Insert is also Insert is also O(log n)O(log n)..
Join, Split both take Join, Split both take O(log n)O(log n)..
Set operations rely on Join and Set operations rely on Join and Split’s Split’s O(log n)O(log n) running time. running time.
Looking good…Looking good…
May 2, 2001May 2, 2001 4040
Idea 1 – ProblemsIdea 1 – Problems
Asymptotic boundAsymptotic bound
Need to work out the constantsNeed to work out the constants Exact analysis in progressExact analysis in progress
I now think of Knuth even higher…I now think of Knuth even higher… SML implementationSML implementation
Make the idea as concrete as codeMake the idea as concrete as codeCan now do more experimentsCan now do more experiments
May 2, 2001May 2, 2001 4141
Idea 1 – QuestionsIdea 1 – Questions
Do we really need to maintain Do we really need to maintain consistent priority across runs?consistent priority across runs? Make things simplerMake things simpler But Union looks suspiciousBut Union looks suspicious
What compression algorithm to What compression algorithm to use?use? No general data compressionNo general data compression Take advantage of index distributionTake advantage of index distribution
May 2, 2001May 2, 2001 4242
Idea 2 – Small BlocksIdea 2 – Small Blocks
Want a more-or-less constant block Want a more-or-less constant block sizesize
Small blocks are more realisticSmall blocks are more realistic
Say 20Say 20 Processor specific---fit cache line sizeProcessor specific---fit cache line size How well can we compress 20 integers?How well can we compress 20 integers?
Leave for second stage investigationLeave for second stage investigation
May 2, 2001May 2, 2001 4343
Perhaps I can sharePerhaps I can share
Writing down algorithm as code helpsWriting down algorithm as code helps Pseudo code are good for short algorithmsPseudo code are good for short algorithms Real code is more concrete.Real code is more concrete.
Good for sloppy people like me.Good for sloppy people like me.Actual SML codeActual SML code
You can figure out you missed some You can figure out you missed some cases.cases.Now if SML has a debugger…Now if SML has a debugger…
Space time tradeoff is very realSpace time tradeoff is very real
May 2, 2001May 2, 2001 4444
Treap Finger SearchTreap Finger Search
Daniel is working on it.Daniel is working on it.
No parent pointers neededNo parent pointers needed
Can mimic parent pointers by Can mimic parent pointers by reversing root-to-(last accessed reversing root-to-(last accessed leaf) pathleaf) path
Should probably leave this to himShould probably leave this to him
May 2, 2001May 2, 2001 4545
Q&A / SuggestionsQ&A / Suggestions
Work in progress, welcome Work in progress, welcome suggestionssuggestions
Danny, don’t kick me too hard…Danny, don’t kick me too hard…