Top Banner
1 Hashing, Hashing, randomness and randomness and dictionaries dictionaries Rasmus Pagh PhD defense October 11, 2002
30

1 Hashing, randomness and dictionaries Rasmus Pagh PhD defense October 11, 2002.

Dec 17, 2015

Download

Documents

Clifton McBride
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Hashing, randomness and dictionaries Rasmus Pagh PhD defense October 11, 2002.

1

Hashing, randomness Hashing, randomness and dictionariesand dictionaries

Rasmus Pagh

PhD defense

October 11, 2002

Page 2: 1 Hashing, randomness and dictionaries Rasmus Pagh PhD defense October 11, 2002.

2

Overview of presentationOverview of presentation

1. Introduction to searching a computer’s memory using ”hashing”.

2. New ideas behind some of the results in the thesis.

3. Overview of results in the thesis.

Page 3: 1 Hashing, randomness and dictionaries Rasmus Pagh PhD defense October 11, 2002.

3

PART IPART I

Introduction to searching a computer’s memory using

”hashing”.

Page 4: 1 Hashing, randomness and dictionaries Rasmus Pagh PhD defense October 11, 2002.

4

Searching a computer’s memorySearching a computer’s memory

• A basic task in computer science is to search for a certain piece of information (a ”key”) in the memory of a computer.

• This is often calledthe dictionary problem.

• For example, searchfor information relatedto ”060275-4287”.

Page 5: 1 Hashing, randomness and dictionaries Rasmus Pagh PhD defense October 11, 2002.

5

The sorting approachThe sorting approach

• Keep a sorted list of all information.

• Number of search steps increases when the amount of information grows.

0601739822 Pia0601758473 Mikael0602728741 Benno0602738432 Alice0602753211 Bengt0602754287 Holger0602754811 Børge0602760666 Petra0602761985 Jens

0601739822 Pia0601758473 Mikael0602728741 Benno0602738432 Alice0602753211 Bengt0602754287 Holger0602754811 Børge0602760666 Petra0602761985 Jens

Where to find0602754287?

Got it!

0602760666 Petra

0602754287 Holger0602753211 Bengt

Page 6: 1 Hashing, randomness and dictionaries Rasmus Pagh PhD defense October 11, 2002.

6

The need for speedThe need for speed

• Many applications perform millions, billions, or even trillions of searches.

• Users do not want to wait for answers.

• The amount of data is rapidly increasing – solutions that remain fast for large data set are needed.

Page 7: 1 Hashing, randomness and dictionaries Rasmus Pagh PhD defense October 11, 2002.

7

• Idea: Store information in random locations.

0601739822 Pia

0601758473 Mikael

0602728741 Benno

0602738432 Alice

0602753211 Bengt

0602754287 Holger

0602754811 Børge 0602760666 Petra

0602761985 Jens

• Use a ”hash function” to generate and remember random locations.

The hashing approachThe hashing approach

Page 8: 1 Hashing, randomness and dictionaries Rasmus Pagh PhD defense October 11, 2002.

8

• Idea: Store information in random locations.

0601739822 Pia

0601758473 Mikael

0602728741 Benno

0602738432 Alice

0602753211 Bengt

0602754287 Holger

0602754811 Børge 0602760666 Petra

0602761985 Jens

• Use a ”hash function” to generate and remember random locations.

The hashing approachThe hashing approach

Got it!

060173982206017584730602728741060273843206027532110602754287060275481106027619850602760666Where to find0602754287?

Page 9: 1 Hashing, randomness and dictionaries Rasmus Pagh PhD defense October 11, 2002.

9

• Udi Manber (chief scientist, ):”The most important techniques behind Yahoo! are:

• Search time varies, but on average it is great – no matter how much information!

hashing, hashing, and hashing”. hashing, hashing,

Hashing in the real worldHashing in the real world

hashing,

• Lots of other critical applications in databases, search engines, algorithms, etc.

Page 10: 1 Hashing, randomness and dictionaries Rasmus Pagh PhD defense October 11, 2002.

10

PART IIPART II

New ideas behind some of the results in the thesis.

Page 11: 1 Hashing, randomness and dictionaries Rasmus Pagh PhD defense October 11, 2002.

11

Search time guaranteeSearch time guarantee

• The time for searching is often more critical than the time for updating.

• Sometimes worst case bounds are important.

• Pioneering work on dictionaries with worst case search time by Fredman et al. (1982) and Dietzfelbinger et al. (1988).

A PROBLEM

Page 12: 1 Hashing, randomness and dictionaries Rasmus Pagh PhD defense October 11, 2002.

12

0602754811 Børge

0601758473 Mikael

0602760666 Petra0602728741 Benno

0602754811 Børge

0601758473 Mikael

0602760666 Petra0602728741 Benno

Cuckoo hashingCuckoo hashing

• Idea: The hash function provides two possible locations.

0602754287 Holger

0602753211 Bengt0601739822 Pia

0602761985 Jens0602738432 Alice

0602754287 Holger

0602753211 Bengt0601739822 Pia

0602761985 Jens0602738432 Alice

Where to find0602754287? Got it!

Not here

A NEW SOLUTION

Page 13: 1 Hashing, randomness and dictionaries Rasmus Pagh PhD defense October 11, 2002.

13

0602754811 Børge

0601758473 Mikael

0602760666 Petra0602728741 Benno

0602754811 Børge

0601758473 Mikael

0602760666 Petra0602728741 Benno

• New information is inserted by, if necessary, kicking out old information.

0602754287 Holger

0602761985 Jens0601739822 Pia

0602753211 Bengt0602738432 Alice

0602754287 Holger

0602761985 Jens0601739822 Pia

0602753211 Bengt0602738432 Alice

Insert ”Harry,0603751133”

CuckooCuckoo insertioninsertion

0601739822 Pia

0602754811 Børge

0603751133 Harry Insert ”Børge,0602754811”Insert ”Pia,

0601739822”

Page 14: 1 Hashing, randomness and dictionaries Rasmus Pagh PhD defense October 11, 2002.

14

Perfect hashingPerfect hashing

• ”Perfect hashing” is hashing without collisions.

• Some memory to store such a function is necessary.

• Function description may be stored in fast, expensive memory and the table in cheaper and slower memory.

A PROBLEM

Page 15: 1 Hashing, randomness and dictionaries Rasmus Pagh PhD defense October 11, 2002.

15

Another view of cuckoo hashingAnother view of cuckoo hashing

0602754287

Page 16: 1 Hashing, randomness and dictionaries Rasmus Pagh PhD defense October 11, 2002.

16

Hash and displaceHash and displace

+2-7

-1

Where to find0602754287?

Idea: Displacements(Tarjan and Yao)

A NEW SOLUTION

Page 17: 1 Hashing, randomness and dictionaries Rasmus Pagh PhD defense October 11, 2002.

17

””Simulating” random functionsSimulating” random functions

• Quest for practical constructions of hash functions that ”behave sufficiently similar to random functions” initiated by Carter and Wegman in 1977.

• Many constructions suggested, and shown to work well in certain situations.

A PROBLEM

Page 18: 1 Hashing, randomness and dictionaries Rasmus Pagh PhD defense October 11, 2002.

18

Uniform hashingUniform hashing

-3

+1

0

+5

+1

-2

-2

+3

+2

+1

+4

-3

0

+2

0

-1

Fill hash & displace table with random

numbers

Twice!

A NEW SOLUTION

Page 19: 1 Hashing, randomness and dictionaries Rasmus Pagh PhD defense October 11, 2002.

19

PART IIIPART III

Overview of results in the thesis.

Page 20: 1 Hashing, randomness and dictionaries Rasmus Pagh PhD defense October 11, 2002.

20

Contributions of the thesisContributions of the thesis

New simple and efficient hashing schemes (cuckoo hashing, hash & displace).

New kinds of hash functions (uniform hashing, dispersing hash functions), with applications.

More efficient deterministic hashing algorithms (static and dynamic).

Detailed investigation of hashing and dictionaries in the ”cell probe” model (upper/lower bounds).

A dictionary using almost minimal space.

Mainly theoretical results of mathematical nature, but also some results that may be practically useful.

Page 21: 1 Hashing, randomness and dictionaries Rasmus Pagh PhD defense October 11, 2002.

21

Cuckoo hashingCuckoo hashing• Utilizes nearly half

of the hash tables.• Searching uses two memory lookups.• Insertion takes expected constant time.• … when using powerful hash functions.• However, very efficient in practice using

weaker hash functions.• Considerably simpler than other hashing

schemes with worst case search time.Joint work with Flemming Friche Rodler

Page 22: 1 Hashing, randomness and dictionaries Rasmus Pagh PhD defense October 11, 2002.

22

Hash and displaceHash and displace

• For a set of n keys, the analysisneeds a table of (2+)n integers.

• Table with suitable displacements can be computed in expected O(n) time.

• The table containing information has size n, i.e., it is completely full.

• ”Universal” hash functions suffice – fast.

• Perhaps the most practical such scheme that is theoretically understood.

+2

-7

-1

Page 23: 1 Hashing, randomness and dictionaries Rasmus Pagh PhD defense October 11, 2002.

23

Uniform hashingUniform hashing

• When hashing a set of n keys,one needs tables of O(n) integers.

• With probability 1-O(n -c) the new hash function computes independent and uniform values on the set.

• … when based on powerful hash functions.

• Previous results required O(n1+) integers.

• Gives theoretical justification for the widespread ”uniform hashing” assumption.

+2

+1

+3

0

+2

-1

-3

Joint work with Anna Östlin

Page 24: 1 Hashing, randomness and dictionaries Rasmus Pagh PhD defense October 11, 2002.

24

Dispersing hash functionsDispersing hash functions• Goal: Small use of random bits when hashing.

• Uniform hashing uses O(n log n) random bits.

• Universal hashing uses O(log n + log log u) random bits.

• Dispersing hash functions, introduced in the thesis, may use only O(log log u) random bits.

• Suffice for, e.g., relational join and element distinctness in expected linear time.

• No explicit construction – shown to be as hard as finding good explicit extractors.

Page 25: 1 Hashing, randomness and dictionaries Rasmus Pagh PhD defense October 11, 2002.

25

Deterministic dictionary constructionDeterministic dictionary construction

Joint paper with Torben Hagerup and Peter Bro Miltersen

• What if one allows no random bits?

• We want O(1) query time and linear space.

• Miltersen (’98) + Hagerup (’99): Reduce in time O(n log n) the general problem to that in a universe of size n2. Yields time O(n1+).

• New: An O(n log n) algorithm forcomputing ”good” displacementsfor the Tarjan-Yao scheme, which handles universe size n2.

+2

-7

-1

Page 26: 1 Hashing, randomness and dictionaries Rasmus Pagh PhD defense October 11, 2002.

26

Deterministic dynamic dictionaryDeterministic dynamic dictionary• Known deterministic dynamic dictionaries

exhibit a trade-off between update and query time.

• New trade-off added using the mentioned techniques of Miltersen and Hagerup.

Page 27: 1 Hashing, randomness and dictionaries Rasmus Pagh PhD defense October 11, 2002.

27

Impossibility resultsImpossibility results

• There are limits to the efficiency of algorithms.• Useful to know if this limit has been reached.• Some results from the thesis:

1 adaptive memory probe (as in hash & displace) is optimal for perfect hashing. 2 random access memory probes (as in cuckoo hashing) is worst-case optimal.

Page 28: 1 Hashing, randomness and dictionaries Rasmus Pagh PhD defense October 11, 2002.

28

One-probe searchOne-probe search

Joint work with Anna Östlin

0601758473 Mikael

0602728741 Benno

0602754287 Holger

0602761985 Jens

0602754287 Holger

0602754287 Holger

0602728741 Benno

0602728741 Benno

0601758473 Mikael

0601758473 Mikael

0602761985 Jens

0602761985 Jens

Where to find0602754287?

• ”Hash function” deterministically produces O(log u) possible locations.

• Probing a randomlocation finds theinformation withprobability 1-.

• Table size n log u.

• ”Hash function” not explicit.

Page 29: 1 Hashing, randomness and dictionaries Rasmus Pagh PhD defense October 11, 2002.

29

A succinct dictionaryA succinct dictionary

• Simple information theory provides a lower bound B on the space usage of a dictionary (say, with no information besides keys).

• New: A static dictionary with O(1) lookup time using B+o(n)+O(log log u) bits.

• Improves the lower-order term compared to previous results.

Page 30: 1 Hashing, randomness and dictionaries Rasmus Pagh PhD defense October 11, 2002.

30

Chronology of papersChronology of papers Low Redundancy in Static Dictionaries with Constant Query Time

ICALP 1999 and SIAM Journal on Computing, 2001 Hash and Displace: Efficient Evaluation of Minimal Perfect Hash Functions

WADS 1999 Deterministic Dictionaries

With Torben Hagerup and Peter Bro MiltersenSODA 1998-2000 and Journal of Algorithms, 2001

A Trade-Off for Worst-Case Efficient DictionariesSWAT 2000 and Nordic Journal of Computing, 2000

Dispersing Hash FunctionsRANDOM 2000

On the Cell Probe Complexity of Membership and Perfect HashingSTOC 2001

Cuckoo HashingWith Flemming Friche RodlerESA 2001

One-Probe SearchWith Anna ÖstlinICALP 2002

Simulating Uniform Hashing in Constant Time and Optimal SpaceWith Anna ÖstlinUnpublished manuscript, 2002