Top Banner
CityHash: Fast Hash Functions for Strings Geoff Pike (joint work with Jyrki Alakuijala) Google http://code.google.com/p/cityhash/
91

CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Apr 24, 2018

Download

Documents

tranliem
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

CityHash: Fast Hash Functions for Strings

Geoff Pike(joint work with Jyrki Alakuijala)

Google

http://code.google.com/p/cityhash/

Page 2: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Introduction

I Who?I What?I When?I Where?I Why?

Page 3: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Outline

Introduction

A Biased Review of String Hashing

Murmur or Something New?

Interlude: Testing

CityHash

Conclusion

Page 4: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Recent activity

I SHA-3 winner was announced last monthI Spooky version 2 was released last monthI MurmurHash3 was finalized last yearI CityHash version 1.1 will be released this month

Page 5: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

In my backup slides you can find ...

I My notationI Discussion of cyclic redundancy checks

I What is a CRC?I What does the crc32q instruction do?

Page 6: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Traditional String Hashing

I Hash function loops over the inputI While looping, the internal state is kept in registersI In each iteration, consume a fixed amount of input

I Sample loop for a traditional byte-at-a-time hash:

for (int i = 0; i < N; i++) {state = Combine(state, Bi)state = Mix(state)

}

Page 7: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Traditional String Hashing

I Hash function loops over the inputI While looping, the internal state is kept in registersI In each iteration, consume a fixed amount of inputI Sample loop for a traditional byte-at-a-time hash:

for (int i = 0; i < N; i++) {state = Combine(state, Bi)state = Mix(state)

}

Page 8: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Two more concrete old examples (loop only)

for (int i = 0; i < N; i++)state = ρ-5 (state)⊕ Bi

for (int i = 0; i < N; i++)state = 33 · state+ Bi

Page 9: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Two more concrete old examples (loop only)

for (int i = 0; i < N; i++)state = ρ-5 (state)⊕ Bi

for (int i = 0; i < N; i++)state = 33 · state+ Bi

Page 10: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

A complete byte-at-a-time example

// Bob Jenkins circa 1996int state = 0for (int i = 0; i < N; i++) {

state = state+ Bistate = state+ σ-10 (state)state = state⊕ σ6 (state)

}state = state+ σ-3 (state)state = state⊕ σ11 (state)state = state+ σ-15 (state)return state

What’s better about this? What’s worse?

Page 11: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

A complete byte-at-a-time example

// Bob Jenkins circa 1996int state = 0for (int i = 0; i < N; i++) {

state = state+ Bistate = state+ σ-10 (state)state = state⊕ σ6 (state)

}state = state+ σ-3 (state)state = state⊕ σ11 (state)state = state+ σ-15 (state)return stateWhat’s better about this? What’s worse?

Page 12: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

What Came Next—Hardware Trends

I CPUs generally got betterI Unaligned loads work well: read words, not bytesI More registersI SIMD instructionsI CRC instructions

I Parallelism became more importantI PipelinesI Instruction-level parallelism (ILP)I Thread-level parallelism

Page 13: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

What Came Next—Hash Function Trends

I People got pickier about hash functionsI Collisions may be more costlyI Hash functions in libraries should be “decent”I More acceptance of complexityI More emphasis on diffusion

Page 14: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Jenkins’ mixAlso around 1996, Bob Jenkins published a hash function witha 96-bit input and a 96-bit output. Pseudocode with 32-bitregisters:

a = a - b; a = a - c; a = a⊕ σ13 (c)b = b - c; b = b - a; b = b ⊕ σ-8 (a)c = c - a; c = c - b; c = c ⊕ σ13 (b)a = a - b; a = a - c; a = a⊕ σ12 (c)b = b - c; b = b - a; b = b ⊕ σ-16 (a)c = c - a; c = c - b; c = c ⊕ σ5 (b)a = a - b; a = a - c; a = a⊕ σ3 (c)b = b - c; b = b - a; b = b ⊕ σ-10 (a)c = c - a; c = c - b; c = c ⊕ σ15 (b)

Thorough, but pretty fast!

Page 15: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Jenkins’ mixAlso around 1996, Bob Jenkins published a hash function witha 96-bit input and a 96-bit output. Pseudocode with 32-bitregisters:

a = a - b; a = a - c; a = a⊕ σ13 (c)b = b - c; b = b - a; b = b ⊕ σ-8 (a)c = c - a; c = c - b; c = c ⊕ σ13 (b)a = a - b; a = a - c; a = a⊕ σ12 (c)b = b - c; b = b - a; b = b ⊕ σ-16 (a)c = c - a; c = c - b; c = c ⊕ σ5 (b)a = a - b; a = a - c; a = a⊕ σ3 (c)b = b - c; b = b - a; b = b ⊕ σ-10 (a)c = c - a; c = c - b; c = c ⊕ σ15 (b)

Thorough, but pretty fast!

Page 16: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Jenkins’ mix-based string hash

Given mix(a, b, c) as defined on the previous slide, pseudocodefor string hash:uint32 a = ...uint32 b = ...uint32 c = ...int iters = bN /12cfor (int i = 0; i < iters; i++) {

a = a + W3ib = b + W3i + 1c = c + W3i + 2mix(a, b, c)

}etc.

Page 17: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Modernizing Google’s string hashing practices

I Until recently, most string hashing at Google used Jenkins’techniques

I Some in the “32-bit” styleI Some in the “64-bit” style, whose mix is 4/3 times as long

I We saw Austin Appleby’s 64-bit Murmur2 was faster andconsidered switching

I Launched education campaign around 2009I Explain the options; give recommendationsI Encourage labelling: “may change” or “won’t”

Page 18: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Modernizing Google’s string hashing practices

I Until recently, most string hashing at Google used Jenkins’techniques

I Some in the “32-bit” styleI Some in the “64-bit” style, whose mix is 4/3 times as long

I We saw Austin Appleby’s 64-bit Murmur2 was faster andconsidered switching

I Launched education campaign around 2009I Explain the options; give recommendationsI Encourage labelling: “may change” or “won’t”

Page 19: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Quality targets for string hashing

There are roughly four levels of quality one might seek:

I quick and dirtyI suitable for a libraryI suitable for fingerprintingI secure

Is Murmur2 good for a library? for fingerprinting? both?

Page 20: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Quality targets for string hashing

There are roughly four levels of quality one might seek:

I quick and dirtyI suitable for a libraryI suitable for fingerprintingI secure

Is Murmur2 good for a library? for fingerprinting? both?

Page 21: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Murmur2 preliminaries

First define two subroutines:

ShiftMix(a) = a⊕ σ47 (a)

and

TailBytes(N) =N mod 8∑

i=1

256(N mod 8)−i · BN − i

Page 22: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Murmur2 preliminaries

First define two subroutines:

ShiftMix(a) = a⊕ σ47 (a)

and

TailBytes(N) =N mod 8∑

i=1

256(N mod 8)−i · BN − i

Page 23: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Murmur2

uint64 k = 14313749767032793493int iters = bN /8cuint64 hash = seed⊕ Nkfor (int i = 0; i < iters; i++)

hash = (hash⊕ (ShiftMix(Wi ·k)·k))·k

if (N mod 8 > 0)hash = (hash⊕ TailBytes(N))·k

return ShiftMix(ShiftMix(hash) · k)

Page 24: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Murmur2 Strong Points

I SimpleI Fast (assuming multiplication is fairly cheap)I Quality is quite good

Page 25: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Questions about Murmur2 (or any other choice)

I Could its speed be better?I Could its quality be better?

Page 26: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Murmur2 Analysis

Inner loop is:

for (int i = 0; i < iters; i++)hash = (hash⊕ f(Wi )) · k

where f is “Mul-ShiftMix-Mul”

Page 27: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Murmur2 Speed

I ILP comes mostly from parallel application of fI Cost of TailBytes(N) can be painful for N < 60 or so

Page 28: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Murmur2 Quality

I f is invertibleI During the loop, diffusion isn’t perfect

Page 29: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Testing

Common tests include:I Hash a bunch of words or phrasesI Hash other real-world data setsI Hash all strings with edit distance <= d from some stringI Hash other synthetic data sets

I E.g., 100-word strings where each word is “cat” or “hat”I E.g., any of the above with e x t r a s p a c e

I We use our own plus SMHasher

I avalanche

Page 30: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Testing

Common tests include:I Hash a bunch of words or phrasesI Hash other real-world data setsI Hash all strings with edit distance <= d from some stringI Hash other synthetic data sets

I E.g., 100-word strings where each word is “cat” or “hat”I E.g., any of the above with e x t r a s p a c e

I We use our own plus SMHasherI avalanche

Page 31: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Avalanche (by example)

Suppose we have a function that inputs and outputs 32 bits.Find M random input values. Hash each input value with andwithout its jth bit flipped. How often do the results differ in theirkth output bit?

Ideally we want “coin flip” behavior, so the relevant distributionhas mean M/2 and variance 1/4M.

Page 32: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Avalanche (by example)

Suppose we have a function that inputs and outputs 32 bits.Find M random input values. Hash each input value with andwithout its jth bit flipped. How often do the results differ in theirkth output bit?

Ideally we want “coin flip” behavior, so the relevant distributionhas mean M/2 and variance 1/4M.

Page 33: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

64x64 avalanche diagram: f (x) = x

Page 34: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

64x64 avalanche diagram: f (x) = kx

Page 35: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

64x64 avalanche diagram: ShiftMix

Page 36: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

64x64 avalanche diagram: ShiftMix(x) · k

Page 37: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

64x64 avalanche diagram: ShiftMix(kx) · k

Page 38: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

64x64 avalanche diagram: f (x) = CRC(kx)

Page 39: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

The CityHash Project

Goals:I Speed (on Google datacenter hardware or similar)I Quality

I Excellent diffusionI Excellent behavior on all contributed test dataI Excellent behavior on basic synthetic test dataI Good internal state diffusion—but not too good,

cf. Rogaway’s Bucket Hashing

Page 40: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Portability

For speed without total loss of portability, assume:I 64-bit registersI pipelined and superscalarI fairly cheap multiplicationI cheap +,−,⊕, σ, ρ, βI cheap register-to-register moves

I a + b may be cheaper than a⊕ bI a + cb + 1 may be fairly cheap for c ∈ {0,1,2,4,8}

Page 41: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Portability

For speed without total loss of portability, assume:I 64-bit registersI pipelined and superscalarI fairly cheap multiplicationI cheap +,−,⊕, σ, ρ, βI cheap register-to-register movesI a + b may be cheaper than a⊕ bI a + cb + 1 may be fairly cheap for c ∈ {0,1,2,4,8}

Page 42: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Branches are expensive

Is there a better way to handle the “tails” of short strings?

How many dynamic branches are reasonable for hashing a12-byte input?

How many arithmetic operations?

Page 43: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Branches are expensive

Is there a better way to handle the “tails” of short strings?

How many dynamic branches are reasonable for hashing a12-byte input?

How many arithmetic operations?

Page 44: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Branches are expensive

Is there a better way to handle the “tails” of short strings?

How many dynamic branches are reasonable for hashing a12-byte input?

How many arithmetic operations?

Page 45: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

CityHash64 initial design (2010)

I Focus on short stringsI Perhaps just use Murmur2 on long stringsI Use overlapping unaligned readsI Write the minimum number of loops: 1I Focus on speed first; fix quality later

Page 46: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

The CityHash64 function: overall structure

if (N <= 32)if (N <= 16)

if (N <= 8)...

else...

else...

else if (N <= 64) {// Handle 33 <= N <= 64...

} else {// Handle N > 64int iters = bN /64c...

}

Page 47: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

The CityHash64 function: overall structure

if (N <= 32)if (N <= 16)if (N <= 8)

...else...

else...

else if (N <= 64) {// Handle 33 <= N <= 64...

} else {// Handle N > 64int iters = bN /64c...

}

Page 48: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

The CityHash64 function (2012): preliminaries

Define α(u, v,m):

let a = u⊕ va’ = ShiftMix(a ·m)a” = a’⊕ va”’ = ShiftMix(a′′ ·m)

ina”’ ·m

Also, k0, k1, and k2 are primes near 264, and K is k2 + 2N.

Page 49: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

The CityHash64 function (2012): preliminaries

Define α(u, v,m):

let a = u⊕ va’ = ShiftMix(a ·m)a” = a’⊕ va”’ = ShiftMix(a′′ ·m)

ina”’ ·m

Also, k0, k1, and k2 are primes near 264, and K is k2 + 2N.

Page 50: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

CityHash64: 1 <= N <= 3

let a = B0b = BbN /2cc = BN−1y = a + 256bz = N + 4c

inShiftMix((y · k2)⊕ (z · k0))

Page 51: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

CityHash64: 4 <= N <= 8

α(N + 4W 320 ,W 32

−1,K)

Page 52: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

CityHash64: 9 <= N <= 16

let a = W0 + k2b = W−1c = ρ37 (b) · K + ad = (ρ25 (a) + b) · K

inα(c,d,K)

Page 53: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

CityHash64: 17 <= N <= 32

let a = W0 · k1b = W1c = W−1 · Kd = W−2 · k2

inα(ρ43 (a + b) + ρ30 (c) + d,a + ρ18 (b + k2) + c,K)

Page 54: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

CityHash64: 33 <= N <= 64

let a = W0 · k2e = W2 · k2f = W3 · 9h = W−2 · Ku = ρ43 (a + W−1) + 9 (ρ30 (W1) + c)v = a + W−1 + f + 1w = h + β((u + v) · K)x = ρ42 (e + f ) + W−3 + β(W−4)y = (β((v + w) · K) + W−1) · Kz = e + f + W−3r = β((x + z) · K + y) + W1t = ShiftMix((r + z) · K + W−4 + h)

intK + x

Page 55: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Evaluation for N <= 64

I CityHash64 is about 1.5x faster than Murmur2 for N <= 64I Quality meets targets (bug reports are welcome)I Simplifying it would be niceI Key lesson: Don’t loop over bytesI Key lesson: Understand the basics of machine architectureI Key lesson: Know when to stop

Page 56: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Evaluation for N <= 64

I CityHash64 is about 1.5x faster than Murmur2 for N <= 64I Quality meets targets (bug reports are welcome)I Simplifying it would be nice

I Key lesson: Don’t loop over bytesI Key lesson: Understand the basics of machine architectureI Key lesson: Know when to stop

Page 57: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Evaluation for N <= 64

I CityHash64 is about 1.5x faster than Murmur2 for N <= 64I Quality meets targets (bug reports are welcome)I Simplifying it would be niceI Key lesson: Don’t loop over bytesI Key lesson: Understand the basics of machine architectureI Key lesson: Know when to stop

Page 58: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Next steps

Arguably we should have written CityHash32 next. That’s stillnot done.

Instead, we worked on 64-bit hashes for N > 64, and 128-bithashes.

Page 59: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

CityHash64 for N > 64

The one loop in CityHash64:I 56 bytes of stateI 64 bytes consumed per iterationI 7 rotates, 4 multiplies, 1 xor, about 36 adds (??)I influenced by mix and Murmur2

Page 60: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

128-bit CityHash variants

I CityHash128I same loop body, manually unrolledI slightly faster for large N

I CityHashCrc128I totally different functionI uses CRC instruction, but isn’t a CRCI faster still for large N

Page 61: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Evaluation for N > 64

I CityHash64 is about 1.3 to 1.6x faster than Murmur2I For long strings, the fastest CityHash variant is about 2x

faster than the fastest Murmur variantI Quality meets targets (bug reports are welcome)I Jenkins’ Spooky is a strong competitor

Page 62: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Evaluation for N > 64

I CityHash64 is about 1.3 to 1.6x faster than Murmur2I For long strings, the fastest CityHash variant is about 2x

faster than the fastest Murmur variantI Quality meets targets (bug reports are welcome)I Jenkins’ Spooky is a strong competitor

Page 63: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

My recommendations

For hash tables or fingerprints:

Nehalem, Westmere, similar otherSandy Bridge, etc. CPUs CPUs

small N CityHash CityHash TBDlarge N CityHash Spooky or CityHash TBD

For quick-and-dirty hashing: Start with the above

Page 64: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

My recommendations

For hash tables or fingerprints:

Nehalem, Westmere, similar otherSandy Bridge, etc. CPUs CPUs

small N CityHash CityHash TBDlarge N CityHash Spooky or CityHash TBD

For quick-and-dirty hashing: Start with the above

Page 65: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Future work

I CityHash32I Big EndianI SIMD

Page 66: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

The End

Backup Slides

Page 67: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

The End

Backup Slides

Page 68: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Notation

N = the length of the input (bytes)a⊕ b = bitwise exclusive-ora + b = sum (usually mod 264)a · b = product (usually mod 264)

σn (a) = right shift a byn bitsσ-n (a) = left shift a byn bitsρn (a) = right rotate a byn bitsρ-n (a) = left rotate a byn bitsβ(a) = byteswap a

Page 69: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

More Notation

Bi = the i th byte of the input (counts from 0)

W bi = the i th b-bit word of the input

W b−1 = the last b-bit word of the input

W b−2 = the second-to-last b-bit word of the input

Page 70: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

More Notation

Bi = the i th byte of the input (counts from 0)

W bi = the i th b-bit word of the input

W b−1 = the last b-bit word of the input

W b−2 = the second-to-last b-bit word of the input

Page 71: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Cyclic Redundancy Check (CRC)

The commonest explanation of a CRC is in terms ofpolynomials whose coefficients are elements of GF(2).

InGF(2):

0 is the additive identity,1 is the multiplicative identity, and1 + 1 = 0 + 0 = 0.

Page 72: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

Cyclic Redundancy Check (CRC)

The commonest explanation of a CRC is in terms ofpolynomials whose coefficients are elements of GF(2). InGF(2):

0 is the additive identity,1 is the multiplicative identity, and1 + 1 = 0 + 0 = 0.

Page 73: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

CRC, part 2

Sample polynomial:

p = x32 + x27 + 1

Page 74: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

CRC, part 3

We can use p to define an equivalence relation: We’ll say q andr are equivalent iff they differ by a polynomial times p.

Page 75: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

CRC, part 4

Theorem: The equivalence relation has 2Degree(p) elements.

Lemma: if Degree(p) = Degree(q) > 0then Degree(p + q) < Degree(p)and, if not, Degree(p + q) = max(Degree(p),Degree(q))

Observation: There are 2Degree(p) polynomials with de-gree less than Degree(p), none equivalent.

Page 76: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

CRC, part 4

Theorem: The equivalence relation has 2Degree(p) elements.

Lemma: if Degree(p) = Degree(q) > 0then Degree(p + q) < Degree(p)and, if not, Degree(p + q) = max(Degree(p),Degree(q))

Observation: There are 2Degree(p) polynomials with de-gree less than Degree(p), none equivalent.

Page 77: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

CRC, part 4

Theorem: The equivalence relation has 2Degree(p) elements.

Lemma: if Degree(p) = Degree(q) > 0then Degree(p + q) < Degree(p)and, if not, Degree(p + q) = max(Degree(p),Degree(q))

Observation: There are 2Degree(p) polynomials with de-gree less than Degree(p), none equivalent.

Page 78: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

CRC, part 5

Observation: Any polynomial with degree >= Degree(p) isequivalent to a lower degree polynomial.

Example: What is a degree <= 31 polynomial equivalent to x50?

Degree(x50)− Degree(p) = 18; therefore x50 − x18 · p hasdegree less than 50.

x50 − x18 · p = x50 − x18 · (x32 + x27 + 1)= x50 − (x50 + x45 + x18)

= x45 + x18

Page 79: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

CRC, part 5

Observation: Any polynomial with degree >= Degree(p) isequivalent to a lower degree polynomial.

Example: What is a degree <= 31 polynomial equivalent to x50?

Degree(x50)− Degree(p) = 18; therefore x50 − x18 · p hasdegree less than 50.

x50 − x18 · p = x50 − x18 · (x32 + x27 + 1)= x50 − (x50 + x45 + x18)

= x45 + x18

Page 80: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

CRC, part 5

Observation: Any polynomial with degree >= Degree(p) isequivalent to a lower degree polynomial.

Example: What is a degree <= 31 polynomial equivalent to x50?

Degree(x50)− Degree(p) = 18; therefore x50 − x18 · p hasdegree less than 50.

x50 − x18 · p = x50 − x18 · (x32 + x27 + 1)= x50 − (x50 + x45 + x18)

= x45 + x18

Page 81: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

CRC, part 5

Observation: Any polynomial with degree >= Degree(p) isequivalent to a lower degree polynomial.

Example: What is a degree <= 31 polynomial equivalent to x50?

Degree(x50)− Degree(p) = 18; therefore x50 − x18 · p hasdegree less than 50.

x50 − x18 · p = x50 − x18 · (x32 + x27 + 1)= x50 − (x50 + x45 + x18)

= x45 + x18

Page 82: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

CRC, part 6

Applying the same idea repeatedly will lead us to the lowestdegree polynomial that is equivalent to x50.

The result:

x50 ≡ x30 + x18 + x13 + x8 + x3

Page 83: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

CRC, part 6

Applying the same idea repeatedly will lead us to the lowestdegree polynomial that is equivalent to x50.

The result:

x50 ≡ x30 + x18 + x13 + x8 + x3

Page 84: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

CRC, part 7

More samples:

x50 ≡ x30 + x18 + x13 + x8 + x3

x50 + 1 ≡ x30 + x18 + x13 + x8 + x3 + 1x51 ≡ x31 + x19 + x14 + x9 + x4

x51 + x50 ≡ x31 + x30 + x19 + x18 + x14 + x13 + x9 + x8 + x4 + x3

x51 + x31 ≡ x19 + x14 + x9 + x4

Page 85: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

CRC in Practice

I There are thousands of CRC implementationsI We’ll focus on those that use _mm_crc32_u64() orcrc32q

I The inputs are a 32-bit number and a 64-bit numberI The output is a 32-bit number

Page 86: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

What is crc32q?

crc32q for inputs u and v returnsC(u xor v) = F (E(D(u xor v))).

D(0) = 0,D(1) = x95,D(2) = x94,D(3) = x95 + x94,D(4) =x93, . . .

E maps a polynomial to the equivalent with lowest-degree.

F (0) = 0,F (x31) = 1,F (x30) = 2,F (x31 + x30) = 3,F (x29) =4, . . .

Page 87: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

How is crc32q used?

C operates on 64 bits of input, so:

For a 64-bit input, use C(seed,u0).

For a 128-bit input, use C(C(seed,u0),u1).

For a 192-bit input, use C(C(C(seed,u0),u1),u2).

Page 88: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

How is crc32q used?

C operates on 64 bits of input, so:

For a 64-bit input, use C(seed,u0).

For a 128-bit input, use C(C(seed,u0),u1).

For a 192-bit input, use C(C(C(seed,u0),u1),u2).

Page 89: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

How is crc32q used?

C operates on 64 bits of input, so:

For a 64-bit input, use C(seed,u0).

For a 128-bit input, use C(C(seed,u0),u1).

For a 192-bit input, use C(C(C(seed,u0),u1),u2).

Page 90: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

C as matrix-vector multiplication

A 32× 64 matrix times a 64× 1 vector yields a 32× 1 result.

The matrix and vectors contain elements of GF(2):

Page 91: CityHash: Fast Hash Functions for Strings · I Hash function loops over the input I While looping, the internal state is kept in registers I In each iteration, ... CityHash: Fast

C as matrix-vector multiplication

A 32× 64 matrix times a 64× 1 vector yields a 32× 1 result.

The matrix and vectors contain elements of GF(2):