Top Banner
Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley Tzeng Li-Yi Wei Microsoft Research Asia
31

Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft Research Asia.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft Research Asia.

Parallel White Noise Generationon a GPU via Cryptographic HashParallel White Noise Generation

on a GPU via Cryptographic Hash

Stanley Tzeng Li-Yi Wei

Microsoft Research Asia

Page 2: Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft Research Asia.

What is White Noise?What is White Noise?

Spatial domain: uniform random number

Frequency domain: white noise

spatial domain frequency domain

Page 3: Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft Research Asia.

ImportanceImportance

Mother of all random numbers

Commonly used, e.g. rand() in C/C++

Major algorithms sequential

e.g. xn = a xn-1 + b mod c

Processors are becoming parallel

GPU, multi-core CPU, Cell

sequential algorithms cannot leverage that

Page 4: Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft Research Asia.

ContributionContribution

☺Parallel algorithm for white noises

independent evaluation for every sample

easy implementation as a GPU pixel shader

speed faster than sequential algorithms

quality same or better

usage similar to texture mapping

Page 5: Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft Research Asia.

PRNG (Pseudo Random Number Generator)PRNG (Pseudo Random Number Generator)

The main source of randomness in programs

Desirable properties

white noise statistics

repeatable

fast computation

low memory usage

Page 6: Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft Research Asia.

Core IdeaCore Idea

1. input trivially prepared in parallel, e.g. linear ramp

2. feed input value into hash, independently and in parallel

3. output white noise

key idea:

borrow cryptographic hash!

hash

input

output

Page 7: Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft Research Asia.

HashHash

(however nice) input → (unrecognizable) mess

Page 8: Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft Research Asia.

Cryptographic HashCryptographic Hash

A subclass of hash

Commonly used for security applications

e.g. password, digital signature

Properties

irreversible – cannot find input from hash output

decorrelating – similar inputs, dissimilar outputs

uniform probability – all outputs likely to occur

Page 9: Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft Research Asia.

Cryptographic Hash - ExampleCryptographic Hash - Example

irreversible, decorrelating, uniform probability

CHash ("The quick brown fox jumps over the lazy dog") = 9e107d9d372bb6826bd81d3542a419d6

CHash ("The quick brown fox jumps over the lazy eog")

= ffd93f16876049265fbaef4da268dd0e

Page 10: Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft Research Asia.

Cryptographic Hash as a PRNGCryptographic Hash as a PRNG

White noise statistics

CHash is cryptographically secure

Repeatable

CHash is invariant with same input

Fast computation

CHash is parallel + constant cost

Low memory usage

CHash maintains no state

Order-independent i.e. Random accessible

important for parallel GPU applications

hash

Page 11: Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft Research Asia.

Which Cryptographic Hash?Which Cryptographic Hash?

Many options

MD5, SHA, RIPEMD, Tiger, block cipher, etc

Desirable properties

white noise quality

fast computation

power-of-2 aligned (output & operations)

pure pixel shader, no state maintenance

Page 12: Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft Research Asia.

Our Hash of Choice: MD5 [Rivest 1992]Our Hash of Choice: MD5 [Rivest 1992]

128-bit outputs and 32-bit operation

Small number of constants fit entirely in shader

Fastest among those satisfying quality criteria

Not 100% secure [Wang and Yu 2005]

but good enough for our goal

Page 13: Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft Research Asia.

MD5 Algorithm OverviewMD5 Algorithm Overview

InputScrambling

(bit op, table, arithmetic) Outputshift table sin table

64 rounds

Page 14: Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft Research Asia.

Performance Bottlenecks for Pixel ShaderPerformance Bottlenecks for Pixel Shader

InputScrambling

(bit op, table, arithmetic) Outputshift table sin table

64 rounds

Page 15: Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft Research Asia.

Our OptimizationOur Optimization

InputScrambling

(bit op, table, arithmetic) Outputshift table sin table

64 rounds

sin functionreducedshift table

loop unrolling

Page 16: Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft Research Asia.

Previous PRNGPrevious PRNG

GPU

BBS [Blum et al. 1986, Olano 2005]

O extremely fast

X not good quality

CEICG [Entacher et al. 1998, Sussman et al. 2006]

O decent quality

X processing time varies

AES [NIST 2001, Yamanouchi 2007]

O invertible (not hash)

X not good quality

CPU

rand

O commonly used

X not good quality

drand48

O better quality

X slower

Mersenne Twister [Matsumoto and Nishimura 1998]

O high quality and fast

X not random accessible

Page 17: Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft Research Asia.

Assessing Quality: DIEHARD [Marsaglia 1995]Assessing Quality: DIEHARD [Marsaglia 1995]

De facto standard on measuring PRNG quality

Runs 15 different tests on the bits generated

Outputs p-val. If p == 0 || p == 1, fail.

BIRTHDAY SPACINGS TEST, M= 512 N=2**24 LAMBDA= 2.0000 Results for aes.bin

For a sample of size 500: mean aes.bin using bits 1 to 24 2.036

duplicate number number spacings observed expected

0 66. 67.668 1 130. 135.335 2 148. 135.335 3 80. 90.224 4 44. 45.112 5 20. 18.045

6 to INF 12. 8.282 Chisquare with 6 d.o.f. = 4.50 p-value= .391147

Page 18: Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft Research Asia.

Cumulative Distribution FunctionCumulative Distribution Function

Shows how data is distributed within set

Given x in data, what % of data values are ≤ x

0 %

100% 100 %

1X=0 1X=0

0 %

Normal Distribution Uniform Distribution

Page 19: Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft Research Asia.

Kolmogorov-Smirnov TestKolmogorov-Smirnov Test

Determines how two sets of data are alike

Looks at max difference D between distribution functions

100 %

1X=0

0 %

100 %

1X=0

0 %

not alike alike

D

D

Page 20: Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft Research Asia.

Assessing Quality: DIEHARDAssessing Quality: DIEHARD

Run the results of the DIEHARD test (p-value) through a KS-test. Look at D-value.

Uniform Distribution Curve

P-value Curve

D-Value

Cumulative Distribution Function

D Smaller D is better quality!0

100

Page 21: Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft Research Asia.

Assessing Quality: Power SpectrumAssessing Quality: Power Spectrum

Radial mean: should be uniform

Radial variance: should be low & uniform

Power spectrum density Radial mean Radial variance (Anisotropy)

Page 22: Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft Research Asia.

Assessing Speed: Batch RenderingAssessing Speed: Batch Rendering

Clock time to generate random bits

n2 x 128 bits image, n = 512, 1024, 2048 and 4096

n2

n2

Page 23: Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft Research Asia.

Assessing Speed: Texture Subset(For random accessibility)Assessing Speed: Texture Subset(For random accessibility)

A huge virtual texture

clock time for access A B

measure difference

(smaller is better)

220

220

A

B

Page 24: Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft Research Asia.

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

MD5GPU GPU CEICG GPU BBS GPU AES rand drand48 M. Twister

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

DIEHARD TESTS PASSED DIEHARD TEST D-VALUE

Test Results: DIEHARD ResultsTest Results: DIEHARD Results

the higher the better the lower the better

Page 25: Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft Research Asia.

Test Results: Power Spectrum TestsTest Results: Power Spectrum Tests

MD5 M. Twister GPU BBS

Page 26: Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft Research Asia.

Test Results: Batch Render SpeedTest Results: Batch Render Speed

0

10

20

30

40

50

60

MD5CPU MD5GPUref MD5GPUopt GPU CEICG GPU BBS GPU AES rand drand48 M. Twister

fps

512 1024 2048 4096

Page 27: Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft Research Asia.

Test Results: Texture Subset SpeedTest Results: Texture Subset Speed

Texture Subset Difference

3.1

0 0

4.8

0 0

362776257001.9

1

10

100

1000

10000

100000

1000000

MD5CPU MD5GPUref MD5GPUopt GPU CEICG GPU BBS GPU AES rand drand48 M. Twister

(ms)

Page 28: Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft Research Asia.

Trading Quality for SpeedTrading Quality for Speed

Reducing # of rounds

O faster speed

X lower quality

Rounds Time(ms)DIEHARD tests

passedKS D-Val

64 6.3 15/15 0.2029

48 4.7 14/15 0.2042

32 3.1 13/15 0.2295

16 1.6 13/15 0.253

Page 29: Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft Research Asia.

ApplicationsApplications

Fractal terrain

(vertex shader)

Texture tiling

(fragment shader)

Page 30: Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft Research Asia.

Future WorkFuture Work

Implement our method in hardware

very similar to texture unit but much smaller

(no need for cache)

Alternative hashes

ride with advances in cryptographic hash

Page 31: Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft Research Asia.

Thank You!Thank You!