CSE 474 Simulation Modeling | MUSHFIQUR ROUF [email protected] http://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474 CSE474: Simulation and Modeling Generating Random Numbers Mushfiqur Rouf [email protected]
Feb 25, 2016
CSE 474 Simulation Modeling | MUSHFIQUR ROUF [email protected]://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
CSE474: Simulation and Modeling
Generating Random Numbers
Mushfiqur [email protected]
CSE 474 Simulation Modeling | MUSHFIQUR ROUF [email protected]://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Initial trials• Throwing dice• Dealing out cards• Drawing numbered balls• Counting gamma rays• Using digits in an expansion of p• Picking numbers randomly from a phonebook• Using randomly pulsating vacuum tubes
CSE 474 Simulation Modeling | MUSHFIQUR ROUF [email protected]://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Importance of U(0,1)• Easy generation• Generation of variates from all other
distributions• Realization of random processes
CSE 474 Simulation Modeling | MUSHFIQUR ROUF [email protected]://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Arithmetic Generators• Generated series is not random
– Given the seed, the whole series is known.• Call it ‘pseudorandom’• Benefits
– Has low memory requirement– Fast
CSE 474 Simulation Modeling | MUSHFIQUR ROUF [email protected]://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Good generator• Produce numbers that appear to be
– Distributed uniformly– Uncorrelated
• Fast and memory efficient• Reproduce
– Debugging– Precise comparison of different model simulation
results• Produce separate streams of numbers
CSE 474 Simulation Modeling | MUSHFIQUR ROUF [email protected]://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Midsquare Method• Start with a four digit
number• Square and take the
middle four digits• Apparently scrambles
well– But Degenerates to zero
and stays there• Doing something strange
may not be useful
Zi Ui Zi 20 7182 - 515811241 5811 0.5811 337677212 7677 0.7677 589363293 9363 0.9363 876657694 6657 0.6657 443156495 3156 0.3156 99603366 9603 0.9603 922176097 2176 0.2176 47349768 7349 0.7349 540078019 78 0.0078 6084
10 60 0.0060 360011 36 0.0036 129612 12 0.0012 14413 1 0.0001 114 0 0.0000 0
CSE 474 Simulation Modeling | MUSHFIQUR ROUF [email protected]://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Linear Congruantial Generators
• Most common, most understood• Recursive formula
Zi = (aZi-1 + c) (mod m)
Ui = Zi/m
Z0 is the seedmultiplierincrementmodulus
CSE 474 Simulation Modeling | MUSHFIQUR ROUF [email protected]://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Objection 1: Pseudorandomness
maacZaZ
ii
i mod11
0
• All Zis are completely determined
CSE 474 Simulation Modeling | MUSHFIQUR ROUF [email protected]://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Objection 2: Limited Values• Can take only the rational values
– 0, 1/m, 2/m, … , (m-1)/m
• P(Ui<1/m) = 0 – but should be > 0– Solution: Use large m.
• Might take only a fraction of these values.– Depends on a-c-m combination.
CSE 474 Simulation Modeling | MUSHFIQUR ROUF [email protected]://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Objection 3: Looping behaviour
• If Z100 = Z50 then Z101 = Z51
– So a cycle starts.• Cycle length is called ‘period’• There are only m possible values.
– Max possible period is m.– If an LCG with “full period” has period = m
• Period depends on a-c-m
CSE 474 Simulation Modeling | MUSHFIQUR ROUF [email protected]://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
LCGFull period
– Iff all of the following hold:• m, c are relatively prime• If prime p divides m, then p divides a-1• If 4 divides m, then 4 divides a-1
• Multiple Streams Support– If streams of size 100000 needed,
• Use seeds Z0, Z100000, Z200000, …
CSE 474 Simulation Modeling | MUSHFIQUR ROUF [email protected]://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Mixed LCG• c > 0• To avoid explicit division, use m = 2b and
utilize integer overflow• Optimized ‘a’
a = 2l + 1– Then the multiplication is just shift and add:
Zi = 2l x Zi–1 + Zi–1
– But has poor statistical properties
CSE 474 Simulation Modeling | MUSHFIQUR ROUF [email protected]://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Multiplicative LCG• c = 0• Cannot have full period since m, c cannot be
relatively prime.– Period m – 1 possible
• To avoid explicit division: m = 2b
– Period is at most 2b–2
– Then we don’t know how uniform the numbers are– NOT a good idea
CSE 474 Simulation Modeling | MUSHFIQUR ROUF [email protected]://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Multiplicative LCG• “Prime modulus multiplicative LCG”
PMMLCG– Use largest prime < 2b
• Example: 231–1
– a is ‘primitive element modulo m’• am–1 – 1 is divisible by m• No smaller power of a has this property
– Period = m – 1
CSE 474 Simulation Modeling | MUSHFIQUR ROUF [email protected]://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Generalization of LCGZi = g(Zi-1, Zi-2, …) (mod m)
• Quadratic Congruential Generatorg = a’Zi-1
2 + aZi-12 + c
• a’ = a = 1, c = 0, m is a mercenary• Multiple Recursive Generator
g(Zi-1, Zi-2, …) = a1Zi-1 + a2Zi-2 + … + aqZi-q – Huge periods ~ mq – 1
• Fibonacci GeneratorZi = (Zi-1 + Zi-2) (mod m)
CSE 474 Simulation Modeling | MUSHFIQUR ROUF [email protected]://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Composite Generators• Combine two or more separate generators• Longer period, better statistical behavior• Shuffling
– Use a second LCG to shuffle output – Shuffling bad LCG by another bad LCG
may produce better results. – Little to gain by shuffling a good LCG
CSE 474 Simulation Modeling | MUSHFIQUR ROUF [email protected]://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Empirical Tests• Uniformity Test
fj = Number of Ui s in interval j• For large n it will have approximately a chi-sq
distribution with k–1 degrees of freedom.• Reject null hypothesis “Uis are U(0,1)” if
k
jj k
nfnk
1
22
21,1
2 k
CSE 474 Simulation Modeling | MUSHFIQUR ROUF [email protected]://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Empirical Tests• Serial Test• Generalization of chi-square test to higher dimensions• Non overlapping d-tuples
U1 = (U1, U2, …, Ud), U2 = (Ud+1, Ud+2, …, U2d), …should be distributed uniformly in d-dimensional hipercube [0,1]d.
• χ2 will have an approximate chi-square distribution with kd – 1 degrees of freedom.
2
1 1 1,,,
2
1 2
21
k
j
k
j
k
jdjjj
d
d
d knf
nkd
CSE 474 Simulation Modeling | MUSHFIQUR ROUF [email protected]://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Empirical Tests• Runs-up test
– Run-up: a “maximal” monotonically increasing subsequence
• ri = number of runs up of length i <= 5• r6 = number of runs up of length >= 6• For large n, R will have approximate chi-square
distribution with 6 df.
6
1
6
1
1i j
jjiiij nbrnbran
R
CSE 474 Simulation Modeling | MUSHFIQUR ROUF [email protected]://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Tests• Empirical Tests
– Only local test; for only that segment of the cycle.
– Examines the actual random numbers to be used in simulation
• Theoretical Tests– Global test; for the whole cycle.– Cannot examine a particular segment
CSE 474 Simulation Modeling | MUSHFIQUR ROUF [email protected]://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Crystalline structure• Random numbers mainly fall in planes• Overlapping d-tuples
(U1, U2, …, Ud), (U2, U3, …, Ud+1), … fall in a small number of d – 1 dimensional hyper planes passing through the d-dimensional hypercube
• If two such planes are far apart the generator will perform poorly
CSE 474 Simulation Modeling | MUSHFIQUR ROUF [email protected]://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Inverse Transform• How to generate X ~ F?• Generate U ~ U(0, 1)1. Return X = F–1(U)
• P(X<=x) = P(F–1(U)<=x) = P(U<=F(x)) = F(x)
CSE 474 Simulation Modeling | MUSHFIQUR ROUF [email protected]://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Inverse Transform• Intuitive explanation
F(x)
CSE 474 Simulation Modeling | MUSHFIQUR ROUF [email protected]://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Inverse Transform• Discrete Case:1. Generate U ~ U(0, 1)2. Determine the smallest integer I such
that U<=F(xI). Return X = xI
• P(X = xi) = P[F(xi-1) < U <= F(xi)]=F(xi) – F(xi–1) = p(xi)
CSE 474 Simulation Modeling | MUSHFIQUR ROUF [email protected]://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Inverse Transform• Can handle mixed
distributions with – Discrete,– Continuous and– Flat components
• X = min{x: F(x) >= U}
CSE 474 Simulation Modeling | MUSHFIQUR ROUF [email protected]://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Inverse Transform• Disadvantages
– F–1 might not be found in a closed form• Use numerical methods
– May not be the fastest way• Advantages
– Facilitates variance reduction techniques– Easily generates truncated distributions– Useful for generating i-th order statistics (using
beta distribution)
CSE 474 Simulation Modeling | MUSHFIQUR ROUF [email protected]://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Composition• Target distribution is a
‘convex’ combination of other distributions
1. Generate a positive random integer J such that P(J = j) = pj
2. Return X with distribution Fj
• Geometric interpretation1. Select an area2. Select from that distribution
1j
jj xFpxF
xF
pxF
jJPjJxXP
xXP
jjj
j
1
1
|
CSE 474 Simulation Modeling | MUSHFIQUR ROUF [email protected]://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Convolution• X = Y1 + Y2 + … + Ym
• Don’t mix up with ‘Composition’
1. Generate Y1, Y2, …, Ym IID 2. Return X = Y1 + Y2 + … + Ym
• P(X<=x) = P(Y1 + Y2 + … + Ym <= x) = F(x)
CSE 474 Simulation Modeling | MUSHFIQUR ROUF [email protected]://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Acceptance – Rejection • A less direct approach• Requires a function t
that ‘majorizes’ f, ie,t(x) >= f(x) for all x
• Define r(x)=t(x)/c, a density function.
1
dxxfdxxtc
f(x)
t(x)
CSE 474 Simulation Modeling | MUSHFIQUR ROUF [email protected]://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Acceptance – Rejection 1. Generate Y having
density r2. Generate U~U(0, 1),
independent of Y3. If U <= f(Y)/t(Y),
return X = YElse goto step 1
f(x)
t(x)
Intuitively, how does it work?
CSE 474 Simulation Modeling | MUSHFIQUR ROUF [email protected]://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Acceptance – Rejection • Choice of t
1. Calculations must be very efficient
2. Probability of acceptance in step 3 is 1/c. So We need t with small c.
• Conflicting– Tighter functions are
more complicated.
f(x)
t(x)
CSE 474 Simulation Modeling | MUSHFIQUR ROUF [email protected]://groups.google.com/group/cse474spring07/ http://faculty.bu.ac.bd/~rouf/cse474
Acceptance – Rejection
f(x)
t(x)
f(x)
t(x)