K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology

Photo-realistic Rendering and Global Illumination in Computer Graphics

Spring 2012

Monte Carlo Method

K. H. Ko

School of MechatronicsGwangju Institute of Science and Technology

2

Brief History Comte de Buffon in 1677

He conducted an experiment in which a needle of length L was thrown at random on a horizontal plane with lines drawn at a distance d apart (d > L).

He repeated the experiment many times to estimate the probability P that the needle would intersect one of these lines.

Laplace suggested that this technique of repeated experimentation could be used to compute an estimated value of pi.

The term “Monte Carlo” was coined in the 1940s, at the advent of electronic computing, to describe mathematical techniques that use statistical sampling to simulate phenomena or evaluate values of functions. These techniques were originally devised to simulate neutron

transport by a group of scientists working on nuclear weapons.

3

Why are Monte Carlo Techniques Useful? Overall steps of Monte Carlo Techniques

Given a problem of computing the value of the integration of a function with respect to an appropriately defined measure over a domain.

The Monte Carlo approach would be to define a random variable such that the expected value of that random variable would be the solution to the problem.

Samples of this random variable are then drawn and averaged to compute an estimate of the expected value of the random variable.

This estimated expected value is an approximation to the solution of the given problem.

4

Why are Monte Carlo Techniques Useful? Advantages

The conceptual simplicity. Given an appropriate random variable, the computation

consists of sampling the random variable and averaging the estimates obtained from the sample.

Can be applied to a wide range of problems Problems that are stochastic in nature: transport

problems in nuclear physics Problems that require the higher-dimensional integration

of complicated functions. Often Monte Carlo techniques are the only feasible solution.

5

Why are Monte Carlo Techniques Useful? Disadvantage

Relative slow convergence rate. 1/sqrt(N), N is the number of samples.

Several variance reduction techniques to accelerate the convergence have been proposed.

However, they are not used unless there are no viable alternatives.

BUT!!! There are problems for which Monte Carlo methods are the only feasible solution technique: Higher-dimensional integrals and integrals with

nonsmooth integrands.

6

Review of Probability Theory A Monte Carlo process is a sequence

of random events.A numerical outcome can be associated

with each possible event. When a fair die is thrown, the outcome could

be any value from1 to 6. A random variable describes the possible

outcomes of an experiment.

7

Review of Probability Theory Discrete Random Variables

When a random variable can take a finite number of possible values, it is called a discrete random variable.

A probability pi can be associated with any event with outcome xi.

A random variable xdie might be said to have a value of 1 to 6 associated with each of the possible outcomes of the throw of the die.

The probability pi associated with each outcome for a fair die is 1/6.

8

Review of Probability Theory Discrete Random Variables

Some properties of the probabilities pi are: The probability of an event lies between 0 and 1:

0≤pi≤1. If an outcome never occurs, its probability is 0. If an event always occurs, its probability is 1.

The probability that either of two events occurs is: Pr(Event1 or Event2) ≤ Pr(Event1) + Pr(Event2) Two events are mutually exclusive if and only if the

occurrence of one of the events implies the other event cannot possibly occur.

Pr(Event1 or Event2) = Pr(Event1) + Pr(Event2) A set of all the possible events/outcomes of an

experiment such that the events are mutually exclusive and collectively exhaustive satisfies the following normalization property: Σi pi = 1.

9

Review of Probability Theory Expected Value

For a discrete random variable with n possible outcomes, the expected value, or mean of the random variable is

n

iiixpxE

1

)(

10

Review of Probability Theory Variance and Standard Deviation

The variance is a measure of the deviation of the outcomes from the expected value of the random variable.

The standard deviation is the square root of the variance.

iii

iii

iii

pxpxxExExExE

or

pxExxExE

222222

222

)(])[(][]])[[(

])[(]])[[(

11

Review of Probability Theory Functions of Random Variables

Consider a function f(x), where x takes values xi with probabilities pi.

x is a random variable. f(x) is also a random variable whose expected value or mean is defined as

n

iii xfpxfE

1

)()]([

12

Review of Probability Theory Functions of Random Variables

The variance of the function f(x) is defined similarly as

])])([)([( 22 xfExfE

13

Review of Probability Theory Continuous Random Variables

Probability Distribution Function For a real-valued (continuous) random variable x, a

probability density function (PDF) p(x) is defined such that the probability that the variable takes a value x in the interval [x,x+dx] equals p(x)dx.

Cumulative Distribution Function (CDF) It provides a more intuitive definition of probabilities

for continuous variables.

ydxxpyxyP )()Pr()(

14

Review of Probability Theory Continuous Random Variables

The CDF gives the probability with which an event occurs with an outcome whose value is less than or equal to the value y.

The CDF P(y) is a nondecreasing function. The CDF P(y) is non-negative over the domain

of the random variable.

15

Review of Probability Theory The PDF p(x) has the following properties:

16

Review of Probability Theory Expected Value

Similar to the discrete-valued case, the expected value of a continuous random variable x is given as:

Consider some function f(x), where p(x) is the probability distribution function of the random variable x.

Since f(x) is also a random variable, its expected value is

dxxxpxE )(][

dxxpxfxfE )()()]([

17

Review of Probability Theory Variance and Standard Deviation

dxxpxExxExE )(])[(]])[[( 222

22222 ))(()(])[(][ dxxxpdxxpxxExE

18

Review of Probability Theory Conditional and Marginal Probabilities

Consider a pair of random variables x and y. For discrete random variables, pij specifies

the probability that x takes a value of xi and y takes a value of yj.

Similarly, a joint probability distribution function p(x,y) is defined for continuous random variables.

19


The marginal density function of x is defined as

The conditional density function p(y|x) is the probability of y given some x;

dyyxpxp ),()(

dyyxpyxp

xpyxpxyp

),(),(

)(),()|(

20


The conditional expectation of a random function g(x,y) is computed as:

dyyxp

dyyxpyxgdyxypyxgxgE

),(

),(),()|(),(]|[

21

Monte Carlo Integration Assume that we have some function f(x)

defined over the domain x∈[a,b]. We would like to evaluate the integral

For one-dimensional integration, Monte Carlo is typically not used.

b

adxxfI )(

22

Monte Carlo Integration Weighted Sum of Random Variables

Consider a function G that is the weighted sum of N random variables g(x1),…,g(xN).

Each of the xi has the same probability distribution function p(x).

xi : independent identically distributed variables Let gi(x) denote the function g(xi):

N

jjjgwG

1

23


The linearity property holds:

Consider the case where the weights wj are the same and all add to 1. When N functions are added together, wj=1/N:

j

jj xgEwxGE )]([)]([

24


The expected value of G(x) is The expected value of

G is the same as the expected value of g(x).

G can be used to estimate the expected value of g(x).

G is called an estimator of the expected value of the function g(x).

25


The variance of G is

Variance, in general, satisfies the following equation, with the covariance Cov[x,y] given as

For independent random variables, Cov[x,y] = 0

N

i

i

NxgxG

1

22 )()]([

],[2][][][ 222 yxCovyxyx

][][][],[ yExExyEyxCov

26


The following property holds for any constant a:

Using the fact that the xi in G are independent identically distributed variables, the variance for G is

][][ 222 xaax

N

i

i

NxgxG

1

22 ])([)]([

27


So,

28


As N increases, the variance of G decreases with N, making G an increasingly good estimator of E[g(x)].

The standard deviation σ decreases as sqrt(N).

29

Monte Carlo Integration Estimator

The Monte Carlo approach to computing the integral is to consider N samples to estimate the value of the integral.

The samples are selected randomly over the domain of the integral with probability distribution function p(x).

The estimator is denoted as <I> and is

N

i i

i

xpxf

NI

1 )()(1

30


The expected value of the estimator is computed as follows:

31


The variance of this estimator is

As N increases, the variance decreases linearly with N. The error in the estimator is proportional to the standard

deviation σ. The standard deviation decreases as sqrt(N).

One problem with Monte Carlo is the slow convergence of the estimator to the right solution. Four times more samples are required to decrease the

error of the Monte Carlo computation by half.

dxxpI

xpxf

N)(

)()(1 22

32

Monte Carlo Integration Example of Simple Monte Carlo

Integration

33

Monte Carlo Integration Bias

When the expected value of the estimator is exactly the value of the integral I, the estimator is said to be unbiased.

An estimator that does not satisfy this property is said to be biased.

The difference between the expected value of the estimator and the actual value of the integral is called bias: B[<I>] = E[<I>] – I.

The total error on the estimate is typically represented as the sum of the standard deviation and the bias.

34

Monte Carlo Integration Bias

A biased estimator is called consistent if the bias vanishes as the number of samples increases. limN->∞ B[<I>] = 0.

35

Monte Carlo Integration Accuracy

There exist two theorems which explain how the error of the Monte Carlo estimator reduces as the number of samples increases.

These error bounds are probabilistic in nature. Chebyshev’s Inequality

The probability that a sample deviates from the solution by a value greater than sqrt(σ2/δ), where δ is an arbitrary positive number, is smaller than δ.

36


Assuming an estimator that averages N samples and has a well-defined variance, the variance of the estimator is

37


The Central Limit Theorem gives an even stronger statement about the accuracy of the estimator.

As N->∞, the Central Limit Theorem states that the values of the estimator have a normal distribution.

Therefore, as N->∞, the computed estimate lies in a narrower region around the expected value of the integral with higher probability.

It only applies when N is large enough. How large N should be is not clear.

38

Monte Carlo Integration Estimating the Variance

The variance for the Monte Carlo estimator is

39

Monte Carlo Integration Deterministic Quadrature versus Monte Carlo

A deterministic quadrature rule to compute a one-dimensional integral could be to compute the sum of the area of regions over the domain.

Extending these deterministic quadrature runes to a d-dimensional integral would require Nd samples.

40

Monte Carlo Integration Multidimensional Monte Carlo

Integration The Monte Carlo integration technique can

be extended to multiple dimensions in a straightforward manner as follows:

41

Monte Carlo Integration Multidimensional Monte Carlo Integration

One of the main strengths of Monte Carlo integration is that it can be extended seamlessly to multiple dimensions.

Monte Carlo techniques permit an arbitrary choice of N as oppose to Nd samples for deterministic quadrature techniques.

Example. Integration over a Hemisphere.

42

Monte Carlo Integration Sampling Random Variables

The Monte Carlo technique is about computing samples from a probability distribution p(x).

Samples should be found such that the distribution of the samples matches p(x).

Inverse Cumulative Distribution Function Rejection Sampling Look-Up Table

43

Monte Carlo Integration Inverse Cumulative Distribution Function

Discrete Random Variables Given a set of probabilities pi, we want to pick xi

with probability pi. The discrete cumulative probability distribution

(CDF) corresponding to the pi is:

i

jii pF

1

44


Discrete Random Variables The selection of samples is done as follows:

Compute a sample u that is uniformly distributed over the domain [0,1).

Output k that satisfies the property:

45


Discrete Random Variables For a uniform PDF, F(a≤u≤b) = (b-a). The probability that the value of u lies between Fk-1

and Fk is Fk-Fk-1 = pk. But this is the probability that k is selected. Therefore, k is selected with probability pk.

46


Continuous Random Variables A sample can be generated according to a given

distribution p(x) by applying the inverse cumulative distribution function of p(x) to a uniformly generated random variable u over the interval [0,1).

The resulting sample is F-1(u). This method of sampling requires the ability to

compute and analytically invert the cumulative probability distribution.

47

Monte Carlo Integration Rejection Sampling

It is often not possible to derive an analytical formula for the inverse of the cumulative distribution function.

Rejection sampling is an alternative.

In rejection sampling, samples are tentatively proposed and tested to determine acceptance or rejection of the sample.

This method raises the dimension of the function being sampled by one and then uniformly samples the bounding box that includes the entire PDF.

This sampling technique yields samples with the appropriate distibution.

48


For a one-dimensional PDF case. The maximum value over the domain [a,b] to be

sampled is M. Rejection sampling raises the dimension of the function

by one and creates a two-dimensional function over [a,b]x[0,M].

This function is then sampled uniformly to compute samples (x,y).

Rejection sampling rejects all samples (x,y) such that p(x) < y.

All other samples are accepted. The distribution of the accepted samples is exactly the

PDF p(x) we want to sample.

49


For a one-dimensional PDF case.

50


For a one-dimensional PDF case. One criticism of rejection sampling is that rejecting

samples could be inefficient. The efficiency of this technique is proportional to

the probability of accepting a proposed sample. This probability is proportional to the ratio of the

area under the function to the area of the box. If this ratio is small, a lot of samples are rejected.

51

Monte Carlo Integration Look-Up Table

It approximates the PDF to be sampled using piecewise linear approximations.

It is not, however, commonly used though it is very useful when the sampled PDF is obtained from measured data.

52

Monte Carlo Integration Variance Reduction

Monte Carlo integration techniques can be roughly subdivided into two categories:

Blind Monte Carlo: those that have no information about the function to be integrated

Informed Monte Carlo: those that do have some kind of information

Informed Monte Carlo methods are able to produce more accurate results as compared to blind Monte Carlo methods.

Designing efficient estimators is a major area of research in Monte Carlo literature.

Reduction of variance is a critical one.

53

Monte Carlo Integration Importance Sampling

It is a technique that uses a nonuniform probability distribution function to generate samples.

The variance of the computation can be reduced by choosing the probability distribution wisely based on information about the function to be integrated.

Given a PDF p(x) defined over the integration domain D, and samples xi, generated according to the PDF, the value of the integral I can be estimated by generating N sample points and computing the weighted mean:

N

i i

i

xpxf

NI

1 )()(1

54


The expected value of the estimator is I The estimator is unbiased.

N

i i

i

xpxf

NI

1 )()(1

55


To determine if the variance of this estimator is better than an estimator using uniform sampling, the variance is estimated.

Clearly, the choice of p(x) affects the value of the variance.

The difficulty of importance sampling is to choose a p(x) such that the variance is minimized.

A perfect estimator would have the variance be zero.

56


The optimal p(x) for the perfect estimator can be found by minimizing the equation of the variance using variational techniques and Lagrange multipliers.

We have to find a scalar λ for which the expression L, a function of p(x), reaches a minimum.

57


The boundary condition is that the integral of p(x) over the integration domain equals 1.

This kind of minimization problem can be solved using the Euler-Lagrange differential equation.

58


To minimize the function, differentiate L(p) with respect to p(x) and solve for the value of p(x) that makes this quantity zero.

59


The constant is a scaling factor, such that p(x) can fulfill the boundary condition.

The optimal p(x) is then given by:

60


If we use this p(x), the variance will be exactly 0 (assuming f(x) does not change sign).

This optimal p(x) requires us to know the value of the integral of f(x).

This value is what we want to compute!!! Clearly, finding the optimal p(x) is not possible. A good importance sampling function matches the

shape of the original function as closely as possible.

61

Monte Carlo Integration Stratified Sampling

One problem with the sampling techniques Samples can be badly distributed over the domain of

integration resulting in a poor approximation of the integral.

This clumping of samples can happen irrespective of the PDF used, because the PDF only tells us something about the expected number of samples in parts of the domain.

Increasing the number of samples collected will eventually address this problem of uneven sample distribution.

Stratified sampling is an alternative of increasing the number of samples to avoid the clumping of samples.

62


The basic idea is to split the integration domain into m disjoint subdomains (called strata).

Then evaluate the integral in each of the subdomains separately with one or more samples.

63


This method often leads to a smaller variance as compared to a blind Monte Carlo integration method.

The variance of a stratified sampling method, where each stratum receives a number of samples nj, which are in turn distributed uniformly over their respective intervals, is equal to

64


If all the strata are of equal size (αj – αj-1 = 1/m) and each stratum contains one uniformly generated sample (nj = 1; N = m), the equation can be given by:

65


The variance obtained using stratified sampling is always smaller than the variance obtained by a pure Monte Carlo sampling scheme.

As a consequence, there is no advantage in generating more than one sample within a single stratum, since a simple equal subdivision of the stratum such that each sample is attributed to a single stratum always yields a better result.

This does not mean that the above sampling scheme always gives us the smallest possible variance.

We did not take into account the size of the strata relative to each other and the number of samples per stratum.

It is not an easy problem to determine how these degrees of freedom can be chosen optimally.

66


It can be proved that the optimal number of samples in one subdomain is proportional to the variance of the function values relative to the average function value in that subdomain.

Applied to the principle of one sample per stratum, this implies that the size of the strata should be chosen such that the function variance is equal in all strata.

Such a sampling strategy assumes prior knowledge of the function in question, which is often not available.

However, such a sampling strategy might be used in an adaptive sampling algorithm.

67


This works well when the number of samples required is known in advance and the dimensionality of the problem is relatively low.

Typically less than 20. For a d-dimensional function, the number of

samples required is Nd. The number of strata required does not scale well

with an increase in the number of dimensions.

68

Monte Carlo Integration N-Rooks or Latin Hypercube Algorithm

The N-Rooks algorithm can keep the number of samples fixed (irrespective of dimensionality).

Consider a two-dimensional function. Stratification of both dimensions would require N2 strata

with one sample per stratum. The N-rooks algorithm addresses this by distributing

N samples evenly among the strata. Each dimension is still subdivided into N subintervals. However, only N samples are needed!!!!!

These samples are distributed such that one sample lies in each subinterval.

69

Monte Carlo Integration N-Rooks or Latin Hypercube Algorithm

Such distribution is achieved by computing permutations of 1..N and letting the ith d-dimensional sample be

In two dimensions, this means that no row or column has more than one sample.

70

Monte Carlo Integration Combining Stratified Sampling and

Importance Sampling These two methods can easily be integrated

with importance sampling The samples computed from a uniform probability

distribution can be stratified. Then these stratified samples are transformed

using the inverse cumulative distribution function. This strategy avoids the clumping of sample,

which at the same time distributing the samples according to the appropriate probability distribution function.

71

Monte Carlo Integration Combining Estimators of Different

Distributions It is useful to combine different sampling

techniques so as to obtain robust solutions that have low variance over a wide range of parameter settings.

The rendering equation consists of the BRDF, the geometry term, the incoming radiance, etc. Each one of these different terms could be used for importance sampling.

However, depending on the material properties or the distribution of objects in a scene, one of these techniques could be more effective than the other.

72


Distributions Using Variance

Consider combining two estimators, <I1> and <I2>, to compute an integral I.

Any linear combination w1<I1> + w2<I2> with constant weights w1+w2=1 will be an estimator for S.

The variance of the linear combination however depends on the weights

73



If <I1> and <I2> are independent, the covariance is zero.

Minimization of the variance expression above allows us to fix the optimal combination weights:

74



The weights can be calculated in two different ways

Using analytical expressions for the variance of the involved estimators.

Using a posteriori estimates for the variances based on the samples in an experiment themselves.

By doing so, a slight bias is introduced. As the number of samples is increased, the bias

vanishes. The combination is asymptotically unbiased or

consistent.

75

Monte Carlo Integration Combining Estimators of Different Distributions

Multiple Importance Sampling Combine different estimators using potentially different

weights for each individual sample, even for samples from the same estimator.

Samples from one estimator could have different weights assigned to them, unlike the approach where the weight depends only on the variance.

The balance heuristic is used to determine the weights that combine these samples from different PDFs provided the weights sum to 1.

The balance heuristic results in an unbiased estimator that provably has variance that differs from the variance of the optimal estimator by an additive error term.

76


Distributions Multiple Importance Sampling

77

Monte Carlo Integration Control Variates

Another technique to reduce variance uses control variates.

Variance could be reduced by computing a function g that can be integrated analytically and subtracted from the original function to be integrated.

78

Monte Carlo Integration Control Variates

Since the integral of the function ∫g(x)dx has been computed analytically, the original integral is estimated by computing an estimator for ∫f(x)-g(x) dx.

If f(x)-g(x) is almost constant, this technique is very effective at decreasing variance.

If f/g is nearly constant, g should be used for importance sampling.

79

Monte Carlo Integration Quasi-Monte Carlo

These techniques decrease the effects of clumping in samples by eliminating randomness completely.

Samples are deterministically distributed as uniformly as possible.

They try to minimize clumping with respect to a measure called the discrepancy.

The most commonly used measure of discrepancy is the star discrepancy measure.

80


Consider a set of points P. Consider each possible axis-aligned box with one

corner at the origin. Given a box of size Bsize, the ideal distribution of points

would have NBsize points. The star discrepancy measure computes how much the

point distribution P deviates from this ideal situation

NumPoints(P∩B) are the number of points from the set P that lie in Box B.

81


The star discrepancy is significant. It is closely related to the error bounds for quasi-

Monte Carlo integration. The Koksma-Hlawka inequality states that the

different between the estimator and the integral to be computed satisfies the condition:

Here the VHK term is the variation in the function f(x). VHK measures how fast the function can change.

82


The important point to take from this inequality is that the error in the MC estimate is directly proportional to the discrepancy of the sample set.

Therefore, much effort has been expended in designing sequences that have low discrepancy.

These sequences are called low-discrepancy sequences (LDS).

Examples of low-discrepancy sequences: Hammersley, Halton, Sobol, etc.

83

Monte Carlo Integration Why Quasi-Monte Carlo?

The error bound for low-discrepancy sequences when applied to MC integration is O((logN)d/N) or )((logN)d-1/N) for large N and dimension d.

This bound could have a substantial potential benefit compared to the 1/sqrt(N) error bounds for pure Monte Carlo techniques.

Low-discrepancy sequences work best for low dimensions (about 10-20).

At higher dimensions, their performance is similar to pseudorandom sampling.

However, as compared to pseudorandom sampling, low-discrepancy sequences are highly correlated.

The difference between successive samples in the van der Corput sequence (a base-2 Halton sequence) is 0.5 half of the time.

The upshot is that low-discrepancy sampling gives up randomness in return for uniformity in the sample distribution.

84

Monte Carlo Integration Comparison of Quasi-MC and MC

QMC is based on low-discrepancy sequences. MC is based on sequences of pseudorandom

numbers.

The accuracy of the quasi-MC method increases faster than that of the MC method.

The advantage of the qMC is greater is the integrand is smooth, and the number of dimensions of the integral is small.

85

Monte Carlo Integration Note on Low discrepancy sequence.

A low-discrepancy sequence is a sequence with the property that for all values of N, its subsequence has a low discrepancy.

For a numerical integration,

If the points are chosen as x = i/N: rectangle rule If the points are chosen to be randomly (or

pseudorandomly) distributed: MC If the points are chosen as elements of a low-

discrepancy sequence: qMC

86

Monte Carlo Integration Note on Low discrepancy sequence.

K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology

Documents

appropriate random variable

random variable xdie

monte carlo techniques

monte carlo methods

monte carlo methodk

term monte carlo

monte carlo approach

sequence of random events