Top Banner
Uncertainty and Probability MLAI: Week 1 Neil D. Lawrence Department of Computer Science Sheeld University 29th September 2015
55

Uncertainty and Probability

Dec 04, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Uncertainty and Probability

Uncertainty and Probability

MLAI: Week 1

Neil D. Lawrence

Department of Computer ScienceSheffield University

29th September 2015

Page 2: Uncertainty and Probability

Outline

Course Text

Review: Basic Probability

Page 3: Uncertainty and Probability

Rogers and Girolami

Page 4: Uncertainty and Probability

Bishop

Page 5: Uncertainty and Probability

What is Machine Learning?

data

+ model = prediction

I data: observations, could be actively or passively acquired(meta-data).

I model: assumptions, based on previous experience (otherdata! transfer learning etc), or beliefs about the regularitiesof the universe. Inductive bias.

I prediction: an action to be taken or a categorization or aquality score.

Page 6: Uncertainty and Probability

What is Machine Learning?

data +

model = prediction

I data: observations, could be actively or passively acquired(meta-data).

I model: assumptions, based on previous experience (otherdata! transfer learning etc), or beliefs about the regularitiesof the universe. Inductive bias.

I prediction: an action to be taken or a categorization or aquality score.

Page 7: Uncertainty and Probability

What is Machine Learning?

data + model

= prediction

I data: observations, could be actively or passively acquired(meta-data).

I model: assumptions, based on previous experience (otherdata! transfer learning etc), or beliefs about the regularitiesof the universe. Inductive bias.

I prediction: an action to be taken or a categorization or aquality score.

Page 8: Uncertainty and Probability

What is Machine Learning?

data + model =

prediction

I data: observations, could be actively or passively acquired(meta-data).

I model: assumptions, based on previous experience (otherdata! transfer learning etc), or beliefs about the regularitiesof the universe. Inductive bias.

I prediction: an action to be taken or a categorization or aquality score.

Page 9: Uncertainty and Probability

What is Machine Learning?

data + model = prediction

I data: observations, could be actively or passively acquired(meta-data).

I model: assumptions, based on previous experience (otherdata! transfer learning etc), or beliefs about the regularitiesof the universe. Inductive bias.

I prediction: an action to be taken or a categorization or aquality score.

Page 10: Uncertainty and Probability

y = mx + c

Page 11: Uncertainty and Probability

0

1

2

3

4

5

0 1 2 3 4 5

y

x

y = mx + c

Page 12: Uncertainty and Probability

0

1

2

3

4

5

0 1 2 3 4 5

y

x

y = mx + cc

m

Page 13: Uncertainty and Probability

0

1

2

3

4

5

0 1 2 3 4 5

y

x

y = mx + cc

m

Page 14: Uncertainty and Probability

0

1

2

3

4

5

0 1 2 3 4 5

y

x

y = mx + cc

m

Page 15: Uncertainty and Probability

0

1

2

3

4

5

0 1 2 3 4 5

y

x

y = mx + c

Page 16: Uncertainty and Probability

0

1

2

3

4

5

0 1 2 3 4 5

y

x

y = mx + c

Page 17: Uncertainty and Probability

0

1

2

3

4

5

0 1 2 3 4 5

y

x

y = mx + c

Page 18: Uncertainty and Probability

y = mx + c

point 1: x = 1, y = 3

3 = m + c

point 2: x = 3, y = 1

1 = 3m + c

point 3: x = 2, y = 2.5

2.5 = 2m + c

Page 19: Uncertainty and Probability
Page 20: Uncertainty and Probability
Page 21: Uncertainty and Probability
Page 22: Uncertainty and Probability

6 A PHILOSOPHICAL ESSAY ON PROBABILITIES.

height: "The day will come when, by study pursued

through several ages, the things now concealed will

appear with evidence; and posterity will be astonished

that truths so clear had escaped us.' '

Clairaut then

undertook to submit to analysis the perturbations which

the comet had experienced by the action of the two

great planets, Jupiter and Saturn; after immense cal-

culations he fixed its next passage at the perihelion

toward the beginning of April, 1759, which was actually

verified by observation. The regularity which astronomyshows us in the movements of the comets doubtless

exists also in all phenomena. -

The curve described by a simple molecule of air or

vapor is regulated in a manner just as certain as the

planetary orbits;the only difference between them is

that which comes from our ignorance.

Probability is relative, in part to this ignorance, in

part to our knowledge. We know that of three or a

greater number of events a single one ought to occur;

but nothing induces us to believe that one of them will

occur rather than the others. In this state of indecision

it is impossible for us to announce their occurrence with

certainty. It is, however, probable that one of these

events, chosen at will, will not occur because we see

several cases equally possible which exclude its occur-

rence, while only a single one favors it.

The theory of chance consists in reducing all the

events of the same kind to a certain number of cases

equally possible, that is to say, to such as we may be

equally undecided about in regard to their existence,and in determining the number of cases favorable to

the event whose probability is sought. The ratio of

Page 23: Uncertainty and Probability

y = mx + c + ε

point 1: x = 1, y = 3

3 = m + c + ε1

point 2: x = 3, y = 1

1 = 3m + c + ε2

point 3: x = 2, y = 2.5

2.5 = 2m + c + ε3

Page 24: Uncertainty and Probability

Outline

Course Text

Review: Basic Probability

Page 25: Uncertainty and Probability

Probability Review I

I We are interested in trials which result in two randomvariables, X and Y, each of which has an ‘outcome’denoted by x or y.

I We summarise the notation and terminology for thesedistributions in the following table.

Terminology Notation DescriptionJoint P

(X = x,Y = y

)‘The probability that

Probability X = x and Y = y’Marginal P (X = x) ‘The probability that

Probability X = x regardless of Y’Conditional P

(X = x|Y = y

)‘The probability that

Probability X = x given that Y = y’

Table: The different basic probability distributions.

Page 26: Uncertainty and Probability

A Pictorial Definition of Probability

1

2

3

4

1 2 3 4 5 6

nY=4 nX=5nX=3,Y=4

N crosses total

X

Y

Figure: Representation of joint and conditional probabilities.

Page 27: Uncertainty and Probability

Different Distributions

Terminology Definition Notation

Joint limN→∞nX=3,Y=4

N P (X = 3,Y = 4)Probability

Marginal limN→∞nX=5

N P (X = 5)Probability

Conditional limN→∞nX=3,Y=4

nY=4P (X = 3|Y = 4)

Probability

Table: Definition of probability distributions.

Page 28: Uncertainty and Probability

Notational Details

I Typically we should write out P(X = x,Y = y

).

I In practice, we often use P(x, y

).

I This looks very much like we might write a multivariatefunction, e.g. f

(x, y

)= x

y .

I For a multivariate function though, f(x, y

), f

(y, x

).

I However P(x, y

)= P

(y, x

)because

P(X = x,Y = y

)= P

(Y = y,X = x

).

I We now quickly review the ‘rules of probability’.

Page 29: Uncertainty and Probability

Normalization

All distributions are normalized. This is clear from the fact that∑x nx = N, which gives∑

xP (x) =

∑x nx

N=

NN

= 1.

A similar result can be derived for the marginal and conditionaldistributions.

Page 30: Uncertainty and Probability

The Sum Rule

Ignoring the limit in our definitions:

I The marginal probability P(y)

isny

N (ignoring the limit).

I The joint distribution P(x, y

)is

nx,y

N .I ny =

∑x nx,y so

ny

N=

∑x

nx,y

N,

in other wordsP(y)

=∑

xP(x, y

).

This is known as the sum rule of probability.

Page 31: Uncertainty and Probability

The Product Rule

I P(x|y

)is

nx,y

ny.

I P(x, y

)is

nx,y

N=

nx,y

ny

ny

N

or in other words

P(x, y

)= P

(x|y

)P(y).

This is known as the product rule of probability.

Page 32: Uncertainty and Probability

Bayes’ Rule

I From the product rule,

P(y, x

)= P

(x, y

)= P

(x|y

)P(y),

soP(y|x

)P (x) = P

(x|y

)P(y)

which leads to Bayes’ rule,

P(y|x

)=

P(x|y

)P(y)

P (x).

Page 33: Uncertainty and Probability

Bayes’ Theorem Example

I There are two barrels in front of you. Barrel One contains20 apples and 4 oranges. Barrel Two other contains 4apples and 8 oranges. You choose a barrel randomly andselect a fruit. It is an apple. What is the probability that thebarrel was Barrel One?

Page 34: Uncertainty and Probability

Bayes’ Theorem Example: Answer I

I We are given that:

P(F = A|B = 1) =20/24P(F = A|B = 2) =4/12

P(B = 1) =0.5P(B = 2) =0.5

Page 35: Uncertainty and Probability

Bayes’ Theorem Example: Answer II

I We use the sum rule to compute:

P(F = A) =P(F = A|B = 1)P(B = 1)+ P(F = A|B = 2)P(B = 2)

=20/24 × 0.5 + 4/12 × 0.5 = 7/12

I And Bayes’ theorem tells us that:

P(B = 1|F = A) =P(F = A|B = 1)P(B = 1)

P(F = A)

=20/24 × 0.5

7/12= 5/7

Page 36: Uncertainty and Probability

Bayes’ Theorem Example: Answer II

I We use the sum rule to compute:

P(F = A) =P(F = A|B = 1)P(B = 1)+ P(F = A|B = 2)P(B = 2)

=20/24 × 0.5 + 4/12 × 0.5 = 7/12

I And Bayes’ theorem tells us that:

P(B = 1|F = A) =P(F = A|B = 1)P(B = 1)

P(F = A)

=20/24 × 0.5

7/12= 5/7

Page 37: Uncertainty and Probability

Reading & Exercises

Before Friday, review the example on Bayes Theorem!

I Read and understand Bishop on probability distributions:page 12–17 (Section 1.2).

I Complete Exercise 1.3 in Bishop.

Page 38: Uncertainty and Probability

Distribution Representation

I We can represent probabilities as tables

y 0 1 2P(y)

0.2 0.5 0.3

Page 39: Uncertainty and Probability

0

0.1

0.2

0.3

0.4

0.5

0.6

0 1 2

P(y)

y

Figure: Histogram representation of the simple distribution.

Page 40: Uncertainty and Probability

Expectations of Distributions

I Writing down the entire distribution is tedious.I Can summarise through expectations.⟨

f (y)⟩

P(y) =∑

yf (y)p(y)

I Consider:y 0 1 2

P(y)

0.2 0.5 0.3I We have

⟨y⟩

P(y) = 0.2 × 0 + 0.5 × 1 + 0.3 × 2 = 1.1I This is the first moment or mean of the distribution.

Page 41: Uncertainty and Probability

0

0.1

0.2

0.3

0.4

0.5

0.6

0 1 2

P(y)

y

Figure: Histogram representation of the simple distribution includingthe expectation of y (red line), the mean of the distribution.

Page 42: Uncertainty and Probability

Variance and Standard Deviation

I Mean gives us the centre of the distribution.I Consider:

y 0 1 2y2 0 1 4

P(y)

0.2 0.5 0.3

I Second moment is⟨y2

⟩P(y)

= 0.2 × 0 + 0.5 × 1 + 0.3 × 4 = 1.7

I Variance is⟨y2

⟩−

⟨y⟩2 = 1.7 − 1.1 × 1.1 = 0.49

I Standard deviation is square root of variance.I Standard deviation gives us the “width” of the

distribution.

Page 43: Uncertainty and Probability

0

0.1

0.2

0.3

0.4

0.5

0.6

0 1 2

P(y)

y

Figure: Histogram representation of the simple distribution includinglines at one standard deviation from the mean of the distribution(magenta lines).

Page 44: Uncertainty and Probability

Expectation Computation Example

I Consider the following distribution.

y 1 2 3 4P(y)

0.3 0.2 0.1 0.4I What is the mean of the distribution?

I What is the standard deviation of the distribution?I Are the mean and standard deviation representative of the

distribution form?I What is the expected value of − log P(y)?

Page 45: Uncertainty and Probability

Expectation Computation Example

I Consider the following distribution.

y 1 2 3 4P(y)

0.3 0.2 0.1 0.4I What is the mean of the distribution?I What is the standard deviation of the distribution?

I Are the mean and standard deviation representative of thedistribution form?

I What is the expected value of − log P(y)?

Page 46: Uncertainty and Probability

Expectation Computation Example

I Consider the following distribution.

y 1 2 3 4P(y)

0.3 0.2 0.1 0.4I What is the mean of the distribution?I What is the standard deviation of the distribution?I Are the mean and standard deviation representative of the

distribution form?

I What is the expected value of − log P(y)?

Page 47: Uncertainty and Probability

Expectation Computation Example

I Consider the following distribution.

y 1 2 3 4P(y)

0.3 0.2 0.1 0.4I What is the mean of the distribution?I What is the standard deviation of the distribution?I Are the mean and standard deviation representative of the

distribution form?I What is the expected value of − log P(y)?

Page 48: Uncertainty and Probability

Expectations Example: Answer

I We are given that:

y 1 2 3 4P(y)

0.3 0.2 0.1 0.4y2 1 4 9 16

− log(P(y)) 1.204 1.609 2.302 0.916I Mean: 1 × 0.3 + 2 × 0.2 + 3 × 0.1 + 4 × 0.4 = 2.6I Second moment: 1 × 0.3 + 4 × 0.2 + 9 × 0.1 + 16 × 0.4 = 8.4I Variance: 8.4 − 2.6 × 2.6 = 1.64I Standard deviation:

√1.64 = 1.2806

I Expectation − log(P(y)):0.3 × 1.204 + 0.2 × 1.609 + 0.1 × 2.302 + 0.4 × 0.916 = 1.280

Page 49: Uncertainty and Probability

Sample Based Approximation Example

I You are given the following values samples of heights ofstudents,

i 1 2 3 4 5 6yi 1.76 1.73 1.79 1.81 1.85 1.80

I What is the sample mean?

I What is the sample variance?I Can you compute sample approximation expected value of− log P(y)?

I Actually these “data” were sampled from a Gaussian withmean 1.7 and standard deviation 0.15. Are your estimatesclose to the real values? If not why not?

Page 50: Uncertainty and Probability

Sample Based Approximation Example

I You are given the following values samples of heights ofstudents,

i 1 2 3 4 5 6yi 1.76 1.73 1.79 1.81 1.85 1.80

I What is the sample mean?I What is the sample variance?

I Can you compute sample approximation expected value of− log P(y)?

I Actually these “data” were sampled from a Gaussian withmean 1.7 and standard deviation 0.15. Are your estimatesclose to the real values? If not why not?

Page 51: Uncertainty and Probability

Sample Based Approximation Example

I You are given the following values samples of heights ofstudents,

i 1 2 3 4 5 6yi 1.76 1.73 1.79 1.81 1.85 1.80

I What is the sample mean?I What is the sample variance?I Can you compute sample approximation expected value of− log P(y)?

I Actually these “data” were sampled from a Gaussian withmean 1.7 and standard deviation 0.15. Are your estimatesclose to the real values? If not why not?

Page 52: Uncertainty and Probability

Sample Based Approximation Example

I You are given the following values samples of heights ofstudents,

i 1 2 3 4 5 6yi 1.76 1.73 1.79 1.81 1.85 1.80

I What is the sample mean?I What is the sample variance?I Can you compute sample approximation expected value of− log P(y)?

I Actually these “data” were sampled from a Gaussian withmean 1.7 and standard deviation 0.15. Are your estimatesclose to the real values? If not why not?

Page 53: Uncertainty and Probability

Sample Based Approximation Example: Answer

I We can compute:

i 1 2 3 4 5 6yi 1.76 1.73 1.79 1.81 1.85 1.80y2

i 3.0976 2.9929 3.2041 3.2761 3.4225 3.2400

I Mean: 1.76+1.73+1.79+1.81+1.85+1.806 = 1.79

I Second moment:3.0976+2.9929+3.2041+3.2761+3.4225+3.2400

6 = 3.2055I Variance: 3.2055 − 1.79 × 1.79 = 1.43 × 10−3

I Standard deviation: 0.0379I No, you can’t compute it. You don’t have access to P(y)

directly.

Page 54: Uncertainty and Probability

Reading

I See probability review at end of slides for reminders.I Read and understand Rogers and Girolami on:

1. Section 2.2 (pg 41–53).2. Section 2.4 (pg 55–58).3. Section 2.5.1 (pg 58–60).4. Section 2.5.3 (pg 61–62).

I For other material in Bishop read:1. Probability densities: Section 1.2.1 (Pages 17–19).2. Expectations and Covariances: Section 1.2.2 (Pages 19–20).3. The Gaussian density: Section 1.2.4 (Pages 24–28) (don’t

worry about material on bias).4. For material on information theory and KL divergence try

Section 1.6 & 1.6.1 of Bishop (pg 48 onwards).I If you are unfamiliar with probabilities you should

complete the following exercises:1. Bishop Exercise 1.72. Bishop Exercise 1.83. Bishop Exercise 1.9

Page 55: Uncertainty and Probability

References I

C. M. Bishop. Pattern Recognition and Machine Learning.Springer-Verlag, 2006. [Google Books] .

P. S. Laplace. Essai philosophique sur les probabilites. Courcier, Paris, 2ndedition, 1814. Sixth edition of 1840 translated and repreinted (1951)as A Philosophical Essay on Probabilities, New York: Dover; fifthedition of 1825 reprinted 1986 with notes by Bernard Bru, Paris:Christian Bourgois Editeur, translated by Andrew Dale (1995) asPhilosophical Essay on Probabilities, New York:Springer-Verlag.

S. Rogers and M. Girolami. A First Course in Machine Learning. CRCPress, 2011. [Google Books] .