Top Banner
Lecture 3 Probability - Part 2 Luigi Freda ALCOR Lab DIAG University of Rome ”La Sapienza” January 26, 2018 Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 1 / 46
46

Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Aug 03, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Lecture 3Probability - Part 2

Luigi Freda

ALCOR LabDIAG

University of Rome ”La Sapienza”

January 26, 2018

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 1 / 46

Page 2: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Outline

1 Common Continuous Distributions - UnivariateGaussian DistributionDegenerate PDFsStudent’s t DistributionLaplace DistributionGamma DistributionBeta DistributionPareto Distribution

2 Joint Probability Distributions - MultivariateJoint Probability DistributionsJoint CDF and PDFMarginal PDFConditional PDF and IndependenceCovariance and CorrelationCorrelation and IndependenceCommon Multivariate Distributions

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 2 / 46

Page 3: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Outline

1 Common Continuous Distributions - UnivariateGaussian DistributionDegenerate PDFsStudent’s t DistributionLaplace DistributionGamma DistributionBeta DistributionPareto Distribution

2 Joint Probability Distributions - MultivariateJoint Probability DistributionsJoint CDF and PDFMarginal PDFConditional PDF and IndependenceCovariance and CorrelationCorrelation and IndependenceCommon Multivariate Distributions

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 3 / 46

Page 4: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Gaussian (Normal) Distribution

X is a continuous RV with values x ∈ RX ∼ N (µ, σ2), i.e. X has a Gaussian distribution or normal distribution

N (x |µ, σ2) ,1√

2πσ2e− 1

2σ2 (x−µ)2

(= PX (X = x))

mean E[X ] = µ

mode µ

variance var[X ] = σ2

precision λ = 1σ2

(µ− 2σ, µ+ 2σ) is the approx 95% interval

(µ− 3σ, µ+ 3σ) is the approx. 99.7% interval

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 4 / 46

Page 5: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Gaussian (Normal) Distribution

Normal,Bell-shaped Curve

0.13% 0.13%2.14%2.14% 13.59% 13.59%34.13%34.13%

-4σ -3σ -2σ -1σ 0 +1σ +2σ +3σ +4σStandard Deviations

Percentage of cases in 8 portions

of the curve

Cumulative Percentages 0.1% 2.3% 15.9% 50% 84.1% 97.7% 99.9%

Percentiles

Z scores

T scores

Standard Nine (Stanines)

Percentage in Stanine

-3.0 -2.0 -1.0 0 +1.0 +2.0 +3.0

20 30 40 50 60 70 80

1 2 3 4 5 6 7 8 9

+4.0-4.0

4% 7% 12% 17% 20% 17% 12% 7% 4%

1 5 20 30 40 50 60 70 80 90 9510 99

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 5 / 46

Page 6: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Gaussian (Normal) Distribution

why Gaussian distribution is the most widely used in statistics?

1 easy to interpret: just two parameters θ = (µ, σ2)

2 central limit theorem: the sum of independent random variables has anapproximately Gaussian distribution

3 least number of assumptions (maximum entropy) subject to constraints of havingmean = µ and variance = σ2

4 simple mathematical form, easy to manipulate and implement

homework: show that ∫ ∞−∞

1√2πσ2

e− 1

2σ2 (x−µ)2

dx = 1

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 6 / 46

Page 7: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Outline

1 Common Continuous Distributions - UnivariateGaussian DistributionDegenerate PDFsStudent’s t DistributionLaplace DistributionGamma DistributionBeta DistributionPareto Distribution

2 Joint Probability Distributions - MultivariateJoint Probability DistributionsJoint CDF and PDFMarginal PDFConditional PDF and IndependenceCovariance and CorrelationCorrelation and IndependenceCommon Multivariate Distributions

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 7 / 46

Page 8: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Degenerate PDFs

X is a continuous RV with values x ∈ Rconsider a Gaussian distribution with σ2 → 0

δ(x − µ) , limσ2→0N (x |µ, σ2)

δ(x) is the Dirac delta function, with

δ(x) =

{∞ if x = 0

0 if x 6= 0

one has∞∫−∞

δ(x)dx = 1 ( limσ2→0

∞∫−∞N (x |µ, σ2)dx = 1)

sifting property∞∫−∞

f (x)δ(x − µ)dx = f (µ)

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 8 / 46

Page 9: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Outline

1 Common Continuous Distributions - UnivariateGaussian DistributionDegenerate PDFsStudent’s t DistributionLaplace DistributionGamma DistributionBeta DistributionPareto Distribution

2 Joint Probability Distributions - MultivariateJoint Probability DistributionsJoint CDF and PDFMarginal PDFConditional PDF and IndependenceCovariance and CorrelationCorrelation and IndependenceCommon Multivariate Distributions

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 9 / 46

Page 10: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Student’s t Distribution

X is a continuous RV with values x ∈ RX ∼ T (µ, σ2, ν), i.e. X has a Student’s t distribution

T (x |µ, σ2, ν) ∝

[1 +

1

ν

(x − µσ

)2]−( ν+1

2)

(= PX (X = x))

scale parameter σ2 > 0

degrees of freedom ν

mean E[X ] = µ defined if ν > 1

mode µ

variance var[X ] = νσ2

(ν−2)defined if ν > 2

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 10 / 46

Page 11: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Student’s t Distribution

a comparison of N (0, 1), T (0, 1, 1) and Lap(0, 1,√

2)

PDF log(PDF)

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 11 / 46

Page 12: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Student’s t DistributionPros

why should we use the Student distribution?

it is less sensitive to outliers than the Gaussian distribution

without outliers with outliers

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 12 / 46

Page 13: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Outline

1 Common Continuous Distributions - UnivariateGaussian DistributionDegenerate PDFsStudent’s t DistributionLaplace DistributionGamma DistributionBeta DistributionPareto Distribution

2 Joint Probability Distributions - MultivariateJoint Probability DistributionsJoint CDF and PDFMarginal PDFConditional PDF and IndependenceCovariance and CorrelationCorrelation and IndependenceCommon Multivariate Distributions

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 13 / 46

Page 14: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Laplace Distribution

X is a continuous RV with values x ∈ RX ∼ Lap(µ, b), i.e. X has a Laplace distribution

Lap(x |µ, b) ,1

2bexp

(− |x − µ|

b

)(= PX (X = x))

scale parameter b > 0

mean E[X ] = µ

mode µ

variance var[X ] = 2b2

compared to Gaussian distribution, Laplace distribution

is more rubust to outliers (see above)

puts more probability density at µ (see above)

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 14 / 46

Page 15: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Outline

1 Common Continuous Distributions - UnivariateGaussian DistributionDegenerate PDFsStudent’s t DistributionLaplace DistributionGamma DistributionBeta DistributionPareto Distribution

2 Joint Probability Distributions - MultivariateJoint Probability DistributionsJoint CDF and PDFMarginal PDFConditional PDF and IndependenceCovariance and CorrelationCorrelation and IndependenceCommon Multivariate Distributions

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 15 / 46

Page 16: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Gamma Distribution

X is a continuous RV with values x ∈ R+ (x > 0)

X ∼ Ga(a, b), i.e. X has a gamma distribution

Ga(x |a, b) ,ba

Γ(a)xa−1e−xb (= PX (X = x))

shape a > 0

rate b > 0

the gamma function is

Γ(x) ,

∞∫−∞

ux−1e−udu

where Γ(x) = (x − 1)! for x ∈ N and Γ(1) = 1

mean E[X ] = ab

mode a−1b

variance var[X ] = ab2

N.B.: there are several distributions which are just special cases of the Gamma (e.g.exponential, Erlang, Chi-squared)

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 16 / 46

Page 17: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Gamma Distribution

some Ga(a, b = 1) distributions

right: an empirical PDF of some rainfall data

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 17 / 46

Page 18: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Inverse Gamma Distribution

X is a continuous RV with values x ∈ R+ (x > 0)

if X ∼ Ga(a, b), i.e. 1X∼ IG(a, b)

IG(a, b) is the inverse gamma distribution

IG(x |a, b) ,ba

Γ(a)x−(a+1)e−b/x (= PX (X = x))

mean E[X ] = ba−1

(defined if a > 1)

mode ba+1

variance var[X ] = b2

(a−1)2(a−2)(defined if a > 2)

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 18 / 46

Page 19: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Outline

1 Common Continuous Distributions - UnivariateGaussian DistributionDegenerate PDFsStudent’s t DistributionLaplace DistributionGamma DistributionBeta DistributionPareto Distribution

2 Joint Probability Distributions - MultivariateJoint Probability DistributionsJoint CDF and PDFMarginal PDFConditional PDF and IndependenceCovariance and CorrelationCorrelation and IndependenceCommon Multivariate Distributions

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 19 / 46

Page 20: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Beta Distribution

X is a continuous RV with values x ∈ [0, 1]

X ∼ Beta(a, b), i.e. X has a beta distribution

Beta(x |a, b) =1

B(a, b)xa−1(1− x)b−1 (= PX (X = x))

requirements: a > 0 and b > 0

the beta function is

B(a, b) ,Γ(a)Γ(b)

Γ(a + b)

mean E[X ] = aa+b

mode a−1a+b−2

variance var[X ] = ab(a+b)2(a+b+1)

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 20 / 46

Page 21: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Beta Distribution

beta distribution

Beta(x |a, b) =1

B(a, b)xa−1(1− x)b−1 (= PX (X = x))

requirements: a > 0 and b > 0

if a = b = 1 then Beta(x |1, 1) = Unif (x |1, 1) in the interval [0, 1] ⊂ Rthis distribution can be used to represent a prior on a probability value to beestimated

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 21 / 46

Page 22: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Outline

1 Common Continuous Distributions - UnivariateGaussian DistributionDegenerate PDFsStudent’s t DistributionLaplace DistributionGamma DistributionBeta DistributionPareto Distribution

2 Joint Probability Distributions - MultivariateJoint Probability DistributionsJoint CDF and PDFMarginal PDFConditional PDF and IndependenceCovariance and CorrelationCorrelation and IndependenceCommon Multivariate Distributions

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 22 / 46

Page 23: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Pareto Distribution

X is a continuous RV with values x ∈ R+ (x > 0)

X ∼ Pareto(k,m), i.e. X has a Pareto distribution

Pareto(x |k,m) = kmkx−(k+1)I(x ≥ m) (= PX (X = x))

as k →∞ the distribution approaches δ(x)

mean E[X ] = kmk−1

(defined for k > 1)

mode m

variance var[X ] = m2k(k−1)2(k−2)

this distribution is particular useful for its long tail

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 23 / 46

Page 24: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Outline

1 Common Continuous Distributions - UnivariateGaussian DistributionDegenerate PDFsStudent’s t DistributionLaplace DistributionGamma DistributionBeta DistributionPareto Distribution

2 Joint Probability Distributions - MultivariateJoint Probability DistributionsJoint CDF and PDFMarginal PDFConditional PDF and IndependenceCovariance and CorrelationCorrelation and IndependenceCommon Multivariate Distributions

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 24 / 46

Page 25: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Joint Probability Distributions

consider an ensemble of RVs X1, ...,XD

we can define a new RV X , (X1, ...,XD)T

we are now interested in modeling the stochastic relationship betweenX1, ...,XD

in this case x = (x1, ..., xD)T ∈ RD denotes a particular value(instance) of X

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 25 / 46

Page 26: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Outline

1 Common Continuous Distributions - UnivariateGaussian DistributionDegenerate PDFsStudent’s t DistributionLaplace DistributionGamma DistributionBeta DistributionPareto Distribution

2 Joint Probability Distributions - MultivariateJoint Probability DistributionsJoint CDF and PDFMarginal PDFConditional PDF and IndependenceCovariance and CorrelationCorrelation and IndependenceCommon Multivariate Distributions

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 26 / 46

Page 27: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Joint Cumulative Distribution FunctionDefinition

given a continuous RV X with values x ∈ RD

Cumulative Distribution Function (CDF)

F (x) = F (x1, ..., xD) , PX(X ≤ x) = PX(X1 ≤ x1, ...,XD ≤ xD)

properties:

0 ≤ F (x) ≤ 1F (x1, ..., xj , ..., xD) ≤ F (x1, ..., xj + ∆xj , ..., xD) with ∆xj > 0lim∆xj→0+ F (x1, ..., xj + ∆xj , ..., xD) = F (x1, ..., xj , ..., xD) (right-continuity)F (−∞, ...,−∞) = 0F (+∞, ...,+∞) = 1

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 27 / 46

Page 28: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Joint Probability Density FunctionDefinitions

given a continuous RV X with values x ∈ RD

Probability Density Function (PDF)

p(x) = p(x1, ..., xD) ,∂DF

∂x1, ..., ∂xD

we assume the above partial derivative of F exists

properties:

F (x) = PX(X ≤ x) =∫ x1

−∞ ...∫ xD−∞ p(ξ1, ..., ξD)dξ1...dξD

PX(x < X ≤ x + dx) ≈ p(x)dx1...dxD = p(x)dx

PX(a < X ≤ b) =∫ b1

a1...∫ bDaD

p(x)dx1...dxD

N.B.: p(x) acts as a density in the above computations

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 28 / 46

Page 29: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Joint Probability Density FunctionDefinitions

for a discrete RV X we have instead

Probability Mass Function (PMF): p(x) , PX(X = x)

in the above properties we can remove dx and replace integrals with sums

the CDF can be defined as

F (x) , PX(X ≤ x) =∑ξi≤x

p(ξi )

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 29 / 46

Page 30: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Joint PDFSome Properties

reconsider

1 F (x) = PX(X ≤ x) =∫ x1

−∞ ...∫ xD−∞ p(ξ1, ..., ξD)dξ1...dξD

2 PX(x < X ≤ x + dx) ≈ p(x)dx1...dxD = p(x)dx

the first implies∫−∞ ...

∫∞−∞ p(x)dx = 1 (consider (x1, ..., xD)→ (∞, ...,∞)))

the second implies p(x) ≥ 0 for all x ∈ RD

it is possible that p(x) > 1 for some x ∈ RD

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 30 / 46

Page 31: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Outline

1 Common Continuous Distributions - UnivariateGaussian DistributionDegenerate PDFsStudent’s t DistributionLaplace DistributionGamma DistributionBeta DistributionPareto Distribution

2 Joint Probability Distributions - MultivariateJoint Probability DistributionsJoint CDF and PDFMarginal PDFConditional PDF and IndependenceCovariance and CorrelationCorrelation and IndependenceCommon Multivariate Distributions

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 31 / 46

Page 32: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Joint PDFMarginal PDF

suppose we want the PDF of Q , (X1,X2, ...,XD−1)T (we don’t care about XD)

FQ(q) = PQ(Q ≤ q) = PX(X ≤ (q,∞)T ) (XD can take any value in (−∞,∞) )

PX(X ≤ (q,∞)) =∫ x1

−∞ ...∫ xD−1

−∞

∫∞−∞ pX(x)dx1...dxD−1dxD =∫ x1

−∞ ...∫ xD−1

−∞

( ∫∞−∞ pX(x)dxD

)dx1...dxD−1

hence we can define the marginal PDF

pQ(q) = pQ(x1, ..., xD−1) ,∫ ∞−∞

pX(x1, ..., xD)dxD

and one has

FQ(q) = FQ(x1, ..., xD−1) =

∫ x1

−∞...

∫ xD−1

−∞pQ(q)dx1...dxD−1

the above procedure can be also used to marginalize more variables

the above procedure can be used for obtaining a marginal PMF for discretevariables by removing the dx and replacing integrals with sums, i.e.

pQ(q) = pQ(x1, x2, ..., xD−1) ,∑xD

pX(x1, x2, ..., xD)

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 32 / 46

Page 33: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Outline

1 Common Continuous Distributions - UnivariateGaussian DistributionDegenerate PDFsStudent’s t DistributionLaplace DistributionGamma DistributionBeta DistributionPareto Distribution

2 Joint Probability Distributions - MultivariateJoint Probability DistributionsJoint CDF and PDFMarginal PDFConditional PDF and IndependenceCovariance and CorrelationCorrelation and IndependenceCommon Multivariate Distributions

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 33 / 46

Page 34: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Joint PDFConditional PDF and Independence

suppose we want the PDF of Q , (X1,X2, ...,XD−1)T given XD = xD

PQ|XD(q < Q ≤ q + dq | xD < XD ≤ xD + dxD) =

PX(x < X ≤ x + dx)

PxD (xD < XD ≤ xD + dxD)

PX(x < X ≤ x + dx) ≈ pX(x)dx1...dxD

PXD (xD < XD ≤ xD + dxD) ≈ pXD (xD)dxD

hence PQ|XD(q < Q ≤ q + dq | xD < XD ≤ xD + dxD) ≈ pX(x)

pXD (xD)dx1...dxD−1

we can define the conditional PDF

pQ|XD(q|xD) = pQ|XD

(x1, ..., xD−1|xD) ,pX(x)

pXD (xD)

Q and XD are independent ⇐⇒ pX(x) = pQ(q)pxD (xD)

if Q and XD are independent then pQ|XD(q|xD) = pQ(q)

the above definitions can be generalized to define the conditional PDF w.r.t. morevariables

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 34 / 46

Page 35: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Outline

1 Common Continuous Distributions - UnivariateGaussian DistributionDegenerate PDFsStudent’s t DistributionLaplace DistributionGamma DistributionBeta DistributionPareto Distribution

2 Joint Probability Distributions - MultivariateJoint Probability DistributionsJoint CDF and PDFMarginal PDFConditional PDF and IndependenceCovariance and CorrelationCorrelation and IndependenceCommon Multivariate Distributions

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 35 / 46

Page 36: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Covariance

covariance of two RVs X and Y

cov[X ,Y ] , E[(X − E[X ])(Y − E[Y ])] = E[XY ]− E[X ]E[Y ] (= cov[Y ,X ])

if x ∈ Rd , the mean value is

E[x] ,∫ ∞−∞

...

∫ ∞−∞

xp(x)dx =

E[x1]...

E[xD ]

∈ RD

if x ∈ Rd , the covariance matrix is

cov[x] , E[(x− E[x])(x− E[x])T ] =

=

var[X1] cov[X1,X2] . . . cov[X1,Xd ]

cov[X2,X1] var[X2] . . . cov[X2,Xd ]...

.... . .

...cov[Xd ,X1] cov[Xd ,X2] . . . var[Xd ]

∈ RD×D

N.B.: cov[x] = cov[x]T and cov[x] ≥ 0

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 36 / 46

Page 37: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Correlation

correlation coefficient of two RVs X and Y

corr[X ,Y ] ,cov[X ,Y ]√

var[X ] var[Y ]

it can be to shown that −1 ≤ corr[X ,Y ] ≤ 1 (homework1)

corr[X ,Y ] = 0⇐⇒ cov[X ,Y ] = 0

if x ∈ Rd , its correlation matrix is

corr[x] ,

corr[X1,X1] corr[X1,X2] . . . corr[X1,Xd ]corr[X2,X1] corr[X2,X2] . . . corr[X2,Xd ]

......

. . ....

corr[Xd ,X1] corr[Xd ,X2] . . . corr[Xd ,Xd ]

∈ RD×D

N.B.: corr[x] = corr[x]T

1use the fact that( ∫

f (t)g(t)dt)2 ≤

∫f 2dt

∫g 2dt

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 37 / 46

Page 38: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Outline

1 Common Continuous Distributions - UnivariateGaussian DistributionDegenerate PDFsStudent’s t DistributionLaplace DistributionGamma DistributionBeta DistributionPareto Distribution

2 Joint Probability Distributions - MultivariateJoint Probability DistributionsJoint CDF and PDFMarginal PDFConditional PDF and IndependenceCovariance and CorrelationCorrelation and IndependenceCommon Multivariate Distributions

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 38 / 46

Page 39: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Correlation and Independence

Property 1: there is a linear relationship between X and Y iff corr[X ,Y ] = 1, i.e.

corr[X ,Y ] = 1⇐⇒ Y = aX + b

the correlation coefficient represents a degree of linear relationship

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 39 / 46

Page 40: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Correlation and Independence

Property 2: if X and Y are independent (p(X ,Y ) = p(X )p(Y )) thencov[X ,Y ] = 0 and corr[X ,Y ] = 0, i.e. (homework)

X ⊥ Y =⇒ corr[X ,Y ] = 0

Property 3:corr[X ,Y ] = 0 6=⇒ X ⊥ Y

example: with X ∼ U(−1, 1) and Y = X 2 (quadratic dependency) one hascorr[X ,Y ] = 0 (homework)

other examples where corr[X ,Y ] = 0 but there is a cleare dependence between Xand Y

a more general measure of dependence is the mutual information

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 40 / 46

Page 41: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Outline

1 Common Continuous Distributions - UnivariateGaussian DistributionDegenerate PDFsStudent’s t DistributionLaplace DistributionGamma DistributionBeta DistributionPareto Distribution

2 Joint Probability Distributions - MultivariateJoint Probability DistributionsJoint CDF and PDFMarginal PDFConditional PDF and IndependenceCovariance and CorrelationCorrelation and IndependenceCommon Multivariate Distributions

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 41 / 46

Page 42: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Multivariate Gaussian (Normal) Distribution

X is a continuous RV with values x ∈ RD

X ∼ N (µ,Σ), i.e. X has a Multivariate Normal distribution (MVN) ormultivariate Gaussian

N (x|µ,Σ) ,1

(2π)D/2|Σ|1/2exp

[− 1

2(x− µ)TΣ−1(x− µ)

]mean: E[x] = µ

mode: µ

covariance matrix: cov[x] = Σ ∈ RD×D where Σ = ΣT and Σ ≥ 0

precision matrix: Λ , Σ−1

spherical isotropic covariance with Σ = σ2ID

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 42 / 46

Page 43: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Multivariate Gaussian (Normal) DistributionBivariate Normal

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 43 / 46

Page 44: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Multivariate Student t Distribution

X is a continuous RV with values x ∈ RD

X ∼ T (µ,Σ, ν), i.e. X has a Multivariate Student t distribution

T (x|µ,Σ, ν) ,Γ(ν/2 + D/2)

Γ(ν/2)

|Σ|−1/2

νD/2πD/2

[1 +

1

ν(x− µ)TΣ−1(x− µ)

]−( ν+D2

)

mean: E[x] = µ

mode: µ

Σ = ΣT is now called the scale matrix

covariance matrix: cov[x] = νν−2

Σ

N.B.: this distribution is similar to MVN but it’s more robust w.r.t outliers due toits fatter tails (see the previous slides about univariate Student t distribution)

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 44 / 46

Page 45: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Dirichlet Distribution

X is a continuous RV with values x ∈ SK

probability simplex SK , {x ∈ RK : 0 ≤ xi ≤ 1,K∑i=1

xi = 1}

the vector x = (x1, ..., xK ) can be used to represent a set of K probabilities

X ∼ Dir(α), i.e. X has a Dirichlet distribution

Dir(x|α) ,1

B(α)

K∏i=1

xαi−1i I(x ∈ SK )

where α ∈ RK and B(α) is a generalization of the beta function to K variables2

B(α) = B(α1, ..., αK ) ,

∏Ki=1 Γ(αi )

Γ(α0)

α0 =∑K

i=1 αi

E[xk ] = αkα0

, mode[xk ] = αk−1α0−K

, var[xk ] = αk (α0−αK )

α20(α0+1)

N.B.: this distribution is a multivariate generalization of the beta distribution

2see the slide about the gamma distribution for the definition of Γ(α)Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 45 / 46

Page 46: Lecture 3 - Probability - Part 2 · 2018-01-27 · 2 Joint Probability Distributions - Multivariate Joint Probability Distributions Joint CDF and PDF Marginal PDF Conditional PDF

Credits

Kevin Murphy’s book

Luigi Freda (”La Sapienza” University) Lecture 3 January 26, 2018 46 / 46