Lectures prepared by: Elchanan Mossel elenaShvetsmossel/teach/134f06...Variance and Standard Deviation •The variance of X, denoted by Var (X) is the mean squared deviation of X from

Lectures prepared by:Elchanan Mosselelena Shvets

Berkeley

Stat 134 FAll 2005

Introduction to probability

Follows Jim Pitman’s book:

Probability

Section 3.3

Histo 1

0.

0.1

-50 -40 -30 -20 -10 0 10 20 30 40 50

X = 2*Bin(300,1/2) – 300E[X] = 0

Histo 2

0.

0.1

-50 -40 -30 -20 -10 0 10 20 30 40 50

Y = 2*Bin(30,1/2) – 30E[Y] = 0

Histo 3

0.

0.1

-50 -40 -30 -20 -10 -38 10 20 30 38 48

Z = 4*Bin(10,1/4) – 10E[Z] = 0

Histo 4

0.

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.

-50 -40 -30 -20 -10 0 10 20 30 40 50

W = 0E[W] = 0

A natural question:

•Is there a good parameter that allow to distinguish between these distributions?

•Is there a way to measure the spread?

Variance and Standard Deviation

•The variance of X, denoted by Var(X) is the mean squared deviation of X from its expected value µ = E(X):

Var(X) = E[(X-µ)2].The standard deviation of X, denoted by SD(X) is the square root of the variance of X:

SD(X) = Var(X).

Computational Formula for Variance

2 2V a r (X ) = E (X ) E (X ) .−

Proof:

E[ (X-µ)2] = E[X2 – 2µ X + µ2]

E[ (X-µ)2] = E[X2] – 2µ E[X] + µ2

E[ (X-µ)2] = E[X2] – 2µ2+ µ2

E[ (X-µ)2] = E[X2] – E[X]2

Claim:

Properties of Variance and SD

1. Claim: Var(X) ≥ 0.Pf: Var(X) = ∑ (x-µ)2 P(X=x) ≥ 0

2.Claim: Var(X) = 0 iff P[X=µ] = 1.

Variance and SD

For a general distribution Chebyshev inequality states that for every random variable X, X is expected to be close to E(X) give or take a few SD(X).

Chebyshev Inequality:

For every random variable X and all k > 0:

P(|X – E(X)| ≥ k SD(X)) � 1/k2.

Chebyshev’s Inequality

P(|X – E(X)| ≥≥≥≥ k SD(X)) �� 1/k2

proof:

• Let µ = E(X) and σ = SD(X).

• Observe that |X–µ| ≥ k σ ⇔ |X–µ|2 ≥ k2 σ 2.

• The RV |X–µ|2 is non-negative, so we can use Markov’s inequality:

• P(|X–µ|2 ≥ k2 σ 2) � E [|X–µ|2 ] / k2 σ 2

P(|X–µ|2 ≥ k2 σ 2) � σ 2 / k2 σ 2 = 1/k2.

Variance of IndicatorsSuppose IA is an indicator of an event A with probability p. Observe that IA

2 = IA.

Ac AIA=1=IA2IA=0=IA2

E(IA2) = E(IA) = P(A) = p, so:

Var(IA) = E(IA2) – E(IA)2 = p – p2 = p(1-p).

Variance of a Sum of Independent Random Variables

Claim: if X1, X2, …, Xn are independent then: Var(X1+X2+…+Xn) = Var(X1)+Var(X2)+…+Var(Xn).

Pf: Suffices to prove for 2 random variables.

E[( X+Y – E(X+Y) )2 ] = E[( X-E(X) + Y–E(Y) )2] =

E[( X-E(X))2]+ 2 E[(X-E(X)) (Y-E(Y))] + E(Y–E(Y) )2]=

Var(X) + Var(Y) + 2 E[(X-E(X))] E[(Y-E(Y))] (mult.rule) =

Var(X) + Var(Y) + 0

Variance and Mean under scaling and shifts

• Claim: SD(aX + b) = |a| SD(X)

• Proof:

Var[aX+b] = E[(aX+b – aµ –b)2] =

= E[a2(X-µ)2] = a2 σ2

• Corollary: If a random variable X has

• E(X) = µ and SD(X) = σ > 0, then

• X*=(X-µ)/σ has

• E(X*) =0 and SD(X*)=1.

Square Root Law

Let X1, X2, … , Xn be independent random variables with the same distribution as X, and let Sn be their sum:

Sn = ∑i=1n Xi, and

their average, then:

nSX =n

Weak Law of large numbers

Then for every ε > 0:

Thm: Let X1, X2, … be a sequence of independent random variables with the same distribution. Let µdenote the common expected value

µ = E(Xi).1 2 n

nX +X +...+ X

And let X = .n

nP(|X | ) 1 as n .µ ε− < → → ∞

Weak Law of large numbers

Now Chebyshev inequality gives us:

Proof: Let µ = E(Xi) and σ = SD(Xi). Then from the square root law we have:

n nE(X ) = and SD(X ) = .n

σµ

2

n nn

P(|X | ) P(|X | )n n

ε σ σµ ε µσ ε

− ≥ = − ≥ ≤

For a fixed ε right hand side tends to 0 as n tends to ∞.

The Normal Approximation•Let Sn = X1 + … + Xn be the sum of independent random variables with the same distribution.

•Then for large n, the distribution of Sn is approximately normal with mean E(Sn) = n µ and SD(Sn) = σ n1/2,

• where µ = E(Xi) and σ = SD(Xi).

In other words:

Sums of repeated independent random variables

Suppose Xi represents the number obtained on the i’th roll of a die.

Then Xi has a uniform distribution on the set

{1,2,3,4,5,6}.

Distribution of X1

0.

0.1

0.2

1 2 3 4 5 6

Sum of two dice

We can obtain the distribution of S2 = X1 +X2

by the convolution formula:

P(S2 = k) = ∑i=1k-1 P(X1=i) P(X2=k-i| X1=i),

by independence

= ∑i=1k-1 P(X1=i) P(X2=k-i).

Distribution of S2

0.

0.1

0.2

2 3 4 5 6 7 8 9 10 11 12

Sum of four dice

We can obtain the distribution of

S4 = X1 + X2 + X3 + X4 = S2 + S’2 again by the

convolution formula:

P(S4 = k) = ∑i=1k-1 P(S2=i) P(S’2=k-i| S2=i),

by independence of S2 and S’2= ∑i=1

k-1 P(S2=i) P(S’2=k-i).

Distribution of S4

0.

0.1

0.2

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Distribution of S8

0.

0.1

8 12 16 20 24 28 32 36 40 44 48

Distribution of S16

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

16 24 32 40 48 56 64 72 80 88 96

Distribution of S32

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

32 48 64 80 96 112 128 144 160 176 192

Distribution of X1

0.

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3

Distribution of S2

0.

0.1

0.2

0.3

0.4

2 3 4 5 6

Distribution of S4

0.

0.1

0.2

0.3

4 5 6 7 8 9 10 11 12

Distribution of S8

0.

0.1

0.2

8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Distribution of S16

0

0.05

0.1

0.15

16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48

Distribution of S32

0.

0.1

32 37 42 47 52 57 62 67 72 77 82 87 92

Distribution of X1

0.

0.1

0.2

0.3

0.4

0 1 2 3 4 5

Distribution of S2

0.

0.1

0.2

0.3

0 1 2 3 4 5 6 7 8 9 10

Distribution of S4

0.

0.1

0.2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Distribution of S8

0.

0.1

0 4 8 12 16 20 24 28 32 36 40

Distribution of S16

0

0.01

0.02

0.03

0.04

0.05

0.06

0 8 16 24 32 40 48 56 64 72 80

Distribution of S32

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0 16 32 48 64 80 96 112 128 144 160

Lectures prepared by: Elchanan Mossel elenaShvetsmossel/teach/134f06...Variance and Standard Deviation •The variance of X, denoted by Var (X) is the mean squared deviation of X from

Documents