Lectures prepared by: Elchanan Mossel elena Shvets Berkeley Stat 134 FAll 2005 Introduction to probability Follows Jim Pitman’s book: Probability Section 3.3
Lectures prepared by:Elchanan Mosselelena Shvets
Berkeley
Stat 134 FAll 2005
Introduction to probability
Follows Jim Pitman’s book:
Probability
Section 3.3
Histo 1
0.
0.1
-50 -40 -30 -20 -10 0 10 20 30 40 50
X = 2*Bin(300,1/2) – 300E[X] = 0
Histo 2
0.
0.1
-50 -40 -30 -20 -10 0 10 20 30 40 50
Y = 2*Bin(30,1/2) – 30E[Y] = 0
Histo 3
0.
0.1
-50 -40 -30 -20 -10 -38 10 20 30 38 48
Z = 4*Bin(10,1/4) – 10E[Z] = 0
Histo 4
0.
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.
-50 -40 -30 -20 -10 0 10 20 30 40 50
W = 0E[W] = 0
A natural question:
•Is there a good parameter that allow to distinguish between these distributions?
•Is there a way to measure the spread?
Variance and Standard Deviation
•The variance of X, denoted by Var(X) is the mean squared deviation of X from its expected value µ = E(X):
Var(X) = E[(X-µ)2].The standard deviation of X, denoted by SD(X) is the square root of the variance of X:
SD(X) = Var(X).
Computational Formula for Variance
2 2V a r (X ) = E (X ) E (X ) .−
Proof:
E[ (X-µ)2] = E[X2 – 2µ X + µ2]
E[ (X-µ)2] = E[X2] – 2µ E[X] + µ2
E[ (X-µ)2] = E[X2] – 2µ2+ µ2
E[ (X-µ)2] = E[X2] – E[X]2
Claim:
Properties of Variance and SD
1. Claim: Var(X) ≥ 0.Pf: Var(X) = ∑ (x-µ)2 P(X=x) ≥ 0
2.Claim: Var(X) = 0 iff P[X=µ] = 1.
Variance and SD
For a general distribution Chebyshev inequality states that for every random variable X, X is expected to be close to E(X) give or take a few SD(X).
Chebyshev Inequality:
For every random variable X and all k > 0:
P(|X – E(X)| ≥ k SD(X)) � 1/k2.
Chebyshev’s Inequality
P(|X – E(X)| ≥≥≥≥ k SD(X)) ���� 1/k2
proof:
• Let µ = E(X) and σ = SD(X).
• Observe that |X–µ| ≥ k σ ⇔ |X–µ|2 ≥ k2 σ 2.
• The RV |X–µ|2 is non-negative, so we can use Markov’s inequality:
• P(|X–µ|2 ≥ k2 σ 2) � E [|X–µ|2 ] / k2 σ 2
P(|X–µ|2 ≥ k2 σ 2) � σ 2 / k2 σ 2 = 1/k2.
Variance of IndicatorsSuppose IA is an indicator of an event A with probability p. Observe that IA
2 = IA.
Ac AIA=1=IA2IA=0=IA2
E(IA2) = E(IA) = P(A) = p, so:
Var(IA) = E(IA2) – E(IA)2 = p – p2 = p(1-p).
Variance of a Sum of Independent Random Variables
Claim: if X1, X2, …, Xn are independent then: Var(X1+X2+…+Xn) = Var(X1)+Var(X2)+…+Var(Xn).
Pf: Suffices to prove for 2 random variables.
E[( X+Y – E(X+Y) )2 ] = E[( X-E(X) + Y–E(Y) )2] =
E[( X-E(X))2]+ 2 E[(X-E(X)) (Y-E(Y))] + E(Y–E(Y) )2]=
Var(X) + Var(Y) + 2 E[(X-E(X))] E[(Y-E(Y))] (mult.rule) =
Var(X) + Var(Y) + 0
Variance and Mean under scaling and shifts
• Claim: SD(aX + b) = |a| SD(X)
• Proof:
Var[aX+b] = E[(aX+b – aµ –b)2] =
= E[a2(X-µ)2] = a2 σ2
• Corollary: If a random variable X has
• E(X) = µ and SD(X) = σ > 0, then
• X*=(X-µ)/σ has
• E(X*) =0 and SD(X*)=1.
Square Root Law
Let X1, X2, … , Xn be independent random variables with the same distribution as X, and let Sn be their sum:
Sn = ∑i=1n Xi, and
their average, then:
nSX =n
Weak Law of large numbers
Then for every ε > 0:
Thm: Let X1, X2, … be a sequence of independent random variables with the same distribution. Let µdenote the common expected value
µ = E(Xi).1 2 n
nX +X +...+ X
And let X = .n
nP(|X | ) 1 as n .µ ε− < → → ∞
Weak Law of large numbers
Now Chebyshev inequality gives us:
Proof: Let µ = E(Xi) and σ = SD(Xi). Then from the square root law we have:
n nE(X ) = and SD(X ) = .n
σµ
2
n nn
P(|X | ) P(|X | )n n
ε σ σµ ε µσ ε
− ≥ = − ≥ ≤
For a fixed ε right hand side tends to 0 as n tends to ∞.
The Normal Approximation•Let Sn = X1 + … + Xn be the sum of independent random variables with the same distribution.
•Then for large n, the distribution of Sn is approximately normal with mean E(Sn) = n µ and SD(Sn) = σ n1/2,
• where µ = E(Xi) and σ = SD(Xi).
In other words:
Sums of repeated independent random variables
Suppose Xi represents the number obtained on the i’th roll of a die.
Then Xi has a uniform distribution on the set
{1,2,3,4,5,6}.
Distribution of X1
0.
0.1
0.2
1 2 3 4 5 6
Sum of two dice
We can obtain the distribution of S2 = X1 +X2
by the convolution formula:
P(S2 = k) = ∑i=1k-1 P(X1=i) P(X2=k-i| X1=i),
by independence
= ∑i=1k-1 P(X1=i) P(X2=k-i).
Distribution of S2
0.
0.1
0.2
2 3 4 5 6 7 8 9 10 11 12
Sum of four dice
We can obtain the distribution of
S4 = X1 + X2 + X3 + X4 = S2 + S’2 again by the
convolution formula:
P(S4 = k) = ∑i=1k-1 P(S2=i) P(S’2=k-i| S2=i),
by independence of S2 and S’2= ∑i=1
k-1 P(S2=i) P(S’2=k-i).
Distribution of S4
0.
0.1
0.2
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Distribution of S8
0.
0.1
8 12 16 20 24 28 32 36 40 44 48
Distribution of S16
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
16 24 32 40 48 56 64 72 80 88 96
Distribution of S32
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
32 48 64 80 96 112 128 144 160 176 192
Distribution of X1
0.
0.1
0.2
0.3
0.4
0.5
0.6
1 2 3
Distribution of S2
0.
0.1
0.2
0.3
0.4
2 3 4 5 6
Distribution of S4
0.
0.1
0.2
0.3
4 5 6 7 8 9 10 11 12
Distribution of S8
0.
0.1
0.2
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Distribution of S16
0
0.05
0.1
0.15
16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48
Distribution of S32
0.
0.1
32 37 42 47 52 57 62 67 72 77 82 87 92
Distribution of X1
0.
0.1
0.2
0.3
0.4
0 1 2 3 4 5
Distribution of S2
0.
0.1
0.2
0.3
0 1 2 3 4 5 6 7 8 9 10
Distribution of S4
0.
0.1
0.2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Distribution of S8
0.
0.1
0 4 8 12 16 20 24 28 32 36 40
Distribution of S16
0
0.01
0.02
0.03
0.04
0.05
0.06
0 8 16 24 32 40 48 56 64 72 80
Distribution of S32
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0 16 32 48 64 80 96 112 128 144 160