Top Banner
STAT:5100 (22S:193) Statistical Inference I Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 1
43

STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Oct 10, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

STAT:5100 (22S:193) Statistical Inference IWeek 13

Luke Tierney

University of Iowa

Fall 2015

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 1

Page 2: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Monday, November 16, 2015

Recap

• Normal populations

• Order statistics

• Marginal CDF and density of an order statistic

• Little “o” and big “O” notation

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 2

Page 3: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Monday, November 16, 2015 Order Statistics

Example

• A N(µ, σ2) population has both mean µ and median µ.

• Either the sample mean or the sample median could be used toestimate µ.

• Which would produce a better estimate?

• We can explore this question using both simulation and theory.

• Some R code:http://www.stat.uiowa.edu/~luke/classes/193/median.R.

• The standard deviation of the sample median seems to satisfy

SD(Xn) ≈ 1.25√nσ

• Many statistics have sampling distributions that follow this squareroot relationship.

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 3

Page 4: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Monday, November 16, 2015 Order Statistics

Example (continued)

• Suppose we are considering estimating µ with X using a sample of size n.

• What would be the equivalent samples size nE we would need to achieve thesame accuracy using the median?

• We need to solve the equation

Var(X n) = Var(XnE )

orσ2

n≈ (1.25)2

nEσ2

• The solution is nE ≈ (1.25)2n = 1.5625n.

• The ratio n/nE ≈ 1/1.5625 = 0.64 is called the relative efficiency of X to X .

• X is less efficient than X if the data really are normally distributed.

• But X is much more robust to outliers than X .

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 4

Page 5: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Monday, November 16, 2015 Approximations and Limits

Approximations and Limits

• If we can’t say anything precise about a sampling distribution, weoften look for approximations.

• Approximations are usually stated as limit results.• This is common in mathematics, for example:

• The statement “f is differentiable at x∗” means

limx→x∗

f (x)− f (x∗)

x − x∗= f ′(x∗)

• This can also be expressed as

f (x) = f (x∗) + f ′(x∗)(x − x∗) + o(x − x∗)

as x → x∗.• This suggests the linear approximation

f (x) ≈ f (x∗) + f ′(x∗)(x − x∗)

for x close to x∗.• Care and experience are needed in interpreting “≈” and “close to.”

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 5

Page 6: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Monday, November 16, 2015 Approximations and Limits

• We will look at two kinds of convergence results:• Convergence of sequences of random variables• Convergence of sequences of probability distributions

• This will help with questions like• Should Xn be close to µ for large samples?• Can the probability distribution of the error Xn − µ be approximated by

a normal distribution?

• These will be formalized as limits as n→∞:• Xn converges to µ.• The distribution of

Xn − µσ/√

n

converges to a standard normal distribution.

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 6

Page 7: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Monday, November 16, 2015 Approximations and Limits

• We will develop tools to help us answer questions like• if Xn → X and Yn → Y does this imply that Xn + Yn → X + Y ?• If f is continuous and Xn → X , does this imply f (Xn)→ f (X )?• If the distribution of

Xn − µσ/√

n

converges to a N(0, 1) distribution and Sn → σ can we conclude thatthe distribution of

Xn − µSn/√

n

also converges to a N(0, 1) distribution?

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 7

Page 8: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Monday, November 16, 2015 Convergence of Sequences of Random Variables

Convergence of Sequences of Random Variables

Examples

• Suppose we want to use a statistic Tn to estimate a parameter θ.• To decide whether this makes sense a minimal requirement might be

that Tn → θ as n→∞.• This property is known as consistency.• The Weak Law of Large Numbers is an example of such a result.

• In showing that an approximate distribution for√

n(Xn − µ)/σ canalso be used as an approximate distribution for

√n(Xn − µ)/Sn a

useful step is to show that

Xn − µSn/√

n− Xn − µ

σ/√

n=

Xn − µσ/√

n

Sn− 1

)→ 0

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 8

Page 9: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Monday, November 16, 2015 Convergence of Sequences of Random Variables

• Some sumulations: http:

//www.stat.uiowa.edu/~luke/classes/193/convergence.R.

• X1,X2, . . . are i.i.d. Bernoulli(p) and

Pn =

∑ni=n Xi

n.

• Almost all sample paths converge to p.

• This means

P(Pn → p) = P({s ∈ S : Pn(s)→ p}) = 1.

• This is called almost sure convergence.

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 9

Page 10: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Monday, November 16, 2015 Almost Sure Convergence

Almost Sure Convergence

Definition

A sequence X1,X2, . . . , of random variables converges almost surely to arandom variable X if

P(

limn→∞

Xn = X)

= 1

orP({

s ∈ S : limn→∞

Xn(s) = X (s)})

= 1

Notation:

Xna.s.→ X

Xn → X a.s.

Plimn→∞

Xn = X

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 10

Page 11: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Monday, November 16, 2015 Almost Sure Convergence

Example

Theorem (Strong Law of Large Numbers)

Let X1,X2, . . . , be i .i .d . with E [|X1|] <∞ and µ = E [X1]. Then

X n → µ

almost surely.

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 11

Page 12: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Monday, November 16, 2015 Almost Sure Convergence

Example

• Let Z be a standard normal random variable.

• Define Zn as

Zn =i

nif

i − 0.5

n≤ Z <

i + 0.5

n

for all integers i .

• Using the notation {b} for the closest integer to b:

Zn =1

n{nZ}.

• Then Zn → Z almost surely:• For any number z define zn = 1

n{nz}.• Then |z − zn| ≤ 1

n → 0; i.e. zn → z .• Therefore Zn(s) = 1

n{Z (s)} → Z (s) for all s ∈ S .

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 12

Page 13: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Wednesday, November 18, 2015

Recap

• Normal populations

• Approximations and limits — motivations from calculus

• Almost sure convergence

• Strong Law of Large Numbers

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 13

Page 14: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Wednesday, November 18, 2015 Almost Sure Convergence

TheoremSuppose Xn → X almost surely and f is continuous. Then f (Xn)→ f (X )almost surely.

Proof.

• Let A = {s ∈ S : Xn(s)→ X (s)}.• Since f is continuous, for any s ∈ A

f (Xn(s))→ f (X (s))

• Therefore{s ∈ S : f (Xn(s))→ f (X (s))} ⊃ A

• SoP(f (Xn)→ f (X )) ≥ P(A) = 1.

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 14

Page 15: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Wednesday, November 18, 2015 Almost Sure Convergence

Example

• Suppose X1,X2, . . . are independent draws from a population withfinite mean µ and finite variance σ2.

• The sample variance can be written as

S2n =

1

n − 1

n∑i=1

(Xi − Xn)2

=1

n − 1

[n∑

i=1

(Xi − µ)2 − n(Xn − µ)2

]

=n

n − 1

[1

n

n∑i=1

(Xi − µ)2 − (Xn − µ)2

]=

n

n − 1Un

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 15

Page 16: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Wednesday, November 18, 2015 Almost Sure Convergence

Example (continued)

• By the Strong Law of Large Numbers

Xn → µ a.s.

1

n

n∑i=1

(Xi − µ)2 → σ2 a.s..

• Therefore

Un =1

n

n∑i=1

(Xi − µ)2 − (Xn − µ)2 → σ2 a.s..

• SoS2n =

n

n − 1Un → σ2 a.s..

• Since the square root is continuous, we also have

Sn =√

S2n →

√σ2 = σ a.s..

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 16

Page 17: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Wednesday, November 18, 2015 Almost Sure Convergence

• Almost sure convergence, if you can show that you have it, is theeasiest form of convergence to work with.

• But almost sure convergence can be difficult to verify.

• It is also more than we need for many useful results.

• It is useful to look for other notions of convergence that may beeasier to verify.

• One alternative is convergence in probability.

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 17

Page 18: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Wednesday, November 18, 2015 Convergence in Probability

Convergence in Probability

Definition

A sequence of random variables X1,X2, . . ., converges in probability to arandom variable X if for every ε > 0

limn→∞

P(|Xn − X | ≥ ε) = 0

orlimn→∞

P(|Xn − X | < ε) = 1

Notation:

XnP→ X

plimn→∞

Xn = X

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 18

Page 19: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Wednesday, November 18, 2015 Convergence in Probability

Examples

• Weak Law of Large Numbers: XnP→ µ.

• Suppose U1,U2, . . . are independent Uniform[0, 1] random variables.• Let Xn = max{U1, . . . ,Un} and let X ≡ 1.• Then for any ε > 0

P(|Xn − X | ≥ ε) = P(Xn ≤ 1− ε)

=

{(1− ε)n if ε < 1

0 otherwise

→ 0

• So XnP→ 1.

• It is also true that Xn → 1 almost surely.

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 19

Page 20: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Wednesday, November 18, 2015 Convergence in Probability

• If Xn → X almost surely, then XnP→ X :

• For an ε > 0

P(|Xn − X | ≥ ε) = E[1{|Xn−X |≥ε}

].

• For any s ∈ S where Xn(s)→ X (s) we have

1{|Xn(s)−X (s)|≥ε} → 0.

• So 1{|Xn−X |≥ε} → 0 almost surely.• By the dominated convergence theorem this implies that

P(|Xn − X | ≥ ε)→ 0.

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 20

Page 21: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Wednesday, November 18, 2015 Convergence in Probability

• It is possible to have convergence in probability but not almost sureconvergence.

• Let X1,X2, . . . be independent Bernoulli random variables withP(Xn = 1) = 1

n .• For any ε > 0 with ε ≤ 1

P(|Xn| ≥ ε) =1

n→ 0.

• So XnP→ 0.

• But for every n

P(all of Xn,Xn+1, . . . are zero) =∞∏k=n

(1− 1

k

)≤ exp

{−∞∑k=n

1

k

}= 0

• So with probability one the sequence X1,X2, . . . contains infinitelymany ones and cannot converge almost surely to zero.

• So almost sure convergence is stronger than convergence inprobability.

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 21

Page 22: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Wednesday, November 18, 2015 A Sufficient Condition

A Sufficient Condition

Suppose for some p ≥ 1 we have E [|Xn|p] <∞ for all n, E [|X |p] <∞,and

limn→∞

E [|Xn − X |p] = 0

Then XnP→ X .

Proof.

By Markov’s inequality,

P(|Xn − X | ≥ ε) = P(|Xn − X |p ≥ εp)

≤ E [|Xn − X |p]

εp→ 0

for any ε > 0.

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 22

Page 23: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Wednesday, November 18, 2015 A Sufficient Condition

• If E [|Xn − X |p]→ 0 then Xn is said to converge to X in Lp.

• Usually we use this for p = 2.

• This is called convergence in mean square.

• If the limit is a constant a then the sufficient condition becomes

E [(Xn − a)2] = Var(Xn) + (E [Xn]− a)2 → 0.

• This convergence holds if and only if both

Var(Xn)→ 0

E [Xn]→ a.

• If Xn is used to estimate a then• E [Xn]− a is called the bias of Xn;• E [(Xn − a)2] is the Mean Squared Error (MSE).

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 23

Page 24: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Wednesday, November 18, 2015 Weak Law of Large Numbers

Weak Law of Large Numbers

TheoremLet X1,X2, . . . , be i .i .d . with mean µ and finite variance σ2. LetX n = 1

n

∑ni=1 Xi . Then

X nP→ µ.

Proof.

E [(X n − µ)2] = Var(X n) =σ2

n→ 0

This is sometimes called (weak) consistency of X n for µ.

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 24

Page 25: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Wednesday, November 18, 2015 Distance and Convergence

Distance and Convergence

• One way to develop a notion of convergence for complicated objects,like random variables, is to define a distance between two objects.

• A distance is a function d(x , y) with these properties:• d(x , y) ≥ 0 for all x , y .• d(x , y) = d(y , x) for all x , y .• d(x , y) = 0 if and only if x = y .• d(x , y) ≤ d(x , z) + d(z , y) for all x , y , z .

• A distance is also called a metric

• A metric space is a set together with a distance.

• Convergence xn → x in a metric space means d(xn, x)→ 0.

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 25

Page 26: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Wednesday, November 18, 2015 Distance and Convergence

Examples

• Lp convergence corresponds to convergence with respect to thedistance

d(Xn,X ) = E [|Xn − X |p]1/p.

• To satisfy the requirement that d(X ,Y ) = 0 implies X = Y we needto work in terms of equivalence classes of almost surely equal randomvariables.

• Convergence in probability also corresponds to convergence withrespect to a distance; one possible distance is the Ky Fan distance

d(X ,Y ) = E [min {|X − Y |, 1}] .

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 26

Page 27: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Wednesday, November 18, 2015 Distances for Probabilities

Distances for Probabilities

• One possible distance between two probabilities P and Q is the totalvariation distance

dTV(P,Q) = supA∈B|P(A)− Q(A)|.

• If P and Q are both continuous with densities f and g then

dTV(P,Q) =1

2

∫|f (x)− g(x)|dx .

• An analogous result holds if P and Q are both discrete.

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 27

Page 28: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Wednesday, November 18, 2015 Distances for Probabilities

• If Fn and F are probability distributions with densities or PMFs fn andf , and fn(x)→ f (x) for all x then

dTV (Fn,F )→ 0.

• This is known as Scheffe’s Theorem.

• If P is continuous and Q is discrete, then

dTV(P,Q) = 1.

• So total variation distance cannot be used to help with approximatingcontinuous distribution with discrete ones, or vice versa.

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 28

Page 29: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Friday, November 20, 2015

Recap

• Almost sure convergence

• Strong Law of Large Numbers

• Convergence in probability

• Weak law of large numbers

• Lp convergence

• Distances and convergence

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 29

Page 30: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Friday, November 20, 2015 Distances for Probabilities

• A distance among cumulative distribution functions is theKolmogorov distance:

dK (F ,G ) = supx∈R|F (x)− G (x)|

• This is a useful distance for continuous distributions or for discretedistributions with a common support.

• It is useful for capturing convergence of a sequence of discretedistributions to a continuous distribution.

• For general discrete distributions it has some undesirable features.

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 30

Page 31: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Friday, November 20, 2015 Distances for Probabilities

Example

• Let Fy (x) be the CDF of a random variable that equals y withprobability one:

Fy (x) =

{1 if x ≥ y

0 if x < y .

• Let yn = 1n .

• Then dK (Fyn ,F0) = 1 for all n.

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 31

Page 32: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Friday, November 20, 2015 Distances for Probabilities

• An alternative distance among CDFs is the Levy distance:

dL(F ,G ) = inf{ε > 0 : F (x − ε)− ε ≤ G (x) ≤ F (x + ε) + ε for all x ∈ R}

• Another way of defining this distance:• Think of placing a square parallel to the axes with side ε in a gap

between F and G .• dL is the largest ε that will fit.

• The Levy distance between a N(0,1) and a N(1, 1) distribution isapproximately 0.28.

• For point mass distributions Fx and Fy the Levy distance isdL(Fx ,Fy ) = |x − y |.

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 32

Page 33: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Friday, November 20, 2015 Distances for Probabilities

• Two useful results: Suppose Xn ∼ Fn and X ∼ F . Then• dL(Fn,F )→ 0 if and only if Fn(x)→ F (x) for all x where F is

continuous.• dL(Fn,F )→ 0 if and only if

E [g(Xn)]→ E [g(X )]

for all bounded, continuous functions g .

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 33

Page 34: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Friday, November 20, 2015 Convergence in Distribution

Convergence in Distribution

Definition

A sequence of random variables X1,X2, . . . , converges in distribution to arandom variable X if

limn→∞

FXn(x) = FX (x)

for all x where FX is continuous.

• This is different—it is really about distributions, not random variables.• This is also called weak convergence of distributions.• It corresponds to convergence in the Levy distance.

Notation:

XnD→ X

Xn ⇒ X

L(Xn)→ L(X )

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 34

Page 35: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Friday, November 20, 2015 Convergence in Distribution

Example

• Suppose X ∼ N(0, 1) and let Xn = (−1)nX .

• Then Xn ∼ X for all n, so Xn → X in distribution.

• Xn does not converge to X almost surely or in probability.

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 35

Page 36: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Friday, November 20, 2015 Convergence in Distribution

TheoremA sequence of random variables X1,X2, . . . converges to a random variable X indistribution if and only if

P(Xn ∈ A)→ P(X ∈ A)

for every open set A with P(X ∈ ∂A) = 0, where ∂A is the boundary of A.

TheoremA sequence of random variables X1,X2, . . . converges to a random variable X indistribution if and only if

E [g(Xn)]→ E [g(X )]

for all bounded, continuous functions g.

TheoremSuppose Xn has MGF Mn, n = 1, 2, . . ., X has MGF M, M is finite in aneighborhood of the origin, and for all t in a neighborhood of the origin

Mn(t)→ M(t). Then XnD→ X .

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 36

Page 37: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Friday, November 20, 2015 Convergence in Distribution

TheoremIf Xn

P→ X then XnD→ X .

TheoremIf c is a constant and Xn

D→ c then XnP→ c.

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 37

Page 38: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Friday, November 20, 2015 Convergence in Distribution

Example

• Suppose U1,U2, . . . are independent Uniform[0, 1] random variables.

• Let Xn = max{U1, . . . ,Un}.• Then for 0 < x < 1

FXn(x) = xn → 0.

• For x ≤ 0 we have FXn(x) = 0 and FXn(x) = 1 for x ≥ 0.

• So FXn(x)→ FX (x) for all x , where FX is the CDF of X ≡ 1.

• So XnD→ X .

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 38

Page 39: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Friday, November 20, 2015 Convergence in Distribution

Example (continued)

• Now suppose Yn = min{U1, . . . ,Un} and Y ≡ 0.

• The CDF of Yn is FYn(y) = 1− (1− y)n for 0 ≤ y ≤ 1.

• For y > 0 we have FYn(y)→ 1.

• But for y = 0 we have FYn(y) = 0 for all n.

• So FYn(y)→ FY (y) for all y except y = 0, where FY is notcontinuous.

• So YnD→ Y .

• YnP→ 0 as well (also almost surely).

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 39

Page 40: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Friday, November 20, 2015 Convergence in Distribution

Example (continued)

• At what rate does Yn → 0?

• The mean of Yn is

E [Yn] =

∫ ∞0

(1− FYn(t))dt =

∫ 1

0(1− t)ndt =

1

n + 1= O(n−1).

• What happens to the distribution of Vn = nYn?

• For 0 ≤ v ≤ n the CDF of Vn is

FVn(v) = P(Vn ≤ v) = P(Yn ≤ v/n) = 1− (1− v/n)n → 1− e−v

• So Vn converges in distribution to V ∼ Exp(1).

• The distribution of Yn = Vn/n is approximately Exponential(λ = n).

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 40

Page 41: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Friday, November 20, 2015 Convergence in Distribution

Example

• Let Pn be the sample proportion of successes in n Bernoulli(p) trials.

• What can we say about the distribution of Pn for large n?

• It is useful to look at the standardized version

Z =Pn − p√p(1− p)/n

.

• Some simulations:http://www.stat.uiowa.edu/~luke/classes/193/convergence.R

• The sample paths do not converge.

• Their probability distributions do converge.

• The limiting distribution is the standard normal distribution.

• This suggests that the distribution of Pn for large n is approximately

N

(p,

p(1− p)

n

).

• This is an example of the Central Limit Theorem.

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 41

Page 42: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Friday, November 20, 2015 Central Limit Theorem

Theorem (Central Limit Theorem)

Let X1,X2, . . . , be i .i .d . from a population with an MGF that is finite nearthe origin. Then X1 has finite mean µ and finite variance σ2. Let

Zn =X n − µσ/√

n=√

nX n − µσ

and let Z ∼ N(0, 1). Then Zn → Z in distribution, i.e.

P(Zn ≤ z)→∫ z

−∞

1√2π

e−u2/2du

for all z.

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 42

Page 43: STAT:5100 (22S:193) Statistical Inference I - Week 13homepage.divms.uiowa.edu/~luke/classes/193/notes-week13.pdf · Week 13 Luke Tierney University of Iowa Fall 2015 Luke Tierney

Friday, November 20, 2015 Central Limit Theorem

• If we only assume E [X 21 ] <∞ then the theorem is still true; the proof

works with characteristic functions.

• Independence and identical distribution can be weakened somewhat.

Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 43