Final Exam Sample Solutions

8/20/2019 Final Exam Sample Solutions

http://slidepdf.com/reader/full/final-exam-sample-solutions 1/19

STATS 200 (Stanford University, Summer 2015) 1

Solutions to Final Exam Sample Questions

This document contains some questions that are fairly representative of the content, style,and difficulty of the questions that will appear on the final exam. Most of these questionscome from actual exams that I gave in previous editions of the course. Please keep the

following things in mind:• This document is much longer than the actual final exam will be.

• All material covered in the lecture notes (up through the end of class on Monday,August 10) is eligible for inclusion on the final exam, regardless of whether it is coveredby any of the sample questions below.

• The final exam will be cumulative, but material not covered on the midterm exam willbe represented more heavily. The proportion of questions on the final exam that aredrawn from pre-midterm material may not be exactly equal to the proportion of thesample questions below that are drawn from pre-midterm material.

1. Note: Parts (a)–(f) of this question already appeared in the midterm exam sample ques-tions, but they are repeated here to maintain the original format of the overall question.

Let X 1, . . . , X n ∼ iid Beta(1, θ), where θ > 0 is unknown.

Note: The Beta(1, θ) distribution has pdf

f θ(x) = θ(1 − x)θ−1 if 0 < x < 1,

0 otherwise .

Also,

E θ

(X 1

) =

1

1+

θ

, Varθ

(X 1

) =

θ

(1+

θ)2(2+

θ).

You may use these facts without proof.

(a) Find the maximum likelihood estimator θ̂MLEn of θ.

Note: Recall that any logarithm of a number between 0 and 1 is negative.

Solution: Differentiating the log-likelihood yields

′Xn(θ) = ∂

∂θn log θ + (θ − 1) n

i=1

log(1 −X i)=

n

θ +

n

i=1

log

(1 −X i

) = 0 ⇐⇒ θ = −

n

∑ni=1 log(1 −X i)

.

This point is the only critical point, and it can be seen from the form of the log-likelihood that Xn

(θ)→ −∞ both as θ → 0 and as θ →∞. Then the critical point isindeed the maximum, and

θ̂MLE

n = −n

∑ni=1 log(1 −X i) .

(The justification for why the critical point is indeed the maximum is not requiredfor full credit since this fact is fairly obvious by inspection of the log-likelihood.)



Solutions to Final Exam Sample Questions 2

(b) Do we know for certain that θ̂MLEn is an unbiased estimator of θ?

Solution: No. There is no reason why it necessarily must be, and indeed it isnt.(An explanation is not required.)

(c) Find the asymptotic distribution of the maximum likelihood estimator ˆθn.

Note: Your answer should be a formal probabilistic result involving convergence indistribution.

Solution: The second derivative of the log-likelihood is

′′Xn(θ) = ∂

∂θn

θ +

n

i=1

log(1 −X i) = − n

θ2,

so the Fisher information in the sample is I n(θ) = nθ2. It follows that the Fisherinformation per observation is I 1

(θ

) = 1

θ2, and so

√ n θ̂n − θ →D N (0, θ2)is the asymptotic distribution of the maximum likelihood estimator θ̂n.

(d) Let µ = 1(1 + θ) = E θ(X 1), and define

X n = n−1

n

i=1

X i.

Do we know for certain that X n is an unbiased estimator of µ?

Solution: Yes, since E θ(X n) = n−1∑ni=1 E θ(X 1) = 1(1 + θ) = µ. (An explanationis not required.)

(e) Define the estimator

θ̃n =1

X n− 1.

Do we know for certain that θ̃n is an unbiased estimator of θ?

Solution: No, since E θ

(1

X n

) ≠ 1

E θ

(X n

).

(f) Find the asymptotic distribution of θ̃n.


Solution: By the central limit theorem,√ n X n −

1

1 + θ→D N 0,

θ(1 + θ)2(2 + θ).




Now let g(t) = t−1 − 1, so that θ = g[1(1 + θ)] and θ̃n = g(X n). Then g′(t) = −t−2,and

g′

1

1 + θ

= −

1

[1

(1 + θ

)]2 = −

(1 + θ

)2.

Next, observe thatg′ 1

1 + θ2 θ(1 + θ)2(2 + θ) = θ(1 + θ)4(1 + θ)2(2 + θ) = θ(1 + θ)2

2 + θ .

Then √ n θ̃n − θ →D N 0,

θ(1 + θ)22 + θ

by the delta method.

(g) Find the asymptotic relative efficiency of θ̂MLEn compared to θ̃n.

Note: If you were unable to find the asymptotic distributions of θ̂MLE

n and/or θ̃

n, then

call their asymptotic variances vMLE(θ) and/or v○(θ) so that you can still answer this question.

Solution: The asymptotic relative efficiency of θ̂MLEn compared to θ̃n is

AREθθ̂MLE

n , θ̃n = 1vMLE(θ)1v○(θ) =

(1 + θ)2θ(2 + θ) = 1 + 2θ + θ2

2θ + θ2 .

(Simplification is not necessary for full credit.)

2. Let X 1, . . . , X n ∼ iid Geometric

(θ

), where θ is unknown and 0 < θ < 1.

Note: The Geometric

(θ

) distribution has pmf

f θ(x) = (1 − θ)xθ if x ∈ {0, 1, 2, . . .},

0 otherwise ,

and it has mean (1 − θ)θ. Also, the maximum likelihood estimator of θ is

θ̂MLE

n =n

n +∑ni=1 X i

.


(a) Find the Fisher information in the sample.

Solution: Differentiating the log-likelihood twice yields

′′Xn(θ) = ∂ 2

∂θ2 ni=1

X i log(1 − θ) +n log θ = − 1(1 − θ)2 n

i=1

X i −n

θ2.

Then the Fisher information in the sample is

I n(θ) = −E θ′′Xn(θ) = n(1 − θ)2E θ(X 1) + n

θ2 =

n

θ(1 − θ) + n

θ2 =

n

θ2(1 − θ) .





(b) Let θ0 be fixed and known, where 0 < θ0 < 1. Find the Wald test of H 0 ∶ θ = θ0 versusH 1 ∶ θ ≠ θ0 and state how to choose the critical value to give the test approximatesize α, where 0 < α < 1.

Note: You may use either version of the Wald test.

Solution: Observe that

J n = −′′Xnθ̂MLE

n = n +∑ni=1 X i

∑ni=1 X i

2 n

i=1

X i +nn +∑ni=1 X i

n 2 = (n +∑n

i=1 X i)3n∑n

i=1 X i

= I nθ̂MLE

n ,

so both versions of the Wald test reject H 0 if and only if (n +∑ni=1 X i)3

n∑ni=1 X i

n

n +∑ni=1 X i

− θ0 ≥ z α2,

where z α2 is the number such that P (Z ≥ z α2) = α for a standard normal randomvariable Z .

(c) Let θ0 be fixed and known, where 0 < θ0 < 1. Find the score test of H 0 ∶ θ = θ0 versusH 1 ∶ θ ≠ θ0 and state how to choose the critical value to give the test approximatesize α, where 0 < α < 1.

Solution: The score function is

′Xn(θ) = ∂

∂θ ni=1

X i log(1 − θ) + n log θ = − 1

1 − θ

n

i=1

X i +n

θ =

n − θ(n +∑ni=1 X i)

θ(1 − θ) ,

so the score test rejects H 0 if and only if

θ20(1 − θ0)n

n − θ0(n +∑ni=1 X i)

θ0(1 − θ0) ≥ z α2,


(d) Find the Wald confidence interval for θ with approximate confidence level 1 − α,where 0 < α < 1.

Note: You may use either version of the Wald confidence interval.

Solution: Since I n

(θ̂MLEn

) = J n, both versions of the Wald confidence interval are

θ ∈ (0, 1) ∶ n

n +∑ni=1 X i

− z α2 n∑n

i=1 X i(n +∑ni=1 X i)3

< θ <n

n +∑ni=1 X i

+ z α2

n∑ni=1 X i(n +∑ni=1 X i)3

,





3. Let X 1, . . . , X n be iid random variables such that E µ,σ2(X 1) = µ and Varµ,σ2(X 1) = σ2 areboth finite. However, suppose that X 1, . . . , X n are not normally distributed. Now define

X n =1

n

n

i=1

X i.

(a) Do we know for certain that X n is a consistent estimator of µ?

Solution: Yes, by the law of large numbers. (An explanation is not required.)

(b) Do we know for certain that the distribution of X n is approximately normal forlarge n?

Solution: Yes, by the central limit theorem. (An explanation is not required.)

4. Let X 1, . . . , X n be iid continuous random variables with pdf

f θ(x) = 2θx exp(−θx2) if x ≥ 0,

0 if x < 0,

where θ > 0 is unknown. Suppose we assign a Gamma(a, b) prior to θ, where a > 0 andb > 0 are known.

Note: The Gamma(a, b) distribution has pdf

f

(x

) =

ba

Γ

(a

) xa−1 exp(−bx) if x > 0,

0 if x ≤ 0,

and its mean is ab. You may use these facts without proof.

(a) Find the posterior distribution of θ .

Solution: Ignoring terms that do not depend on θ, the posterior is

π(θ xn) ∝ θn exp−θn

i=1

x2

i θa−1 exp(−bθ) 1(0,∞)(θ)∝ θa+n−1 exp

−

b +

n

i=1

x2

i

θ

1(0,∞)

(θ

),

which we recognize as the unnormalized pdf of a Gamma(a + n, b +∑ni=1 x2

i ) distri-bution. Thus, θ xn ∼ Gamma(a + n, b +∑n

i=1 x2

i ).

(b) Find (or simply state) the posterior mean of θ.

Solution: The posterior mean of θ is simply E (θ xn) = (a+n)(b+∑ni=1 x2

i ).




5. Let X 1, . . . , X n ∼ iid Poisson(λ), where λ > 0 is unknown.

Note: The Poisson(λ) distribution has pmf

f λ

(x

) =

λx exp

(−λ

)x!

if x ∈

{0, 1, 2, . . .

},

0 if x ∉ {0, 1, 2, . . .},

and its mean and variance are both λ. Also, the maximum likelihood estimator of λ is

λ̂MLE

n =1

n

n

i=1

X i.


(a) Let λ0 > 0 be fixed and known. Find the likelihood ratio test of H 0 ∶ λ = λ0 versusH 1 ∶ λ ≠ λ0. (You do not need to state how to choose the critical value for this partof the question.)

Solution: Evaluating the likelihood at λ = λ0 and at λ = λ̂MLEn yields

LXn(λ0) = λ

∑n

i=1X i

0 exp(−nλ0)∏n

i=1 X i! ,

LXn(λ̂MLE

n ) = (n−1∑ni=1 X i)∑n

i=1X i exp[−n(n−1∑n

i=1 X i)]∏n

i=1 X i! .

Then the likelihood ratio statistic is

Λ(X n) = LXn(λ0)LXn(λ̂MLE

n ) = nλ0

∑ni=1 X i

∑n

i=1X i

exp n

i=1

X i − nλ0,

and the likelihood ratio test rejects H 0 if and only if Λ(X n) ≤ k for some criticalvalue k ≥ 0. (Simplification is not necessary for full credit.)

(b) State how the critical value of the likelihood ratio test in part (a) can be chosen togive the test approximate size α.

Solution: To give the likelihood ratio test in part (a) approximate size α, wereject H 0 if and only if

−2logΛ(X n) ≥ wα,

or equivalently, if and only if

Λ(X n) ≤ exp−wα

2 ,

where wα is the number such that P (W ≥ wα) = α for a χ2

1 random variable W .




6. Let X be a single continuous random variable with pdf

f µ(x) = exp(µ − x)

[1 + exp

(µ − x

)]2 =

1

4 sechµ − x

2 = 1

4 sechx −µ

2 ,

where sech is the hyperbolic secant function, and cdf

F µ(x) = 1

1 + exp(µ − x) ,

where µ ∈ R is unknown.

Note: The maximum likelihood estimator of µ is µ̂MLE = X . Also, sech(t) = sech(−t)for all t ∈ R, and sech(t) is a strictly decreasing function of t. You may use these facts without proof.

(a) Show that the likelihood ratio test of H 0 ∶ µ = 0 versus H 1 ∶ µ ≠ 0 rejects H 0 if and

only if X ≥ c for some critical value c. (You do not

need to state how to choose thecritical value for this part of the question.)

Solution: Evaluating the likelihood at µ = 0 and at µ = µ̂MLE yields

LX (0) = exp(−X )[1 + exp(−X )]2 = 1

4 sechX

2 , LX (X ) = 1(1 + 1)2 = 1

4,

so the likelihood ratio statistic is

Λ

(X

) =

LX

(0

)LX

(X

) =

4 exp

(−X

)[1 + exp

(−X

)]2 = sech

X

2

,

and we reject H 0 if and only if Λ(X ) ≤ k for some critical value k ≥ 0. The note tellsus that sech(X 2) = sech(−X 2) and that sech(X 2) is a strictly decreasing functionof X 2, so rejecting H 0 if and only if sech(X 2) ≤ k is equivalent to rejecting H 0 if and only if X ≥ c for some c.

(b) State how the critical value c of the likelihood ratio test in part (a) can be chosento give the test size α (exactly, not just approximately), where 0 < α < 1.

Solution: The test has size α if and only if

α = P µ=0(X ≥ c) = P µ=0(X ≤ −

c) +P µ=0(X ≥ c) = F 0(−c) + 1−

F 0(c)=1

1 + exp(c) + 1 −1

1 + exp(−c)=

2

1 + exp(c) .

Then we choose c = log(2α−1− 1) to achieve size α. (Simplifying and solving for c in

terms of α are not necessary for full credit.)




(c) For the likelihood ratio test with size α in parts (a) and (b), find the probability of a type II error if the true value of µ happens to be µ = µ ≠ 0.

Solution: Since µ ≠ µ0,

P µ(type II error) = P µ(X < c) = P µ(−c < X < c)= F µ(c) −F µ(−c)=

1

1 + exp(µ − c) − 1

1 + exp(µ + c)=

2α−1− 1

2α−1− 1 + exp(µ) − 1

1 + (2α−1− 1) exp(µ) .

(Inserting the value of c is not necessary for full credit.)

(d) Suppose we observe X = xobs. Find the p-value of the likelihood ratio test for theobserved data xobs.

Note: Be sure your answer is correct for both positive and negative values of xobs.

Solution: The p-value is

p(xobs) = P µ=0(X ≥ xobs) = P µ=0(X ≤ −xobs) +P µ=0(X ≥ xobs)= F 0(−xobs) + 1 −F 0(xobs)=

1

1 + exp(xobs) + 1 −1

1 + exp(−xobs) = 2

1 + exp(xobs) .


7. Suppose that we call a hypothesis test trivial if its rejection region is either the emptyset or the entire sample space, i.e., a trivial test is a test that either never rejects H 0or always rejects H 0. Now let X ∼ Bin(n, θ), where θ is unknown, and consider testingH 0 ∶ θ = 12 versus H 1 ∶ θ ≠ 12. Find a necessary and sufficient condition (in terms of n)for the existence of a test of these hypotheses with level α = 0.05 = 120 that is not trivial.

Solution: A test of these hypotheses with rejection region R has level 0.05 if and onlyif P θ=12(X ∈ R) ≤ 0.05. For such a test to be nontrivial, the rejection region R cannotbe the empty set. Now observe that the points in the sample space with the smallestprobability when θ = 1

2 are X = 0 and X = n, each of which have probability 1

2n when

θ = 12. Hence, there exists a nonempty rejection region R with P θ=12(X ∈ R) ≤ 0.05 if and only if 12n ≤ 0.05. This inequality holds if and only if n ≥ 5. Hence, there exists anontrivial test of these hypotheses with level α = 0.05 if and only n ≥ 5. Also, since it isclear that n must be a positive integer for the question to make sense, any condition thatis equivalent to n ≥ 5 when applied to the positive integers, such as n ≥ (log20)(log2),is also acceptable.




8. Let X 1, . . . , X n be iid Exp(1) random variables with pdf

f

(x

) =

exp(−x) if x ≥ 0,

0 if x < 0.

Then let Y n = max1≤i≤n X i. Find a sequence of constants an such that Y n − an convergesin distribution to a a random variable with cdf G(t) = exp[−exp(−t)]. (This limitingdistribution is called the Gumbel distribution.)

Hint: For any c ∈ R, (1 +n−1c)n → exp(c) as n →∞.

Solution: Let F denote the cdf of the Exp(1) distribution, which is

F (x) =

1 − exp(−x) if x ≥ 0,

0 if x < 0.

Now let Gn denote the cdf of the random variable Y n−

an. Then

Gn(t) = P (Y n − an ≤ t) = P (Y n ≤ t + an)= P max

1≤i≤nX i ≤ t + an

=

n

i=1

P (X i ≤ t + an)= [P (X 1 ≤ t + an)]n=

[F

(t + an

)]n=

[

1 − exp

(−t − an

)]n

if t + an ≥ 0,

0 if t + an < 0.

Now let an = log n. Then for every t ∈ R, t + an ≥ 0 for all sufficiently large n. Then forevery t ∈ R,

Gn(t) = [1 − exp(−t − log n)]n for all sufficiently large n

= 1 −n−1 exp(−t)n → exp[−exp(−t)] = G(t).

Thus, Y n − log n converges in distribution to a random variable with cdf G(t).


f µ(x) = 1√ 2π x3

exp 1µ −

x2µ2

−1

2x if x > 0,

0 if x ≤ 0,

where µ > 0 is unknown.




(a) Find the maximum likelihood estimator of µ.


′x

(µ

) =

∂

∂µ −

n

2 log

(2π

)−

3

2

n

i=1

log xi +n

µ −

1

2µ2

n

i=1

xi −1

2

n

i=1

1

xi = −n

µ2 +

1

µ3

n

i=1

xi = 0 ⇐⇒ µ =1

n

n

i=1

xi = xn.

Although it is not necessary to obtain full credit, we technically should now verifythat this critical point does indeed maximize the likelihood since it is not immediatelyclear that this is the case. (In particular, it is not obvious what the likelihood doesas µ →∞.) However, the second derivative at the critical point is

′′x

(xn

) =

2n

(xn

)3 −

3nxn

(xn

)4 = −

n

(xn

)3 < 0,

so this critical point is indeed a maximum. Hence, the MLE of µ is µ̂ = X n.

(b) Let µ0 > 0 be fixed and known. Find the likelihood ratio test of H 0 ∶ µ = µ0 versusH 1 ∶ µ ≠ µ0. (You do not need to state how to choose the critical value to give thetest a particular size in this part of the question.)

Solution: The likelihood ratio statistic is

Λ

(X

) =

LX

(µ0

)LX

(µ̂

) =

n

i=1

1 2π X 3i

exp1

µ0

−

X i

2µ2

0

−

1

2X i

n

i=11

2π X 3iexp

1

X n−

X i

2X n2 −1

2X i= exp n

µ0

−

nX n

2µ2

0

−

n

X n+

n

2X n = exp

− n

2X nX n

µ0

− 12,

and we reject H 0 if and only if Λ(X ) ≤ k for some choice of k.

(c) State how the critical value of the likelihood ratio test in part (b) can be chosen togive the test approximate size α.

Solution: To give the likelihood ratio test in part (b) approximate size α, we

reject H 0 if and only if

−2logΛ(X n) ≥ wα,

or equivalently, if and only if

Λ(X n) ≤ exp−wα

2 ,

where wα is the number such that P (W ≥ wα) = α for a χ2

1 random variable W .




(d) Explain how the likelihood ratio test in part (b) would change if the alternativehypothesis were H 1 ∶ µ < µ0 instead.

Solution: The MLE must now be found over the restricted parameter spaceµ ∈

(0, µ0

] rather than over µ > 0. Thus, if X n > µ0, the MLE is instead µ̂ = µ0,

which yields Λ(X ) = 1. If instead X n ≤ µ0, then there is no change.


f θ(x) = θ(θ + 1)(θ + x)2 if 0 ≤ x ≤ 1,

0 otherwise,

where θ > 0 is unknown. It can be shown that the Fisher information in the sample is

I n(θ

) =

n

3θ2(θ + 1)2

.

Use this fact to find (or simply state) the asymptotic distribution of the maximum likeli-hood estimator θ̂n of θ.

Note: There is no need to actually find the form of θ̂n or to verify the result for the Fisher information. Also, you may assume that the regularity conditions of Section 7.4 of the notes hold.

Solution: Let θ̂n denote the MLE of θ. Then √

n(θ̂n − θ)→D N [0, 3θ2(θ + 1)2].


f θ(x) = θ(θ + x)2 if x ≥ 0,

0 if x < 0,

where θ > 0 is unknown. Let θ̂n denote the maximum likelihood estimator of θ (whichyou do not need to find).

Note: It can be shown by simple calculus that

E θ

(X 1

) = E θ

1

X 1 = ∞, E θ

1

θ+

X 1 =

1

2θ

, E θ

1

(θ+

X 1)2 =

1

3θ2

,

so you may use any of these facts without proof. You may also assume that the relevantregularity conditions (i.e., those of Section 7.4 of the notes) are satisfied.




(a) Find the score function for the sample and show explicitly that its expectation iszero (i.e., do not simply cite the result from the notes that says that the expectationis zero).

Solution: The score function is (for θ > 0)

′Xn(θ) = ∂

∂θ

n

i=1

[log θ − 2log(θ +X i)] = ∂

∂θn log θ − 2

n

i=1

log(θ +X i)=

n

θ − 2

n

i=1

1

θ +X i.

Then clearly E θ[′Xn(θ)] = 0 since E [1(θ +X i)] = 1(2θ) for each i ∈ {1, . . . , n}.

(b) Find the Fisher information for the sample.

Solution: We first find the Fisher information per observation. The second

derivative of the log-likelihood for X 1 is (for θ > 0)

′′X 1(θ) = ∂ 2

∂θ2[log θ − 2log(θ +X 1)] = ∂

∂θ1

θ −

2

θ +X 1 = − 1

θ2 +

2(θ +X 1)2 .

Then

I 1(θ) = −E θ′′X 1(θ) = 1

θ2 − 2 E θ 1(θ +X 1)2 = 1

θ2 −

2

3θ2 =

1

3θ2,

and thus

I n(θ) = n I 1(θ) = n

3θ2

is the Fisher information for the sample.

(c) Find (or simply state) the asymptotic distribution of θ̂n.


Solution: Since the Fisher information per observation is I 1(θ) = 1(3θ2), theasymptotic distribution of the maximum likelihood estimator θ̂n of θ is

√ nθ̂n − θ→D N 0, 3θ2by the standard result from the notes.




12. Let X 1, . . . , X n ∼ iid N (µ, 1), where µ ∈ R is unknown.

(a) State an estimator of µ that is consistent but not unbiased.

Solution: There exist many consistent estimators of µ that are not unbiased,such as n−1

+X n

or (n−1)X nn. More generally, any estimator of the form a

n+b

nX

nis consistent if an → 0 and bn → 1 and is biased (i.e., not unbiased) if an ≠ 0 or bn ≠ 1.(However, there also exist estimators that meet the required criteria that are not of this form.)

(b) State an estimator of µ that is consistent but not asymptotically efficient.

Solution: Again, there exist many consistent estimators of µ that are not asymp-totically efficient. A simple example is to take the estimator to be the mean of justthe even-numbered observations among X 1, . . . , X n. (Taking the estimator to be themean of any subset of X 1, . . . , X n meets the required criteria if the fraction of obser-vations in the subset tends to a limit strictly between 0 and 1, e.g., the fraction of even-numbered observations tends to 12.)

(c) Is (X n)2 = (n−1∑ni=1 X i)2 an unbiased estimator of µ2? Why or why not?

Solution: No, since E µ[(X n)2] = [E µ(X n)]2 +Varµ(X n) = µ2+ n−1 ≠ µ2.

(d) Is (X n)2 = (n−1∑ni=1 X i)2 a consistent estimator of µ2? Why or why not?

Solution: Yes, since (X n)2 →P µ2 by the continuous mapping theorem, notingthat X n →P µ by the weak law of large numbers.

13. Lemma 7.2.1 of the notes states that E θ[′Xn(θ)] = 0 for all θ in the parameter space Θ.This result uses the regularity condition that X n = (X 1, . . . , X n) is an iid sample. Nowsuppose that we were to remove the condition of independence while keeping all otherregularity conditions in place. Explain why the result that E θ[′

Xn(θ)] = 0 for all θ ∈ Θ

would still be true.

Solution: If X 1, . . . , X n are not independent, then the log-likelihood is

Xn(θ) = logLX 1(θ)LX 2X 1(θ)LX 3X 1,X 2(θ)LX nX 1,...,X n−1(θ) = n

i=1

X iX 1,...,X i−1(θ),

where X iX 1,...,X i−1(θ) is simply the conditional pdf or pmf of X i given X 1

, . . . , X i−1

. Thisis still a valid pdf or pmf, so it still has the property that E θ[′X iX 1,...,X i−1(θ)] = 0. Thus,

we still obtain the result that E θ[′Xn(θ)] = 0.




14. Let X 1, . . . , X n be iid random variables from a distribution that depends on an unknownparameter θ ∈ R. This distribution has the following properties:

E θ

(X 1

) = 2 exp

(θ

), Varθ

(X 1

) = 12 exp

(2θ

),

E θ

(log X 1

) = θ, Varθ

(log X 1

) = 2 log 2,

E θ(X −11 ) = 2 exp(−θ), Varθ(X −11 ) = 12 exp(−2θ).

Now define the estimators

θ̃(1)n = log 1

2n

n

i=1

X i, θ̃(2)n = log

n

i=1

X i1n.

(a) Find a function v(1)(θ) such that

√ n

θ̃(1)n − θ

→D N

0, v(1)

(θ

).

Solution to (a): By the central limit theorem,

√ n1

n

n

i=1

X i − 2exp(θ)→D N [0, 12exp(2θ)].Now apply the delta method with the function g(t) = log(t2), which has derivativeg′(t) = 1t. Then g′[2exp(θ)] = 1

2 exp(−θ), and so√

n

θ̃(1)n − θ

→D N

(0, 3

).

Thus, v(1)(θ) = 3 for all θ .

(b) Find a function v(2)(θ) such that√ nθ̃

(2)n − θ→D N 0, v(2)(θ).

Solution to (b): Begin by rewriting θ̃(2)n as

θ̃(2)n =

1

n

n

i=1

log X i.

Then by the central limit theorem,√ nθ̃

(2)n − θ→D N (0, 2 log 2).

Thus, v(2)(θ) = 2 log 2 for all θ .




(c) Find AREθ[θ̃(1)n , θ̃

(2)n ], the asymptotic relative efficiency of θ̃

(1)n compared to θ̃

(2)n , and

use it to state which of the two estimators is better for large n.

Solution to (c): AREθ[θ̃(1)n , θ̃

(2)n ] = v(2)(θ)v(1)(θ) = (2log2)3 for all θ ∈ R.

Note that log 2 < log e = 1, so 2log2 < 3. Thus, θ̃(2)n is better than θ̃

(1)n for large n.

15. Let X 1, . . . , X n be iid discrete random variables with pmf

pθ(x) = (x + k − 1)!

x! (k − 1)! θk(1 − θ)x if x ∈ {0, 1, 2, . . .},

0 otherwise,

where k is a known positive integer, θ is unknown, and 0 < θ < 1.

(a) Find the maximum likelihood estimator θ̂n of θ.

Note: For the purposes of this question, you can ignore any possible data values for which the MLE does not exist.


′xn(θ) = ∂

∂θnk log θ +

n

i=1

xi log(1 − θ) + n

i=1

log(xi + k − 1)!

xi! (k − 1)!

=nk

θ −

1

1 − θ

n

i=1

xi = 0 ⇐⇒ (1 − θ)nk = θn

i=1

xi ⇐⇒ θ =nk

nk +∑ni=1 xi

.

It is clear from the form of the likelihood that this point is indeed a maximum. Thus,the maximum likelihood estimator of θ is

θ̂n =nk

nk +∑ni=1 X i

as long as X i > 0 for some i ∈ {1, . . . , n}. (Otherwise the MLE does not exist, butthe question says we can ignore this possibility.)

(b) Find the asymptotic distribution of the maximum likelihood estimator θ̂n of θ .

Note: You may use without proof the fact that E θ(X 1) = (1 − θ)kθ, and you may assume that the regularity conditions of Section 7.4 of the notes hold.

Solution: First,

′′X 1

(θ

) =

∂

∂θ

k

θ −

X 1

1 − θ

= −

k

θ2 −

X 1

(1 − θ

)2

,

and thus

I 1(θ) = −E θ′′X 1(θ) = k

θ2 +

E θ(X 1)(1 − θ)2 = k

θ2 +

k

θ(1 − θ) = k

θ2(1 − θ) .

Then √ nθ̂n − θ→D N 0 ,

θ2(1 − θ)k ,

assuming the various regularity conditions.




16. Let X be a single continuous random variable with pdf

f θ(x) = 1

π

[1 +

(x − θ

)2

],

where θ ∈ R

is unknown. Let θ0 ∈ R

be fixed and known, and consider testing H 0 ∶ θ = θ0

versus H 1 ∶ θ ≠ θ0.

Note: The cdf of X is F θ(x) = 1

π arctan(x−θ)+ 1

2, noting that the arctan function is strictly

increasing and is also an odd function, i.e., arctan(−u) = −arctan(u) for all u ∈ R. Also,the π that appears in the pdf and cdf is the usual mathematical constant, i.e., π ≈ 3.14.You may use any of these facts without proof.

(a) Show that the likelihood ratio test of these hypotheses rejects H 0 if and only if X ≤ θ0 − c or X ≥ θ0 + c, where c ≥ 0.

Solution: The maximum likelihood estimator of θ is clearly θ̂ = X . Then the

likelihood ratio statistic is

Λ(X ) = LX (θ0)LX (θ̂) = 1

1 + (X − θ0)2 ,

and we reject H 0 if and only if Λ(X ) ≤ k for some k. Now note that Λ(X ) ≤ k isequivalent to X − θ0 ≥ c for some c ≥ 0.

(b) Let 0 < α < 1. Find the value of c such that the test in part (a) has size α (exactly).

Note: The inverse of the arctan function is simply the tan function.

Solution: The test has size α if and only if

α = P θ0(X − θ0 ≥ c) = P θ0(X ≤ θ0 − c) +P θ0(X ≥ θ0 + c)= F θ0(θ0 − c) + 1 −F θ0(θ0 + c)=

1

π arctan(−c) + 1

2 + 1 −

1

π arctan(c) − 1

2 = 1 −

2

π arctan(c).

Thus,

c = tan(1 −α)π

2

gives the test size α (exactly).




17. Let X be a single random variable with a continuous uniform distribution on [0, θ], whereθ > 0 is unknown. Consider testing H 0 ∶ θ = 1 versus H 1 ∶ θ ≠ 1.

Note: The maximum likelihood estimator of θ is θ̂ = X . You may use this fact without proof.

(a) Let 0 < α < 1. Find the likelihood ratio test of these hypotheses, and find the criticalvalue that gives the test size α.

Solution: Evaluating the likelihood at θ = 1 yields

LX (1) = 1 if X ≤ 1,

0 if X > 1.

while evaluating the likelihood at θ̂ yields

LX θ̂ =

1

θ̂=

1

X .

Then the likelihood ratio test statistic is

Λ(X ) = X if X ≤ 1,

0 if X > 1,

and we reject H 0 if and only if Λ(X ) ≤ k for some critical value k. To give the testsize α, we must choose k so that α = P θ=1[Λ(X ) ≤ k]. Observe that if θ = 1, then X

and Λ

(X

) are both uniformly distributed on

[0, 1

]. Then P θ=1

[Λ

(X

) ≤ k

] = k. Thus,

we take k = α to give the test size α.

(b) Find values c1 > 0 and c2 > 0 such that the likelihood ratio test with size α in part (a)takes the form “Reject H 0 if and only if either X ≤ c1 or X > c2.”

Solution: Observe that Λ(X ) ≤ k = α if and only if either X ≤ α or X > 1. Thus,c1 = α and c2 = 1.

(c) Give two reasons why it would not be appropriate to choose the critical value forpart (a) based on the result that −2 log Λ has an approximate χ2

1 distribution.

Solution: First, this result depends on various regularity conditions that are

not satisfied since the support of the distribution of X depends on the unknownparameter θ. Second, this approximation is based on an asymptotic result, whichmeans it holds for large n. However, here n = 1.




18. Let X 1, . . . , X n be iid Poisson(λ) random variables with pmf

pλ

(x

) =

λx exp(−λ)x!

if x ∈ {0, 1, 2, . . .},

0 otherwise,

where λ > 0 is unknown. Then let λ0 > 0 be fixed and known, and consider testingH 0 ∶ λ = λ0 versus λ ≠ λ0.

Note: The MLE of λ is λ̂n = X n, and E λ(X 1) = λ. You may use these facts without proof.

(a) Find the Wald test of these hypotheses, and state how to choose the critical valueto give the test approximate size α. (You may use either version of the Wald test.)

Solution (Version I): First, observe that

′′Xn

(λ

) =

∂ 2

∂λ2

n

i=1

X i log λ − nλ −n

i=1

log

(X i!

) = −

nX n

λ2 .

Then

J n = −′′Xnλ̂n = n

X n.

Then the Wald test rejects H 0 if and only if n

X nX n − λ0 ≥ z α2,

where z α2 is the number such that P

(Z

≥ z α2

) = α for a standard normal random

variable Z .

Solution (Version II): First, observe that

′′Xn(λ) = ∂ 2

∂λ2 ni=1


i=1

log(X i!) = −nX n

λ2 .

Then

I n(λ) = −E λ′′Xn(λ) = n

λ,

so

I n(λ̂n) = n

X n .

Then the Wald test rejects H 0 if and only if n

X nX n − λ0 ≥ z α2,





(b) Find the score test of these hypotheses, and state how to choose the critical value togive the test approximate size α.

Solution: The score function is

′

Xn(λ) = ∂

∂λ n

i=1


i=1

log(X i!) = nX n

λ − n,

while the Fisher information is

I n(λ) = −E λ′′Xn(λ) = n

λ.

Then the score test rejects H 0 if and only if λ0

n

nX n

λ0

− n

≥ z α2,

i.e., if and only if n

λ0

X n − λ0 ≥ z α2,


(c) Find the Wald confidence interval for λ with approximate confidence level 1 − α.(You may use either version of the Wald confidence interval.)

Solution: Using results from part (a), the Wald confidence interval for λ withapproximate confidence level 1 −α isλ0 ∶ X n − z α2

X n

n < λ0 < X n + z α2

X n

n


Final Exam Sample Solutions

Documents