Top Banner
Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) Mondays, 12.00 – 13.30, Room: J 498 Mondays, 14.15 – 15.45, Room: J 498 Prof. Dr. Bernd Wilfling Westf¨ alische Wilhelms-Universit¨ atM¨unster
301

Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Sep 07, 2018

Download

Documents

dinhnhu
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Slides

Advanced Statistics

Winter Term 2014/2015(October 13, 2014 – November 24, 2014)

Mondays, 12.00 – 13.30, Room: J 498Mondays, 14.15 – 15.45, Room: J 498

Prof. Dr. Bernd Wilfling

Westfalische Wilhelms-Universitat Munster

Page 2: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Contents

1 Introduction1.1 Syllabus1.2 Why ’Advanced Statistics’?

2 Random Variables, Distribution Functions, Expectation,Moment Generating Functions

2.1 Basic Terminology

2.2 Random Variable, Cumulative Distribution Function, Density Function2.3 Expectation, Moments and Moment Generating Functions2.4 Special Parameteric Families of Univariate Distributions

3 Joint and Conditional Distributions, Stochastic Independence3.1 Joint and Marginal Distribution3.2 Conditional Distribution and Stochastic Independence3.3 Expectation and Joint Moment Generating Functions

3.4 The Multivariate Normal Distribution

4 Distributions of Functions of Random Variables

4.1 Expectations of Functions of Random Variables

4.2 Cumulative-distribution-function Technique4.3 Moment-generating-function Technique4.4 General Transformations

5 Methods of Estimation5.1 Sampling, Estimators, Limit Theorems5.2 Properties of Estimators

5.3 Methods of Estimation5.3.1 Least-Squares Estimators5.3.2 Method-of-moments Estimators5.3.3 Maximum-Likelihood Estimators

6 Hypothesis Testing

6.1 Basic Terminology6.2 Classical Testing Procedures6.2.1 Wald Test

6.2.2 Likelihood-Ratio Test6.2.3 Lagrange-Multiplier Test

i

Page 3: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

References and Related Reading

In German:

Mosler, K. und F. Schmid (2011). Wahrscheinlichkeitsrechnung und schließende Statistik

(4. Auflage). Springer Verlag, Heidelberg.

Schira, J. (2012). Statistische Methoden der VWL und BWL – Theorie und Praxis (4. Auf-lage). Pearson Studium, Munchen.

Wilfling, B. (2013). Statistik I. Skript zur Vorlesung Statistik I – Deskriptive Sta-tistik im Wintersemester 2013/2014 an der Westfalischen Wilhelms-UniversitatMunster.

Wilfling, B. (2014). Statistik II. Skript zur Vorlesung Statistik II – Wahrscheinlich-keitsrechnung und schließende Statistik im Sommersemester 2014 an der

Westfalischen Wilhelms-Universitat Munster.

In English:

Chiang, A. (1984). Fundamental Methods of Mathematical Economics, 3. edition. McGraw-

Hill, Singapore.

Feller, W. (1968). An Introduction to Probability Theory and its Applications, Vol. 1. John

Wiley & Sons, New York.

Feller, W. (1971). An Introduction to Probability Theory and its Applications, Vol. 2. JohnWiley & Sons, New York.

Garthwaite, P.H., Jolliffe, I.T. and B. Jones (2002). Statistical Inference, 3. edition. OxfordUniversity Press, Oxford.

Mood, A.M., Graybill, F.A. and D.C. Boes (1974). Introduction to the Theory of Statistics,

3. edition. McGraw-Hill, Tokyo.

ii

Page 4: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

1. Introduction

1.1 Syllabus

Aim of this course:

• Consolidation of

– probability calculus

– statistical inference(on the basis of previous Bachelor courses)

• Preparatory course to Econometrics, Empirical Economics

1

Page 5: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Web-site:

• http://www1.wiwi.uni-muenster.de/oeew/

−→ Study −→ Courses winter term 2014/2015

−→ Advanced Statistics

Style:

• Lecture is based on slides

• Slides are downloadable as PDF-files from the web-site

References:

• See ’Contents’

2

Page 6: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

How to get prepared for the exam:

• Courses

• Class in ’Advanced Statistics’(Fri, 10.00 – 11.30 [Room: J 498] andFri, 12.00 – 13.30 [Room: J 498],October 17, 2014 – November 28, 2014)

Auxiliary material to be used in the exam:

• Pocket calculator (non-programmable)

• Course-slides (clean)

• No textbooks

3

Page 7: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Class teacher:

• Dipl.-Vw. Sarah Meyer(see personal web-site)

4

Page 8: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

1.2 Why ’Advanced Statistics’?

Contents of the BA course Statistics II:

• Random experiments, events, probability

• Random variables, distributions

• Samples, statistics

• Estimators

• Tests of hypothesis

Aim of the BA course ’Statistics II’:

• Elementary understanding of statistical concepts(sampling, estimation, hypothesis-testing)

5

Page 9: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Now:

• Course in Advanced Statistics(probability calculus and mathematical statistics)

Aim of this course:

• Better understanding of distribution theory

• How can we find good estimators?

• How can we construct good tests of hypothesis?

6

Page 10: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Preliminaries:

• BA coursesMathematicsStatistics IStatistics II

• The slides for the BA courses Statistics I+II are downloadablefrom the web-site(in German)

Later courses based on ’Advanced Statistics’:

• All courses belonging to the three modules ’Econometricsand Empirical Economics’(Econometrics I+II, Analysis of Time Series, ...)

7

Page 11: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

2. Random Variables, Distribution Functions, Ex-pectation, Moment generating Functions

Aim of this section:

• Mathematical definition of the concepts

random variable

(cumulative) distribution function

(probability) density function

expectation and moments

moment generating function

8

Page 12: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Preliminaries:

• Repetition of the notions

random experiment

outcome (sample point) and sample space

event

probability

(see Wilfling (2014), Chapter 2)

9

Page 13: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

2.1 Basic Terminology

Definition 2.1: (Random experiment)

A random experiment is an experiment

(a) for which we know in advance all conceivable outcomes thatit can take on, but

(b) for which we do not know in advance the actual outcomethat it eventually takes on.

Random experiments are performed in controllable trials.

10

Page 14: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Examples of random experiments:

• Drawing of lottery numbers

• Roulette, tossing a coin, tossing a dice

• ’Technical experiments’(testing the hardness of lots from steel production etc.)

In economics:

• Random experiments (according to Def. 2.1) are rare(historical data, trials are not controllable)

• Modern discipline: Experimental Economics

11

Page 15: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Definition 2.2: (Sample point, sample space)

Each conceivable outcome ω of a random experiment is called asample point. The totality of conceivable outcomes (or samplepoints) is defined as the sample space and is denoted by Ω.

Examples:

• Random experiment of tossing a single dice:

Ω = 1,2,3,4,5,6

• Random experiment of tossing a coin until HEAD shows up:

Ω = H,TH,TTH,TTTH,TTTTH, . . .

• Random experiment of measuring tomorrow’s exchange ratebetween the euro and the US-$:

Ω = [0,∞)

12

Page 16: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Obviously:

• The number of elements in Ω can be either (1) finite or (2)infinite, but countable or (3) infinite and uncountable

Now:

• Definition of the notion Event based on mathematical sets

Definition 2.3: (Event)

An event of a random experiment is a subset of the sample spaceΩ. We say ’the event A occurs’ if the random experiment hasan outcome ω ∈ A.

13

Page 17: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remarks:

• Events are typically denoted by A, B, C, . . . or A1, A2, . . .

• A = Ω is called the sure event(since for every sample point ω we have ω ∈ A)

• A = ∅ (empty set) is called the impossible event(since for every ω we have ω /∈ A)

• If the event A is a subset of the event B (A ⊂ B) we say that’the occurrence of A implies the occurrence of B’(since for every ω ∈ A we also have ω ∈ B)

Obviously:

• Events are represented by mathematical sets−→ application of set operations to events

14

Page 18: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Combining events (set operations):

• Intersection:n⋂

i=1Ai occurs, if all Ai occur

• Union:n⋃

i=1Ai occurs, if at least one Ai occurs

• Set difference:C = A\B occurs, if A occurs and B does not occur

• Complement:C = Ω\A ≡ A occurs, if A does not occur

• The events A and B are called disjoint, if A ∩B = ∅(both events cannot occur simultaneously)

15

Page 19: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Now:

• For any arbitrary event A we are looking for a number P (A)which represents the probability that A occurs

• Formally:

P : A −→ P (A)

(P (·) is a set function)

Question:

• Which properties should the probability function (set func-tion) P (·) have?

16

Page 20: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Definition 2.4: (Kolmogorov-axioms)

The following axioms for P (·) are called Kolmogorov-axioms:

• Nonnegativity: P (A) ≥ 0 for every A

• Standardization: P (Ω) = 1

• Additivity: For two disjoint events A and B (i.e. for A∩B = ∅)P (·) satisfies

P (A ∪B) = P (A) + P (B)

17

Page 21: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Easy to check:

• The three axioms imply several additional properties and ruleswhen computing with probabilities

Theorem 2.5: (General properties)

The Kolmogorov-axioms imply the following properties:

• Probability of the complementary event:

P (A) = 1− P (A)

• Probability of the impossible event:

P (∅) = 0

• Range of probabilities:

0 ≤ P (A) ≤ 1

18

Page 22: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Next:

• General rules when computing with probabilities

Theorem 2.6: (Calculation rules)

The Kolmogorov-axioms imply the following calculation rules(A, B, C are arbitrary events):

• Addition rule (I):

P (A ∪B) = P (A) + P (B)− P (A ∩B)

(probability that A or B occurs)

19

Page 23: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• Addition rule (II):

P (A ∪B ∪ C) = P (A) + P (B) + P (C)

−P (A ∩B)− P (B ∩ C)

−P (A ∩ C) + P (A ∩B ∩ C)

(probability that A or B or C occurs)

• Probability of the ’difference event’:

P (A\B) = P (A ∩B)

= P (A)− P (A ∩B)

20

Page 24: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Notice:

• If B implies A (i.e. if B ⊂ A) it follows that

P (A\B) = P (A)− P (B)

21

Page 25: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

2.2 Random Variable, Cumulative DistributionFunction, Density Function

Frequently:• Instead of being interested in a concrete sample point ω ∈ Ω

itself, we are rather interested in a number depending on ω

Examples:• Profit in euro when playing roulette

• Profit earned when selling a stock

• Monthly salary of a randomly selected person

Intuitive meaning of a random variable:• Rule translating the abstract ω into a number

22

Page 26: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Definition 2.7: (Random variable [rv])

A random variable, denoted by X or X(·), is a mathematicalfunction of the form

X : Ω −→ Rω −→ X(ω).

Remarks:

• A random variable relates each sample point ω ∈ Ω to a realnumber

• Intuitively:A random variable X characterizes a number that is a prioriunknown

23

Page 27: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• When the random experiment is carried out, the randomvariable X takes on the value x

• x is called realization or value of the random variable X afterthe random experiment has been carried out

• Random variables are denoted by capital letters, realizationsare denoted by small letters

• The rv X describes the situation ex ante, i.e. before carryingout the random experiment

• The realization x describes the situation ex post, i.e. afterhaving carried out the random experiment

24

Page 28: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Example 1:

• Consider the experiment of tossing a single coin (H=Head,T=Tail). Let the rv X represent the ’Number of Heads’

• We have

Ω = H, T

The random variable X can take on two values:

X(T ) = 0, X(H) = 1

25

Page 29: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Example 2:

• Consider the experiment of tossing a coin three times. LetX represent the ’Number of Heads’

• We have

Ω = (H, H, H)︸ ︷︷ ︸

=ω1

, (H, H, T )︸ ︷︷ ︸

=ω2

, . . . , (T, T, T )︸ ︷︷ ︸

=ω8

The rv X is defined by

X(ω) = number of H in ω

• Obviously:X relates distinct ω’s to the same number, e.g.

X((H, H, T )) = X((H, T, H)) = X((T, H, H)) = 2

26

Page 30: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Example 3:

• Consider the experiment of randomly selecting 1 person froma group of people. Let X represent the person’s status ofemployment

• We have

Ω = ’employed’︸ ︷︷ ︸

=ω1

, ’unemployed’︸ ︷︷ ︸

=ω2

• X can be defined as

X(ω1) = 1, X(ω2) = 0

27

Page 31: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Example 4:

• Consider the experiment of measuring tomorrow’s price of aspecific stock. Let X denote the stock price

• We have Ω = [0,∞), i.e. X is defined by

X(ω) = ω

Conclusion:

• The random variable X can take on distinct values with spe-cific probabilities

28

Page 32: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Question:

• How can we determine these specific probabilities and howcan we calculate with them?

Simplifying notation: (a, b, x ∈ R)

• P (X = a) ≡ P (ω|X(ω) = a)

• P (a < X < b) ≡ P (ω|a < X(ω) < b)

• P (X ≤ x) ≡ P (ω|X(ω) ≤ x)

Solution:

• We can compute these probabilities via the so-called cumu-lative distribution function of X

29

Page 33: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Intuitively:

• The cumulative distribution function of the random variableX characterizes the probabilities according to which the pos-sible values x are distributed along the real line(the so-called distribution of X)

Definition 2.8: (Cumulative distribution function [cdf])

The cumulative distribution function of a random variable X,denoted by FX, is defined to be the function

FX : R −→ [0,1]x −→ FX(x) = P (ω|X(ω) ≤ x) = P (X ≤ x).

30

Page 34: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Example:

• Consider the experiment of tossing a coin three times. LetX represent the ’Number of Heads’

• We have

Ω = (H, H, H)︸ ︷︷ ︸

= ω1

, (H, H, T )︸ ︷︷ ︸

= ω2

, . . . , (T, T, T )︸ ︷︷ ︸

= ω8

• For the probabilities of X we find

P (X = 0) = P ((T, T, T )) = 1/8

P (X = 1) = P ((T, T, H), (T, H, T ), (H, T, T )) = 3/8

P (X = 2) = P ((T, H, H), (H, T, H), (H, H, T )) = 3/8

P (X = 3) = P ((H, H, H)) = 1/8

31

Page 35: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• Thus, the cdf is given by

FX(x) =

0.000 forx < 00.125 for 0 ≤ x < 10.5 for 1 ≤ x < 2

0.875 for 2 ≤ x < 31 forx ≥ 3

Remarks:

• In practice, it will be sufficient to only know the cdf FX of X

• In many situations, it will appear impossible to exactly specifythe sample space Ω or the explicit function X : Ω −→ R.However, often we may derive the cdf FX from other factualconsideration

32

Page 36: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

General properties of FX:

• FX(x) is a monotone, nondecreasing function

• We have

limx→−∞

FX(x) = 0 and limx→+∞

FX(x) = 1

• FX is continuous from the right; that is,

limz→xz>x

FX(z) = FX(x)

33

Page 37: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Summary:

• Via the cdf FX(x) we can answer the following question:

’What is the probability that the random variable X takeson a value that does not exceed x?’

Now:

• Consider the question:

’What is the value which X does not exceed with aprespecified probability p ∈ (0,1)?’

−→ quantile function of X

34

Page 38: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Definition 2.9: (Quantile function)

Consider the rv X with cdf FX. For every p ∈ (0,1) the quantilefunction of X, denoted by QX(p), is defined as

QX : (0,1) −→ Rp −→ QX(p) = minx|FX(x) ≥ p.

The value of the quantile function xp = QX(p) is called the pthquantile of X.

Remarks:

• The pth quantile xp of X is defined as the smallest numberx satisfying FX(x) ≥ p

• In other words: The pth quantile xp is the smallest value thatX does not exceed with probability p

35

Page 39: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Special quantiles:

• Median: p = 0.5

• Quartiles: p = 0.25,0.5,0.75

• Quintiles: p = 0.2,0.4,0.6,0.8

• Deciles: p = 0.1,0.2, . . . ,0.9

Now:

• Consideration of two distinct classes of random variables(discrete vs. continuous rv’s)

36

Page 40: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Reason:

• Each class requires a specific mathematical treatment

Mathematical tools for analyzing discrete rv’s:

• Finite and infinite sums

Mathematical tools for analyzing continuous rv’s:

• Differential- and integral calculus

Remarks:

• Some rv’s are partly discrete and partly continuous

• Such rv’s are not treated in this course

37

Page 41: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Definition 2.10: (Discrete random variable)

A random variable X will be defined to be discrete if it can takeon either

(a) only a finite number of values x1, x2, . . . , xJ or

(b) an infinite, but countable number of values x1, x2, . . .

each with strictly positive probability; that is, if for all j =1, . . . , J, . . . we have

P (X = xj) > 0 andJ,...∑

j=1P (X = xj) = 1.

38

Page 42: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Examples of discrete variables:

• Countable variables (’X = Number of . . .’)

• Encoded qualitative variables

Further definitions:

Definition 2.11: (Support of a discrete random variable)

The support of a discrete rv X, denoted by supp(X), is definedto be the totality of all values that X can take on with a strictlypositive probability:

supp(X) = x1, . . . , xJ or supp(X) = x1, x2, . . ..

39

Page 43: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Definition 2.12: (Discrete density function)

For a discrete random variable X the function

fX(x) = P (X = x)

is defined to be the discrete density function of X.

Remarks:

• The discrete density function fX(·) takes on strictly positivevalues only for elements of the support of X. For realizationsof X that do not belong to the support of X, i.e. for x /∈supp(X), we have fX(x) = 0:

fX(x) =

P (X = xj) > 0 forx = xj ∈ supp(X)0 forx /∈ supp(X)

40

Page 44: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• The discrete density function fX(·) has the following proper-ties:

fX(x) ≥ 0 for all x

xj∈supp(X)

fX(xj) = 1

• For any arbitrary set A ⊂ R the probability of the eventω|X(ω) ∈ A = X ∈ A is given by

P (X ∈ A) =∑

xj∈AfX(xj)

41

Page 45: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Example:

• Consider the experiment of tossing a coin three times andlet X = ’Number of Heads’(see slide 31)

• Obviously: X is discrete and has the support

supp(X) = 0,1,2,3

• The discrete density function of X is given by

fX(x) =

P (X = 0) = 0.125 forx = 0P (X = 1) = 0.375 forx = 1P (X = 2) = 0.375 forx = 2P (X = 3) = 0.125 forx = 3

0 forx /∈ supp(X)

42

Page 46: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• The cdf of X is given by (see slide 32)

FX(x) =

0.000 forx < 00.125 for 0 ≤ x < 10.5 for 1 ≤ x < 2

0.875 for 2 ≤ x < 31 forx ≥ 3

Obviously:

• The cdf FX(·) can be obtained from fX(·):

FX(x) = P (X ≤ x) =∑

xj∈supp(X)|xj≤x

=P (X=xj)︷ ︸︸ ︷

fX(xj)

43

Page 47: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Conclusion:

• The cdf of a discrete random variable X is a step functionwith steps at the points xj ∈ supp(X). The height of thestep at xj is given by

FX(xj)− limx→xjx<xj

F (x) = P (X = xj) = fX(xj),

i.e. the step height is equal to the value of the discrete densityfunction at xj(relationship between cdf and discrete density function)

44

Page 48: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Now:

• Definition of continuous random variables

Intuitively:

• In contrast to discrete random variables, continuous randomvariables can take on an uncountable number of values(e.g. every real number on a given interval)

In fact:

• Definition of a continuous random variable is quite technical

45

Page 49: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Definition 2.13: (Continuous rv, probability density function)

A random variable X is called continuous if there exists a functionfX : R −→ [0,∞) such that the cdf of X can be written as

FX(x) =∫ x

−∞fX(t)dt for all x ∈ R.

The function fX(x) is called the probability density function (pdf)of X.

Remarks:

• The cdf FX(·) of a continuous random variable X is a prim-itive function of the pdf fX(·)

• FX(x) = P (X ≤ x) is equal to the area under the pdf fX(·)between the limits −∞ and x

46

Page 50: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Cdf FX(·) and pdf fX(·)

47

x

fX(t)

P(X ≤ x) = FX(x)

t

Page 51: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Properties of the pdf fX(·):

1. A pdf fX(·) cannot take on negative value, i.e.

fX(x) ≥ 0 for all x ∈ R

2. The area under a pdf is equal to one, i.e.∫ +∞

−∞fX(x)dx = 1

3. If the cdf FX(x) is differentiable we have

fX(x) = F ′X(x) ≡ dFX(x)/dx

48

Page 52: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Example: (Uniform distribution over [0,10])

• Consider the random variable X with pdf

fX(x) =

0 , for x /∈ [0,10]0.1 , for x ∈ [0,10]

• Derivation of the cdf FX:For x < 0 we have

FX(x) =∫ x

−∞fX(t) dt =

∫ x

−∞0 dt = 0

49

Page 53: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

For x ∈ [0,10] we have

FX(x) =∫ x

−∞fX(t) dt

=∫ 0

−∞0 dt

︸ ︷︷ ︸

=0

+∫ x

00.1 dt

= [0.1 · t]x0

= 0.1 · x− 0.1 · 0

= 0.1 · x

50

Page 54: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

For x > 10 we have

FX(x) =∫ x

−∞fX(t) dt

=∫ 0

−∞0 dt

︸ ︷︷ ︸

=0

+∫ 10

00.1 dt

︸ ︷︷ ︸

=1

+∫ ∞

100 dt

︸ ︷︷ ︸

=0

= 1

51

Page 55: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Now:

• Interval probabilities, i.e. (for a, b ∈ R, a < b)

P (X ∈ (a, b]) = P (a < X ≤ b)

• We have

P (a < X ≤ b) = P (ω|a < X(ω) ≤ b)

= P (ω|X(ω) > a ∩ ω|X(ω) ≤ b)

= 1− P (ω|X(ω) > a ∩ ω|X(ω) ≤ b)

= 1− P (ω|X(ω) > a ∪ ω|X(ω) ≤ b)

= 1− P (ω|X(ω) ≤ a ∪ ω|X(ω) > b)

52

Page 56: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

= 1− [P (X ≤ a) + P (X > b)]

= 1− [FX(a) + (1− P (X ≤ b))]

= 1− [FX(a) + 1− FX(b)]

= FX(b)− FX(a)

=∫ b

−∞fX(t) dt−

∫ a

−∞fX(t) dt

=∫ b

afX(t) dt

53

Page 57: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Interval probability between the limits a and b

54

a x b

fX(x)

P(a < X ≤ b)

Page 58: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Important result for a continuous rv X:

P (X = a) = 0 for all a ∈ R

Proof:

P (X = a) = limb→a

P (a < X ≤ b) = limb→a

∫ b

afX(x) dx

=∫ a

afX(x)dx = 0

Conclusion:

• The probability that a continuous random variable X takeson a single explicit value is always zero

55

Page 59: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Probability of a single value

56

a b1b2b3

fX(x)

x

Page 60: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Notice:

• This does not imply that the event X = a cannot occur

Consequence:

• Since for continuous random variables we always have P (X =a) = 0 for all a ∈ R, it follows that

P (a < X < b) = P (a ≤ X < b) = P (a ≤ X ≤ b)

= P (a < X ≤ b) = FX(b)− FX(a)

(when computing interval probabilities for continuous rv’s, itdoes not matter if the interval is open or closed)

57

Page 61: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

2.3 Expectation, Moments and Moment Gener-ating Functions

Repetition:

• Expectation of an arbitrary random variable X

Definition 2.14: (Expectation)

The expectation of the random variable X, denoted by E(X), isdefined by

E(X) =

xj∈supp(X)xj · P (X = xj) , if X is discrete

∫ +∞

−∞x · fX(x) dx , if X is continuous

.

58

Page 62: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remarks:

• The expectation of the random variable X is approximatelyequal to the sum of all realizations each weighted by theprobability of its occurrence

• Instead of E(X) we often write µX

• There exist random variables that do not have an expectation(see class)

59

Page 63: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Example 1: (Discrete random variable)

• Consider the experiment of tossing two dice. Let X repre-sent the absolute difference of the two dice. What is theexpectation of X?

• The support of X is given by

supp(X) = 0,1,2,3,4,5

60

Page 64: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• The discrete density function of X is given by

fX(x) =

P (X = 0) = 6/36 forx = 0P (X = 1) = 10/36 forx = 1P (X = 2) = 8/36 forx = 2P (X = 3) = 6/36 forx = 3P (X = 4) = 4/36 forx = 4P (X = 5) = 2/36 forx = 5

0 forx /∈ supp(X)

• This gives

E(X) = 0 ·636

+ 1 ·1036

+ 2 ·836

+ 3 ·636

+ 4 ·436

+ 5 ·236

=7036

= 1.9444

61

Page 65: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Example 2: (Continuous random variable)

• Consider the continuous random variable X with pdf

fX(x) =

x4

, for 1 ≤ x ≤ 3

0 , elsewise

• To calculate the expectation we split up the integral:

E(X) =∫ +∞

−∞x · fX(x) dx

=∫ 1

−∞0 dx +

∫ 3

1x ·

x4

dx +∫ +∞

30 dx

62

Page 66: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

=∫ 3

1

x2

4dx =

14·[13· x3

]3

1

=14·(27

3−

13

)

=2612

= 2.1667

Frequently:

• Random variable X plus discrete density or pdf fX is known

• We have to find the expectation of the transformed randomvariable

Y = g(X)

63

Page 67: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Theorem 2.15: (Expectation of a transformed rv)

Let X be a random variable with discrete density or pdf fX(·).For any Baire-function g : R −→ R the expectation of the trans-formed random variable Y = g(X) is given by

E(Y ) = E[g(X)]

=

xj∈supp(X)g(xj) · P (X = xj) , if X is discrete

∫ +∞

−∞g(x) · fX(x) dx , if X is continuous

.

64

Page 68: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remarks:

• All functions considered in this course are Baire-functions

• For the special case g(x) = x (the identity function) Theorem2.15 coincides with Definition 2.14

Next:

• Some important rules for calculating expected values

65

Page 69: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Theorem 2.16: (Properties of expectations)

Let X be an arbitrary random variable (discrete or continuous),c, c1, c2 ∈ R constants and g, g1, g2 : R −→ R functions. Then:

1. E(c) = c.

2. E[c · g(X)] = c · E[g(X)].

3. E[c1 · g1(X) + c2 · g2(X)] = c1 · E[g1(X)] + c2 · E[g2(X)].

4. If g1(x) ≤ g2(x) for all x ∈ R then

E[g1(X)] ≤ E[g2(X)].

Proof: Class

66

Page 70: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Now:

• Consider the random variable X (discrete or continuous) andthe explicit function g(x) = [x− E(X)]2

−→ variance and standard deviation of X

Definition 2.17: (Variance, standard deviation)

For any random variable X the variance, denoted by Var(X), isdefined as the expected quadratic distance between X and itsexpectation E(X); that is

Var(X) = E[(X − E(X))2].

The standard deviation of X, denoted by SD(X), is defined tobe the (positive) square root of the variance:

SD(X) = +√

Var(X).

67

Page 71: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remark:

• Setting g(X) = [X − E(X)]2 in Theorem 2.15 (on slide 64)yields the following explicit formulas for discrete and contin-uous random variables:

Var(X) = E[g(X)]

=

xj∈supp(X)[xj − E(X)]2 · P (X = xj)

∫ +∞

−∞[x− E(X)]2 · fX(x) dx

68

Page 72: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Example: (Discrete random variable)

• Consider again the experiment of tossing two dice with Xrepresenting the absolute difference of the two dice (see Ex-ample 1 on slide 60). The variance is given by

Var(X) = (0− 70/36)2 · 6/36 + (1− 70/36)2 · 10/36

+ (2− 70/36)2 · 8/36 + (3− 70/36)2 · 6/36

+ (4− 70/36)2 · 4/36 + (5− 70/36)2 · 2/36

= 2.05247

Notice:

• The variance is an expectation per definitionem−→ rules for expectations are applicable

69

Page 73: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Theorem 2.18: (Rules for variances)

Let X be an arbitrary random variable (discrete or continuous)and a, b ∈ R real constants; then

1. Var(X) = E(X2)− [E(X)]2.

2. Var(a + b ·X) = b2 ·Var(X).

Proof: Class

Next:

• Two important inequalities dealing with expectations andtransformed random variables

70

Page 74: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Theorem 2.19: (Chebyshev inequality)

Let X be an arbitrary random variable and g : R −→ R+ a non-negative function. Then, for every k > 0 we have

P [g(X) ≥ k] ≤E [g(X)]

k.

Special case:

• Consider

g(x) = [x− E(X)]2 and k = r2 ·Var(X) (r > 0)

• Theorem 2.19 implies

P

[X − E(X)]2 ≥ r2 ·Var(X)

≤Var(X)

r2 ·Var(X)=

1r2

71

Page 75: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• Now:

P

[X − E(X)]2 ≥ r2 ·Var(X)

= P |X − E(X)| ≥ r · SD(X)

= 1− P |X − E(X)| < r · SD(X)

• It follows that

P |X − E(X)| < r · SD(X) ≥ 1−1r2

(specific Chebyshev inequality)

72

Page 76: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remarks:

• The specific Chebyshev inequality provides a minimal proba-bility of the event that any arbitrary random variable X takeson a value from the following interval:

[E(X)− r · SD(X),E(X) + r · SD(X)]

• For example, for r = 3 we have

P |X − E(X)| < 3 · SD(X) ≥ 1−132 =

89

which is equivalent to

P E(X)− 3 · SD(X) < X < E(X) + 3 · SD(X) ≥ 0.8889

or

P X ∈ (E(X)− 3 · SD(X),E(X) + 3 · SD(X)) ≥ 0.8889

73

Page 77: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Theorem 2.20: (Jensen inequality)

Let X be a random variable with mean E(X) and let g : R −→ Rbe a convex function, i.e. for all x we have g′′(x) ≥ 0; then

E [g(X)] ≥ g(E[X]).

Remarks:

• If the function g is concave (i.e. if g′′(x) ≤ 0 for all x) thenJensen’s inequality states that E [g(X)] ≤ g(E[X])

• Notice that in general we have

E [g(X)] 6= g(E[X])

74

Page 78: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Example:

• Consider the random variable X and the function g(x) = x2

• We have g′′(x) = 2 ≥ 0 for all x, i.e. g is convex

• It follows from Jensen’s inequality that

E [g(X)]︸ ︷︷ ︸

=E(X2)

≥ g(E[X])︸ ︷︷ ︸

=[E(X)]2

i.e.

E(X2)− [E(X)]2 ≥ 0

• This implies

Var(X) = E(X2)− [E(X)]2 ≥ 0

(the variance of an arbitrary rv cannot be negative)

75

Page 79: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Now:

• Consider the random variable X with expectation E(X) = µX,the integer number n ∈ N and the functions

g1(x) = xn

g2(x) = [x− µX]n

Definition 2.21: (Moments, central moments)

(a) The n-th moment of X, denoted by µ′n, is defined as

µ′n ≡ E[g1(X)] = E(Xn).

(b) The n-th central moment of X about µX, denoted by µn, isdefined as

µn ≡ E[g2(X)] = E[(X − µX)n].

76

Page 80: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Relations:

• µ′1 = E(X) = µX(the 1st moment coincides with E(X))

• µ1 = E[X − µX] = E(X)− µX = 0(the 1st central moment is always equal to 0)

• µ2 = E[(X − µX)2] = Var(X)(the 2nd central moment coincides with Var(X))

77

Page 81: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remarks:

• The first four moments of a random variable X are importantmeasures of the probability distribution(expectation, variance, skewness, kurtosis)

• The moments of a random variable X play an important rolein theoretical and applied statistics

• In some cases, when all moments are known, the cdf of arandom variable X can be determined

78

Page 82: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Question:

• Can we find a function that gives us a representation of allmoments of a random variable X?

Definition 2.22: (Moment generating function)

Let X be a random variable with discrete density or pdf fX(·).The expected value of et·X is defined to be the moment gener-ating function of X if the expected value exists for every valueof t in some interval −h < t < h, h > 0. That is, the momentgenerating function of X, denoted by mX(t), is defined as

mX(t) = E[

et·X]

.

79

Page 83: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remarks:

• The moment generating function mX(t) is a function in t

• There are rv’s X for which mX(t) does not exist

• If mX(t) exists it can be calculated as

mX(t) = E[

et·X]

=

xj∈supp(X)et·xj · P (X = xj) , if X is discrete

∫ +∞

−∞et·x · fX(x) dx , if X is continuous

80

Page 84: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Question:

• Why is mX(t) called the moment generating function?

Answer:

• Consider the nth derivative of mX(t) with respect to t:

dn

dtnmX(t) =

xj∈supp(X)(xj)

n · et·xj · P (X = xj) for discrete X

∫ +∞

−∞xn · et·x · fX(x) dx for continuous X

81

Page 85: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• Now, evaluate the nth derivative at t = 0:

dn

dtnmX(0) =

xj∈supp(X)(xj)

n · P (X = xj) for discrete X

∫ +∞

−∞xn · fX(x) dx for continuous X

= E(Xn) = µ′n

(see Definition 2.21(a) on slide 76)

82

Page 86: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Example:

• Let X be a continuous random variable with pdf

fX(x) =

0 , for x < 0λ · e−λ·x , for x ≥ 0

(exponential distribution with parameter λ > 0)

• We have

mX(t) = E[

et·X]

=∫ +∞

−∞et·x · fX(x) dx

=∫ +∞

0λ · e(t−λ)·x dx =

λλ− t

for t < λ

83

Page 87: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• It follows that

m′X(t) =

λ(λ− t)2

and m′′X(t) =

2λ(λ− t)3

and thus

m′X(0) = E(X) =

and m′′X(0) = E(X2) =

2λ2

Now:

• Important result on moment generating functions

84

Page 88: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Theorem 2.23: (Identification property)

Let X and Y be two random variables with densities fX(·) andfY (·), respectively. Suppose that mX(t) and mY (t) both existand that mX(t) = mY (t) for all t in the interval −h < t < h forsome h > 0. Then the two cdf’s FX(·) and FY (·) are equal; thatis FX(x) = FY (x) for all x.

Remarks:

• Theorem 2.23 states that there is a unique cdf FX(x) for agiven moment generating function mX(t)−→ if we can find mX(t) for X then, at least theoretically, we

can find the distribution of X

• We will make use of this property in Section 4

85

Page 89: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Example:

• Suppose that a random variable X has the moment generat-ing function

mX(t) =1

1− tfor − 1 < t < 1

• Then the pdf of X is given by

fX(x) =

0 , for x < 0e−x , for x ≥ 0

(exponential distribution with parameter λ = 1)

86

Page 90: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

2.4 Special Parametric Families of Univariate Dis-tributions

Up to now:

• General mathematical properties of arbitrary distributions

• Discrimination: discrete vs continuous distributions

• Consideration of

the cdf FX(x)

the discrete density or the pdf fX(x)

expectations of the form E[g(X)]

the moment generating function mX(t)

87

Page 91: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Central result:

• The distribution of a random variable X is (essentially) de-termined by fX(x) or FX(x)

• FX(x) can be determined by fX(x)(cf. slide 46)

• fX(x) can be determined by FX(x)(cf. slide 48)

Question:

• How many different distributions are known to exist?

88

Page 92: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Answer:

• Infinitely many

But:

• In practice, there are some important parametric families ofdistributions that provide ’good’ models for representing real-world random phenomena

• These families of distributions are decribed in detail in alltextbooks on mathematical statistics(see e.g. Mosler & Schmid (2008), Mood et al. (1974))

89

Page 93: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• Important families of discrete distributions

Bernoulli distribution

Binomial distribution

Geometric distribution

Poisson distribution

• Important families of continuous distributions

Uniform or rectangular distribution

Exponential distribution

Normal distribution

90

Page 94: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remark:

• The most important family of distributions at all is the nor-mal distribution

Definition 2.24: (Normal distribution)

A continuous random variable X is defined to be normally dis-tributed with parameters µ ∈ R and σ2 > 0, denoted by X ∼N(µ, σ2), if its pdf is given by

fX(x) =1√

2π · σ· e−

12

(

x−µσ

)2

, x ∈ R.

91

Page 95: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

PDF’s of the normal distribution

92

0 5 x

fX(x)

N(0,1) N(5,1)

N(5,3)

N(5,5)

Page 96: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remarks:

• The special normal distribution N(0,1) is called standard nor-mal distribution the pdf of which is denoted by ϕ(x)

• The properties as well as calculation rules for normally dis-tributed random variables are important pre-conditions forthis course(see Wilfling (2014), Section 3.4)

93

Page 97: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

3. Joint and Conditional Distributions, StochasticIndependence

Aim of this section:

• Multidimensional random variables (random vectors)(joint and marginal distributions)

• Stochastic (in)dependence and conditional distribution

• Multivariate normal distribution(definition, properties)

Literature:

• Mood, Graybill, Boes (1974), Chapter IV, pp. 129-174

• Wilfling (2014), Chapter 4

94

Page 98: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

3.1 Joint and Marginal Distribution

Now:

• Consider several random variables simultaneously

Applications:

• Several economic applications

• Statistical inference

95

Page 99: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Definition 3.1: (Random vector)

Let X1, · · · , Xn be a set of n random variables each representingthe same random experiment, i.e.

Xi : Ω −→ R for i = 1, . . . , n.

Then X = (X1, . . . , Xn)′ is called an n-dimensional random vari-able or an n-dimensional random vector.

Remark:

• In the literature random vectors are often denoted by

X = (X1, . . . , Xn) or more simply by X1, . . . , Xn

96

Page 100: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• For n = 2 it is common practice to write

X = (X, Y )′ or (X, Y ) or X, Y

• Realizations are denoted by small letters:

x = (x1, . . . , xn)′ ∈ Rn or x = (x, y)′ ∈ R2

Now:

• Characterization of the probability distribution of the randomvector X

97

Page 101: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Definition 3.2: (Joint cumulative distribution function)

Let X = (X1, . . . , Xn)′ be an n-dimensional random vector. Thefunction

FX1,...,Xn : Rn −→ [0,1]

defined by

FX1,...,Xn(x1, . . . , xn) = P (X1 ≤ x1, X2 ≤ x2, . . . , Xn ≤ xn)

is called the joint cumulative distribution function of X.

Remark:

• Definition 3.2 applies to discrete as well as to continuousrandom variables X1, . . . , Xn

98

Page 102: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Some properties of the bivariate cdf (n = 2):

• FX,Y (x, y) is monotone increasing in x and y

• limx→−∞

FX,Y (x, y) = 0

• limy→−∞

FX,Y (x, y) = 0

• limx→+∞y→+∞

FX,Y (x, y) = 1

Remark:

• Analogous properties hold for the n-dimensional cdfFX1,...,Xn(x1, . . . , xn)

99

Page 103: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Now:

• Joint discrete versus joint continuous random vectors

Definition 3.3: (Joint discrete random vector)

The random vector X = (X1, . . . , Xn)′ is defined to be a joint dis-crete random vector if it can assume only a finite (or a countableinfinite) number of realizations x = (x1, . . . , xn)′ such that

P (X1 = x1, X2 = x2, . . . , Xn = xn) > 0

and∑

P (X1 = x1, X2 = x2, . . . , Xn = xn) = 1,

where the summation is over all possible realizations of X.

100

Page 104: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Definition 3.4: (Joint continuous random vector)

The random vector X = (X1, . . . , Xn)′ is defined to be a jointcontinuous random vector if and only if there exists a nonnegativefunction fX1,...,Xn(x1, . . . , xn) such that

FX1,...,Xn(x1, . . . , xn) =∫ xn

−∞. . .

∫ x1

−∞fX1,...,Xn(u1, . . . , un) du1 . . . dun

for all (x1, . . . , xn). The function fX1,...,Xn is defined to be a jointprobability density function of X.

Example:

• Consider X = (X, Y )′ with joint pdf

fX,Y (x, y) =

x + y , for (x, y) ∈ [0,1]× [0,1]0 , elsewise

101

Page 105: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Joint pdf fX,Y (x, y)

102

00.2

0.40.6

0.81

x0

0.2

0.4

0.6

0.8

1

y

00.5

11.5

2

fHx,yL

00.2

0.40.6

0.8x

Page 106: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• The joint cdf can be obtained by

FX,Y (x, y) =∫ y

−∞

∫ x

−∞fX,Y (u, v) du dv

=∫ y

0

∫ x

0(u + v) du dv

= . . .

=

0.5(x2y + xy2) , for (x, y) ∈ [0,1]× [0,1]0.5(x2 + x) , for (x, y) ∈ [0,1]× [1,∞)0.5(y2 + y) , for (x, y) ∈ [1,∞)× [0,1]

1 , for (x, y) ∈ [1,∞)× [1,∞)

(Proof: Class)

103

Page 107: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remarks:

• If X = (X1, . . . , Xn)′ is a joint continuous random vector,then

∂nFX1,...,Xn(x1, . . . , xn)

∂x1 · · · ∂xn= fX1,...,Xn(x1, . . . , xn)

• The volume under the joint pdf represents probabilities:

P (aL1 < X1 ≤ aU

1, . . . , aLn < Xn ≤ aU

n)

=∫ aU

n

aLn

. . .∫ aU

1

aL1

fX1,...,Xn(u1, . . . , un) du1 . . . dun

104

Page 108: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• In this course:

Emphasis on joint continuous random vectors

Analogous results for joint discrete random vectors(see Mood, Graybill, Boes (1974), Chapter IV)

Now:

• Determination of the distribution of a single random vari-able Xi from the joint distribution of the random vector(X1, . . . , Xn)′

−→ marginal distribution

105

Page 109: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Definition 3.5: (Marginal distribution)

Let X = (X1, . . . , Xn)′ be a continuous random vector with jointcdf FX1,...,Xn and joint pdf fX1,...,Xn. Then

FX1(x1) = FX1,...,Xn(x1,+∞,+∞, . . . ,+∞,+∞)

FX2(x2) = FX1,...,Xn(+∞, x2,+∞, . . . ,+∞,+∞)

. . .

FXn(xn) = FX1,...,Xn(+∞,+∞,+∞, . . . ,+∞, xn)

are called marginal cdfs while

106

Page 110: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

fX1(x1) =∫ +∞

−∞. . .

∫ +∞

−∞fX1,...,Xn(x1, x2, . . . , xn) dx2 . . . dxn

fX2(x2) =∫ +∞

−∞. . .

∫ +∞

−∞fX1,...,Xn(x1, x2, . . . , xn) dx1 dx3 . . . dxn

· · ·

fXn(xn) =∫ +∞

−∞. . .

∫ +∞

−∞fX1,...,Xn(x1, x2, . . . , xn) dx1 dx2 . . . dxn−1

are called marginal pdfs of the one-dimensional (univariate) ran-dom variables X1, . . . , Xn.

107

Page 111: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Example:

• Consider the bivariate pdf

fX,Y (x, y)

=

40(x− 0.5)2y3(3− 2x− y) , for (x, y) ∈ [0,1]× [0,1]0 , elsewise

108

Page 112: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Bivariate pdf fX,Y (x, y)

109

00.2

0.40.6

0.81

x0

0.2

0.4

0.6

0.8

1

y

01

23

fHx,yL

00.2

0.40.6

0.8x

Page 113: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• The marginal pdf of X obtains as

fX(x) =∫ 1

040(x− 0.5)2y3(3− 2x− y)dy

= 40(x− 0.5)2∫ 1

0(3y3 − 2xy3 − y4)dy

= 40(x− 0.5)2[34

y4 −2x4

y4 −15

y5]1

0

= 40(x− 0.5)2(34−

2x4−

15

)

= −20x3 + 42x2 − 27x + 5.5

110

Page 114: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Marginal pdf fX(x)

111

0.2 0.4 0.6 0.8 1x

0.25

0.5

0.75

1

1.25

1.5

fHxL

Page 115: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• The marginal pdf of Y obtains as

fY (y) =∫ 1

040(x− 0.5)2y3(3− 2x− y)dx

= 40y3∫ 1

0(x− 0.5)2(3− 2x− y)dx

= −103

y3(y − 2)

112

Page 116: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Marginal pdf fY (y)

113

0.2 0.4 0.6 0.8 1y

0.5

1

1.5

2

2.5

3

fHyL

Page 117: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remarks:

• When considering the marginal instead of the joint distribu-tions, we are faced with an information loss(the joint distribution uniquely determines all marginal distri-butions, but the converse does not hold in general)

• Besides the respective univariate marginal distributions, thereare also multivariate distributions which can be obtained fromthe joint distribution of X = (X1, . . . , Xn)′

114

Page 118: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Example:

• For n = 5 consider X = (X1, . . . , X5)′ with joint pdf fX1,...,X5

• Then the marginal pdf of Z = (X1, X3, X5)′ obtains as

fX1,X3,X5(x1, x3, x5)

=∫ +∞

−∞

∫ +∞

−∞fX1,...,X5(x1, x2, x3, x4, x5) dx2 dx4

(integrate out the irrelevant components)

115

Page 119: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

3.2 Conditional Distribution and Stochastic Inde-pendence

Now:

• Distribution of a random variable X under the condition thatanother random variable Y has already taken on the realiza-tion y(conditional distribution of X given Y = y)

116

Page 120: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Definition 3.6: (Conditional distribution)

Let X = (X, Y )′ be a bivariate continuous random vector withjoint pdf fX,Y (x, y). The conditional density of X given Y = y isdefined to be

fX|Y =y(x) =fX,Y (x, y)

fY (y).

Analogously, the conditional density of Y given X = x is definedto be

fY |X=x(y) =fX,Y (x, y)

fX(x).

117

Page 121: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remark:

• Conditional densities of random vectors are defined analo-gously, e.g.

fX1,X2,X4|X3=x3,X5=x5(x1, x2, x4) =

fX1,X2,X3,X4,X5(x1, x2, x3, x4, x5)

fX3,X5(x3, x5)

118

Page 122: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Example:

• Consider the bivariate pdf

fX,Y (x, y)

=

40(x− 0.5)2y3(3− 2x− y) , for (x, y) ∈ [0,1]× [0,1]0 , elsewise

with marginal pdf

fY (y) = −103

y3(y − 2)

(cf. Slides 108-112)

119

Page 123: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• It follows that

fX|Y =y(x) =fX,Y (x, y)

fY (y)

=40(x− 0.5)2y3(3− 2x− y)

−103 y3(y − 2)

=12(x− 0.5)2(3− 2x− y)

2− y

120

Page 124: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Conditional pdf fX|Y =0.01(x) of X given Y = 0.01

121

0.2 0.4 0.6 0.8 1x

0.5

1

1.5

2

2.5

3

Bedingte Dichte

Page 125: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Conditional pdf fX|Y =0.95(x) of X given Y = 0.95

122

0.2 0.4 0.6 0.8 1x

0.2

0.4

0.6

0.8

1

1.2

Bedingte Dichte

Page 126: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Now:

• Combine the concepts ’joint distribution’ and ’conditionaldistribution’ to define the notion ’stochastic independence’(for two random variables first)

Definition 3.7: (Stochastic Independence [I])

Let (X, Y )′ be a bivariate continuous random vector with jointpdf fX,Y (x, y). X and Y are defined to be stochastically inde-pendent if and only if

fX,Y (x, y) = fX(x) · fY (y) for all x, y ∈ R.

123

Page 127: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remarks:

• Alternatively, stochastic independence can be defined via thecdfs:X and Y are stochastically independent, if and only if

FX,Y (x, y) = FX(x) · FY (y) for all x, y ∈ R.

• If X and Y are independent, we have

fX|Y =y(x) =fX,Y (x, y)

fY (y)=

fX(x) · fY (y)fY (y)

= fX(x)

fY |X=x(y) =fX,Y (x, y)

fX(x)=

fX(x) · fY (y)fX(x)

= fY (y)

• If X and Y are independent and g and h are two continuousfunctions, then g(X) and h(Y ) are also independent

124

Page 128: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Now:

• Extension to n random variables

Definition 3.8: (Stochastic independence [II])

Let (X1, . . . , Xn)′ be a continuous random vector with joint pdffX1,...,Xn(x1, . . . , xn) and joint cdf FX1,...,Xn(x1, . . . , xn). X1, . . . , Xn

are defined to be stochastically independent, if and only if for all(x1, . . . , xn)′ ∈ Rn

fX1,...,Xn(x1, . . . , xn) = fX1(x1) · . . . · fXn(xn)

or

FX1,...,Xn(x1, . . . , xn) = FX1(x1) · . . . · FXn(xn).

125

Page 129: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remarks:

• For discrete random vectors we define: X1, . . . , Xn are stochas-tically independent, if and only if for all (x1, . . . , xn)′ ∈ Rn

P (X1 = x1, . . . , Xn = xn) = P (X1 = x1) · . . . · P (Xn = xn)

or

FX1,...,Xn(x1, . . . , xn) = FX1(x1) · . . . · FXn(xn)

• In the case of independence, the joint distribution resultsfrom the marginal distributions

• If X1, . . . , Xn are stochastically independent and g1, . . . , gn arecontinuous functions, then Y1 = g1(X1), . . . , Yn = gn(Xn) arealso stochastically independent

126

Page 130: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

3.3 Expectation and Joint Moment GeneratingFunctions

Now:

• Definition of the expectation of a function

g : Rn −→ R(x1, . . . , xn) 7−→ g(x1, . . . xn)

of a continuous random vector X = (X1, . . . , Xn)′

127

Page 131: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Definition 3.9: (Expectation of a function)

Let (X1, . . . , Xn)′ be a continuous random vector with joint pdffX1,...,Xn(x1, . . . , xn) and g : Rn −→ R a real-valued continuousfunction. The expectation of the function g of the random vectoris defined to be

E[g(X1, . . . , Xn)]

=∫ +∞

−∞. . .

∫ +∞

−∞g(x1, . . . , xn) · fX1,...,Xn(x1, . . . , xn) dx1 . . . dxn.

128

Page 132: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remarks:

• For a discrete random vector (X1, . . . , Xn)′ the analogous def-inition is

E[g(X1, . . . , Xn)] =∑

g(x1, . . . , xn) · P (X1 = x1, . . . , Xn = xn),

where the summation is over all realizationen of the vector

• Definition 3.9 includes the expectation of a univariate ran-dom variable X:Set n = 1 and g(x) = x

−→ E(X1) ≡ E(X) =∫ +∞

−∞xfX(x) dx

• Definition 3.9 includes the variance of X:Set n = 1 and g(x) = [x− E(X)]2

−→ Var(X1) ≡ Var(X) =∫ +∞

−∞[x− E(X)]2fX(x) dx

129

Page 133: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• Definition 3.9 includes the covariance of two variables:Set n = 2 and g(x1, x2) = [x1 − E(X1)] · [x2 − E(X2)]

−→ Cov(X1, X2)

=∫ +∞

−∞

∫ +∞

−∞[x1 − E(X1)][x2 − E(X2)]fX1,X2(x1, x2) dx1 dx2

• Via the covariance we define the correlation coefficient:

Corr(X1, X2) =Cov(X1, X2)

Var(X1)√

Var(X2)

• General properties of expected values, variances, covariancesand the correlation coefficient−→ Class

130

Page 134: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Now:

• ’Expectation’ and ’variances’ of random vectors

Definition 3.10: (Expected vector, covariance matrix)

Let X = (X1, . . . , Xn)′ be a random vector. The expected vectorof X is defined to be

E(X) =

E(X1)...

E(Xn)

.

The covariance matrix of X is defined to be

Cov(X) =

Var(X1) Cov(X1, X2) . . . Cov(X1, Xn)Cov(X2, X1) Var(X2) . . . Cov(X2, Xn)

... ... . . . ...Cov(Xn, X1) Cov(Xn, X2) . . . Var(Xn)

.

131

Page 135: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remark:

• Obviously, the covariance matrix is symmetric per definition

Now:

• Expected vectors and covariance matrices under linear trans-formations of random vectors

Let

• X = (X1, . . . , Xn)′ be a n-dimensional random vector

• A be an (m× n) matrix of real numbers

• b be an (m× 1) column vector of real numbers

132

Page 136: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Obviously:

• Y = AX + b is an (m× 1) random vector:

Y =

a11 a12 . . . a1na21 a22 . . . a2n... ... . . . ...

am1 am2 . . . amn

X1X2...

Xn

+

b1b2...

bm

=

a11X1 + a12X2 + . . . + a1nXn + b1a21X1 + a22X2 + . . . + a2nXn + b2

...am1X1 + am2X2 + . . . + amnXn + bm

133

Page 137: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• The expected vector of Y is given by

E(Y) =

a11E(X1) + a12E(X2) + . . . + a1nE(Xn) + b1a21E(X1) + a22E(X2) + . . . + a2nE(Xn) + b2

...am1E(X1) + am2E(X2) + . . . + amnE(Xn) + bm

= AE(X) + b

• The covariance matrix of Y is given by

Cov(Y) =

Var(Y1) Cov(Y1, Y2) . . . Cov(Y1, Yn)Cov(Y2, Y1) Var(Y2) . . . Cov(Y2, Yn)

... ... . . . ...Cov(Yn, Y1) Cov(Yn, Y2) . . . Var(Yn)

= ACov(X)A′

(Proof: Class)

134

Page 138: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remark:

• Cf. the analogous results for univariate variables:

E(a ·X + b) = a · E(X) + b

Var(a ·X + b) = a2 ·Var(X)

Up to now:

• Expected values for unconditional distributions

Now:

• Expected values for conditional distributions(cf. Definition 3.6, Slide 117)

135

Page 139: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Definition 3.11: (Conditional expected value of a function)

Let (X, Y )′ be a continuous random vector with joint pdf fX,Y (x, y)and let g : R2 −→ R be a real-valued function. The conditionalexpected value of the function g given X = x is defined to be

E[g(X, Y )|X = x] =∫ +∞

−∞g(x, y) · fY |X(y) dy.

136

Page 140: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remarks:

• An analogous definition applies to a discrete random vector(X, Y )′

• Definition 3.11 naturally extends to higher-dimensional dis-tributions

• For g(x, y) = y we obtain the special case E[g(X, Y )|X = x] =E(Y |X = x)

• Note that E[g(X, Y )|X = x] is a function of x

137

Page 141: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Example:

• Consider the joint pdf

fX,Y (x, y) =

x + y , for (x, y) ∈ [0,1]× [0,1]0 , elsewise

• The conditional distribution of Y given X = x is given by

fY |X=x(y) =

x + yx + 0.5 , for (x, y) ∈ [0,1]× [0,1]

0 , elsewise

• For g(x, y) = y the conditional expectation is given as

E(Y |X = x) =∫ 1

0y ·

x + yx + 0.5

dy =1

x + 0.5·(x2

+13

)

138

Page 142: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remarks:

• Consider the function g(x, y) = g(y)(i.e. g does not depend on x)

• Denote h(x) = E[g(Y )|X = x]

• We calculate the unconditional expectation of the trans-formed variable h(X)

• We have

139

Page 143: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

E E[g(Y )|X = x] = E[h(X)] =∫ +∞

−∞h(x) · fX(x) dx

=∫ +∞

−∞E[g(Y )|X = x] · fX(x) dx

=∫ +∞

−∞

[

∫ +∞

−∞g(y) · fY |X(y) dy

]

· fX(x) dx

=∫ +∞

−∞

∫ +∞

−∞g(y) · fY |X(y) · fX(x) dy dx

=∫ +∞

−∞

∫ +∞

−∞g(y) · fX,Y (x, y) dy dx

= E[g(Y )]

140

Page 144: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Theorem 3.12:

Let (X, Y )′ be an arbitrary discrete or continuous random vector.Then

E[g(Y )] = E E[g(Y )|X = x]

and, in particular,

E[Y ] = E E[Y |X = x] .

Now:

• Three important rules for conditional and unconditional ex-pected values

141

Page 145: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Theorem 3.13:

Let (X, Y )′ be an arbitrary discrete or continuous random vectorand g1(·), g2(·) two unidimensional functions. Then

1. E[g1(Y ) + g2(Y )|X = x] = E[g1(Y )|X = x] + E[g2(Y )|X = x],

2. E[g1(Y ) · g2(X)|X = x] = g2(x) · E[g1(Y )|X = x].

3. If X and Y are stochastically independent we have

E[g1(X) · g2(Y )] = E[g1(X)] · E[g2(Y )].

142

Page 146: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Finally:

• Moment generating function for random vectors

Definition 3.14: (Joint moment generating function)

Let X = (X1, . . . , Xn)′ be an arbitrary discrete or continuousrandom vector. The joint moment generating function of X isdefined to be

mX1,...,Xn(t1, . . . , tn) = E[

et1·X1+...+tn·Xn]

if this expectation exists for all t1, . . . , tn with −h < tj < h for anarbitary value h > 0 and for all j = 1, . . . , n.

143

Page 147: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remarks:

• Via the joint moment generating function mX1,...,Xn(t1, . . . , tn)we can derive the following mathematical objects:

the marginal moment generating functions mX1(t1), . . . ,mXn(tn)

the moments of the marginal distributions

the so-called joint moments

144

Page 148: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Important result: (cf. Theorem 2.23, Slide 85)

For any given joint moment generating functionmX1,...,Xn(t1, . . . , tn) there exists a unique joint cdfFX1,...,Xn(x1, . . . , xn)

145

Page 149: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

3.4 The Multivariate Normal Distribution

Now:

• Extension of the univariate normal distribution

Definition 3.15: (Multivariate normal distribution)

Let X = (X1, . . . , Xn)′ be an continuous random vector. X is de-fined to have a multivariate normal distribution with parameters

µ =

µ1...

µn

and Σ =

σ21 · · · σ1n... . . . ...

σn1 · · · σ2n

,

if for x = (x1, . . . , xn)′ ∈ Rn its joint pdf is given by

fX(x) = (2π)−n/2 [det(Σ)]−1/2 · exp

−12

(x− µ)′Σ−1 (x− µ)

.

146

Page 150: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remarks:

• See Chang (1984, p. 92) for a definition and the propertiesof the determinant det(A) of the matrix A

• Notation:

X ∼ N(µ,Σ)

• µ is a column vector with µ1, . . . , µn ∈ R

• Σ is a regular, positive definite, symmetric (n× n) matrix

• Role of the parameters:

E(X) = µ and Cov(X) = Σ

147

Page 151: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• Joint pdf of the multiv. standard normal distribution N(0, In):

φ(x) = (2π)−n/2 · exp

−12x′x

• Cf. the analogy to the univariate pdf in Definition 2.24, Slide91

Properties of the N(µ,Σ) distribution:

• Partial vectors (marginal distributions) of X also have multi-variate normal distributions, i.e. if

X =

[

X1X2

]

∼ N

([

µ1µ2

]

,

[

Σ11 Σ12Σ21 Σ22

])

then

X1 ∼ N(µ1,Σ11)X2 ∼ N(µ2,Σ22)

148

Page 152: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• Thus, all univariate variables of X = (X1, . . . , Xn)′ have uni-variate normal distributions:

X1 ∼ N(µ1, σ21)

X2 ∼ N(µ2, σ22)

...Xn ∼ N(µn, σ2

n)

• The conditional distributions are also (univariately or multi-variately) normal:

X1|X2 = x2 ∼ N(

µ1 + Σ12Σ−122 (x2 − µ2),Σ11 −Σ12Σ

−122 Σ21

)

• Linear transformations:Let A be an (m × n) matrix, b an (m × 1) vector of realnumbers and X = (X1, . . . , Xn)′ ∼ N(µ,Σ). Then

AX + b ∼ N(Aµ + b,AΣA′)

149

Page 153: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Example:

• Consider

X ∼ N(µ,Σ)

∼ N

([

01

]

,

[

1 0.50.5 2

])

• Find the distribution of Y = AX + b where

A =

[

1 23 4

]

, b =

[

12

]

• It follows that Y ∼ N(Aµ + b,AΣA′)

• In particular,

Aµ + b =

[

36

]

and AΣA′ =

[

12 2424 53

]

150

Page 154: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Now:

• Consider the bivariate case (n = 2), i.e.

X = (X, Y )′, E(X) =

[

µXµY

]

, Σ =

[

σ2X σXY

σY X σ2Y

]

• We have

σXY = σY X = Cov(X, Y ) = σX · σY ·Corr(X, Y ) = σX · σY · ρ• The joint pdf follows from Definition 3.15 with n = 2

fX,Y (x, y) =1

2πσXσY

1− ρ2exp

−1

2(

1− ρ2)

×[

(x− µX)2

σ2X

−2ρ(x− µX)(y − µY )

σXσY+

(y − µY )2

σ2Y

]

(Derivation: Class)

151

Page 155: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

fX,Y (x, y) for µX = µY = 0, σx = σY = 1 and ρ = 0

152

-2

0

2x -2

0

2

y

00.05

0.1

0.15

fHx,yL

-2

0

2x

Page 156: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

fX,Y (x, y) for µX = µY = 0, σx = σY = 1 and ρ = 0.9

153

-2

0

2x -2

0

2

y

00.1

0.2

0.3fHx,yL

-2

0

2x

Page 157: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remarks:

• The marginal distributions are given by

X ∼ N(µX , σ2X) and Y ∼ N(µY , σ2

Y )−→ interesting result for the normal distribution:

If (X, Y )′ has a bivariate normal distribution, then X and Yare independent if and only if ρ = Corr(X, Y ) = 0

• The conditional distributions are given by

X|Y = y ∼ N

(

µX + ρσXσY

(y − µY ), σ2X

(

1− ρ2)

)

Y |X = x ∼ N

(

µY + ρσYσX

(x− µX), σ2Y

(

1− ρ2)

)

(Proof: Class)

154

Page 158: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

4. Distributions of Functions of Random Vari-ables

Setup:

• Consider as given the joint distribution of X1, . . . , Xn

(i.e. consider as given fX1,...,Xn and FX1,...,Xn)

• Consider k functions

g1 : Rn −→ R, . . . , gk : Rn −→ R

• Find the joint distribution of the k random variables

Y1 = g1(X1, . . . , Xn), . . . , Yk = gk(X1, . . . Xn)

(i.e. find fY1,...,Ykand FY1,...,Yk

)

155

Page 159: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Example:

• Consider as given X1, . . . , Xn with fX1,...,Xn

• Consider the functions

g1(X1, . . . , Xn) =n

i=1Xi and g2(X1, . . . , Xn) =

1n

n∑

i=1Xi

• Find fY1,Y2 with Y1 =∑n

i=1 Xi and Y2 = 1n

∑ni=1 Xi

Remark:

• From the joint distribution fY1,...,Ykwe can derive the k marginal

distributions fY1, . . . fYk(cf. Chapter 3, Slides 106, 107)

156

Page 160: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Aim of this chapter:

• Techniques for finding the (marginal) distribution(s)of (Y1, . . . , Yk)

157

Page 161: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

4.1 Expectations of Functions of Random Vari-ables

Simplification:

• In a first step, we are not interested in the exact distributions,but merely in certain expected values of Y1, . . . , Yk

Expectation two ways:

• Consider as given the (continuous) random variables X1, . . . ,Xn and the function g : Rn −→ R

• Consider the random variables Y = g(X1, . . . , Xn) and findthe expectation E[g(X1, . . . , Xn)]

158

Page 162: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• Two ways of calculating E(Y ):

E(Y ) =∫ +∞

−∞y · fY (y) dy

or

E(Y ) =∫ +∞

−∞. . .

∫ +∞

−∞g(x1, . . . , xn)·fX1,...,Xn(x1, . . . xn) dx1 . . . dxn

(cf. Definition 3.9, Slide 128)

• It can be proved that

Both ways of calculating E(Y ) are equivalent

−→ choose the most convenient calculation

159

Page 163: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Now:

• Calculation rules for expected values, variances, covariancesof sums of random variables

Setting:

• X1, . . . , Xn are given continuous or discrete random variableswith joint density fX1,...,Xn

• The (transforming) function g : Rn −→ R is given by

g(x1, . . . , xn) =n

i=1xi

160

Page 164: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• In a first step, find the expectation and the variance of

Y = g(X1, . . . , Xn) =n

i=1Xi

Theorem 4.1: (Expectation and variance of a sum)

For the given random variables X1, . . . , Xn we have

E

n∑

i=1Xi

=n

i=1E(Xi)

and

Var

n∑

i=1Xi

=n

i=1Var(Xi) + 2 ·

n∑

i=1

n∑

j=i+1Cov(Xi, Xj).

161

Page 165: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Implications:

• For given constants a1, . . . , an ∈ R we have

E

n∑

i=1ai ·Xi

=n

i=1ai · E(Xi)

(why?)

• For two random variables X1 and X2 we have

E(X1 ±X2) = E(X1)± E(X2)

• If X1, . . . , Xn are stochastically independent, it follows thatCov(Xi, Xj) = 0 for all i 6= j and hence

Var

n∑

i=1Xi

=n

i=1Var(Xi)

162

Page 166: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Now:

• Calculating the covariance of two sums of random variables

Theorem 4.2: (Covariance of two sums)

Let X1, . . . , Xn and Y1, . . . , Ym be two sets of random variablesand let a1, . . . an and b1, . . . , bm be two sets of constants. Then

Cov

n∑

i=1ai ·Xi,

m∑

j=1bj · Yj

=n

i=1

m∑

j=1ai · bj ·Cov(Xi, Yj).

163

Page 167: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Implications:

• The variance of a weighted sum of random variables is givenby

Var

n∑

i=1ai ·Xi

= Cov

n∑

i=1ai ·Xi,

n∑

j=1aj ·Xj

=n

i=1

n∑

j=1ai · aj ·Cov(Xi, Xj)

=n

i=1a2

i ·Var(Xi) +n

i=1

n∑

j=1,j 6=iai · aj ·Cov(Xi, Xj)

=n

i=1a2

i ·Var(Xi) + 2 ·n

i=1

n∑

j=i+1ai · aj ·Cov(Xi, Xj)

164

Page 168: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• For two random variables X1 and X2 we have

Var(X1 ±X2) = Var(X1) + Var(X2)± 2 ·Cov(X1, X2),

and if X1 and X2 are independent we have

Var(X1 ±X2) = Var(X1) + Var(X2)

Finally:

• Important result concerning the expectation of a product oftwo random variables

165

Page 169: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Setting:

• Let X1, X2 be both continuous or both discrete random vari-ables with joint density fX1,X2

• Let g : Rn −→ R be defined as g(x1, x2) = x1 · x2

• Find the expectation of

Y = g(X1, X2) = X1 ·X2

Theorem 4.3: (Expectation of a product)

For the random variables X1, X2 we have

E (X1 ·X2) = E(X1) · E(X2) + Cov(X1, X2).

166

Page 170: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Implication:

• If X1 and X2 are stochastically independent, we have

E (X1 ·X2) = E(X1) · E(X2)

Remarks:

• A formula for Var(X1 ·X2) also exists

• In many cases, there are no explicit formulas for expectedvalues and variances of other transformations (e.g. for ratiosof random variables)

167

Page 171: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

4.2 The Cumulative-distribution-function Tech-nique

Motivation:

• Consider as given the random variables X1, . . . , Xn with jointdensity fX1,...,Xn

• Find the joint distribution of Y1, . . . , Yk where Yj = gj(X1, . . . ,Xn) for j = 1, . . . , k

• The joint cdf of Y1, . . . , Yk is defined to be

FY1,...,Yk(y1, . . . , yk) = P (Y1 ≤ y1, . . . , Yk ≤ yk)

(cf. Definition 3.2, Slide 98)

168

Page 172: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• Now, for each y1, . . . , yk the event

Y1 ≤ y1, . . . , Yk ≤ yk

= g1(X1, . . . , Xn) ≤ y1, . . . , gk(X1, . . . , Xn) ≤ yk ,

i.e. the latter event is an event described in terms of the givenfunctions g1, . . . , gk and the given random variables X1, . . . , Xn

−→ since the joint distribution of X1, . . . , Xn is assumed given,presumably the probability of the latter event can be cal-culated and consequently FY1,...,Yk

determined

169

Page 173: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Example 1:

• Consider n = 1 (i.e. consider X1 ≡ X with cdf FX) and k = 1(i.e. g1 ≡ g and Y1 ≡ Y )

• Consider the function

g(x) = a · x + b, b ∈ R, a > 0

• Find the distribution of

Y = g(X) = a ·X + b

170

Page 174: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• The cdf of Y is given by

FY (y) = P (Y ≤ y)

= P [g(X) ≤ y]

= P (a ·X + b ≤ y)

= P(

X ≤y − b

a

)

= FX

(y − ba

)

• If X is continuous, the pdf of Y is given by

fY (y) = F ′Y (y) = F ′X

(y − ba

)

=1a· fX

(y − ba

)

(cf. Slide 48)

171

Page 175: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Example 2:

• Consider n = 1 and k = 1 and the function

g(x) = ex

• The cdf of Y = g(X) = eX is given by

FY (y) = P (Y ≤ y)

= P (eX ≤ y)

= P [X ≤ ln(y)]

= FX[ln(y)]

• If X is continuous, the pdf of Y is given by

fY (y) = F ′Y (y) = F ′X [ln(y)] =fX [ln(y)]

y

172

Page 176: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Now:

• Consider n = 2 and k = 2, i.e. consider X1 and X2 with jointdensity fX1,X2(x1, x2)

• Consider the functions

g1(x1, x2) = x1 + x2 and g2(x1, x2) = x1 − x2

• Find the distributions of the sum and the difference of tworandom variables

• Derivation via the two-dimensional cdf-technique

173

Page 177: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Theorem 4.4: (Distribution of a sum / difference)

Let X1 and X2 be two continuous random variables with joint pdffX1,X2(x1, x2). Then the pdfs of Y1 = X1+X2 and Y2 = X1−X2are given by

fY1(y1) =∫ +∞

−∞fX1,X2(x1, y1 − x1) dx1

=∫ +∞

−∞fX1,X2(y1 − x2, x2) dx2

and

fY2(y2) =∫ +∞

−∞fX1,X2(x1, x1 − y2) dx1

=∫ +∞

−∞fX1,X2(y2 + x2, x2) dx2.

174

Page 178: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Implication:• If X1 and X2 are independent, then

fY1(y1) =∫ +∞

−∞fX1(x1) · fX2(y1 − x1) dx1

fY2(y2) =∫ +∞

−∞fX1(x1) · fX2(x1 − y2) dx1

Example:• Let X1 and X2 be independent random variables both with

pdf

fX1(x) = fX2(x) =

1 , for x ∈ [0,1]0 , elsewise

• Find the pdf of Y = X1 + X2(Class)

175

Page 179: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Now:

• Analogous results for the product and the ratio of two ran-dom variables

Theorem 4.5: (Distribution of a product / ratio)

Let X1 and X2 be continuous random variables with joint pdffX1,X2(x1, x2). Then the pdfs of Y1 = X1 ·X2 and Y2 = X1/X2are given by

fY1(y1) =∫ +∞

−∞

1|x1|

fX1,X2(x1,y1

x1) dx1

and

fY2(y2) =∫ +∞

−∞|x2| · fX1,X2(y2 · x2, x2) dx2.

176

Page 180: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

4.3 The Moment-generating-function Technique

Motivation:

• Consider as given the random variables X1, . . . , Xn with jointpdf fX1,...,Xn

• Again, find the joint distribution of Y1, . . . , Yk where Yj =gj(X1, . . . , Xn) for j = 1, . . . , k

177

Page 181: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• According to Definition 3.14, Slide 143, the joint momentgenerating function of the Y1, . . . , Yk is defined to be

mY1,...,Yk(t1, . . . , tk) = E

[

et1·Y1+...+tk·Yk]

=∫ +∞

−∞. . .

∫ +∞

−∞et1·g1(x1,...,xn)+...+tk·gk(x1,...,xn)

×fX1,...,Xn(x1, . . . , xn) dx1 . . . dxn

• If mY1,...,Yk(t1, . . . , tk) can be recognized as the joint moment

generating function of some known joint distribution, it willfollow that Y1, . . . , Yk has that joint distribution by virtue ofthe identification property(cf. Slide 145)

178

Page 182: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Example:

• Consider n = 1 and k = 1 where the random variable X1 ≡ Xhas a standard normal distribution

• Consider the function g1(x) ≡ g(x) = x2

• Find the distribution of Y = g(X) = X2

• The moment generating function of Y is given by

mY (t) = E[

et·Y]

= E[

et·X2]

=∫ +∞

−∞et·x2

· fX(x)dx

179

Page 183: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

=∫ +∞

−∞et·x2

·1√2π

· e−12x2

dx

= . . .

=

12

12 − t

12

for t <12

• This is the moment generating function of a gamma distri-bution with parameters λ = 1

2 and r = 12

(see Mood, Graybill, Boes (1974), pp. 540/541)

−→ Y = X2 ∼ Γ(0.5,0.5)

180

Page 184: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Now:

• Distribution of sums of independent random variables

Preliminaries:

• Consider the moment generating function of such a sum

• Let X1, . . . , Xn be independent random variables and let Y =∑n

i=1 Xi

• The moment generating function of Y is given by

mY (t) = E[

et·Y]

= E[

et·∑n

i=1 Xi]

= E[

et·X1 · et·X2 · . . . · et·Xn]

= E[

et·X1]

· E[

et·X2]

· . . . · E[

et·Xn]

[Theorem 3.13(c)]

= mX1(t) ·mX2(t) · . . . ·mXn(t)

181

Page 185: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Theorem 4.6: (Moment generating function of a sum)

Let X1, . . . , Xn be stochastically independent random variableswith existing moment generating functions mX1(t), . . . , mXn(t)for all t ∈ (−h, h), h > 0. Then the moment generating functionof the sum Y =

∑ni=1 Xi is given by

mY (t) =n∏

i=1mXi(t) for t ∈ (−h, h).

Hopefully:

• The distribution of the sum Y =∑n

i=1 Xi may be identifiedfrom the moment generating function of the sum mY (t)

182

Page 186: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Example 1:

• Assume that X1, . . . , Xn are independent and identically dis-tributed exponential random variables with parameter λ > 0

• The moment generating function of each Xi (i = 1, . . . , n) isgiven by

mXi(t) =λ

λ− tfor t < λ

(cf. Mood, Graybill, Boes (1974), pp. 540/541)

• So the moment generating function of the sum Y =∑n

i=1 Xiis given by

mY (t) = m∑

Xi(t) =

n∏

i=1mXi(t) =

( λλ− t

)n

183

Page 187: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• This is the moment generating function of a Γ(n, λ) distri-bution(cf. Mood, Graybill, Boes (1974), pp. 540/541)

−→ the sum of n independent, identically distributed expo-nential random variables with parameter λ has a Γ(n, λ)distribution

184

Page 188: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Example 2:

• Assume that X1, . . . , Xn are independent random variablesand that Xi ∼ N(µi, σ2

i )

• Furthermore, let a1, . . . , an ∈ R be constants

• Then the distribution of the weighted sum is given by

Y =n

i=1ai ·Xi ∼ N

n∑

i=1ai · µi,

n∑

i=1a2

i · σ2i

(Proof: Class)

185

Page 189: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

4.4 General Transformations

Up to now:

• Techniques that allow us, under special circumstances, tofind the distributions of the transformed variables

Y1 = g1(X1, . . . , Xn), . . . , Yk = gk(X1, . . . , Xn)

However:

• These methods do not necessarily hit the mark(e.g. if calculations get too complicated)

186

Page 190: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Resort:

• There are constructive methods by which it is generally pos-sible (under rather mild conditions) to find the distributionsof transformed random variables−→ transformation theorems

Here:

• We restrict attention to the simplest case where n = 1, k = 1,i.e. we consider the transformation Y = g(X)

• For multivariate extensions (i.e. for n ≥ 1, k ≥ 1) see Mood,Graybill, Boes (1974), pp. 203-212

187

Page 191: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Theorem 4.7: (Transformation theorem for densities)

Suppose X is a continuous random variable with pdf fX(x). SetD = x : fX(x) > 0. Furthermore, assume that

(a) the transformation g : D −→ W with y = g(x) is a one-to-onetransformation of D onto W ,

(b) the derivative with respect to y of the inverse function g−1 :W −→ D with x = g−1(y) is continuous and nonzero for ally ∈ W .

Then Y = g(X) is a continuous random variable with pdf

fY (y) =

dg−1(y)dy

· fX(

g−1(y))

, for y ∈ W

0 , elsewise.

188

Page 192: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remark:

• The transformation g : D −→ W with y = g(x) is called one-to-one, if for every y ∈ W there exists exactly one x ∈ D withy = g(x)

Example:

• Suppose X has the pdf

fX(x) =

θ · x−θ−1 , for x ∈ [1,+∞)0 , elsewise

(Pareto distribution with parameter θ > 0)

• Find the distribution of Y = ln(X)

• We have D = [1,+∞), g(x) = ln(x), W = [0,+∞)

189

Page 193: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• Furthermore, g(x) = ln(x) is a one-to-one transformation ofD = [1,+∞) onto W = [0,+∞) with inverse function

x = g−1(y) = ey

• Its derivative with respect to y is given by

dg−1(y)dy

= ey,

i.e. the derivative is continuous and nonzero for all y ∈ [0,+∞)

• Hence, the pdf of Y = ln(x) is given by

fY (y) =

ey · θ · (ey)−θ−1 , for y ∈ [0,+∞)0 , elsewise

=

θ · e−θ·y , for y ∈ [0,+∞)0 , elsewise

190

Page 194: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

5. Methods of Estimation

Setting:

• Let X be a random variable (or let X be a random vector)representing a random experiment

• We are interested in the actual distribution of X (or X)

Notice:

• In practice the actual distribution of X is a priori unknown

191

Page 195: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Therefore:

• Collect information on the unknown distribution by repeat-edly observing the random experiment (and thus the randomvariable X)

−→ random sample−→ statistic−→ estimator

192

Page 196: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

5.1 Sampling, Estimators, Limit Theorems

Setting:

• Let X represent the random experiment under consideration(X is a univariate random variable)

• We intend to observe the random experiment (i.e. X) n times

• Prior to the explicit realizations we may consider the potentialobservations as a set of n random variables X1, . . . , Xn

193

Page 197: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Definition 5.1: (Random sample)

The random variables X1, . . . , Xn are defined to be a randomsample from X if

(a) each Xi, i = 1, . . . , n, has the same distribution as X,

(b) X1, . . . , Xn are stochastically independent.

The number n is called the sample size.

194

Page 198: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remarks:

• We assume that, in principle, the random experiment can berepeated as often as desired

• We call the realizations x1, . . . , xn of the random sampleX1, . . . , Xn the observed or the concrete sample

• Considering the random sample X1, . . . , Xn as a random vec-tor, we see that its joint density is given by

fX1,...,Xn(x1, . . . , xn) =n∏

i=1fXi(xi)

(since the Xi’s are independent; cf. Definition 3.8, Slide 125)

195

Page 199: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Model of a random sample

196

Random process X

Potential realizations

X1 (Rv) x1 (Realization 1. exp.)

X2 (Rv)

Xn (Rv)

x2 (Realization 2. exp.)

xn (Realization n. exp.)

. . . . . .

Page 200: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Now:

• Consider functions of the sampling variables X1, . . . , Xn

−→ statistic−→ estimator

Definition 5.2: (Statistic)

Let X1, . . . , Xn be a random sample from X and let g : Rn −→ Rbe a real-valued function with n arguments that does not containany unknown parameters. Then the random variable

T = g(X1, . . . , Xn)

is called a statistic.

197

Page 201: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Examples:

• Sample mean:

X = g1(X1, . . . , Xn) =1n·

n∑

i=1Xi

• Sample variance:

S2 = g2(X1, . . . , Xn) =1n·

n∑

i=1

(

Xi −X)2

• Sample standard deviation:

S = g3(X1, . . . , Xn) =

1n·

n∑

i=1

(

Xi −X)2

198

Page 202: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remarks:

• All these concepts can be extended to the multivariate case

• The statistic T = g(X1, . . . , Xn) is a function of random vari-ables and hence it is itself a random variable−→ a statistic has a distribution

(and, in particular, an expectation and a variance)

Purposes of statistics:

• Statistics provide information on the distribution of X

• Statistics are central tools forestimating parametershypothesis-testing on parameters

199

Page 203: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Random samples and statistics

200

Random sample

( X1, . . ., Xn) Measurement Sample realization

( x1, . . ., xn)

g( X1, . . ., Xn) Statistic

g( x1, . . ., xn) Realization of the statistic

Page 204: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Now:

• Let X be a random variable with unknown cdf FX(x)

• We may be interested in one or several unknown parametersof X

• Let θ denote this unknown vector of parameters, e.g.

θ =

[

E(X)Var(X)

]

• Frequently, the distribution family of X is known, e.g. X ∼N(µ, σ2), but we do not know the specific parameters. Then

θ =

[

µσ2

]

• We will estimate the unknown parameter vector on the basisof statistics from a random sample X1, . . . , Xn

201

Page 205: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Definition 5.3: (Estimator, estimate)

The statistic θ(X1, . . . , Xn) is called estimator (or point estima-tor) of the unknown parameter vector θ. After having observedthe concrete sample x1, . . . , xn, we call the realization of the es-timator θ(x1, . . . , xn) an estimate.

Remarks:

• The estimator θ(X1, . . . , Xn) is a random variable or a randomvector−→ an estimator has a (joint) distribution, an expected value

(or vector) and a variance (or a covariance matrix)

• The estimate θ(x1, . . . , xn) is a number (or a vector of num-bers)

202

Page 206: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Example:

• Let X ∼ N(µ, σ2) with unknown parameters µ and σ2

• The vector of parameters to be estimated is given by

θ =

[

µσ2

]

=

[

E(X)Var(X)

]

• Potential estimators of µ and σ2 are

µ =1n

n∑

i=1Xi and σ2 =

1n− 1

n∑

i=1(Xi − µ)2

−→ an estimator of θ is given by

θ =

[

µσ2

]

=

1n

∑ni=1 Xi

1n− 1

∑ni=1 (Xi − µ)2

203

Page 207: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Question:

• Why do we need this seemingly complicated concept of anestimator in the form of a random variable?

Answer:

• To establish a comparison between alternative estimators ofthe parameter vector θ

Example:

• Let θ = Var(X) denote the unknown variance of X

204

Page 208: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• Two alternative estimators of θ are

θ1(X1, . . . , Xn) =1n

n∑

i=1

(

Xi −X)2

θ2(X1, . . . , Xn) =1

n− 1

n∑

i=1

(

Xi −X)2

Question:

• Which estimator is better and for what reasons?−→ properties (goodness criteria) of point estimators

(see Section 5.2)

205

Page 209: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Notice:

• Some of these criteria qualify estimators in terms of theirproperties when the sample size becomes large(n →∞, large-sample-properties)

Therefore:

• Explanation of the concept of stochastic convergence:

Central-limit theorem

Weak law of large numbers

Convergence in probability

Convergence in distribution

206

Page 210: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Theorem 5.4: (Univariate central-limit theorem)

Let X be any arbitrary random variable with E(X) = µ andVar(X) = σ2. Let X1, . . . , Xn be a random sample from X andlet

Xn =1n

n∑

i=1Xi

denote the arithmetic sample mean. Then, for n →∞, we have

Xn ∼ N

(

µ,σ2

n

)

and√

nXn − µ

σ∼ N(0,1).

Next:

• Generalization to the multivariate case

207

Page 211: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Theorem 5.5: (Multivariate central-limit theorem)

Let X = (X1, . . . , Xm)′ be any arbitrary random vector withE(X) = µ and Cov(X) = Σ. Let X1, . . . ,Xn be a (multivari-ate) random sample from X and let

Xn =1n

n∑

i=1Xi

denote the multivariate arithmetic sample mean. Then, for n →∞, we have

Xn ∼ N(

µ,1nΣ

)

and√

n(

Xn − µ)

∼ N(0,Σ).

208

Page 212: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remarks:

• A multivariate random sample from the random vector Xarises naturally by replacing all univariate random variablesin Definition 5.1 (Slide 194) by corresponding multivariaterandom vectors

• Note the formal analogy to the univariate case in Theorem5.4(be aware of matrix-calculus rules!)

Next:

• Famous theorem on the arithmetic sample mean

209

Page 213: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Theorem 5.6: (Weak law of large numbers)

Let X1, X2, . . . be a sequence of independent and identically dis-tributed random variables with

E(Xi) = µ < ∞,

Var(Xi) = σ2 < ∞.

Consider the random variable

Xn =1n

n∑

i=1Xi

(arithmetic sample mean). Then, for any ε > 0 we have

limn→∞P

(∣

∣Xn − µ∣

∣ ≥ ε)

= 0.

210

Page 214: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remarks:

• Theorem 5.6 is known as the weak law of large numbers

• Irrespective of how small we choose ε > 0, the probabilitythat Xn deviates more than ±ε from its expectation µ tendsto zero when the sample size increases

• Notice the analogy between a sequence of independent andidentically distributed random variables and the definition ofa random sample from X on Slide 194

Next:

• The first important concept of limiting behaviour

211

Page 215: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Definition 5.7: (Convergence in probability)

Let Y1, Y2, . . . be a sequence of random variables. We say thatthe sequence Y1, Y2, . . . converges in probability to θ, if for anyε > 0 we have

limn→∞P (|Yn − θ| ≥ ε) = 0.

We denote convergence in probability by

plim Yn = θ or Ynp→ θ.

Remarks:

• Specific case: Weak law of large numbers

plim Xn = µ or Xnp→ µ

212

Page 216: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• Typically (but not necessarily) a sequence of random vari-ables converges in probability to a constant θ ∈ R

• For multivariate sequences of random vectors Y1,Y2, . . . theDefinition 5.7 has to be applied to the respective correspond-ing elements

• The concept of convergence in probability is important toqualifying estimators

Next:

• Alternative concepts of stochastic convergence

213

Page 217: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Definition 5.8: (Convergence in distribution)

Let Y1, Y2, . . . be a sequence of random variables and let Z also bea random variable. We say that the sequence Y1, Y2, . . . convergesin distribution to the distribution of Z if

limn→∞FYn(y) = FZ(y) for any y ∈ R.

We denote convergence in distribution by

Ynd→ Z.

Remarks:• Specific case: central-limit theorem

Yn =√

nXn − µ

σd→ U ∼ N(0,1)

• In the case of convergence in distribution, the sequence ofrv’s always converges to a limiting random variable

214

Page 218: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Theorem 5.9: (Rules for probability limits)

Let X1, X2, . . . and Y1, Y2, . . . be sequences of random variableswith plim Xn = a and plim Yn = b. Then

(a) plim (Xn ± Yn) = a± b,

(b) plim (Xn · Yn) = a · b,

(c) plim(Xn

Yn

)

= ab (for b 6= 0).

(d) (Slutsky-Theorem) If g : R −→ R is a continuous function ina ∈ R, then

plim g (Xn) = g(a).

215

Page 219: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remark:

• There is a property similar to Slutsky’s theorem that holdsfor the convergence in distribution

Theorem 5.10: (Rule for limiting distributions)

Let X1, X2, . . . be a sequence of random variables and let Z be a

random variable such that Xnd→ Z. If h : R −→ R is a continuous

function, then

h (Xn)d→ h(Z).

Next:

• Connection of both convergence concepts

216

Page 220: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Theorem 5.11: (Cramer-Theorem)

Let X1, X2, . . . and Y1, Y2, . . . be sequences of random variables,let Z be a random variable and a ∈ R a constant. Assume thatplim Xn = a and Yn

d→ Z. Then

(a) Xn + Ynd→ a + Z,

(b) Xn · Ynd→ a · Z.

Example:

• Let X1, . . . , Xn be a random sample from X with E(X) = µand Var(X) = σ2

217

Page 221: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• It can be shown that

plim S∗2n = plim1

n− 1

n∑

i=1

(

Xi −Xn)2

= σ2

plim S2n = plim

1n

n∑

i=1

(

Xi −Xn)2

= σ2

• For g1(x) = x/σ2 Slutksky’s theorem yields

plim g1(

S∗2n)

= plimS∗2nσ2 = g1(σ

2) = 1

plim g1(

S2n

)

= plimS2

nσ2 = g1(σ

2) = 1

218

Page 222: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• For g2(x) = σ/√

x Slutksky’s theorem yields

plim g2(

S∗2n)

= plimσS∗n

= g2(σ2) = 1

plim g2(

S2n

)

= plimσSn

= g2(σ2) = 1

• From the central-limit theorem we know that

√n

Xn − µσ

d→ U ∼ N(0,1)

219

Page 223: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• Now, Cramer’s theorem yields

g2(

S∗2n)

·√

nXn − µ

σ=

σS∗n

·√

nXn − µ

σ

=√

nXn − µ

S∗n

d→ 1 · U

= U ∼ N(0,1)

• Analogously, Cramer’s theorem yields

√n

Xn − µSn

d→ U ∼ N(0,1)

220

Page 224: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

5.2 Properties of Estimators

Content of Definition 5.3 (Slide 202):

• An estimator is defined to be a statistic(a function of the random sample)−→ there are several alternative estimators of the unknown

parameter vector θ

Example:

• Assume that X ∼ N(0, σ2) with unknown variance σ2 and letX1, . . . , Xn be a random sample from X

• Alternative estimators of θ = σ2 are

θ1 =1n

n∑

i=1

(

Xi −X)2

and θ2 =1

n− 1

n∑

i=1

(

Xi −X)2

221

Page 225: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Important questions:

• Are there reasonable criteria according to which we can selecta ’good’ estimator?

• How can we construct ’good’ estimators?

First goodness property of point estimators:

• Concept of repeated sampling:Draw several random samples from XConsider the estimator for each random sampleAn ’average’ of the estimates should be ’close’ to theunknown parameter(no systematic bias)

−→ unbiasedness of an estimator

222

Page 226: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Definition 5.12: (Unbiasedness, bias)

An estimator θ(X1, . . . , Xn) of the unknown parameter θ is definedto be an unbiased estimator if its expectation coincides with theparameter to be estimated, i.e. if

E[

θ(X1, . . . , Xn)]

= θ.

The bias of the estimator is defined as

Bias(θ) = E(θ)− θ.

Remarks:

• Definition 5.12 easily generalizes to the multivariate case

• The bias of an unbiased estimator is equal to zero

223

Page 227: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Now:• Important and very general result

Theorem 5.13: (Unbiased estimators of E(X) and Var(X))

Let X1, . . . , Xn be a random sample form X where X may bearbitrarily distributed with unknown expectation µ = E(X) andunknown variance σ2 = Var(X). Then the estimators

µ(X1, . . . , Xn) = X =1n·

n∑

i=1Xi

and

σ2(X1, . . . , Xn) = S2 =1

n− 1·

n∑

i=1

(

Xi −X)2

are always unbiased estimators of the parameters µ = E(X) andσ2 = Var(X), respectively.

224

Page 228: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remarks:

• Proof: Class

• Note that no explicit distribution of X is required

• Unbiasedness does, in general, not carry over to parametertransformations. For example,

S =√

S2 is not a unbiased estimator of σ = SD(X) =√

Var(X)

Question:

• How can we compare two alternative unbiased estimators ofthe parameter θ?

225

Page 229: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Definition 5.14: (Relative efficiency)

Let θ1 and θ2 be two unbiased estimators of the unknown pa-rameter θ. θ1 is defined to be relatively more efficient than θ2if

Var(θ1) ≤ Var(θ2)

for all possible parameter values of θ and

Var(θ1) < Var(θ2)

for at least one possible parameter value of θ.

226

Page 230: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Example:

• Assume θ = E(X)

• Consider the estimators

θ1(X1, . . . , Xn) =1n

n∑

i=1Xi

θ2(X1, . . . , Xn) =X1

2+

12(n− 1)

n∑

i=2Xi

• Which estimator is relatively more efficient?(Class)

Question:

• How can we compare two estimators if (at least) one esti-mator is biased?

227

Page 231: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Definition 5.15: (Mean-squared error)

Let θ be an estimator of the parameter θ. The mean-squarederror of the estimator θ is defined to be

MSE(θ) = E[

(

θ − θ)2

]

= Var(

θ)

+[

Bias(θ)]2

.

Remarks:

• If an estimator is unbiased, then its MSE is equal to thevariance of the estimator

• The MSE of an estimator θ depends on the value of theunknown parameter θ

228

Page 232: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Next:

• Comparison of alternative estimators via their MSE’s

Definition 5.16: (MSE efficiency)

Let θ1 and θ2 be two alternative estimators of the unknownparameter θ. θ1 is defined to be more MSE efficient than θ2 if

MSE(θ1) ≤ MSE(θ2)

for all possible parameter values of θ and

MSE(θ1) < MSE(θ2)

for at least one possible parameter value of θ.

229

Page 233: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

’Unbiased’ vs ’biased’ estimator

230

),,( 12 nXX K∧θ

),,( 11 nXX K∧θ

θ

Page 234: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remarks:

• Frequently 2 estimators of θ are not comparable with respectto MSE efficiency since their respective MSE curves cross

• There is no general mathematical principle for constructingMSE efficient estimators

• However, there are methods for finding the estimator withuniformly minimum-variance among all unbiased estimators−→ restriction to the class of all unbiased estimators

• These specific methods are not discussed here(Rao-Blackwell-Theorem, Lehmann-Scheffe-Theorem)

• Here, we consider only one important result

231

Page 235: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Theorem 5.17: (Cramer-Rao lower bound for variance)

Let X1, . . . , Xn be a random sample from X and let θ be a param-eter to be estimated. Consider the joint density of the randomsample fX1,...,Xn(x1, . . . , xn) and define the value

CR(θ) ≡

E

(

∂ fX1,...,Xn(X1, . . . , Xn)

∂ θ

)2

−1

.

Under certain (regularity) conditions we have for any unbiasedestimator θ(X1, . . . , Xn)

Var(θ) ≥ CR(θ).

232

Page 236: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remarks:

• The value CR(θ) is the minimal variance that any unbiasedestimator can take on

−→ goodness criterion for unbiased estimators

• If for an unbiased estimator θ(X1, . . . , Xn)

Var(θ) = CR(θ),

then θ is called UMVUE(Uniformly Minimum-Variance Unbiased Estimator)

233

Page 237: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Second goodness property of point estimators:

• Consider an increasing sample size (n →∞)

Notation: θn(X1, . . . , Xn) = θ(X1, . . . , Xn)

Analysis of the asymptotic distribution properties of θn

−→ consistency of an estimator

Definition 5.18: ((Weak) consistency)

The estimator θn(X1, . . . , Xn) is called (weakly) consistent for θif it converges in probability to θ, i.e. if

plim θn(X1, . . . , Xn) = θ.

234

Page 238: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Example:

• Assume that X ∼ N(µ, σ2) with known σ2 (e.g. σ2 = 1)

• Consider the following two estimators of µ:

µn(X1, . . . , Xn) =1n

n∑

i=1Xi

µ∗n(X1, . . . , Xn) =1n

n∑

i=1Xi +

2n

• µn is (weakly) consistent for µ(Theorem 5.6, Slide 210: weak law of large numbers)

235

Page 239: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• µ∗n is (weakly) consistent for µ(this follows from Theorem 5.9(a), Slide 215)

• Exact distribution of µn:

µn ∼ N(µ, σ2/n)

(linear transformation of the normal distribution)

• Exact distribution of µ∗n:

µ∗n ∼ N(µ + 2/n, σ2/n)

(linear transformation of the normal distribution)

236

Page 240: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Pdf’s of the estimator µn for n = 2,10,20 (σ2 = 1)

237

6

4

2

-1 -0.5 µ=0 0.5 1 0

8

Page 241: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Pdf’s of the estimator µ∗n for n = 2,10,20 (σ2 = 1)

238

6

4

2

-0.5 µ=0 0.5 1 1.5 2 2.5 0

8

Page 242: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remarks:

• Sufficient (but not necessary) condition for consistency:

limn→∞E(θn) = θ (asymptotic unbiasedness)

limn→∞Var(θn) = 0

• Possible properties of an estimator:

consistent and unbiased

inconsistent and unbiased

consistent and biased

inconsistent and biased

239

Page 243: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Next:

• Application of the central-limit theorem to estimators

−→ asymptotic normality of an estimator

Definition 5.19: (Asymptotic normality)

An estimator θn(X1, . . . , Xn) of the parameter θ is called asymp-totically normal if there exist (1) a sequence of real constantsθ1, θ2, . . . and (2) a function V (θ) such that

√n ·

(

θn − θn) d→ U ∼ N(0, V (θ)).

240

Page 244: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remarks:

• Alternative notation:

θnappr.∼ N(θn, V (θ)/n)

• The concept of asymptotic normality naturally extends tomultivariate settings

241

Page 245: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

5.3 Methods of Estimation

Up to now:

• Definitions + properties of estimators

Next:

• Construction of estimators

Three classical methods:

• Method of Lesst Squares (LS)

• Method of Moments (MM)

• Maximum-Likelihood method (ML)

242

Page 246: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remarks:

• There are further methods(e.g. the Generalized Method-of-Moments, GMM)

• Here: focus on ML estimation

243

Page 247: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

5.3.1 Least-Squares Estimators

History:• Introduced by

A.M. Legendre (1752-1833)C.F. Gauß (1777-1855)

Idee:• Approximate the (noisy) observations x1, . . . , xn by functions

gi(θ1, . . . , θm), i = 1, . . . , n, m < n such that

S(x1, . . . , xn; θ) =n

i=1[xi − gi(θ)]2 −→ min

θ• The LS-estimator is then defined to be

θ(X1, . . . , Xn) = argmin S(X1, . . . , Xn; θ)

244

Page 248: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remark:

• The LS-method is central to the linear regression model(cf. the courses ’Econometrics’ I + II)

245

Page 249: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

5.3.2 Method-of-moments Estimators

History:

• Introduced by K. Pearson (1857-1936)

Definition 5.20: (Theoretical and sample moments)

(a) Let X be a random variable with expectation E(X). Thetheoretical p-th moment of X, denoted by µ′p, is defined as

µ′p = E(Xp).

The theoretical p-th central moment of X, denoted by µp, isdefined as

µp = E [X − E(X)]p .

246

Page 250: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

(b) Let X1, . . . , Xn be a random sample from X and let X denotethe arithmetic sample mean. Then the p-th sample moment,denoted by µ′p, is defined as

µ′p =1n

n∑

i=1Xp

i .

The p-th central sample moment, denoted by µp, is definedas

µp =1n

n∑

i=1

(

Xi −X)p

.

247

Page 251: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remarks:

• The theoretical moments µ′p and µp had already been intro-duced in Definition 2.21 (Slide 76)

• The sample moments µ′p and µp are (weakly) consistent es-timators of the theoretical moments µ′p and µp

• The arithmetic sample mean is the 1st sample moment ofX1, . . . , Xn

• The sample variance is the 2nd central sample moment ofX1, . . . , Xn

248

Page 252: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

General setting:

• Based on the random sample X1, . . . , Xn from X estimate ther unknown parameters θ1, . . . , θr

Basic idea of the method of moments:

1. Express the r theoretical moments as functions of the r un-known parameters:

µ′1 = g1(θ1, . . . , θr)...

µ′r = gr(θ1, . . . , θr)

249

Page 253: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

2. Express the r unknown parameters as functions of the r the-oretical moments:

θ1 = h1(µ1, . . . , µr, µ′1, . . . , µ′r)...

θr = hr(µ1, . . . , µr, µ′1, . . . , µ′r)

3. Replace the theoretical moments by the sample moments:

θ1(X1, . . . , Xn) = h1(µ1, . . . , µr, µ′1, . . . , µ′r)...

θr(X1, . . . , Xn) = hr(µ1, . . . , µr, µ′1, . . . , µ′r)

250

Page 254: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Example: (Exponential distribution)

• Let the random variable X have an exponential distributionwith parameter λ > 0 and pdf

fX(x) =

λe−λx , for x > 00 , elsewise

• The expectation and the variance of X are given by

E(X) =1λ

Var(X) =1λ2

251

Page 255: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• Method-of-moments estimator via the expectation:

1. We know that

E(X) = µ′1 =1λ

2. This implies

λ =1µ′1

3. Method-of-moments estimator of λ:

λ(X1, . . . , Xn) =1

1/n∑n

i=1 Xi

252

Page 256: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• Method-of-moments estimator via the variance:

1. We know that

Var(X) = µ2 =1λ2

2. This implies

λ =

1µ2

3. Method-of-moments estimator of λ:

λ(X1, . . . , Xn) =

1

1/n∑n

i=1

(

Xi −X)2

−→ Method-of-moment estimators of an unknown parameterare not unique

253

Page 257: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remarks:

• Method-of-moment estimators are (weakly) consistent since

plim θ1 = plim h1(µ1, . . . , µr, µ′1, . . . , µ′r)

= h1(plim µ1, . . . ,plim µr,plim µ′1, . . . ,plim µ′r)

= h1(µ1, . . . , µr, µ′1, . . . , µ′r)

= θ1

• In general, method-of-moments estimators are not unbiased

• Method-of-moments estimators typically are asymptoticallynormal

• The asymptotic variances are often hard to determine

254

Page 258: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

5.3.3 Maximum-Likelihood Estimators

History:

• Introduced by Ronald Fisher (1890-1962)

Basic idea behind ML estimation:

• We estimate the unknown parameters θ1, . . . , θr in such amanner that the likelihood of the observed sample x1, . . . , xn,which we express as a function of the unknown parameters,becomes maximal

255

Page 259: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Example:

• Consider an urn containing black and white balls

• The ratio of numbers is known to be 3 : 1

• It is not known if the black or the white balls are more nu-merous

• Draw n balls with replacement

• Let X denote the number of black balls in the sample

• Discrete density of X:

P (X = x) =(nx

)

px(1−p)n−x, x ∈ 0,1, . . . , n, p ∈ 0.25,0.75

(binomial distribution)

256

Page 260: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• p ∈ 0.25,0.75 is the parameter to be estimated

• Consider a particular sample of size n = 3−→ potential realizations:

Number of black balls: x 0 1 2 3P (X = x; p = 0.25) 27

642764

964

164

P (X = x; p = 0.75) 164

964

2764

2764

• Intuitive estimation:We estimate p by that value which ex-ante maximizes theprobability of observing the actual realization x

p =

0.25 , fur x = 0,10.75 , fur x = 2,3

−→ Maximum-Likelihood (ML) estimation

257

Page 261: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Next:

• Formalization of the ML estimation technique

Notions:

• Likelihood-, Loglikelihood function

• ML estimator

Definition 5.21: (Likelihood function)

The likelihood function of n random variables X1, . . . , Xn is de-fined to be the joint density of the n random variables, sayfX1,...,Xn(x1, . . . , xn; θ), which is considered to be a function ofthe parameter vector θ.

258

Page 262: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remarks:

• If X1, . . . , Xn is a random sample from the continuous randomvariable X with pdf fX(x, θ), then

fX1,...,Xn(x1, . . . , xn; θ) =n∏

i=1fXi(xi; θ) =

n∏

i=1fX(xi; θ)

• The likelihood function is often denoted by L(θ;x1, . . . , xn)or L(θ), i.e. in the above-mentioned case

L(θ;x1, . . . , xn) = L(θ) =n∏

i=1fX(xi; θ)

259

Page 263: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• If the X1, . . . , Xn are a sample from a discrete random variableX, the likelihood function is given by

L(θ;x1, . . . , xn) = P (X1 = x1, . . . , Xn = xn; θ) =n∏

i=1P (X = xi; θ)

(likelihood = probability that the observed sample occurs)

Example:

• Let X1, . . . , Xn be a random sample from X ∼ N(µ, σ2). Thenθ = (µ, σ2)′ and the likelihood function is given by

L(θ;x1, . . . , xn) =n∏

i=1

1√2πσ2

e−1/2((xi−µ)/σ)2

=( 12πσ2

)n/2· exp

−1

2σ2

n∑

i=1(xi − µ)2

260

Page 264: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Definition 5.22: (Maximum-likelihood estimator)

Let L(θ, x1, . . . , xn) be the likelihood function of the random vari-ables X1, . . . , Xn. If θ [where θ = θ(x1, . . . , xn) is a function ofthe observations x1, . . . , xn] is the value of θ which maximizesL(θ, x1, . . . , xn), then θ(X1, . . . , Xn) is the maximum-likelihood es-timator of θ.

Remarks:

• We obtain the ML estimator via (1) maximizing the likelihoodfunction

L(θ;x1, . . . , xn) = maxθ

L(θ;x1, . . . , xn),

and (2) by replacing the realizations x1, . . . , xn by the randomvariables X1, . . . , Xn

261

Page 265: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• It is often easier to maximize the loglikelihood function

ln[L(θ;x1, . . . , xn)]

(L(θ) and ln[L(θ)] have their maxima at the same value ofθ)

• We derive θ = (θ1, . . . , θr)′ by solving the system of equations

∂∂ θ1

ln[L(θ;x1, . . . , xn)] = 0

...∂

∂ θrln[L(θ;x1, . . . , xn)] = 0

262

Page 266: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Example:

• Let X1, . . . , Xn be a random sample from X ∼ N(µ, σ2) withthe likelihood function

L(µ, σ2) =( 12πσ2

)n/2· exp

−1

2σ2

n∑

i=1(xi − µ)2

• The loglikelihood function is given by

L∗(µ, σ2) = ln[L(µ, σ2)]

= −n2

ln(2π)−n2

ln(σ2)−1

2σ2

n∑

i=1(xi − µ)2

263

Page 267: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• The partial derivatives are given by

∂ L∗(µ, σ2)∂ µ

=1σ2

n∑

i=1(xi − µ)

and

∂ L∗(µ, σ2)∂ σ2 = −

n2

1σ2 +

12σ4

n∑

i=1(xi − µ)2

• Setting these equal to zero, solving the system of equationsand replacing the realizations by the random variables yieldsthe ML estimators

µ(X1, . . . , Xn) =1n

n∑

i=1Xi = X

σ2(X1, . . . , Xn) =1n

n∑

i=1

(

Xi −X)2

264

Page 268: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

General properties of ML estimators:

• Distributional assumptions are necessary

• Under rather mild regularity conditions ML estimators havenice properties:

1. If θ is the ML estimator of θ, then g(θ) is the ML estimatorof g(θ)(equivariance property)

2. (Weak) consistency:

plim θn = θ

265

Page 269: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

3. Asymptotic normality:√

n(

θn − θ) d→ U ∼ N(0, V (θ))

4. Asymptotic efficiency:V (θ) coincides with the Cramer-Rao lower bound

5. Direct computation (numerical methods)

6. Quasi-ML estimation:ML estimators computed on the basis of normally dis-tributed random samples are robust even if the randomsample actually is not normally distributed(robustness against distribution misspecification)

266

Page 270: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

6. Hypothesis Testing

Setting:

• Let X represent the random experiment under consideration

• Let X have the unknown cdf FX(x)

• We are interested in an unknown parameter θ in the distri-bution of X

Now:

• Testing of a statistical hypothesis on the unknown θ on thebasis of a random sample X1, . . . , Xn

267

Page 271: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Example 1:

• In a our local pub the glasses are said to contain 0.4 litresof beer. We suspect that in many cases the glasses actuallycontain less than 0.4 litres of beer

• Let X represent the process of ’filling a glass of beer’

• Let θ = E(X) denote the expected amount of beer filled inone glass

• On the basis of a random sample X1, . . . , Xn we would liketo test

θ = 0.4 versus θ < 0.4

268

Page 272: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Example 2:

• We know from past data that the risk of a specific stock(measured by the standard deviation of the stock return) hasbeen equal to 25%. Now, there is a change in the managerialboard of the firm. Does this change affect the risk of thestock?

• Let X represent the stock return

• Let θ =√

Var(X) = SD(X) denote the standard deviation ofthe return

• On the basis of a random sample X1, . . . , Xn we would liketo test

θ = 0.25 versus θ 6= 0.25

269

Page 273: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

6.1 Basic Terminology

Definition 6.1: (Parameter test)

Let X be a random variable and let θ be an unknown parameter inthe distribution of X. A parameter test constitutes a statisticalprocedure for deciding on a hypothesis concerning the unknownparameter θ on the basis of a random sample X1, . . . , Xn fromX.

Statistical hypothesis-testing problem:

• Let Θ denote the set of all possible parameter values(i.e. θ ∈ Θ; we call Θ the parameter space)

• Let Θ0 ⊂ Θ be a subset of the parameter space

270

Page 274: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• Consider the following statements:

H0 : θ ∈ Θ0 versus H1 : θ ∈ Θ/Θ0 = Θ1

• H0 is called the null hypothesis, H1 is called the alternativehypothesis

Types of hypotheses:

• If |Θ0| = 1 (i.e. Θ0 = θ0) and H0 : θ = θ0, then H0 is calledsimple

• Otherwise H0 is called composite

• An analogous terminology applies to H1

271

Page 275: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Types of hypothesis tests:

• Let θ0 ∈ Θ be a real constant. Then

H0 : θ = θ0 versus H1 : θ 6= θ0

is called a two-sided test

• The tests

H0 : θ ≤ θ0 versus H1 : θ > θ0

and

H0 : θ ≥ θ0 versus H1 : θ < θ0

are called one-sided tests (right- and left-sided tests)

272

Page 276: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Next:• Consider the general testing problem

H0 : θ ∈ Θ0 versus H1 : θ ∈ Θ1 = Θ/Θ0

General procedure:• Based on a random sample X1, . . . , Xn from X decide on

whether to reject H0 in favor of H1 or not

Explicit procedure:• Select an ’appropriate’ test statistic T (X1, . . . , Xn) and de-

termine an ’appropriate’ critical region K ⊂ R• Decision:

T (X1, . . . , Xn) ∈ K =⇒ reject H0T (X1, . . . , Xn) /∈ K =⇒ do not reject (accept) H0

273

Page 277: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Notice:

• T (X1, . . . , Xn) is a random variable−→ the decision is random−→ possibility of wrong decisions

• Types of errors:

Decision based on testReality reject H0 accept H0H0 true type I error correct decisionH0 false correct decision type II error

Conclusion:

• Type I error: test rejects H0 when H0 is true

• Type II error: test accepts H0 when H0 is false

274

Page 278: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

When do wrong decisions occur?

• The type I error occurs if

T (X1, . . . , Xn) ∈ K

when for the true parameter θ we have θ ∈ Θ0

• The type II error occurs if

T (X1, . . . , Xn) /∈ K,

when for the true parameter θ we have θ ∈ Θ1

275

Page 279: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Question:

• When does a hypothesis test of the form

H0 : θ ∈ Θ0 versus H1 : θ ∈ Θ1 = Θ/Θ0

have ’good’ properties?

Intuitively:

• A test is ’good’ if it possesses low probabilities of committingtype I and type II errors

Next:

• Formal instrument for measuring type I and type II errorprobabilities

276

Page 280: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Definition 6.2: (Power function of a test)

Consider a hypothesis test of the general form given on Slide 276with the test statistic T (X1, . . . , Xn) and an ’appropriately cho-sen’ critical region K. The power function of the test, denotedby G(θ), is defined to be the probability that the test rejects H0when θ is the true (unknown) parameter. Formally,

G : Θ −→ [0,1]

with

G(θ) = P (T (X1, . . . , Xn) ∈ K).

277

Page 281: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remark:

• Using the power function of a test, we can express the prob-abilities of the type I error as

G(θ) for all θ ∈ Θ0

and the probabilities of the type II error as

1−G(θ) for all θ ∈ Θ1

Question:

• What should an ideal test look like?

Intuitively:

• A test would be ideal if the probabilities of both the type Iand the type II errors were constantly equal to zero−→ the test would yield the correct decision with probab. 1

278

Page 282: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Example:

• For θ0 ∈ Θ consider the testing problem

H0 : θ ≤ θ0 versus H1 : θ > θ0

Power function of an ideal test

279

Page 283: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Unfortunately:

• It can be shown mathematically that, in general, such anideal test does not exist

Way out:

• For the selected test statistic T (X1, . . . , Xn) consider themaximal type-I-error probability

α = maxθ∈Θ0

P (T (X1, . . . , Xn) ∈ K) = maxθ∈Θ0

G(θ)

• Now, fix the critical region K in such a way that α takes ona prespecified small value

280

Page 284: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

−→ all type-I-error probabilities are less than or equal to α

• Frequently used α-values: α = 0.01, α = 0.05, α = 0.1

Definition 6.3: (Size of test)

Consider a hypothesis test of the general form given on Slide276 with the test statistic T (X1, . . . , Xn) and an appropriatelychosen critical region K. The size of the test (also known asthe significance level of the test) is defined to be the maximaltype-I-error probability

α = maxθ∈Θ0

P (T (X1, . . . , Xn) ∈ K) = maxθ∈Θ0

G(θ).

281

Page 285: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Implications of this test construction:

• The probability of the test rejecting H0 when in fact H0 istrue (i.e. the type-I-error probability) is α at the utmost−→ if, for a concrete sample, the test rejects H0, we can be

quite sure that H0 is in fact false(we say that H1 is statistically significant)

• By contrast, we cannot control for the type-II-error proba-bility (i.e. for the probability of the test accepting H0 whenin fact H0 is false)−→ if, for a concrete sample, the test accepts H0, then there

is no probability assessment of a potentially wrong deci-sion(acceptance of H0 simply means: the data are not incon-sistent with H0)

282

Page 286: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Therefore:

• It is crucial how to formulate H0 and H1

• We formulate our research hypothesis in H1(hoping that, for a concrete sample, our test rejects H0)

Example:

• Consider Example 1 on Slide 268

• If, for a concrete sample, our test rejects H0 we can be quitesure that (on average) the glasses contain less than 0.4 litresof beer

• If our test accepts H0 we cannot make a statistically signifi-cant statement(the data are not inconsistent with H0)

283

Page 287: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

6.2 Classical Testing Procedures

Next:

• Three general classical testing procedures based on the log-likelihood function of a random sample

Setting:

• Let X1, . . . , Xn be a random sample from X

• Let θ ∈ R be an unknown parameter

• Let L(θ) = L(θ;x1, . . . , xn) denote the likelihood function

284

Page 288: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

• Let ln[L(θ)] denote the loglikelihood function

• Assume g : R −→ R to be a continuous function

• Consider the testing problem:

H0 : g(θ) = q versus H1 : g(θ) 6= q

Fundamental to all three tests:

• Maximum-Likelihood estimator θML of θ

285

Page 289: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

6.2.1 Wald Test

History:

• Suggested by A. Wald (1902-1950)

Idea behind this test:

• If H0 : g(θ) = q is true, then the random variable g(θML)− qshould not be significantly different from zero

286

Page 290: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Previous knowledge:

• Equivariance property of the ML estimator (Slide 265)−→ g(θML) is the ML estimator of g(θ)

• Asymptotic normality (Slide 266)

−→(

g(θML)− g(θ)) d→ U ∼ N(0,Var(g(θML)))

• The asymptotic variance Var(g(θML)) needs to be estimatedfrom the data

Wald test statistic:

W =

[

g(

θML)

− q]2

Var[

g(

θML)]

d(under H0)−→ U ∼ χ2

1

287

Page 291: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Test decision:

• Reject H0 at the significance level α if W > χ21;1−α

Remarks:

• The Wald test is a pure test against H0(it is not necessary to exactly specify H1)

• The Wald principle can be applied to any consistent, asymp-totically normally distributed estimator

288

Page 292: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Wald test statistic for H0 : g(θ) = 0 versus H1 : g(θ) 6= 0

289

g(θ )

≈ Wθ

MLθ

( )]ln[ θL

Page 293: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

6.2.2 Likelihood-Ratio Test (LR Test)

Idea behind this test:

• Consider the likelihood function L(θ) at 2 points:max

θ:g(θ)=qL(θ) (= L(θH0))

maxθ∈Θ

L(θ) (= L(θML))

• Consider the quantity

λ =L(θH0)

L(θML)

• Properties of λ:0 ≤ λ ≤ 1If H0 is true, then λ should be close to one

290

Page 294: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

LR test statistic:

LR = −2 ln(λ) = 2

ln[

L(θML)]

− ln[

L(θH0)]

d(under H0)−→ U ∼ χ2

1

Properties of the LR test statistic:

• 0 ≤ LR < ∞

• If H0 is true, then LR should be close to zero

Test decision:

• Reject H0 at the significance level α if LR > χ21;1−α

291

Page 295: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remarks:

• The LR test verifies if the distance in the loglikelihood func-tions, ln[L(θML)]− ln[L(θH0)], is significantly larger than 0

• The LR test does not require the computation of any asymp-totic variance

292

Page 296: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

LR test statistic for H0 : g(θ) = 0 versus H1 : g(θ) 6= 0

293

)](ln[ MLL∧θ g(θ ) g(θ ) ≈LR

)](ln[ 0HL∧θ

θ

MLθ0

ˆHθ

ln[L(θ )]

Page 297: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

6.2.3 Lagrange-Multiplier Test (LM Test)

History:

• Suggested by J.L. Lagrange (1736-1813)

Idea behind this test:

• For the ML estimator θML we have

∂ ln[L(θ)]∂ θ

θ=θML

= 0

• If H0 : g(θ) = q is true, then the slope of the loglikelihoodfunction at the point θH0 should not be significantly differentfrom zero

294

Page 298: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

LM test statistic:

LM =

∂ ln[L(θ)]∂ θ

θH0

2

·[

Var(

θH0

)]−1d

(under H0)−→ U ∼ χ21

Test decision:

• Reject H0 at the significance level α if LM > χ21;1−α

295

Page 299: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

LM test statistic for H0 : g(θ) = 0 versus H1 : g(θ) 6= 0

296

( )θθ

∂∂ ]ln[L

MLθ

( )]ln[ θL

≈ LM θ

g(θ)

0H∧θ

Page 300: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

Remarks:

• The test statistics of both, the Wald and the LM tests, con-tain the estimated variances of the estimator θH0

• These unknown variances can be estimated consistently bythe co-called Fisher-information

• Many econometric tests are based on these three construc-tion principles

• The three tests are asymptotically equivalent, i.e. for largesample sizes they produce identical test decisions

• The three principles can be extended to the testing of hy-potheses on a parameter vector θ

• If θ ∈ Rm, then all 3 test statistics have a χ2m distribution

under H0

297

Page 301: Slides Advanced Statistics - uni-muenster.de · Slides Advanced Statistics Winter Term 2014/2015 (October 13, 2014 – November 24, 2014) ... Chiang, A. (1984). Fundamental Methods

The 3 tests in one graph

298

( )

θθ

∂∂ Lln

ln[( )] ML∧θ

ln[( )] 0H∧θ ≈ LR g(θ )

≈ LM ≈ Wθ

MLθ0

ˆHθ

( )θLln