Probability Models and Their Parametric Estimation

8/6/2019 Probability Models and Their Parametric Estimation

1/237

PROBABILITY MODELS AND THEIR PARAMETRIC ESTIMATION

NET/JRF/CSIR EXAMINATIONS

A. SANTHAKUMARAN


2/237

Dr. A. Santhakumaraan

Associate Professor and Head

Department of Statistics

Salem Sowdeswari College

Salem - 636010

Tamil - Nadu

E-mail: ask.stat @ yahoo.com


3/237

About the Author

Dr. A. Santhakumaran is an Associate Professor and Head Department of Statistics at

Salem Sowdeswari College, Slaem - 10, Tamil Nadu. He holds a Ph.D. in Statistics -

Mathematics from the Ramanujan Institute for Advanced Study in Mathematics, Univer-

sity of Madras. He has interests in Stochastic Processes and Their Applications. He hasto his credit over 31 research papers in Feedback Queues, Statistical Quality Control and

Reliability Theory. He is the authour of the book Fundamentals of Testing Statistical

Hypotheses and Research Methodology.


4/237

Acknowledgments

My special thanks to the Correspondent and Secretary of Salem Sowdeswari

College , Salem and my colleagues for their enthusiastic and unstinted support ren-

dered for publishing this book. I am grateful to Professor V. Thangaraj, RIASM, Uni-versity of Madras, for his encouragement for writing the book. My greatest debt is

Dr. J. Subramaniam, Professor of Mathematics, Bannari Amman Institute of Technol-

ogy, Sathyamangalam, who read most of the manuscript and whose critical comments

resulted in numerous significant improvements. My thanks to Mr. G. Narayanan, Ra-

manujan Institute Computer Centre, RIASM, University of Madras, for the suggestions

rendered by him towards the successful completion of the Latex typeset of the book.

Finally, I wish to express my gratitude to all my teachers under whose influ-

ence I have come to appreciate statistics as the science of winding and twisting net-

work, connecting Mathematics , Scientific Philosophy, Computer Software and other

intellectual sources of the Millennium. A.SANTHAKUMARAN


5/237

PREFACE

Even though the science of Statistics was originated more than 200 years ago ,

it was recognized as a separate discipline in the early 1940 in India. From then to

till now statistics is evolving as a versatile powerful and indispensable instrument foranalyzing the statistical data in real life problems. We have reached a stage where no

empirical science can afford to ignore the science of Statistics, since the diagnosis of

pattern of recognition can be achieved through the science of Statistics. Because of the

speedy growth of modern science and technology, one who learns statistics, he must

have capacity, knowledge and intellect. Bird has capacity to imitate when we taught.

The child is not born with a language. But it is born into an innate capacity to learn

language. So when we teach the child, the child manipulates the structure and creates

sentences. But a bird cannot do this. So the child has knowledge and capacity to create

new sentences. If a man has the ability and knowledge he can be inventiveness and

innovation constitute intellect.

If a student has ability, knowledge and intellect, then he will be able to learn and

implement statistics successfully. If these three faculties are lacking, learning of statis-

tics will not be possible. We shall give a number of examples drawn from the story ofimprovement of natural knowledge and the success of decision making. It shows how

statistical ideas played an important role in scientific investigations and other decision

making processes. The most successful man in life is one who makes the best deci-

sion based on the available information. Practically it is a very difficult task to take a

decision on a real life problems. We illustrate this with the help of following examples.

One wants to know that how many ways a bread can be divided into two equivalent

parts. Immediately one reflects that it is divided into a finite number of ways. In fact

the bread is divided into two equivalent parts in infinite number of ways. Naturally

every article can have infinite dimension. Our interest of study may be one dimension

namely, length of the bread, Area ( = length breath ) two dimension and Volume( = length height breadth) three dimension and so on. Analogous to this arethe measures of average ( location), measures of variability ( scale) and measures of

skewness and kurtosis (shape).

Another example is that a new two wheeler is introduced by a manufacturer in the

market. The manufacturer wants to announce that the two wheeler gives how much

kilometer per litre on road. For this purpose, the manufacturer ride the two wheeler on

the road three times and observed that the two wheeler gives 50 km per litre, 55 km

per litre and 60 km per litre respectively. Suddenly one comes to the mind that the two

wheeler gives = 50+55+603 = 55 km per litre. This is absolutely wrong. Actually thetwo wheeler gives 60 km per litre, the value of the maximum order statistic.

A cyclist pedals from his house to his college at a speed of 10 mph and returns back

his house from the college at a speed of 15 mph. He wants to know his average speed.

One assumes that the distance between the house and the college is x miles. Then the

average speed of the cyclist = Total distanceTotal time taken

= 2xx10 +

x15

= 12 mph which is the

Harmonic Mean.Seven students and a master want to cross a river from one side to other side. The

students are not able to swim to cross the river. The master measures average height

of the students which is 5.5. He also measures the depth of the river from one side

5


6/237

to other side in 10 places 2, 2.5, 4, 5.5, 6, 6.5, 10, 2.5,1.5,1which has

4.15 average depth of the river. The master takes a decision to cross the river on foot,

since average height of the students is greater than the average depth of the river. The

students fail to cross the river, since some place the depth of the river is more than

5.5. The master is not happy for his decision. The master has succeeded to take a

decision if the minimum height of the students is greater than the maximum depth of

the river.

Keeping this in mind, the first chapter of the book deals with some of the well

known distributions he pattern of recognition of statistical distributions. Chapter 2

gives the criteria of point estimation. Chapter 3 focuses on the study of optimal estima-

tion. Chapter 4 illustrates the properties of complete family of distributions. Chapter 5

explains the methods of estimation. Chapter 6 discusses interval estimation. Chapter 7

consists of Bayesian estimation.

6


7/237

DISTINCTIVE FEATURES

Care has been taken to provide conceptual clarity, simplicity and up to date ma-terials.

Properly graded and solved problems to illustrate each concept and procedureare presented in the text.

About 300 solved problems and 50 remarks. A chapter on complete family of distributions. It is intended to serve as a text book of one semester course on Statistical Infer-

ence of Under - Graduate and Post - Graduate Statistics of Indian universities

and other Applicable Sciences, Allied Statistical Courses, Mathematical Sci-

ences and various Competitive Examinations like ISS, UGC Junior Fellowship,

SLET, NET etc.

Salem - 636010 A.Santhakumaran

January 2010

7


8/237

CONTENTS

1 Diagnosis of Statistical Pattern 1 32

1.1 Introduction

1.2 Collection of data

1.3 Diagnosing the Probability Models data

1.4 Discrete Probability Models

1.5 Continuous Probability Models

1.6 Diagnosis of Probability Models

1.7 Quantile - Quantile plot

2 Criteria of point estimation 33 732.1 Introduction

2.2 Point estimator

2.3 Problems of point estimation

2.4 Criteria of the point estimation

2.5 Consistency

2.6 Sufficient condition for consistency

2.7 Unbiased estimator

2.8 Sufficient Statistic

2.9 Neyman Factorizability Criterion

2.10 Exponential family of distributions

2.11 Distribution Admitting Sufficient Statistic

2.12 Joint Sufficient Statistics

2.13 Efficient estimator

3 Complete Family of Distributions 74 943.1 Introduction

3.2 Completeness

3.3 Minimal Sufficient Statistic

4 Optimal Estimation 95 151

8


9/237

4.1 Introduction

4.2 Uniformly Minimum Variance Unbiased Estimator

4.3 Uncorrelatedness Approach

4.4 Rao - Balckwell Theorem

4.5 Lehman - Scheffe Theorem

4.6 Inequality Approach

4.7 Cramer Rao Inequality

4.8 Chapman - Robbin Inequality

4.9 Efficiency

4.10 Extension of Cramer- Rao Inequality

4.11 Cramer - Rao Inequality - Multiparameter case

4.12 Bhattacharya Inequality

5 Methods of Estimation 152 2035.1 Introduction

5.2 Method of Maximum Likelihood Estimation

5.3 Numerical Methods of Maximum Likelihood Estimation

5.4 Optimum property of MLE

5.5 Method of Minimum Variance Bound Estimation

5.6 Method of Moment Estimation

5.7 Method of Minimum Chi - Square Estimation

5.8 Method of Least Square Estimation

5.9 Gauss Markoff Theorem

6 Interval Estimation 204 2266.1 Introduction

6.2 Confidence Intervals

6.3 Alternative Method of Confidence Intervals

6.4 Shortest Length Confidence Intervals

7 Bayes estimation 227 245

9


10/237

7.1 Introduction

7.2 Bayes point estimation

7.3 Bayes confidence intervals

References

Glossary of Notation

Appendix

Answers to problems

Index

10


11/237

1. DIAGNOSIS OF STATISTICAL PATTERN

1.1 Introduction

Statistics is a decision making tool which aims to resolve the real life problems.It originated more than 2000 years ago, but it was recognized as a separate discipline

from 1940 in India. From then till now , statistics is evolving as a versatile powerful and

indispensable instrument for investigation in all fields of real life problems. It provides

a wide variety of analytical tools. We have reached a stage where no empirical science

can afford to ignore the science of statistics since the diagnosis of pattern of recognition

can be achieved through the science of statistics.

Statistics is a method of obtaining and analyzing data in order to take decisions

on them. In India, during the period of Chandra Gupta Maurya there was an efficient

system of collecting official and administrative statistics. During Akbars reign ( 1556

- 1605AD) people maintained good records of land and agricultural statistics. Statistics

surveys were also conducted during his reign.

Sir Ronald A. Fisher known as Father of statistics placed statistics on a very

sound footing by applying it to various diversified fields. His contributions in statistics

led to a very responsible position of statistics among sciences

Professor P. C. Mahalanobis is the founder of statistics in India. He was a

physicist by training , a statistician by instinct and an economist by conviction. Gov-

ernment of India has observed on 29th June the birthday of Professor Prasanta Chan-dra Mahalanobis as National Statistics Day. Professor C.R. Rao is an Indian legend

, whose career spans the history of modern statistics. He is considered by many to be

the greatest living statistician in the world to day.

There are many definitions of the term statistics . Some authors have defined

statistics as statistical data ( plural sense) and others as statistical methods ( singular

sense).

Statistics as Statistical Data

Yule and Kendall state By statistics we mean quantitative data affected to a

marked extent by multiplicity of causes. Their definition point out the following char-

acteristics:

Statistics are aggregates of facts. Statistics are affected to a marked extent by multiplicity of causes. Statistics are numerically expressed. Statistics are enumerated or estimated according to reasonable standards of ac-

curacy.

Statistics are collected in a systematic manner. Statistics are collected for a pre - determined purpose and Statistics should be placed in relation to each other.

11

Probability Models and their Parametric Estimation


12/237

Statistics as Statistical Methods

One of the best definitions of statistics is given by Croxton and Cowden. They

define statistics as the science which deals with collection, analysis and interpretation

of numerical data. This definition points out the scientific ways of :

Data collection Data presentation data analysis Data interpretation

Statistics as Statistical Models and Methods

Statistics is an imposing form of Mathematics. The usage of statistical methods

has been briskly expanding in the late 20th century, because of the application valueof the statistical models and methods have greater implication in the applications of

many inter - disciplinary sciences. So we define Statistics as the science of winding

and twisting network connecting Mathematics, Scientific Philosophy, Computer

software and other intellectual sources of the millennium.

This definition reveals that statisticians work to translate real life problems

into mathematical models by using assumptions or axioms or principles. Then they

derive exact solutions by their knowledge and thereby intellectually validate the results

and express their merits in non-mathematical forms which make for the consistency of

real life problems.

In real life problems, there are many situations where the actions of the en-

tities within the system under study cannot be completely predicted with 100 percent

perfection . There is always some variation. The variation can be classified into two

categories, i.e., variation due to assignable causes which has to be identified and elim-

inated; and variation due to chance causes which is equal to 6 values. This is alsocalled natural variation. In general, the reduction of natural variation is not necessaryand involves more cost. So it is not feasible to reduce the natural variation. However,

some appropriate statistical patterns of recognition may well describe the causes of

variations.

An appropriate statistical pattern of recognition can be diagnosed by repeated

sampling of phenomenon of interest. Then, through the systematic study of these data,

a statistician can obtain a known distribution suitable for the data and estimates the

parameters of the distribution. A statistician takes continuous efforts in the selection of

a distribution form.

There are four steps in the diagnosis of a statistical distribution. They are

(i) Data collection

Data collection for real life problems often requires a substantial knowledge on

the problems, planning time and resource commitment.

(ii) Identification of statistical pattern

When the data are available, identification of a probability distribution begins

12

A. Santhakumaran


13/237

by developing a frequency distribution or Histogram of the data. Based on the

pattern of frequency distribution and knowledge on the nature and behaviour of

the process, a family of distributions is chosen.

(iii) Parameters selectionChoose parameters that determine a specific instance of a distribution family

when the data are available. These parameters are estimated from the data.

(iv) Validity of the distribution

The validity of the chosen distribution and the associated parameters are evalu-

ated with the help of statistical tests. The validity of various assumptions made

on parameter is achieved by certain level of significance only.

If the chosen distribution is not a good approximation of the data, then the analyst

goes to the second step, chooses a different family of distributions and repeats the

procedure.

If the several iterations of this procedure fail to give a fit between an assumed

distributional form and the collected data, then the empirical form of the distribution

may be used.

1.2 Collection of Data

Collection of data is one of the important tasks in finding a solution for real life

problems. Even if the statistical pattern of the real life problems are valid, if the data

are inaccurately collected, inappropriately analyzed or not representative of the real life

problems, then the data will be misleading when used for decision making.

One can learn data collection from an actual experience. The following sug-

gestions may enhance and facilitate data collection. Data collection and analysis must

be tackled with great care.

(i) Before collecting data, planning is very important. It could commence by a prac-

tice of pre - observing experience. Try to collect the data while pre - observing.

Forms of the data are devised for due purposes. It is very likely that these forms

will have to be modified several times before the actual data collection begins.

Watch for unusual situations or normal circumstances and consider how they

will be handled. Planning is very important even if the data are collected au-

tomatically. After collecting the data, find out whether the collected data are

appropriate or not.

(ii) If the data being collected are adequate to diagnosize the statistical distributions,

then determine the apt distribution. If the data being used are useless to diagno-

size the statistical distribution, then there is no need to collect superfluous data.

(iii) Try to combine homogeneous data sets. Check data for homogeneity in suc-

cessive time periods and during the same time period on successive interval oftimes.

13



14/237

(iv) Beware of the possibility of data censoring, in which a quantity of interest is not

observed in its entirety. This problem most often occurs when the analyst is

interested in the time required to complete some process but the process begins

prior to or finishes after the completion of the observation period. Censoring can

result in especially long process times being left out of the data sample.

(v) One may use scatter diagram which indicates the relationship between the two

variables of interest.

(vi) Consider the possibility that a sequence of observations which appear to be in-

dependent may possess autocorrelation. Autocorrelation may exist in successive

time periods.

1.3 Diagnosis of a distribution with data

The methods for selecting families of distributions are possible, if only the sta-

tistical data are available. The specific distribution within a family is specified by

estimating its parameters. Estimating the parameters of a family of distributions leadsto the theory of estimation.

The formation of frequency distribution or Histogram is useful in guessing the

shape of a distribution. Hines and Montgomery state that choosing the number of class

intervals approximately equals the square root of the sample size. If the intervals are too

long, the Histogram will be coarse or blocking and its shape and other details will not

smoothen the data. So one has to allow the interval sizes to change until a good choice

is found. The Histogram for continuous data corresponds to the probability density

function of a theoretical distribution. If continuous, a line drawn through the centre

point of each class interval frequency should result in a shape like that of probability

density function (pdf )( see Figure 1.2).Histogram for discrete data where there are a large number of data points,

should have a cell for each value in the range of the data. However if there are a few

data points, it may be necessary to combine adjacent cells to eliminate the ragged ap-pearance of the Histogram. If the Histogram is associated with discrete data, it should

look like a probability mass function (pmf ) ( see Figure 1.1).

1.4 Discrete Distributions

Discrete random variables are used to describe the random phenomenon in which

only integer values can occur. The following are some important distributions.

1.4.1 Bernoulli distribution

An experiment consists of n trials, each trial has a success or a failure and eachtrial is repeated under the same condition. Let Xj = 1 if the j

th experiment

resulted in success and let Xj = 0 , if the jth experiment resulted in a failure,

the sample space has a value 0 and 1. If the trials are independent, each trial hasonly two possible outcomes ( success or failure) and the probability of success

14

A. Santhakumaran


15/237

remains constant from trial to trial. For one trial the pmf

p(x) = x(1 )1x x = 0, 1, 0 < < 10 otherwise

is the Bernoulli distribution function.

From the above assumptions in a production process, X denotes the quality ofthe produced item, then X follows the Bernoulli random variable.

1.4.2 Binomial Distribution

Let X be a random variable, denotes the number of success in n Bernoullitrials. Then the random variable X is called a Binomial random variable withparameters n and . Here the sample space is {0, 1, 2, , n} and the pmfis

p(x) =

n!

x!(nx)! x(1 )nx x = 0, 1, , n, 0 < < 1

0 otherwise

In Binomial distribution, the mean is always greater than variance . If

X1, X2, , Xn are independent and identically distributed Bernoulli randomvariables, then

ni=1 Xi b(n, ) . The problems relating to tossing a coin

or throwing dice lead to Binomial distribution . In a production process, the

number of x defective units in a random sample of n units follows Binomialdistribution.

1.4.3 Geometric Distribution

A random variable X is related to a sequence of Bernoulli trials in which thenumber of trials (x + 1) to achieve the first success is

p(x) = (1 )x x = 0, 1, 2, , 0 < < 10 otherwise

It is the probability that the event {X = x} occurs, when there are x failuresfollowed by a success.

A couple decides to have any number of children until they have a male

child. If the probability of having a male child in their family is p , they haveto expect how many children they will have before the first male child is born.

X denotes the number of children of the couple. The probability that there arex female children preceding the first male child is born, is a Geometric randomvariable.

1.4.4 Negative Binomial Distribution

If X1, X2, , Xn are iid Geometric variables, then T = t(X) =ni=1

Xi

a Negative Binomial variate whose pmf is

p(t) =

(t+n1)!t!(n1)!

n(1 )t t = 0, 1, 0 otherwise

15



16/237

A random variable X is related to a sequence of Bernoulli trials in which xfailures preceding the nth success in (x + n) trials is given by

p(x) = (x+n1)!

(n1)!x!n(1

)x x = 0, 1, 2,

0 otherwise

This will happen if the last trial results in a success and among the previous

(n + x 1) trials there are exactly x failures. Note that if n = 1 , then p(x)is the Geometric distribution function. Negative Binomial distribution has Mean

< Variance . In a production process, the number of units that are required toachieve nth defective in x + n units follow Negative Binomial distribution.

1.4.5 Multinomial DistributionIf the sample space of a random experiment has been split into more than twomutually exclusive and exhaustive events then one can define a random vari-able which leads to Multinomial distribution. Let E1, E2, , Ek be k mu-tually exclusive and exhaustive events of a random experiment with respec-

tive probabilities 1, 2, , k, such that 1 + 2 + + k = 1 and0 < i < 1, i = 1, 2, , k, then the probability that E1 occurs x1 times, E2occurs x2 times, , Ek occurs xk times in n independent trials is knownas Multinomial distribution with pmf is given by

p1,2, ,k (x1, x2, , xn) =

n!x1!x2!xk!

x11 x22 xkk where

ki=1 xi = n

0 otherwise

If k = 2 , that is, the number of mutually exclusive events is only two, then theMultinomial distribution becomes a Binomial distribution as is given by

p1,2 (x1, x2) =

n!

x1!x2!x11

x22 where x1 + x2 = n and 1 + 2 = 1

0 otherwise

That is x2 = n x1 and 2 = 1 1 which implies

p1 (x1) =

n!

x1!(nx1)! x11 (1 1)nx1 0 < 1 < 1, x1 = 0, 1, , n

0 otherwise

Consider two brands A and B. Each individual in the population prefers brand

A to brand B with probability 1 , prefers B to A with probability 2 and isindifferent between brand A and B with probability 3 = 1 1 2 . Ina random sample of n individuals X1 prefers brand A, X2 prefers brand Band X3 prefers some other brand other than A and B. Then the three randomvariables follow a Trinomial distribution, i.e.,

p1,2,3 (x1, x2, x3) = P{X1 = x1, X2 = x2, X3 = x3}

=

n!x1!x2!x3! x

11 x

22 x

33 x1 + x2 + x3 = n0 otherwise

16

A. Santhakumaran


17/237

1.4.6 Discrete Uniform Distribution

A random variable X is said to follow uniform distribution on N points(x1, x2, , xN), if its pmf is given by

pN(x) = PN{X = xi} =

1N i = 1, 2, , N and N I+0 otherwise

A random experiment with complete uncertainty but whose outcomes are equal

probabilities may describe Uniform distribution. In a finite population of Nunits, one has to select any unit xi, i = 1, 2, , N from the population withsimple random sampling technique which has a discrete uniform distribution.

1.4.7 Hypergeometric Distribution

One situation in which Bernoulli trials are encountered is that in which an ob-

ject is drawn at random from a collection of objects of two types in a box. In

order to repeat this experiment so that the results are independent and identically

distributed, it is necessary to replace each object drawn and to mix the objects

before the next one is drawn. This process is referred to as sampling with re-placement. If the sampling is done no replacement of the objects drawn, the

resulting trial are still of the Bernoulli type but no longer independent.

For example, four balls are drawn one at a time, at random and no replace-

ment from 8 balls in a box, 3 black and 5 red. The probability that the third ball

drawn is black, i.e.,

P{ 3rd ball black} = P(RRB) + P(RBB) + P(BRB) + P(BBB)=

5

8 4

7 3

6+

5

8 3

7 2

6+

3

8 5

7 2

6+

3

8 2

7 1

6

=3

8

which is the same as the probability that the first ball drawn is black. It shouldnot be surprising that this probability for black ball is the same on the third draw

as on the first draw.

In general case, n objects are to be drawn at random, one at a time, froma collection of N objects, M of one kind and N M of another kind. Theone kind and of object will be thought of as success and coded 1; the other kind

is coded 0. Let X1, X2, , Xn denote the sequence of coded outcomes; thatis Xi is 1 or 0 according to whether the ith draw results in success or failure.The total number of success in n trials is just the sum of the X s ,

Sn = X1 + X2 + + Xnas it was in the case of independent identically distributed Bernoulli trials. That

is, the probability of a 1 on the ith trial is the same at each trial:

P{Xi = 1} = MN

i = 1, 2, , n

17



18/237

One can observe first that the probability of a given sequence of N objects is

1

N

1

N

1

1N

n + 1

The probability that an object of type 1 occurs in the ith position in the sequenceof N objects is

P{Xi = 1} = M(N 1)(N 2) (N n + 1)N(N 1) (N n + 2)(N n + 1)

=M

Ni = 1, 2, , n

where M is the number of ways of selecting the ith position with an objectcoded 1 and (N1)(N2) (Nn + 1) is the number of ways of selectingthe remaining (n 1) places in the sequence from the (N 1) remainingobjects. It does not matter whether the number of success among the n objects

drawn, one at a time, at random or that of simultaneously drawing n at random.The probability function of Sn is

P{Sn = k} =

Mk

N - Mn - k

Nn

k = 0, 1, 2, , min(n, M)

0 otherwise

The random variable Sn with the above probability function is said to have aHypergeometric distribution. The mean of the random variable Sn is easilyobtained from the representation of a Hypergeometric variable as a sum of the

Bernoulli trials. That is,

E[Sn] = E[X1 + X2 + + Xn]= E[X1] + E[Xn] + + E[Xn]= 1 P{X1 = 1} + 0 P{X1 = 0}

+ + 1 P{Xn = 1} + 0 P{Xn = 0}=

M

N+ + M

N=

nM

N

Variance of Sn = nM

N

N MN

N nN 1 if N I+ (1.1)

The probability at each trial that the object drawn is of the type of which there

are initially M is p = M

N

, then

Variance of Sn = npqN nN 1 if N I+ (1.2)

18

A. Santhakumaran


19/237

The above formula (1.2) differs from the formula (1.1) by the extra factor NnN1 .The variance of Sn = npq

NnN1 in the no replacement case and the variance

of Sn = npq in the replacement case for fixed p and fixed n , since the factorNnN1

1 as N becomes finitely many. Thus Hypergeometric distribution isexact where as Binomial distribution is approximate one.

50 students of the M.Sc. Statistics in a certain college are divided at random

into 5 batches of 10 each for the annual practical examination in Statistics. The

class consists of 20 resident students and 30 non - resident students. X denotesthe number of students in the first batch who appear the practical examination.

The Hypergeometric distribution is apt to describe the random variable X andhas the pmf

P{X = x} =

20x

3010 - x

5010

x = 0, 1, 2, , 10

0 otherwise

1.4.8 Poisson Distribution

Poisson random variable is used to describe rare events. For example number of

air crashes occurred on Monday in 3 pm to 5 pm. The pmf of Poisson randomvariable given as

p(x) =

e

x

x! > 0, x = 0, 1, 2, 0 otherwise

where is a parameter. One of the important properties of the Poisson dis-tribution is that mean and variance are the same and are equal to . IfX1, X2,

, Xn are iid Poisson random variables with parameter , then the

sum of the random variables

ni=1 Xi follows a Poisson distribution with pa-

rameter n .

After correcting 50 pages of the proof of a book, the proof readers find

that there are, on the average 2 errors per 5 pages. One would like to know the

number of pages with errors 0 , 1, 2, 3 in 10000 pages of the first print ofthe book. X denotes the number of errors per page; then the random variableX follows the Poisson distribution with parameter = 25 = .4.

1.4.9 Power series distribution

If a random variable X follows a Power series distribution, then its pmf is

P{X = x} = ax

x

f() x S; ax 0, > 00 otherwise

where f() is a generating function, i.e., f() =

xS axx, > 0 so that

f() is positive, finite and differentiable and S is a non - empty countablesubset of non - negative integers.

19



20/237

1.4.9 Particular cases.

(i) Binomial Distribution

Let = p1+p , f() = (1 + )n and S = {0, 1, 2, 3, , n} a set of non -

negative integers, then

f() =xS

axx

(1 + )n =n

x=0

axx

ax =n

x

Pp{X = x} =

nx

p1p

x[1 + p1p ]

n

= nxp

xqnx x = 0, 1, 2, , n0 otherwise

(ii) Negative Binomial Distribution

Let = p1p , f() = (1 )n and S = {0, 1, 2, }, 0 1 and n I+ . Now

f() =xS

axx

(1 )n =

x=0

axx

ax = (1)x - nx

= (1)x(1)x

n + x - 1

x

=

n + x - 1

x

P{X = x} =

n + x - 1

x

p

1+p

x

1 ( p1+p )

n=

n + x - 1

x

p

1 +p

x(1 +p)n

=

n + x - 1

x

px(1 +p)(n+x)

=-n

x

(p)x(1 +p)(n+x) x = 0, 1, 2,

(iii) Poisson distribution

20

A. Santhakumaran


21/237

Let f() = e and S = {0, 1, 2, }. Now

f() =

xSax

x

e =xS

axx

x=0

x

x!=

x=0

axx ax = 1

x!

P{X = x} = axx

f()=

1

x!

x

e=

ex

x!x = 0, 1, 2,

1.5 Continuous Distributions

Continuous random variable can be used to describe random phenomena in which

the variable X of interest can take any value x in some interval which has P{X =x} = 0 x in that interval.

1.5.1 Uniform Distribution

A random variable X is uniformly distributed at an interval [a, b], if its pdf isgiven by

pa,b(x) =

1

ba a x b0 otherwise

Note that P{x1 < X < x2} = F(x2) F(x1) = x2x1ba is proportional to thelength of the interval for all x1 and x2 satisfying a x1 x2 b . If randomphenomenon has complete unpredictability, then it can be described as uniform

distribution.

1.5.2 Normal Distribution

A random variable X with mean ( < < ) and variance 2

(> 0)has a Normal distribution if it has the pdf

p,2 (x) =

1

2e

122

[x]2 < x < 0 otherwise

The time of number of components of a random experiment can be thought of

as a Normal distribution. The time to assemble a product which is the sum of

the times required for each assembly operation may describe a Normal random

variable.

1.5.3 Exponential Distribution

A random variable X is said to be Exponentially distributed with parameter

> 0 , if its pdf is given by

p(x) =

ex x > 00 otherwise

21



22/237

The value of the intercept on the vertical axis is always equal to the value of .Note that all pdfs eventually intersect at , since the Exponential distributionhas its mode at the origin. The mean and standard deviation are equal in Ex-

ponential distribution. In a random phenomenon, the time between independent

events which have memory less property may appropriately follow Exponential

random variable. For example, the time between the arrivals of a large number

of customers who act independently of each other may fit adequately the data to

Exponential distribution.

1.5.4 Gamma Distribution

A function used to define the Gamma distribution is the Gamma function. A

random variable X follows a Gamma distribution, if

p, (x) =

exx1 x > 0, > 0, > 0

0 otherwise

where is called the shape parameter and is called the scale parameter.

ni=1

Xi

G(n, 1

) , if each Xi

exp( 1

) . The cumulative distributionfunction F(x) = P{X x} of a random variable X is given by

F(x) =

1

x (t)

1et dt x > 00 otherwise

1.5.5 Erlang Distribution

The pdf of the Gamma distribution becomes Erlang distribution of order kwhen = k an integer. When = k a positive integer, the cumulative distri-bution function F(x) is given by

F(x) =

1 k1i=0 ekx (kx)ii! x > 00 otherwise

which is the sum of Poisson terms with mean kx .

1.5.6 Weibull Distribution

A random variable X has a Weibull distribution if it has pdf

p,, (x) =

x

1exp[( x ) ] x

0 otherwise

The three parameters of the Weibull distribution are ( < < ) which isthe location parameter, ( > 0) which is the scale parameter and ( > 0)which is the shape parameter. When = 0 the Weibull pdf becomes

p,(x) =

(

x )

1exp[( x ) ] x 00 otherwise

When = 0 and = 1 , the Weibull distribution is reduced to Exponentialdistribution with pdf

p(x) =

1 e

x x 00 otherwise

22

A. Santhakumaran


23/237

1.5.7 Triangular Distribution

A random variable X has a Triangular distribution if its pdf is given by

pa,b,c(x) =

2(xa)(b

a)(c

a) a

x

b

2(cx)(cb)(ca) b < x c0 otherwise

where a b c. The mode occurs at x = b , since a b c, it followsthat 2a+c3 E[X] a+2c3 . The mode is used more often than the mean tocharacterize the Triangular distribution.

1.5.8 Empirical Distribution

An empirical distribution may be either continuous or discrete in nature. It is

used to establish a statistical model for the available data whenever there is a

discrepancy in the aimed distribution or whenever one can unable to arrive at a

known distribution.

(a) Empirical Continuous DistributionsThe time taken to install 100 machines is collected. The data are given in Table

1.1 which gives the number of machines together with time taken. For example,

30 machines have installed between 0 and 1 hour, 25 between 1 and 2 hour, 20

between 2 and 3 hour and 25 between 3 and 4 hour. X denotes time taken toinstall the machines.

Table 1.1 Distribution of the time taken to install the Machines

Durationof Hours Frequency p(x) F(x) = P

{X

x}0 x 1 30 .30 .30

1 < x 2 25 .25 .552 < x 3 20 .20 .753 < x 4 25 .25 1.00

(b) Empirical Discrete Distributions

At the end of the day, the number of shipments on the loading docks of an export

company are observed as 0, 1 , 2, 3, 4 and 5 with frequencies 23, 15, 12, 10, 25

and 15 respectively. Let X be the number of shipments on the loading docks ofthe company at the end of the day. Then X is a discrete random variable whichtakes the values 0 , 1, 2, 3, 4 and 5 with the distribution as given in Table 1.2.

Figure 1.1 is the Histogram of number of shipments on the loading docks of the

company.

23



24/237

Table 1.2 Distribution of number of shipments

Number ofshipments x Frequency P{X = x} F(x) = P{X x}0 23 .23 .23

1 15 .15 .38

2 12 .12 .50

3 10 .10 .60

4 25 .25 .85

5 15 .15 1.00

0 1 2 3 4 5

Figure 1.1 Histogram of shipmentsNumber of shipments

5

10

1520

25

FREQ

U

ENCY

1.6 Diagnosis of distributions

The probability of an item whose value of the variable is constant increment, is

an Exponential distribution. This is apt to fit the data. If the probability of a variable

of an item whose value of the variable is either positive or negative, then a Normal

distribution is appropriate to the data. When the variable of interest seems to follow

the Normal probability distribution, the random variable is restricted to be greater than

or less than a certain value. The truncated Normal distribution will be adequate to fit

the data. The Gamma and Weibull distributions are also used to describe the data. The

Exponential distribution is a special case of both the Gamma and Weibull distributions.

The difference between the Exponential, Gamma and Weibull distributions involve the

location of modes of the pdf s and the shapes of their tails will be in proportion to

24

A. Santhakumaran


25/237

large and short times. The Exponential distribution has its mode at the origin but the

Gamma and Weibull distributions have their modes at some point( 0 ) which is afunction of the parameters values selected. The tail of the Gamma distribution is long,

like an Exponential distribution while the tail of the Weibull distribution may decline

more rapidly or less rapidly than that of an Exponential distribution. In practice, if

there are higher value of the variable than an Exponential distribution, it can account

for a Weibull distribution which provides a better distribution of the data.

Illustration 1.6.1

Sixteen equipments were produced and placed on test and the Table 1.3 gives the

length of time intervals between failures in hours.

Table 1.3 Equipments time between failuresEquipment

Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Timebetweenfailures 19 12 16 1 15 5 10 1 46 7 33 25 4 9 1 10

For the sake of simplicity in processing the data , one can set up the ordered set as

given blow:

Table 1.4 Ordered set of equipment time between failuresEquipment

Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Time

betweenfailures 1 1 1 4 5 7 9 10 10 12 15 16 19 25 33 46

On this basis, one may construct a Histogram to judge the pattern of the data in Table

1.4. An approximate value of the interval can be determined from the formula.

t =maximum value - minimum value

1 + 3.3log10 N

where the maximum and minimum are the values in the ordered set and N is the totalnumber of items of the order statistics. In this case maximum value is 46 , minimum

value is 1 and N is 16. Thus t = 451+3.3 log10 16= 9.05 10 = width of the class

interval.

Table 1.5 Frequency DistributionTime

interval 0 - 10 10 - 20 20 - 30 30 - 40 40 - 50Number ofEquipment 9 4 1 1 1

25



26/237

Histogram is drawn based on the frequency distribution in Table 1.5 and is given in

Figure 1.2.

Time interval50403020100

9

4

1 1 1

Figure 1.2 Histogram of time to failures

Numberof

Equipment

The Histogram reveals that the distribution could be Negative Exponential or the

right portion of the Normal distribution. Assume the time to failure follows Exponen-

tial distribution of the form,

p(x) =

ex > 0, x > 00 otherwise

How for the assumption is valid has to be verified? The validity of the assumption

is tested by the 2 test of goodness of fit.

Table 1.6 Distribution of time to failures

Interval pi

Expected

frequency

E

Observedfrequency

O

0 - 10 .5262 8.41 8 910 - 20 .2493 3.98

4 4

20 - 30 .1181 1.886 2 130 - 40 .0559 .8944 1 140 - 50 .0265 .454 1 1

26

A. Santhakumaran


27/237

where pi =xi+1

xiexdx = exi exi+1 , i = 0, 10, 20, , 50. If the cell

frequencies are less than 5, then it can be made 5 or more than 5. One may get two

classes only, i.e, the expected frequencies are equal to 8 each and the corresponding

observed frequencies are 9 and 7 respectively. The 2 test of goodness of fit failsto test the validity of the assumption that the sample data come from an Exponential

distribution with parameter = 113.38 = .0747 = failure rate per unit hour where themean life time of the equipments = 21416 = 13.38 hours. To test the validity of theassumption that the time to failure follows an Exponential distribution, consider the

likelihood function of the cell frequencies of o1 = 9 and o2 = 7 is

L =

n!

o1!o2!

e1n

o1 e2n

o2 o1 + o2 = n0 otherwise

Under H0 the likelihood function follows a Binomial probability law b(16, p) wherep = e1n . To test the hypothesis that H0 : the fit is the best one vs H1 : the fit is not thebest one. It is equivalent to test the hypothesis that H0 : p .5 vs H1 : p > .5 TheUMP level = .05 test is given by

(x) =

1 ifx > 11.17 ifx = 110 otherwise

The observed value is 9 which is less than 11. There is no evidence to reject the

hypothesis H0 . The data come from an Exponential distribution with 5% level ofsignificance. Thus time to failure of the equipments follows an Exponential distribu-

tion. One may conclude that on an average the equipment would be operated for 13.38

hours without failure.

1.7 Quantile - Quantile plot

The construction of Histograms and the recognition of a distributional shape arenecessary ingredients for selecting a family of distributions to represent a sample data.

A Histogram is not useful for evaluating the fit of the chosen distribution. When there

are a small number of data points ( 30 ), a Histogram can be rather ragged. Furtherperception of the fit depends on the width of the Histogram intervals. Even if the

intervals are well chosen, grouping the data into cells makes it difficult to compare a

Histogram to a continuous pdf . A quantile - quantile (q - q) plot is a useful tool forevaluating distribution fit that does not suffer from these problems.

If X is a random variable with cumulative distribution F(x) , then q quantile ofX is that value y such that F(y) = P{X y} = q for 0 < q < 1 . When F(x)has an inverse y = F1(q) . Let x1, x2, , xn be a sample observations of X.Order the observations from the smallest to the largest and denote these as yj , j = 1to n where y1

y2

yn . One can denote j the rank or order number.

Therefore j = 1 for the smallest and j = n for the largest. The q - q plot is based on

the fact that yj is an estimate of the (j 12

n ) quantile of X, i.e, yj is approximately

F1

j 12n

.

27



28/237

A distribution with cumulative distribution function F(x) is a possible represen-tation of the random variable X. If F(x) is a member of an appropriate family of

distributions, then a plot of yj versus F1

j 12

n will be approximately a straightline.

If F(x) is from an appropriate family of distributions and also has appropriateparameter values, then the line will have slope 1. On the other hand, if the assumed

distribution is inappropriate, the points will deviate from a straight line in a systematic

manner. The decision whether to accept or reject some hypothesized distribution is

subjective.

In the construction of q - q plot, the following should be borne in mind.

(i) The observed values will never fall exactly on a straight line. (ii) The ordered values

are not independent, since they have been ranked. (iii) The variances of the extremes

are much higher than the variances in the middle of the plot. Greater discrepancies can

be accepted at the extremes. The linearity of the points in the middle of the plot is more

important than the linearity at the extremes.

Illustration 1.7.1

A sample of 20 repairing times of electronic watch was considered. The repairing

time X is a random variable. The values are in seconds on the random variable X.The values are arranged in the increasing order of magnitude as in Table 1.7.

Table 1.7 Repairing times of electronic watch

j Value j Value j Value j Value1 88.54 6 88.82 11 88.98 16 89.26

2 88.56 7 88.85 12 89.02 17 89.30

3 88.60 8 88.90 13 89.08 18 89.35

4 88.64 9 88.95 14 89.18 19 89.41

5 88.75 10 88.97 15 89.25 20 89.45

28

A. Santhakumaran


29/237

Table 1.8 Normal Quantile

jj1220

yj =F1(

j1220

)xj = yj .08

+ 88.993 jj 1220

yj =F1(

j1220

)xj = yj .08

+ 88.993

1 .025 -1.96 88.84 11 .525 .06 89.00

2 .075 - 1.41 88.88 12 .575 .18 89.01

3 .125 - 1.13 88.90 13 .625 .31 89.02

4 .175 - 0.93 88.92 14 .675 .45 89.03

5 .225 - 0.75 88.94 15 .725 .60 89.04

6 .275 -.60 88.95 16 .775 .75 89.05

7 .325 -.45 88.96 17 .825 .93 89.07

8 .375 -.31 88.97 18 .875 1.13 89.08

9 .425 - .18 88.98 19 .925 1.41 89.11

10 .475 -.06 88.99 20 .975 1.96 89.15

The ordered observations in Table 1.8 are then plotted versus F1

j 12n

for

j = 1, 2, , 20 where F(.) is the cumulative distribution function of the Normalrandom variable X with mean 88.993 seconds, and standard deviation .08 seconds to

obtain the q - q plot. The plotted values are shown in Figure 1.3. The general per-ception of a straight line is quite clear in the q - q plot, supporting the hypothesis of a

normal distribution.

Normalquantile

yj

Time xjFigure 1.3 q q plot of the repairing times

Note: The diagnosis of statistical distributions of real life problems are not exact

but at best they represent reasonable approximations.

Problems

1.1 The mean and variance of the number of defective items drawn randomly oneby one with replacement from a lot are found to be 10 and 6 respectively. The

distribution of the number of defective items is:

(a) Poisson with mean 10

29



30/237

(b) Binomial with n = 25 and p = 0.4(c) Normal with mean 10 and variance 6

(d) None of the above

1.2 If X is a Poisson random variate with mean 3, then P{|X 3| < 1} will be:(a) 12 e

3 (b) 3e3 (c) 4.5e3 (d) 27e3

1.3 Let U(1), U(2), , U(n) be the order statistics of a random sampleU1, U2, , Un of size n from the uniform (0, 1) distribution. Then the con-ditional distribution of U1 given U(n) = u(n) is given by:(a) Uniform on (0, u(n))

(b) P{U1 = u(n)} = 1n and probability n1n is uniformly distributed over(0, u(n)) .

(c) Beta

1n

, n1n

(d) Uniform (0, 1)

1.4 A biased coin is tested 4 times or until a head turns up, whichever occurs earlier.

The distribution of the number of tails turning up is:(a) Binomial (b) Geometric (c) Negative Binomial (d) Hypergeometric

1.5 If X and Y are independent Exponential random variables with the same mean , then the distribution of min(X, Y) is :(a) Exponential with mean 2(b) Exponential with mean (c) not Exponential with mean (d) Exponential with mean 2

1.6 The 2 goodness of fit is based on the assumption that a character under study is(a) Normal (b) Non - Normal (c) any distribution (d) not required

1.7 The exact distribution of 2 goodness of fit for each experiment unit is classified

into one of more k categories of a random sample of size n depends on :(a) Hypergeometric distribution

(b) Normal distribution

(c) Multinomial distribution

(d) Binomial distribution

1.8 If X1 b(n1, 1) , X2 b(n2, 2) and X1 , X2 are independent, then thesum of the variates X1 + X2 is distributed as :(a) Hypergeometric distribution

(b) Binomial distribution

(c) Poisson distribution


1.9 If X1

b(n1

, ) , X2

b(n2

, ) and X1

, X2

are independent, then the sum

of the variates X1 + X2 is distributed as :(a) Hypergeometric distribution


30

A. Santhakumaran


31/237



1.10 If X1

P(1), X2

P(2) and X1 , X2 are independent,then the sum of

the variates X1 + X2 is distributed as :(a) Hypergeometric distribution




1.11 The skewness of a Binomial distribution will be zero if:

(a)p < .5 (b) p > .5 (c) p = .5 (d) p = .51.12 If the sample size n = 2 , the students t - distribution reduces to:

(a) Normal distribution

(b) F - distribution(c) 2 - distribution(d) Cauchy distribution

1.13 The reciprocal property of Fn1,n21 distribution can be expressed as:(a) Fn2,n1 (1 ) = 1Fn1,n2 ()(b) P{Fn1,n2 () c} = P

Fn2,n1 () 1c

(c) Fn2,n1 (1 2 ) = 1Fn1,n2 ( 2 )(d) All the above

1.14 The distribution of which the moment generating function is not useful in finding

the moments is :

(a) Binomial distribution

(b) Negative Binomial distribution

(c) Hypergeometric distribution

(d) Geometric distribution

1.15 Probability of selecting a unit from a population of N units in a simple randomsampling technique is a :

(a) Bernoulli distribution


(c) Geometric distribution

(d) discrete Uniform distribution

1.16 A production process is a sequence of Bernoulli trials, the number of x defectiveunits in a sample of n units is a:(a) Bernoulli distribution



(d) Hypergeometric distribution

1.17 A random variable X is related to a sequence of Bernoulli trials in which thenumber of trials (x + 1) to achieve the first success, then the distribution of X

31



32/237

is :

(a) Bernoulli distribution




1.18 If X1, X2, , Xn are iid Geometric variables, thenn

i=1 Xi follows:(a) Negative Binomial distribution




1.19 A random variable X is related to a sequence of Bernoulli trials in which xfailures preceding the nth success in (x + n) trials is a :(a) Binomial distribution

(b) Multinomial distribution

(c) Negative Binomial distribution


1.20 If a random experiment has only two mutually exclusive outcomes of a Bernoulli

trial, then the random variable leads to:

(a) Binomial distribution

(b) Multinomial distribution

(c) Negative Binomial distribution


1.21 A box contains N balls M of which are white and N M are red. If Xdenotes the number of white balls in the sample contains n balls with replace-ment, then X is a :(a) Binomial variate

(b) Bernoulli variate

(c) Negative Binomial variate

(d) Hypergeometric variate

1.22 The number of independent events that occur in a fixed amount of time may

follow:

(a) Exponential distribution

(b) Poisson distribution

(c) Geometric distribution

(d) Gamma distribution

1.23 A power series distribution

P{

X = x}

= axx

f() x S, ax 00 otherwise

where f() = (1 + )n, = p(1p) and S = {0, 1, 2, } . Then the randomvariable X has

32

A. Santhakumaran


33/237

(a) Geometric distribution

(b) Bernoulli distribution

(c) Binomial distribution

(d) Negative Binomial distribution

1.24 The given probability function p(x) = 23x+1 for x = 0, 1, 2, 3, , represents:(a) Negative Binomial distribution


(c) Bernoulli distribution


1.25 Dinesh Kumar receives 2, 2, 4 and 4 telephone calls on 4 randomly selected days.

Assuming that the telephone calls follow Poisson distribution, the estimate of the

number of telephone calls in 8 days is:

(a) 12 (b) 3 (c) 24 (d) none of the above

1.26 The exact distribution of 2 goodness of fit test for each experiment units isclassified into one of two categories of a random sample of size

ndepends on :

(a) Hypergeometric distribution

(b) Normal distribution


(d) Binomial distribution

1.27 The pmf of a random variable X is

p(x) =

k=0(1)k

k + x

k

x+k

(x+k+1) x = 0, 1, = 0 otherwise

It is known as

(a) Binomial ( b) Negative Binomial (c) Poisson (d) Geometric

33



34/237

2. CRITERIA OF POINT ESTIMATION

2.1 Introduction

In real life applications, determining appropriate distributions from the randomsample is a major task. Faulty assumption of distributions will lead to misleading rec-

ommendations. As a family of distributions induced by a parameter has been selected,

the next step is to estimate the parameters of the distribution. The criteria of the point

estimators for many standard distributions are described in this chapter.

The set of all admissible values of parameters of a distribution is called the parame-

ter space . Any member from the parameter space is called parameter. For example,a random variable X is assumed to follow a normal distribution with mean andvariance 2 . The parameter space = {(, ) | < < , 0 < 2 < } .Suppose a random sample X1, X2, X3, , Xn is taken on X. Here a statisticT = t(X) from the sample X1, X2, , Xn which gives the best value for the pa-rameter . The particular value of the Statistic T = t(x) = x based on the valuesx1, x2,

, xn is called an estimate of . If the statistic T = X is used to estimate

the unknown parameter , then the sample mean is called an estimator of . Thus anestimator is a rule or a procedure to estimate the value of . The numerical value x iscalled an estimate of .

2.2 Point Estimator

Let X1, X2, , Xn be n independent identically distributed ( iid ) randomsample drawn from a population with probability density function (pdf ) p(x) , . The statistic T = t(X) is said to be a point estimator of , if the func-tion T = t(X) has a single point (X) which maps to in the parameter space .

2.3 Problems of Point EstimationThe problems involved in point estimation are

to select or choose a statistic T = t(X) . to find the distribution function of the statistic T = t(X) . to verify the selected statistic satisfies the criteria of the point estimation .

2.4 Criteria of the Point Estimation

The criteria of the point estimation are

(i) Consistency

(ii) Unbiasedness

(iii) Sufficiency and

(iv) Efficiency

34

A. Santhakumaran


35/237

2.5 Consistency

Consistency is a convergence property of an estimator. It is an asymptotic or large

sample size property. Let X1, X2, , Xn be iid random sample drawn from a pop-ulation with common distribution P, . An estimator T = t(X) is consistentfor if for every > 0 and for each fixed , P{|T| > } as n ,i.e. T

P as n for fixed .Example 2.1 Let X1, X2, , Xn be a random sample drawn from a normal

population with mean and known variance 2 . The statistic T = X is chosen for

an estimator of the parameter . The statistic X N( , 2n ). To test the consistencyof the estimator, consider for every > 0 and fixed ,

P{|X | > } = 1 P{|X | < }= 1 P{ < X < }= 1 P{

n/ 0 and fixed ,P{|X | > } = 1 P{ < X < }

= 1 P{ < X < + }

= 1 +

1

1

1 + (x )2 dx

since X Cauchy distribution with parameter = 1

1

1

1 + z2dz where x = z

= 1 1

[tan1(z)]

= 1 2

tan1() since tan1() = tan1()

35



36/237

Thus P{|X | > } 0 as n i.e., X

P

as n . For Cauchy population the sample mean X is not aconsistent estimator of the parameter .

2.6 Sufficient condition for consistency

Theorem 2.1 If {Tn}n=1 is a sequence of estimator such that E[Tn] andV[Tn] 0 as n , then the statistic Tn is a consistent estimator of the param-eter .

Consider E[Tn ]2 = E (Tn E[Tn] + E[Tn] )2= E (Tn E[Tn])2 + {E[Tn ]}2= V[Tn] + {E[Tn ]}2

since E (Tn E[Tn]) = 0

By Chebychevs inequality

P{|Tn | > } 12

E[Tn ]2

12

V[Tn] + {E[Tn ]}2

0 as n

since V[Tn] 0 and E[Tn] as n . Tn is a consistent estimator of .

Remark 2.2 The conditions are only sufficient, but not necessary. Since if

{Xn}n=1 is a sequence of iid random variables from a population with finite mean = E[X] , then X converges to in probability for each fixed . It is knownas Khintchins Weak Law of Large Numbers, i.e., sample mean X finitely exists, is aconsistent estimator for the population mean which does not require the conditionV[X] 0 as n for every fixed . Thus consistency follows the ex-istence of the expectation of the statistic and the assumption of finite variance of the

statistic is not needed.

For illustration the Cauchy pdf is

p(x) =

1

11+x2 < x <

0 otherwise

The mean E[X] does not exist finitely, i.e.,

E[X] =

1

x

1 + x2dx

36

A. Santhakumaran


37/237

is divergent. But the Cauchy Principle value

1

lim

t

t

t

x

1 + x2dx =

1

2lim

t

t

t

2x

1 + x2dx

=1

2lim

t

log(1 + x2)tt

=1

2lim

t[log(1 + t2) log(1 + t2)]

= 0

The Cauchy Principle value 0 is taken as the mean of the Cauchy distribution. Thus the

Cauchy distribution has not the mean finitely exist. Hence for the Cauchy population,

the sample mean X is not a consistent estimator of the parameter .Example 2.3 If X1, X2, , Xn is a random sample drawn from a normal popu-

lation N( 0, 2 ). Show that 13nn

k=1 X4k is a consistent estimator of

4.Let T = 13n

nk=1 X

4k .

E4 [T] =1

3n

nk=1

E4 [X4k ]

=1

3n

nk=1

E4 [Xk 0]4 since E[Xk] = 0 k = 1, 2,

=1

3nn4 =

1

3n3n4 since 4 = 3

4 where

2n = 1 3 5 (2n 1)2n n = 1, 2, = 4

V4

[T] =

1

(3n)2

nk=1 V

4

[X

4

]

=1

(3n)2

nk=1

E4 [X

4k ]

2 E4 [X4k ]2

=1

(3n)2n[8 24]

=1

32n[1058 (34)2] since 8 = 1 3 5 7 8

=1

32n968 0 as n .

Thus T is a consistent estimator of 4 .Example 2.4 Let

X1, X2, Xnbe a random sample drawn from a population

with rectangular distribution (0, ), > 0 . Show that (ni=1 Xi) 1n is a consistentestimator of e1 .

37



38/237

Let GM = (n

i=1 Xi)1n Xi > 0, i = 1, 2, , n .

loge GM =1

n

n

i=1

log Xi

E[log X] =1

0

log xdx

=1

[x log x]0

0

dx

=1

log lim

x0x log x

= log 1

Since limx0

x log x = limx0

log x1x

= limx0

1x

1x2= 0

E[log X]2

=

1

0 (log x)2

dx

=1

[x(log x)2]0

1

0

2xlog x

xdx

= (log )2 1

limx0

x(log x)2 2

[ log ]= (log )2 2log + 2 since lim

x0x(log x)2 = 0

V[log X] = (log )2 2log + 2 (log 1)2 = 1

V[log GM] =1

n2

ni=1

V[log Xi] =1

n

V[log GM]

0 as n

,

> 0

Thus loge GM is a consistent estimator of log 1 , i.e., GM is a consistent estimatorof e1 .

Example 2.5 Let X1, X2, , Xn be iid random sample drawn from a pop-ulation with E[Xi] = and V[Xi] =

2, i = 1, 2, , n. Prove that

38

A. Santhakumaran


39/237

2n(n+1)

ni=1 iXi is a consistent estimator of .

E n

i=1

iXi = E[X1 + 2X2 + + nXn]= + 2 + + n= [1 + 2 + + n]=

n(n + 1)

2

2

n(n + 1)E

n

i=1

iXi

= ,

V

n

i=1

iXi

=

ni=1

i2V[Xi]

= 2n

i=1

i2

= 2n(n + 1)(2n + 1)

6

V

2

n(n + 1)

ni=1

iXi

=

2

3

(2n + 1)

n(n + 1)2 0 as n

Thus 2n(n+1)n

i=1 iXi is a consistent estimator of .

Consistent estimator is not unique

Example 2.6 Let T = max1in{Xi} be the nth order statistic of a randomsample of size n drawn from a population with a uniform distribution on the interval

( 0, ). The pdf of T is

p(t) =

ntn1

n 0 < t < , > 00 otherwise

E[T] =n

n

0

tndt =n

n + 1

E[T2] =

n2

(n + 2), V[T] =

n2

(n + 2)(n + 1)2

Thus E[T] and V[T] 0 as n . T is a consistent estimator of . AlsoE

(n+1)

n T

= and V[(n+1)

n T] =2

n(n+2) 0 as n , i.e., (n+1)n T is

also a consistent estimator of . The statistic T and

(n+1)

n T are the two consistentestimators of the same parameter . Thus consistent estimator is not unique.

39



40/237

2.7 Invariance Property of Consistent Estimator

If T = t(X) is a consistent estimator of , then anT, T + cn, and anT + cnare also consistent estimators of , where an = 1 +

kn , k and an 1 and

cn 0 as n for every fixed . In general, we have the Theorem 2.2.Theorem 2.2 If Tn = tn(X) is a consistent estimator of () and (()) is acontinuous function of () , then (Tn) is a consistent estimator of (()) .

Proof Given Tn = tn(X) is a consistent estimator () , i.e., TnP () as

n .Therefore for given > 0, > 0 , there exist a positive integer n N(, ) suchthat

P{|Tn ()| < } > 1 n NAlso (.) is a continuous function , i. e., For every such that{|(Tn) (())|} < 1 whenever |Tn ()| < i.e., |Tn ()| < |(Tn) (())| < 1

For any two events A and B if A B , then A B .Therefore P(A) P(B), i.e., P(B) P(A) . Let A = {|Tn ()| < } andB = {|(Tn) (())| < 1} thenP{(Tn) (())| < 1} P{|Tn ()| < }i.e.,P{|(Tn) (())| < 1} 1 n N (Tn) P (()) asn .i.e.,(Tn) is a consistent estimator of (())

Example 2.7 Suppose T = t(X) is a statistic with pdf p(x) for > 0, .Prove that T2 = t2(X) is a consistent estimator of 2 , if T = t(X) is a consistentestimator of .

Given T = t(X) is a consistent estimator of .By the definition of consistent estimator, P{|T | < } 1 as n , for >0,

, consider

P{|T | < } = P{ < T < + }= P{( )2 < T2 < ( + )2}= P{2 < T2 2 2 < 2}= P{ < T2 2 2 < }

where = 2

= P{ < T 2 < }where T = T2 2

= P{|T 2| < } 1 as n

T = T2 2 T2 as n since 0 as n ... P{|T2 2| < } 1 as . Thus T2 is a consistent estimator of 2 .

40

A. Santhakumaran


41/237

2.8 Unbiased Estimator

For any statistic g(T) , if the mathematical expectation is equal to a parameter () ,then g(T) is called an unbiased estimator of the parameter () ,

i.e., E[g(T)] = (), .

Otherwise, the statistic g(T) is said to be a biased estimator of () . The unbiasedestimator is also called zero bias estimator. A statistic g(T) is said to be asymptoticallyunbiased estimator if E[g(T)] () as n , .

Example 2.8 A random variable X has the pdf

p(x) =

2x if0 < x < 1(1 ) if1 x < 2, 0 < < 10 otherwise

Show that g(X) , a measurable function of X is an unbiased estimator of if and

only if 10 xg(x)dx =

1

2and 2

1 g(x)dx = 0.Assume g(X) is an unbiased estimator of , i.e.,

E[g(X)] = 10

g(x)2xdx +

21

g(x)(1 )dx =

10

2xg(x)dx 2

1

g(x)dx

+

21

g(x)dx =

1

0

2xg(x)dx 2

1

g(x)dx = 1 and

2

1

g(x)dx = 0

i.e.,

10

xg(x)dx =1

2and

21

g(x)dx = 0

Conversely,1

0xg(x)dx = 12 and

21

g(x)dx = 0, then g(X) is an unbiased esti-mator of .

E[g(X)] =

10

2xg(x)dx +

21

(1 )g(x)dx

= 2 1

0

xg(x)dx + (1 ) 2

1

g(x)dx

= 21

2+ (1 ) 0

=

41



42/237

Thus g(X) is an unbiased estimator of .Example 2.9 If T denotes the number of successes in n independent and identicaltrials of an experiment with probability of success . Obtain an unbiased estimator of2 and (1

), 0 < < 1.

Let Xi b(1, ), i = 1, 2, , n , then T = ni=1 Xi b(n, ) . If g(T) isthe unbiased estimator of () = (1 ) , then E[g(T)] = (1 )

nt=0

g(t)cnt t(1 )nt = (1 )

nt=0

g(t)cnt

1 t

= (1 )1n

Consider =

1 =

1 +

...

n

t=0

g(t)cnt t =

1 + 1

1 + 1n

= (1 + )n2

= [1 + cn21 + cn32

2 + + n]Equating the coefficient oft on both sides

g(t)cnt = cn2t1

g(t) =(n 2)!

(t 1)!(n t 1)!t!(n t)!

n!

=(n 2)!t(t 1)!(n t)(n t 1)!(t 1)!n(n 1)(n 2)!(n t 1)!

=t(n t)

n(n 1) , if n = 2, 3,

Thus the unbiased estimator of (1 ) isT(n T)n(n 1) n = 2, 3,

42

A. Santhakumaran


43/237

Let the unbiased estimator of 2 be given by

E[g(T)] = 2

nt=0

g(t)cnt

1 t

(1 )n = 2n

t=0

g(t)cnt t = 2(1 + )n2

= 2[1 + cn21 + + cn2t t + + n2] g(t)cnt = c

n2t2

g(t) = (n 2)!t!(n t)!(t 2)!(n t)!n!

=(n 2)!t(t 1)!(t 2)!(t 2)!n(n 1)(n 2)!

=

t(t

1)

n(n 1) n = 2, 3,

Thus the unbiased estimator of 2 is

g(T) =T[T 1]n(n 1) n = 2, 3,

Example 2.10 Obtain an unbiased estimator of 1 , given a sample observation from

a Geometric population with pmf

p(x) =

(1 )x1 x = 1, 2, 3, , 0 < < 10 otherwise

E[g(X)] =1

x=1

g(x)(1 )x1 = 1

x=1

g(x)(1 )x = (1 )2

Take 1 = = 1

x=1

g(x)x = (1 )2

= (1 + 2 + 32 +

+ xx1 +

)

g(x) = x x = 1, 2, 3,

Thus g(X) = X is the unbiased estimator of 1 .

43



44/237

Unbiased estimator is not exist

Example 2.11 Assume X b(1, ), 0 < < 1. If a single observation x of Xfrom a Bernoulli population, then there is no unbiased estimator exist for 2 .

p(x) =

(1 )1x x = 0, 1 and 0 < < 10 otherwise

Let there be an unbiased estimator for 2 say g(X) . That is,

E[g(X)] = 2

1x=0

g(x)x(1 )1x = 2

g(0)(1 ) + g(1) = 2

[g(1) g(0)] + g(0) = 2 g(1) = 0 and g(0) = 0 i.e., g(x) = 0 for x = 0, 1.Thus the value of 2 is 0 for x = 0 or x = 1 . But the value of 2 lies between 0 to1. The unbiased estimator of 2 does not exist.

Example 2.12 If X b(n, ) , then show that there exist no unbiased estimatorof the parameter 1

Consider E[g(X)] =1

n

i=0

g(x)n!

x!(n x)!

1 x

(1 )n = 1

ni=0

g(x)n!

x!(n x)! x =

(1 + )n+1

where = 1g(x) n!x!(nx)!

x g(0) as 0 and (1+)n+1 as 0 or 0Thus there is no unbiased estimator exist of the parameter 1 .

Unbiased estimator is unique

Example 2.13 A random sample X is drawn from a Bernoulli population b(1, ), ={ 14 , 12} . Then there exists an unique unbiased estimator of 2 .

Let E[g(X)] = 2

1x=0

g(x)x

(1 )1

x

= 2

When =1

4 3g(0) + g(1) = 1

4(2.1)

44

A. Santhakumaran


45/237

When =1

2 g(0) + g(1) = 1

2(2.2)

Solving the equations (2.1) and (2.2) for g(0) and g(1) , one gets the values of g(0) =

1

8

and g(1) = 5

8

,

i.e., g(x) =

18 for x = 058 for x = 1

Thus the unbiased estimator of 2 is g(X) = X which is unique.

Unbiased estimator is not unique

Example 2.14 Let X1, X2, , Xn be a iid random sample drawn from a popu-lation with Poisson distribution P() . g1(X) = X and g2(X) =

1n

ni=1(Xi X)2

are the two unbiased estimators of . Consider a statistic g(X) = g1(X) + (1 )g2(X), , 0 < < 1 . Then E[g(X)] = and which is notunique. Thus unbiased estimator is not unique.

Example 2.15 Show that the mean X of a random sample of size n drawn from

a population with probability density function

p(x) =

1

ex 0 < x < , > 0

0 otherwise

is an unbiased estimator of and has variance 2

n .

Let T =n

i=1 Xi G(n, ). The pdf of T is

p(t) =

1

nn e t tn1 0 < t < , > 0

0 otherwise

E[T] =

0

1

nne

1 ttn+11dt

= n

E

n

i=1

Xi

= n > 0

E[nX] = n > 0 E[X] = > 0

E[T2] = n(n + 1)2 > 0

V[T] = n2 > 0

... V[X] = V

ni=1 Xi

n

=1

n2V[T]

=1

n2n2 =

2

n

45



46/237

Example 2.16 Let X1, X2, , Xn be a random sample drawn from a normal pop-ulation with mean zero and variance 2, 0 < 2 < . Show that

ni=1 X

2i

n is an

unbiased estimator of 2 and has variance 24

n .

Define ns2 =n

i=1 X2i , then Y = ns2

2 2 distribution with n degreesof freedom , i.e., Y G( n2 , 12 ).

p(y) =

1

2n2 n2

e12 yy

n2 1 0 < y <

0 otherwise

E[Y] =

0

1

212 n2

e12 yy

n2 +11dy

=1

2n2 n2

( n2 + 1)

( 12 )n2 +1

= n

E[Y2] = n2 + 2nV[Y] = 2n

But Y =ns2

2

... E2

ns2

2

= n

E2 [s2] = 2

Thus

X2in is an unbiased estimator of

2 .

V2 ns2

2 = 2nn2

4V2 [s

2] = 2n

V2 [s2] =

24

n

Example 2.17 Let Y1 < Y2 < Y3 be the order statistics of a random sample ofsize 3 drawn from an uniform population with pdf

p(x) =

1 0 < x < 0 otherwise

Show that 4Y1 and 2Y2 are unbiased estimators of . Also find the variance of theseestimators.

The pdf of Y1 is

p(y1) =

3!

1!2!1

y1

1 dx

20 < y1 <

0 otherwise

46

A. Santhakumaran


47/237

p(y1) =

3 [1 y1 ]2 0 < y1 < 0 otherwise

E[Y1] = 3

0y1(1 y1

)2dy1

=3

10

t(1 t)2dt where y1 = t

= 3

10

t21(1 t)31dt

= 323

5=

4 > 0

Similarly E[Y2

1 ] =2

10and V[Y1] =

32

15

... V[4Y1] =32

5

The pdf of Y2 is

p(y2) =3!

1!1!1!

y20

1

dx

1

y2

1

dx

p(y2) =

6

2 y2[1 y2 ] 0 < y2 < 0 otherwise

.. E[Y2] =2

2Y2 is an unbiased estimator of and E[Y2] = 3210 and V[Y2] = 2

20

V[2Y2] = 25Example 2.18 Let Y1 and Y2 be two independent and unbiased estimators of .

If the variance of Y1 is twice the variance of Y2 , find the constant k1 and k2 so thatk1Y1 + k2Y2 is an unbiased estimator of with smaller possible variance for such alinear combination.

Given E[Y1] = and E[Y2] = and V[Y1] = 22 and

47



48/237

V[Y2] = 2 . Also E[k1Y1 + k2Y2] =

k1E[Y1] + k2E[Y2] =

k1 + k2 = 1

i.e., k2 = 1 k1Consider = V[k1Y1 + k2Y2]

= k21 V[Y1] + k22V[Y2]

= k21 22 + (1 k1)22

= 3k212 2k12 + 2

Differentiate twice this with respective to k1d

dk1= 6k1

2 22

d2

dk21= 62

For minimum ddk1

= 0 and d2

dk21

> 0

6k12 22 = 0i.e., k1 =

1

3and k2 =

2

3

Thus 13 Y1 +23 Y2 has minimum variance.

Consistent estimator need not be unbiased

Example 2.19 Let X1, X2, , Xn be a sample of size n drawn from a normalpopulation with mean and variance 2 . Define s2 = 1n

ni=1(Xi X)2 , then

Y = ns2

2 2 distribution with (n 1) degrees of freedom and Y G( n12 , 12 ) .It has the pdf

p(y) =

1

2n1

2 n12e

12 yy

n12 1 0 < y <

0 otherwise

48

A. Santhakumaran


49/237

E[Yr] =

0

1

2n1

2 n12e

12 yy

n12 +r1dy

=1

2n

12 n12

n1

2 + r( 12 )

n

12 +r

=2r

n12

n 1

2+ r

When r = 1

E[Y] =2

n12

n 12

n 1

2= n 1

... E2

ns2

2

= n 1

E2 [s2] = n 1n 2

and V2 [s2] =

2(n 1)n2

4

Thus E2 [s2] 2 and V2 [s2] 0 as n

.. 1n

ni=1(Xi X)2 is a consistent estimator of 2 .

But E2 [s2] = 2. .. 1n

ni=1(Xi X)2 is not an unbiased estimator of 2 .

Example 2.20 Illustrate with an example that an estimator is both consistent and

unbiased.

Let X1, X2, , Xn be a random sample of size n drawn from a normalpopulation with mean and variance 2 . Define s2 = 1n

ni=1(Xi X)2 and

S2 = 1n1n

i=1(Xi X)2 , then Y = ns2

2 2 distribution with (n 1) degreesof freedom and Y G( n12 , 12 ) . with E2 [s2] = n1n 2 and V2 [s2] = 2(n1)n2 4 .

(n 1)S2

= ns

2

S2

=

n

n 1 s2

E2 [S2] =

n

n 1 E2 [s2]

=n

n 1n 1

n2 = 2

V2 [S2] =

n2

(n 1)2 E2 [s2]

=n2

(n 1)22(n 1)

n24

=24

(n 1) 0 as

Thus S2 = 1n1n

i=1(Xi X)2 is consistent and also unbiased estimator of 2 .Example 2.21 Give an example that an unbiased estimator need not be consistent.

Let X1, X2, , Xn be a random sample drawn from a normal populationwith mean and known variance 2 , then the estimator X1 ( first observation) of the

49



50/237

sample is unbiased but not consistent. Since E[X1] = and V[X1] = 2

and

P

{|X1

|<

}= P

{ < X1

<

}= P{ < X1 < + }

=12

+

e1

22(x1)2 dx1

1 as n

. . X1 is not consistent but unbiased estimator of .Example 2.22 Give an example that an estimator is not consistent and not unbi-

ased.

Let Y1 < Y2 < Y3 be the order statistics of a random sample of size 3 drawnfrom a uniform population with pdf for given is

p(x) = 1 0 < x < 0 otherwise

then Y1 is not consistent and not unbiased estimator of , since

E[Y1] =

4= and

P

Y1 4 <

= P

4 < Y1 <

4+

=3

4 +

4

1 y1

2dy1

1 as n

Thus Y1 the first order statistic is not consistent and not unbiased estimator of .

2.9 Sufficient Statistic

Sufficient statistic conveys as much as information about the distribution of a ran-

dom variable which is contained in the sample. It helps to identify a family of distribu-

tions only and not for the parameters of the distributions.

Definition 2.1 Let X1, X2, , Xn be a random sample of size n drawn from apopulation with pdf p(x | ). Let T = t(X) be a statistic whose pdf is p(t) . Fora continuous random variable X, T = t(X) is said to be a sufficient statistic iff

p(x1, x2, , xn)p(t)

is independent of for every given T = t . Similarly for a discrete random variableX , T = t(X) is said to be a sufficient statistic iff

P {X1 = x1, X2 = x2, | T = t}

50

A. Santhakumaran


51/237

is independent of for every given T = t .Example 2.23 Let X be a single observation from a population with pmf

p(x), 0 < < 1 .

p(x) =

|x|(1)|

x|

2 x = 1, 11 (1 ) x = 00 otherwise

Show that |X| is sufficient.Let Y = |X| . Then P{Y = 0} = P{|X| = 0} = P{X = 0} = 1 (1 )

P{Y = 1} = P{|X| = 1} = P{X = 1orX = 1} = P{X = 1} + P{X = 1} =(1 )Consider

P{X = 1 | Y = 1} = P{X = 1 Y = 1}P{Y = 1}

=X = 1 |X| = 1}

P{Y = 1}=

P{X = 1}P{Y = 1}

=(1)

2

(1 ) =1

2is independent of

Therefore Y = |X| is sufficient.Example 2.24 Let X1, X2, , Xn be independent random sample drawn from apopulation with pdf

p(x) =

eix x > i, i = 1, 2, 3 , n0 otherwise

Show that T = min1inXi

i is a sufficient statistic.Let y = xii , then dx = idy

Given p(x) = ei[xi ]

i.e., p(y) = iei[y], y >

Take T = min1in Yi . The pdf of T is

p(t) =n!

1!(n 1)! iei[t]

t

ieiiydyn1

= ineninit < t < p(x1, x2, , xn)

p(t)=

ein

xi

nieninit

=1

ni

enitxi

It is independent of . Thus T = min1in Yi = min1in Xii is sufficient.Example 2.25 Let X1 and X2 be iid Poisson random variables with parameter

. Prove that

51



52/237

(i) X1 + X2 is a sufficient statistic.

(ii) X1 + 2X2 is not a sufficient statistic.

(i) Given that

P{X1 = x1} =

e x1x1!

x1 = 0, 1, 2, 0 otherwise

and P{X2 = x2} =

e x2x2!

x2 = 0, 1, 2, 0 otherwise

Let T = X1 + X2 , then

P{T = t} =

e tt! t = 0, 1, 2,

0 otherwise

Consider P{X1 = x1, X2 = x2 | T = t} =P

{X1 = x1, X2 = t

x1

}P{T = t}=

P{X1 = x1}P{X2 = t x1}P{T = t}

=

e x1x1!

e tx2(tx2)!

e2(2)tt!

=t!

(t x1)!x1!2t is independent of.

.. X1 + X2 is a sufficient statistic.

(ii) Consider P{X1 + 2X2 = 2} = P{X1 = 0, X2 = 1}

+ P{X1 = 2, X2 = 0}= P{X1 = 0}P{X2 = 1}+ P{X1 = 2}P{X2 = 0}= e2 +

2

2e2

= e2[1 +

2]

Therefore P{X1 = 0, X2 = 1 | X1 + 2X2 = 2} = P{X1 = 0, X2 = 1}P{X1 + 2X2 = 2}

=e2

e2[1 + 2 ]

=2

2 + depends on .

.. X1 + 2X2 is not a sufficient statistic.Example 2.26 Let X1 and X2 be two independent Bernoulli random variables such

52

A. Santhakumaran


53/237

that P{X1 = 1} = 1 P{X1 = 0} = , 0 < < 1 and P{X2 = 1} =1 P{X2 = 0} = 2, 0 < 12 . Show that X1 + X2 is not a sufficient statistic.

Let T = X1 + X2. Consider

P{T = 1} = P{X1 + X2 = 1}= P{X1 = 0, X2 = 1} + P{X1 = 1, X2 = 0}= (1 )2 + (1 2)= (3 4)

..P{X1 = 0 | X1 + X2 = 1} = P{X1 = 0 X1 + X2 = 1}P{X1 + X2 = 1}

=P{X1 = 0, X2 = 1}

P{X1 + X2 = 1}=

(1 )2(3 4)

=2(1

)

(3 4) is dependent on .

. . X1 + X2 is not a sufficient statistic.Example 2.27 If X1 and X2 denote a random sample drawn from a normal popula-tion N( , 1 ), < < . Show that T = X1 + X2 is a sufficient statistic.

The joint pdf of X1 and X2 is

p(x1, x2) = p(x1)p(x2)

=1

2e

12 (x1)2 12 (x2)2

Let T = X1 + X2 N(2, 2)

p(t) =

122 e1

4 (t2)2

< t < 0 otherwise

The definition of sufficient statistic gives

p(x1, x2)

p(t)=

12 e

12 [x21+x222(x1+x2)+22]

12

e

14 [t

24t+42]

=1

e12 (x

21+x

22)+(x1+x2)2

e14 (x1+x2)

2+(x1+x2)2

=1

e12 (x

21+x

22)+

14 (x1+x2)

2

is independent of.

. . T = X1 + X2 is a sufficient statistic.

Example 2.28 Let X1, X2, X3 be a sample from B(1, ) . Show that X1X2 +X3is not sufficient.

53



54/237

Let Y = X1X2 and T = X1X2 + X3 , then

P{Y = 0} = P{X1 = 0X2 = 0} + P{X1 = 1, X2 = 0} + P{X1 = 0, X2 = 1}= (1

)2 + (1

) + (1

)

= 1 2

P{Y = 1} = P{X1 = 1, X2 = 1}= 2

P{Y + X3 = 1} = P{Y = 0, X3 = 1} + P{Y = 1, X3 = 0}= (1 2) + 2(1 )

i.e., P{T = 1} = (1 )(1 + 2)Consider

P{Y = 1 | T = 1} = P{Y = 1, T = 1}P{T = 1}

=P{Y = 1}P{X3 = 0}

P{T = 1}=

2

(1 )(1 + 2)=

Probability Models and Their Parametric Estimation

Documents