8/6/2019 Probability Models and Their Parametric Estimation
1/237
PROBABILITY MODELS AND THEIR PARAMETRIC ESTIMATION
NET/JRF/CSIR EXAMINATIONS
A. SANTHAKUMARAN
8/6/2019 Probability Models and Their Parametric Estimation
2/237
Dr. A. Santhakumaraan
Associate Professor and Head
Department of Statistics
Salem Sowdeswari College
Salem - 636010
Tamil - Nadu
E-mail: ask.stat @ yahoo.com
8/6/2019 Probability Models and Their Parametric Estimation
3/237
About the Author
Dr. A. Santhakumaran is an Associate Professor and Head Department of Statistics at
Salem Sowdeswari College, Slaem - 10, Tamil Nadu. He holds a Ph.D. in Statistics -
Mathematics from the Ramanujan Institute for Advanced Study in Mathematics, Univer-
sity of Madras. He has interests in Stochastic Processes and Their Applications. He hasto his credit over 31 research papers in Feedback Queues, Statistical Quality Control and
Reliability Theory. He is the authour of the book Fundamentals of Testing Statistical
Hypotheses and Research Methodology.
8/6/2019 Probability Models and Their Parametric Estimation
4/237
Acknowledgments
My special thanks to the Correspondent and Secretary of Salem Sowdeswari
College , Salem and my colleagues for their enthusiastic and unstinted support ren-
dered for publishing this book. I am grateful to Professor V. Thangaraj, RIASM, Uni-versity of Madras, for his encouragement for writing the book. My greatest debt is
Dr. J. Subramaniam, Professor of Mathematics, Bannari Amman Institute of Technol-
ogy, Sathyamangalam, who read most of the manuscript and whose critical comments
resulted in numerous significant improvements. My thanks to Mr. G. Narayanan, Ra-
manujan Institute Computer Centre, RIASM, University of Madras, for the suggestions
rendered by him towards the successful completion of the Latex typeset of the book.
Finally, I wish to express my gratitude to all my teachers under whose influ-
ence I have come to appreciate statistics as the science of winding and twisting net-
work, connecting Mathematics , Scientific Philosophy, Computer Software and other
intellectual sources of the Millennium. A.SANTHAKUMARAN
8/6/2019 Probability Models and Their Parametric Estimation
5/237
PREFACE
Even though the science of Statistics was originated more than 200 years ago ,
it was recognized as a separate discipline in the early 1940 in India. From then to
till now statistics is evolving as a versatile powerful and indispensable instrument foranalyzing the statistical data in real life problems. We have reached a stage where no
empirical science can afford to ignore the science of Statistics, since the diagnosis of
pattern of recognition can be achieved through the science of Statistics. Because of the
speedy growth of modern science and technology, one who learns statistics, he must
have capacity, knowledge and intellect. Bird has capacity to imitate when we taught.
The child is not born with a language. But it is born into an innate capacity to learn
language. So when we teach the child, the child manipulates the structure and creates
sentences. But a bird cannot do this. So the child has knowledge and capacity to create
new sentences. If a man has the ability and knowledge he can be inventiveness and
innovation constitute intellect.
If a student has ability, knowledge and intellect, then he will be able to learn and
implement statistics successfully. If these three faculties are lacking, learning of statis-
tics will not be possible. We shall give a number of examples drawn from the story ofimprovement of natural knowledge and the success of decision making. It shows how
statistical ideas played an important role in scientific investigations and other decision
making processes. The most successful man in life is one who makes the best deci-
sion based on the available information. Practically it is a very difficult task to take a
decision on a real life problems. We illustrate this with the help of following examples.
One wants to know that how many ways a bread can be divided into two equivalent
parts. Immediately one reflects that it is divided into a finite number of ways. In fact
the bread is divided into two equivalent parts in infinite number of ways. Naturally
every article can have infinite dimension. Our interest of study may be one dimension
namely, length of the bread, Area ( = length breath ) two dimension and Volume( = length height breadth) three dimension and so on. Analogous to this arethe measures of average ( location), measures of variability ( scale) and measures of
skewness and kurtosis (shape).
Another example is that a new two wheeler is introduced by a manufacturer in the
market. The manufacturer wants to announce that the two wheeler gives how much
kilometer per litre on road. For this purpose, the manufacturer ride the two wheeler on
the road three times and observed that the two wheeler gives 50 km per litre, 55 km
per litre and 60 km per litre respectively. Suddenly one comes to the mind that the two
wheeler gives = 50+55+603 = 55 km per litre. This is absolutely wrong. Actually thetwo wheeler gives 60 km per litre, the value of the maximum order statistic.
A cyclist pedals from his house to his college at a speed of 10 mph and returns back
his house from the college at a speed of 15 mph. He wants to know his average speed.
One assumes that the distance between the house and the college is x miles. Then the
average speed of the cyclist = Total distanceTotal time taken
= 2xx10 +
x15
= 12 mph which is the
Harmonic Mean.Seven students and a master want to cross a river from one side to other side. The
students are not able to swim to cross the river. The master measures average height
of the students which is 5.5. He also measures the depth of the river from one side
5
8/6/2019 Probability Models and Their Parametric Estimation
6/237
to other side in 10 places 2, 2.5, 4, 5.5, 6, 6.5, 10, 2.5,1.5,1which has
4.15 average depth of the river. The master takes a decision to cross the river on foot,
since average height of the students is greater than the average depth of the river. The
students fail to cross the river, since some place the depth of the river is more than
5.5. The master is not happy for his decision. The master has succeeded to take a
decision if the minimum height of the students is greater than the maximum depth of
the river.
Keeping this in mind, the first chapter of the book deals with some of the well
known distributions he pattern of recognition of statistical distributions. Chapter 2
gives the criteria of point estimation. Chapter 3 focuses on the study of optimal estima-
tion. Chapter 4 illustrates the properties of complete family of distributions. Chapter 5
explains the methods of estimation. Chapter 6 discusses interval estimation. Chapter 7
consists of Bayesian estimation.
6
8/6/2019 Probability Models and Their Parametric Estimation
7/237
DISTINCTIVE FEATURES
Care has been taken to provide conceptual clarity, simplicity and up to date ma-terials.
Properly graded and solved problems to illustrate each concept and procedureare presented in the text.
About 300 solved problems and 50 remarks. A chapter on complete family of distributions. It is intended to serve as a text book of one semester course on Statistical Infer-
ence of Under - Graduate and Post - Graduate Statistics of Indian universities
and other Applicable Sciences, Allied Statistical Courses, Mathematical Sci-
ences and various Competitive Examinations like ISS, UGC Junior Fellowship,
SLET, NET etc.
Salem - 636010 A.Santhakumaran
January 2010
7
8/6/2019 Probability Models and Their Parametric Estimation
8/237
CONTENTS
1 Diagnosis of Statistical Pattern 1 32
1.1 Introduction
1.2 Collection of data
1.3 Diagnosing the Probability Models data
1.4 Discrete Probability Models
1.5 Continuous Probability Models
1.6 Diagnosis of Probability Models
1.7 Quantile - Quantile plot
2 Criteria of point estimation 33 732.1 Introduction
2.2 Point estimator
2.3 Problems of point estimation
2.4 Criteria of the point estimation
2.5 Consistency
2.6 Sufficient condition for consistency
2.7 Unbiased estimator
2.8 Sufficient Statistic
2.9 Neyman Factorizability Criterion
2.10 Exponential family of distributions
2.11 Distribution Admitting Sufficient Statistic
2.12 Joint Sufficient Statistics
2.13 Efficient estimator
3 Complete Family of Distributions 74 943.1 Introduction
3.2 Completeness
3.3 Minimal Sufficient Statistic
4 Optimal Estimation 95 151
8
8/6/2019 Probability Models and Their Parametric Estimation
9/237
4.1 Introduction
4.2 Uniformly Minimum Variance Unbiased Estimator
4.3 Uncorrelatedness Approach
4.4 Rao - Balckwell Theorem
4.5 Lehman - Scheffe Theorem
4.6 Inequality Approach
4.7 Cramer Rao Inequality
4.8 Chapman - Robbin Inequality
4.9 Efficiency
4.10 Extension of Cramer- Rao Inequality
4.11 Cramer - Rao Inequality - Multiparameter case
4.12 Bhattacharya Inequality
5 Methods of Estimation 152 2035.1 Introduction
5.2 Method of Maximum Likelihood Estimation
5.3 Numerical Methods of Maximum Likelihood Estimation
5.4 Optimum property of MLE
5.5 Method of Minimum Variance Bound Estimation
5.6 Method of Moment Estimation
5.7 Method of Minimum Chi - Square Estimation
5.8 Method of Least Square Estimation
5.9 Gauss Markoff Theorem
6 Interval Estimation 204 2266.1 Introduction
6.2 Confidence Intervals
6.3 Alternative Method of Confidence Intervals
6.4 Shortest Length Confidence Intervals
7 Bayes estimation 227 245
9
8/6/2019 Probability Models and Their Parametric Estimation
10/237
7.1 Introduction
7.2 Bayes point estimation
7.3 Bayes confidence intervals
References
Glossary of Notation
Appendix
Answers to problems
Index
10
8/6/2019 Probability Models and Their Parametric Estimation
11/237
1. DIAGNOSIS OF STATISTICAL PATTERN
1.1 Introduction
Statistics is a decision making tool which aims to resolve the real life problems.It originated more than 2000 years ago, but it was recognized as a separate discipline
from 1940 in India. From then till now , statistics is evolving as a versatile powerful and
indispensable instrument for investigation in all fields of real life problems. It provides
a wide variety of analytical tools. We have reached a stage where no empirical science
can afford to ignore the science of statistics since the diagnosis of pattern of recognition
can be achieved through the science of statistics.
Statistics is a method of obtaining and analyzing data in order to take decisions
on them. In India, during the period of Chandra Gupta Maurya there was an efficient
system of collecting official and administrative statistics. During Akbars reign ( 1556
- 1605AD) people maintained good records of land and agricultural statistics. Statistics
surveys were also conducted during his reign.
Sir Ronald A. Fisher known as Father of statistics placed statistics on a very
sound footing by applying it to various diversified fields. His contributions in statistics
led to a very responsible position of statistics among sciences
Professor P. C. Mahalanobis is the founder of statistics in India. He was a
physicist by training , a statistician by instinct and an economist by conviction. Gov-
ernment of India has observed on 29th June the birthday of Professor Prasanta Chan-dra Mahalanobis as National Statistics Day. Professor C.R. Rao is an Indian legend
, whose career spans the history of modern statistics. He is considered by many to be
the greatest living statistician in the world to day.
There are many definitions of the term statistics . Some authors have defined
statistics as statistical data ( plural sense) and others as statistical methods ( singular
sense).
Statistics as Statistical Data
Yule and Kendall state By statistics we mean quantitative data affected to a
marked extent by multiplicity of causes. Their definition point out the following char-
acteristics:
Statistics are aggregates of facts. Statistics are affected to a marked extent by multiplicity of causes. Statistics are numerically expressed. Statistics are enumerated or estimated according to reasonable standards of ac-
curacy.
Statistics are collected in a systematic manner. Statistics are collected for a pre - determined purpose and Statistics should be placed in relation to each other.
11
Probability Models and their Parametric Estimation
8/6/2019 Probability Models and Their Parametric Estimation
12/237
Statistics as Statistical Methods
One of the best definitions of statistics is given by Croxton and Cowden. They
define statistics as the science which deals with collection, analysis and interpretation
of numerical data. This definition points out the scientific ways of :
Data collection Data presentation data analysis Data interpretation
Statistics as Statistical Models and Methods
Statistics is an imposing form of Mathematics. The usage of statistical methods
has been briskly expanding in the late 20th century, because of the application valueof the statistical models and methods have greater implication in the applications of
many inter - disciplinary sciences. So we define Statistics as the science of winding
and twisting network connecting Mathematics, Scientific Philosophy, Computer
software and other intellectual sources of the millennium.
This definition reveals that statisticians work to translate real life problems
into mathematical models by using assumptions or axioms or principles. Then they
derive exact solutions by their knowledge and thereby intellectually validate the results
and express their merits in non-mathematical forms which make for the consistency of
real life problems.
In real life problems, there are many situations where the actions of the en-
tities within the system under study cannot be completely predicted with 100 percent
perfection . There is always some variation. The variation can be classified into two
categories, i.e., variation due to assignable causes which has to be identified and elim-
inated; and variation due to chance causes which is equal to 6 values. This is alsocalled natural variation. In general, the reduction of natural variation is not necessaryand involves more cost. So it is not feasible to reduce the natural variation. However,
some appropriate statistical patterns of recognition may well describe the causes of
variations.
An appropriate statistical pattern of recognition can be diagnosed by repeated
sampling of phenomenon of interest. Then, through the systematic study of these data,
a statistician can obtain a known distribution suitable for the data and estimates the
parameters of the distribution. A statistician takes continuous efforts in the selection of
a distribution form.
There are four steps in the diagnosis of a statistical distribution. They are
(i) Data collection
Data collection for real life problems often requires a substantial knowledge on
the problems, planning time and resource commitment.
(ii) Identification of statistical pattern
When the data are available, identification of a probability distribution begins
12
A. Santhakumaran
8/6/2019 Probability Models and Their Parametric Estimation
13/237
by developing a frequency distribution or Histogram of the data. Based on the
pattern of frequency distribution and knowledge on the nature and behaviour of
the process, a family of distributions is chosen.
(iii) Parameters selectionChoose parameters that determine a specific instance of a distribution family
when the data are available. These parameters are estimated from the data.
(iv) Validity of the distribution
The validity of the chosen distribution and the associated parameters are evalu-
ated with the help of statistical tests. The validity of various assumptions made
on parameter is achieved by certain level of significance only.
If the chosen distribution is not a good approximation of the data, then the analyst
goes to the second step, chooses a different family of distributions and repeats the
procedure.
If the several iterations of this procedure fail to give a fit between an assumed
distributional form and the collected data, then the empirical form of the distribution
may be used.
1.2 Collection of Data
Collection of data is one of the important tasks in finding a solution for real life
problems. Even if the statistical pattern of the real life problems are valid, if the data
are inaccurately collected, inappropriately analyzed or not representative of the real life
problems, then the data will be misleading when used for decision making.
One can learn data collection from an actual experience. The following sug-
gestions may enhance and facilitate data collection. Data collection and analysis must
be tackled with great care.
(i) Before collecting data, planning is very important. It could commence by a prac-
tice of pre - observing experience. Try to collect the data while pre - observing.
Forms of the data are devised for due purposes. It is very likely that these forms
will have to be modified several times before the actual data collection begins.
Watch for unusual situations or normal circumstances and consider how they
will be handled. Planning is very important even if the data are collected au-
tomatically. After collecting the data, find out whether the collected data are
appropriate or not.
(ii) If the data being collected are adequate to diagnosize the statistical distributions,
then determine the apt distribution. If the data being used are useless to diagno-
size the statistical distribution, then there is no need to collect superfluous data.
(iii) Try to combine homogeneous data sets. Check data for homogeneity in suc-
cessive time periods and during the same time period on successive interval oftimes.
13
Probability Models and their Parametric Estimation
8/6/2019 Probability Models and Their Parametric Estimation
14/237
(iv) Beware of the possibility of data censoring, in which a quantity of interest is not
observed in its entirety. This problem most often occurs when the analyst is
interested in the time required to complete some process but the process begins
prior to or finishes after the completion of the observation period. Censoring can
result in especially long process times being left out of the data sample.
(v) One may use scatter diagram which indicates the relationship between the two
variables of interest.
(vi) Consider the possibility that a sequence of observations which appear to be in-
dependent may possess autocorrelation. Autocorrelation may exist in successive
time periods.
1.3 Diagnosis of a distribution with data
The methods for selecting families of distributions are possible, if only the sta-
tistical data are available. The specific distribution within a family is specified by
estimating its parameters. Estimating the parameters of a family of distributions leadsto the theory of estimation.
The formation of frequency distribution or Histogram is useful in guessing the
shape of a distribution. Hines and Montgomery state that choosing the number of class
intervals approximately equals the square root of the sample size. If the intervals are too
long, the Histogram will be coarse or blocking and its shape and other details will not
smoothen the data. So one has to allow the interval sizes to change until a good choice
is found. The Histogram for continuous data corresponds to the probability density
function of a theoretical distribution. If continuous, a line drawn through the centre
point of each class interval frequency should result in a shape like that of probability
density function (pdf )( see Figure 1.2).Histogram for discrete data where there are a large number of data points,
should have a cell for each value in the range of the data. However if there are a few
data points, it may be necessary to combine adjacent cells to eliminate the ragged ap-pearance of the Histogram. If the Histogram is associated with discrete data, it should
look like a probability mass function (pmf ) ( see Figure 1.1).
1.4 Discrete Distributions
Discrete random variables are used to describe the random phenomenon in which
only integer values can occur. The following are some important distributions.
1.4.1 Bernoulli distribution
An experiment consists of n trials, each trial has a success or a failure and eachtrial is repeated under the same condition. Let Xj = 1 if the j
th experiment
resulted in success and let Xj = 0 , if the jth experiment resulted in a failure,
the sample space has a value 0 and 1. If the trials are independent, each trial hasonly two possible outcomes ( success or failure) and the probability of success
14
A. Santhakumaran
8/6/2019 Probability Models and Their Parametric Estimation
15/237
remains constant from trial to trial. For one trial the pmf
p(x) = x(1 )1x x = 0, 1, 0 < < 10 otherwise
is the Bernoulli distribution function.
From the above assumptions in a production process, X denotes the quality ofthe produced item, then X follows the Bernoulli random variable.
1.4.2 Binomial Distribution
Let X be a random variable, denotes the number of success in n Bernoullitrials. Then the random variable X is called a Binomial random variable withparameters n and . Here the sample space is {0, 1, 2, , n} and the pmfis
p(x) =
n!
x!(nx)! x(1 )nx x = 0, 1, , n, 0 < < 1
0 otherwise
In Binomial distribution, the mean is always greater than variance . If
X1, X2, , Xn are independent and identically distributed Bernoulli randomvariables, then
ni=1 Xi b(n, ) . The problems relating to tossing a coin
or throwing dice lead to Binomial distribution . In a production process, the
number of x defective units in a random sample of n units follows Binomialdistribution.
1.4.3 Geometric Distribution
A random variable X is related to a sequence of Bernoulli trials in which thenumber of trials (x + 1) to achieve the first success is
p(x) = (1 )x x = 0, 1, 2, , 0 < < 10 otherwise
It is the probability that the event {X = x} occurs, when there are x failuresfollowed by a success.
A couple decides to have any number of children until they have a male
child. If the probability of having a male child in their family is p , they haveto expect how many children they will have before the first male child is born.
X denotes the number of children of the couple. The probability that there arex female children preceding the first male child is born, is a Geometric randomvariable.
1.4.4 Negative Binomial Distribution
If X1, X2, , Xn are iid Geometric variables, then T = t(X) =ni=1
Xi
a Negative Binomial variate whose pmf is
p(t) =
(t+n1)!t!(n1)!
n(1 )t t = 0, 1, 0 otherwise
15
Probability Models and their Parametric Estimation
8/6/2019 Probability Models and Their Parametric Estimation
16/237
A random variable X is related to a sequence of Bernoulli trials in which xfailures preceding the nth success in (x + n) trials is given by
p(x) = (x+n1)!
(n1)!x!n(1
)x x = 0, 1, 2,
0 otherwise
This will happen if the last trial results in a success and among the previous
(n + x 1) trials there are exactly x failures. Note that if n = 1 , then p(x)is the Geometric distribution function. Negative Binomial distribution has Mean
< Variance . In a production process, the number of units that are required toachieve nth defective in x + n units follow Negative Binomial distribution.
1.4.5 Multinomial DistributionIf the sample space of a random experiment has been split into more than twomutually exclusive and exhaustive events then one can define a random vari-able which leads to Multinomial distribution. Let E1, E2, , Ek be k mu-tually exclusive and exhaustive events of a random experiment with respec-
tive probabilities 1, 2, , k, such that 1 + 2 + + k = 1 and0 < i < 1, i = 1, 2, , k, then the probability that E1 occurs x1 times, E2occurs x2 times, , Ek occurs xk times in n independent trials is knownas Multinomial distribution with pmf is given by
p1,2, ,k (x1, x2, , xn) =
n!x1!x2!xk!
x11 x22 xkk where
ki=1 xi = n
0 otherwise
If k = 2 , that is, the number of mutually exclusive events is only two, then theMultinomial distribution becomes a Binomial distribution as is given by
p1,2 (x1, x2) =
n!
x1!x2!x11
x22 where x1 + x2 = n and 1 + 2 = 1
0 otherwise
That is x2 = n x1 and 2 = 1 1 which implies
p1 (x1) =
n!
x1!(nx1)! x11 (1 1)nx1 0 < 1 < 1, x1 = 0, 1, , n
0 otherwise
Consider two brands A and B. Each individual in the population prefers brand
A to brand B with probability 1 , prefers B to A with probability 2 and isindifferent between brand A and B with probability 3 = 1 1 2 . Ina random sample of n individuals X1 prefers brand A, X2 prefers brand Band X3 prefers some other brand other than A and B. Then the three randomvariables follow a Trinomial distribution, i.e.,
p1,2,3 (x1, x2, x3) = P{X1 = x1, X2 = x2, X3 = x3}
=
n!x1!x2!x3! x
11 x
22 x
33 x1 + x2 + x3 = n0 otherwise
16
A. Santhakumaran
8/6/2019 Probability Models and Their Parametric Estimation
17/237
1.4.6 Discrete Uniform Distribution
A random variable X is said to follow uniform distribution on N points(x1, x2, , xN), if its pmf is given by
pN(x) = PN{X = xi} =
1N i = 1, 2, , N and N I+0 otherwise
A random experiment with complete uncertainty but whose outcomes are equal
probabilities may describe Uniform distribution. In a finite population of Nunits, one has to select any unit xi, i = 1, 2, , N from the population withsimple random sampling technique which has a discrete uniform distribution.
1.4.7 Hypergeometric Distribution
One situation in which Bernoulli trials are encountered is that in which an ob-
ject is drawn at random from a collection of objects of two types in a box. In
order to repeat this experiment so that the results are independent and identically
distributed, it is necessary to replace each object drawn and to mix the objects
before the next one is drawn. This process is referred to as sampling with re-placement. If the sampling is done no replacement of the objects drawn, the
resulting trial are still of the Bernoulli type but no longer independent.
For example, four balls are drawn one at a time, at random and no replace-
ment from 8 balls in a box, 3 black and 5 red. The probability that the third ball
drawn is black, i.e.,
P{ 3rd ball black} = P(RRB) + P(RBB) + P(BRB) + P(BBB)=
5
8 4
7 3
6+
5
8 3
7 2
6+
3
8 5
7 2
6+
3
8 2
7 1
6
=3
8
which is the same as the probability that the first ball drawn is black. It shouldnot be surprising that this probability for black ball is the same on the third draw
as on the first draw.
In general case, n objects are to be drawn at random, one at a time, froma collection of N objects, M of one kind and N M of another kind. Theone kind and of object will be thought of as success and coded 1; the other kind
is coded 0. Let X1, X2, , Xn denote the sequence of coded outcomes; thatis Xi is 1 or 0 according to whether the ith draw results in success or failure.The total number of success in n trials is just the sum of the X s ,
Sn = X1 + X2 + + Xnas it was in the case of independent identically distributed Bernoulli trials. That
is, the probability of a 1 on the ith trial is the same at each trial:
P{Xi = 1} = MN
i = 1, 2, , n
17
Probability Models and their Parametric Estimation
8/6/2019 Probability Models and Their Parametric Estimation
18/237
One can observe first that the probability of a given sequence of N objects is
1
N
1
N
1
1N
n + 1
The probability that an object of type 1 occurs in the ith position in the sequenceof N objects is
P{Xi = 1} = M(N 1)(N 2) (N n + 1)N(N 1) (N n + 2)(N n + 1)
=M
Ni = 1, 2, , n
where M is the number of ways of selecting the ith position with an objectcoded 1 and (N1)(N2) (Nn + 1) is the number of ways of selectingthe remaining (n 1) places in the sequence from the (N 1) remainingobjects. It does not matter whether the number of success among the n objects
drawn, one at a time, at random or that of simultaneously drawing n at random.The probability function of Sn is
P{Sn = k} =
Mk
N - Mn - k
Nn
k = 0, 1, 2, , min(n, M)
0 otherwise
The random variable Sn with the above probability function is said to have aHypergeometric distribution. The mean of the random variable Sn is easilyobtained from the representation of a Hypergeometric variable as a sum of the
Bernoulli trials. That is,
E[Sn] = E[X1 + X2 + + Xn]= E[X1] + E[Xn] + + E[Xn]= 1 P{X1 = 1} + 0 P{X1 = 0}
+ + 1 P{Xn = 1} + 0 P{Xn = 0}=
M
N+ + M
N=
nM
N
Variance of Sn = nM
N
N MN
N nN 1 if N I+ (1.1)
The probability at each trial that the object drawn is of the type of which there
are initially M is p = M
N
, then
Variance of Sn = npqN nN 1 if N I+ (1.2)
18
A. Santhakumaran
8/6/2019 Probability Models and Their Parametric Estimation
19/237
The above formula (1.2) differs from the formula (1.1) by the extra factor NnN1 .The variance of Sn = npq
NnN1 in the no replacement case and the variance
of Sn = npq in the replacement case for fixed p and fixed n , since the factorNnN1
1 as N becomes finitely many. Thus Hypergeometric distribution isexact where as Binomial distribution is approximate one.
50 students of the M.Sc. Statistics in a certain college are divided at random
into 5 batches of 10 each for the annual practical examination in Statistics. The
class consists of 20 resident students and 30 non - resident students. X denotesthe number of students in the first batch who appear the practical examination.
The Hypergeometric distribution is apt to describe the random variable X andhas the pmf
P{X = x} =
20x
3010 - x
5010
x = 0, 1, 2, , 10
0 otherwise
1.4.8 Poisson Distribution
Poisson random variable is used to describe rare events. For example number of
air crashes occurred on Monday in 3 pm to 5 pm. The pmf of Poisson randomvariable given as
p(x) =
e
x
x! > 0, x = 0, 1, 2, 0 otherwise
where is a parameter. One of the important properties of the Poisson dis-tribution is that mean and variance are the same and are equal to . IfX1, X2,
, Xn are iid Poisson random variables with parameter , then the
sum of the random variables
ni=1 Xi follows a Poisson distribution with pa-
rameter n .
After correcting 50 pages of the proof of a book, the proof readers find
that there are, on the average 2 errors per 5 pages. One would like to know the
number of pages with errors 0 , 1, 2, 3 in 10000 pages of the first print ofthe book. X denotes the number of errors per page; then the random variableX follows the Poisson distribution with parameter = 25 = .4.
1.4.9 Power series distribution
If a random variable X follows a Power series distribution, then its pmf is
P{X = x} = ax
x
f() x S; ax 0, > 00 otherwise
where f() is a generating function, i.e., f() =
xS axx, > 0 so that
f() is positive, finite and differentiable and S is a non - empty countablesubset of non - negative integers.
19
Probability Models and their Parametric Estimation
8/6/2019 Probability Models and Their Parametric Estimation
20/237
1.4.9 Particular cases.
(i) Binomial Distribution
Let = p1+p , f() = (1 + )n and S = {0, 1, 2, 3, , n} a set of non -
negative integers, then
f() =xS
axx
(1 + )n =n
x=0
axx
ax =n
x
Pp{X = x} =
nx
p1p
x[1 + p1p ]
n
= nxp
xqnx x = 0, 1, 2, , n0 otherwise
(ii) Negative Binomial Distribution
Let = p1p , f() = (1 )n and S = {0, 1, 2, }, 0 1 and n I+ . Now
f() =xS
axx
(1 )n =
x=0
axx
ax = (1)x - nx
= (1)x(1)x
n + x - 1
x
=
n + x - 1
x
P{X = x} =
n + x - 1
x
p
1+p
x
1 ( p1+p )
n=
n + x - 1
x
p
1 +p
x(1 +p)n
=
n + x - 1
x
px(1 +p)(n+x)
=-n
x
(p)x(1 +p)(n+x) x = 0, 1, 2,
(iii) Poisson distribution
20
A. Santhakumaran
8/6/2019 Probability Models and Their Parametric Estimation
21/237
Let f() = e and S = {0, 1, 2, }. Now
f() =
xSax
x
e =xS
axx
x=0
x
x!=
x=0
axx ax = 1
x!
P{X = x} = axx
f()=
1
x!
x
e=
ex
x!x = 0, 1, 2,
1.5 Continuous Distributions
Continuous random variable can be used to describe random phenomena in which
the variable X of interest can take any value x in some interval which has P{X =x} = 0 x in that interval.
1.5.1 Uniform Distribution
A random variable X is uniformly distributed at an interval [a, b], if its pdf isgiven by
pa,b(x) =
1
ba a x b0 otherwise
Note that P{x1 < X < x2} = F(x2) F(x1) = x2x1ba is proportional to thelength of the interval for all x1 and x2 satisfying a x1 x2 b . If randomphenomenon has complete unpredictability, then it can be described as uniform
distribution.
1.5.2 Normal Distribution
A random variable X with mean ( < < ) and variance 2
(> 0)has a Normal distribution if it has the pdf
p,2 (x) =
1
2e
122
[x]2 < x < 0 otherwise
The time of number of components of a random experiment can be thought of
as a Normal distribution. The time to assemble a product which is the sum of
the times required for each assembly operation may describe a Normal random
variable.
1.5.3 Exponential Distribution
A random variable X is said to be Exponentially distributed with parameter
> 0 , if its pdf is given by
p(x) =
ex x > 00 otherwise
21
Probability Models and their Parametric Estimation
8/6/2019 Probability Models and Their Parametric Estimation
22/237
The value of the intercept on the vertical axis is always equal to the value of .Note that all pdfs eventually intersect at , since the Exponential distributionhas its mode at the origin. The mean and standard deviation are equal in Ex-
ponential distribution. In a random phenomenon, the time between independent
events which have memory less property may appropriately follow Exponential
random variable. For example, the time between the arrivals of a large number
of customers who act independently of each other may fit adequately the data to
Exponential distribution.
1.5.4 Gamma Distribution
A function used to define the Gamma distribution is the Gamma function. A
random variable X follows a Gamma distribution, if
p, (x) =
exx1 x > 0, > 0, > 0
0 otherwise
where is called the shape parameter and is called the scale parameter.
ni=1
Xi
G(n, 1
) , if each Xi
exp( 1
) . The cumulative distributionfunction F(x) = P{X x} of a random variable X is given by
F(x) =
1
x (t)
1et dt x > 00 otherwise
1.5.5 Erlang Distribution
The pdf of the Gamma distribution becomes Erlang distribution of order kwhen = k an integer. When = k a positive integer, the cumulative distri-bution function F(x) is given by
F(x) =
1 k1i=0 ekx (kx)ii! x > 00 otherwise
which is the sum of Poisson terms with mean kx .
1.5.6 Weibull Distribution
A random variable X has a Weibull distribution if it has pdf
p,, (x) =
x
1exp[( x ) ] x
0 otherwise
The three parameters of the Weibull distribution are ( < < ) which isthe location parameter, ( > 0) which is the scale parameter and ( > 0)which is the shape parameter. When = 0 the Weibull pdf becomes
p,(x) =
(
x )
1exp[( x ) ] x 00 otherwise
When = 0 and = 1 , the Weibull distribution is reduced to Exponentialdistribution with pdf
p(x) =
1 e
x x 00 otherwise
22
A. Santhakumaran
8/6/2019 Probability Models and Their Parametric Estimation
23/237
1.5.7 Triangular Distribution
A random variable X has a Triangular distribution if its pdf is given by
pa,b,c(x) =
2(xa)(b
a)(c
a) a
x
b
2(cx)(cb)(ca) b < x c0 otherwise
where a b c. The mode occurs at x = b , since a b c, it followsthat 2a+c3 E[X] a+2c3 . The mode is used more often than the mean tocharacterize the Triangular distribution.
1.5.8 Empirical Distribution
An empirical distribution may be either continuous or discrete in nature. It is
used to establish a statistical model for the available data whenever there is a
discrepancy in the aimed distribution or whenever one can unable to arrive at a
known distribution.
(a) Empirical Continuous DistributionsThe time taken to install 100 machines is collected. The data are given in Table
1.1 which gives the number of machines together with time taken. For example,
30 machines have installed between 0 and 1 hour, 25 between 1 and 2 hour, 20
between 2 and 3 hour and 25 between 3 and 4 hour. X denotes time taken toinstall the machines.
Table 1.1 Distribution of the time taken to install the Machines
Durationof Hours Frequency p(x) F(x) = P
{X
x}0 x 1 30 .30 .30
1 < x 2 25 .25 .552 < x 3 20 .20 .753 < x 4 25 .25 1.00
(b) Empirical Discrete Distributions
At the end of the day, the number of shipments on the loading docks of an export
company are observed as 0, 1 , 2, 3, 4 and 5 with frequencies 23, 15, 12, 10, 25
and 15 respectively. Let X be the number of shipments on the loading docks ofthe company at the end of the day. Then X is a discrete random variable whichtakes the values 0 , 1, 2, 3, 4 and 5 with the distribution as given in Table 1.2.
Figure 1.1 is the Histogram of number of shipments on the loading docks of the
company.
23
Probability Models and their Parametric Estimation
8/6/2019 Probability Models and Their Parametric Estimation
24/237
Table 1.2 Distribution of number of shipments
Number ofshipments x Frequency P{X = x} F(x) = P{X x}0 23 .23 .23
1 15 .15 .38
2 12 .12 .50
3 10 .10 .60
4 25 .25 .85
5 15 .15 1.00
0 1 2 3 4 5
Figure 1.1 Histogram of shipmentsNumber of shipments
5
10
1520
25
FREQ
U
ENCY
1.6 Diagnosis of distributions
The probability of an item whose value of the variable is constant increment, is
an Exponential distribution. This is apt to fit the data. If the probability of a variable
of an item whose value of the variable is either positive or negative, then a Normal
distribution is appropriate to the data. When the variable of interest seems to follow
the Normal probability distribution, the random variable is restricted to be greater than
or less than a certain value. The truncated Normal distribution will be adequate to fit
the data. The Gamma and Weibull distributions are also used to describe the data. The
Exponential distribution is a special case of both the Gamma and Weibull distributions.
The difference between the Exponential, Gamma and Weibull distributions involve the
location of modes of the pdf s and the shapes of their tails will be in proportion to
24
A. Santhakumaran
8/6/2019 Probability Models and Their Parametric Estimation
25/237
large and short times. The Exponential distribution has its mode at the origin but the
Gamma and Weibull distributions have their modes at some point( 0 ) which is afunction of the parameters values selected. The tail of the Gamma distribution is long,
like an Exponential distribution while the tail of the Weibull distribution may decline
more rapidly or less rapidly than that of an Exponential distribution. In practice, if
there are higher value of the variable than an Exponential distribution, it can account
for a Weibull distribution which provides a better distribution of the data.
Illustration 1.6.1
Sixteen equipments were produced and placed on test and the Table 1.3 gives the
length of time intervals between failures in hours.
Table 1.3 Equipments time between failuresEquipment
Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Timebetweenfailures 19 12 16 1 15 5 10 1 46 7 33 25 4 9 1 10
For the sake of simplicity in processing the data , one can set up the ordered set as
given blow:
Table 1.4 Ordered set of equipment time between failuresEquipment
Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Time
betweenfailures 1 1 1 4 5 7 9 10 10 12 15 16 19 25 33 46
On this basis, one may construct a Histogram to judge the pattern of the data in Table
1.4. An approximate value of the interval can be determined from the formula.
t =maximum value - minimum value
1 + 3.3log10 N
where the maximum and minimum are the values in the ordered set and N is the totalnumber of items of the order statistics. In this case maximum value is 46 , minimum
value is 1 and N is 16. Thus t = 451+3.3 log10 16= 9.05 10 = width of the class
interval.
Table 1.5 Frequency DistributionTime
interval 0 - 10 10 - 20 20 - 30 30 - 40 40 - 50Number ofEquipment 9 4 1 1 1
25
Probability Models and their Parametric Estimation
8/6/2019 Probability Models and Their Parametric Estimation
26/237
Histogram is drawn based on the frequency distribution in Table 1.5 and is given in
Figure 1.2.
Time interval50403020100
9
4
1 1 1
Figure 1.2 Histogram of time to failures
Numberof
Equipment
The Histogram reveals that the distribution could be Negative Exponential or the
right portion of the Normal distribution. Assume the time to failure follows Exponen-
tial distribution of the form,
p(x) =
ex > 0, x > 00 otherwise
How for the assumption is valid has to be verified? The validity of the assumption
is tested by the 2 test of goodness of fit.
Table 1.6 Distribution of time to failures
Interval pi
Expected
frequency
E
Observedfrequency
O
0 - 10 .5262 8.41 8 910 - 20 .2493 3.98
4 4
20 - 30 .1181 1.886 2 130 - 40 .0559 .8944 1 140 - 50 .0265 .454 1 1
26
A. Santhakumaran
8/6/2019 Probability Models and Their Parametric Estimation
27/237
where pi =xi+1
xiexdx = exi exi+1 , i = 0, 10, 20, , 50. If the cell
frequencies are less than 5, then it can be made 5 or more than 5. One may get two
classes only, i.e, the expected frequencies are equal to 8 each and the corresponding
observed frequencies are 9 and 7 respectively. The 2 test of goodness of fit failsto test the validity of the assumption that the sample data come from an Exponential
distribution with parameter = 113.38 = .0747 = failure rate per unit hour where themean life time of the equipments = 21416 = 13.38 hours. To test the validity of theassumption that the time to failure follows an Exponential distribution, consider the
likelihood function of the cell frequencies of o1 = 9 and o2 = 7 is
L =
n!
o1!o2!
e1n
o1 e2n
o2 o1 + o2 = n0 otherwise
Under H0 the likelihood function follows a Binomial probability law b(16, p) wherep = e1n . To test the hypothesis that H0 : the fit is the best one vs H1 : the fit is not thebest one. It is equivalent to test the hypothesis that H0 : p .5 vs H1 : p > .5 TheUMP level = .05 test is given by
(x) =
1 ifx > 11.17 ifx = 110 otherwise
The observed value is 9 which is less than 11. There is no evidence to reject the
hypothesis H0 . The data come from an Exponential distribution with 5% level ofsignificance. Thus time to failure of the equipments follows an Exponential distribu-
tion. One may conclude that on an average the equipment would be operated for 13.38
hours without failure.
1.7 Quantile - Quantile plot
The construction of Histograms and the recognition of a distributional shape arenecessary ingredients for selecting a family of distributions to represent a sample data.
A Histogram is not useful for evaluating the fit of the chosen distribution. When there
are a small number of data points ( 30 ), a Histogram can be rather ragged. Furtherperception of the fit depends on the width of the Histogram intervals. Even if the
intervals are well chosen, grouping the data into cells makes it difficult to compare a
Histogram to a continuous pdf . A quantile - quantile (q - q) plot is a useful tool forevaluating distribution fit that does not suffer from these problems.
If X is a random variable with cumulative distribution F(x) , then q quantile ofX is that value y such that F(y) = P{X y} = q for 0 < q < 1 . When F(x)has an inverse y = F1(q) . Let x1, x2, , xn be a sample observations of X.Order the observations from the smallest to the largest and denote these as yj , j = 1to n where y1
y2
yn . One can denote j the rank or order number.
Therefore j = 1 for the smallest and j = n for the largest. The q - q plot is based on
the fact that yj is an estimate of the (j 12
n ) quantile of X, i.e, yj is approximately
F1
j 12n
.
27
Probability Models and their Parametric Estimation
8/6/2019 Probability Models and Their Parametric Estimation
28/237
A distribution with cumulative distribution function F(x) is a possible represen-tation of the random variable X. If F(x) is a member of an appropriate family of
distributions, then a plot of yj versus F1
j 12
n will be approximately a straightline.
If F(x) is from an appropriate family of distributions and also has appropriateparameter values, then the line will have slope 1. On the other hand, if the assumed
distribution is inappropriate, the points will deviate from a straight line in a systematic
manner. The decision whether to accept or reject some hypothesized distribution is
subjective.
In the construction of q - q plot, the following should be borne in mind.
(i) The observed values will never fall exactly on a straight line. (ii) The ordered values
are not independent, since they have been ranked. (iii) The variances of the extremes
are much higher than the variances in the middle of the plot. Greater discrepancies can
be accepted at the extremes. The linearity of the points in the middle of the plot is more
important than the linearity at the extremes.
Illustration 1.7.1
A sample of 20 repairing times of electronic watch was considered. The repairing
time X is a random variable. The values are in seconds on the random variable X.The values are arranged in the increasing order of magnitude as in Table 1.7.
Table 1.7 Repairing times of electronic watch
j Value j Value j Value j Value1 88.54 6 88.82 11 88.98 16 89.26
2 88.56 7 88.85 12 89.02 17 89.30
3 88.60 8 88.90 13 89.08 18 89.35
4 88.64 9 88.95 14 89.18 19 89.41
5 88.75 10 88.97 15 89.25 20 89.45
28
A. Santhakumaran
8/6/2019 Probability Models and Their Parametric Estimation
29/237
Table 1.8 Normal Quantile
jj1220
yj =F1(
j1220
)xj = yj .08
+ 88.993 jj 1220
yj =F1(
j1220
)xj = yj .08
+ 88.993
1 .025 -1.96 88.84 11 .525 .06 89.00
2 .075 - 1.41 88.88 12 .575 .18 89.01
3 .125 - 1.13 88.90 13 .625 .31 89.02
4 .175 - 0.93 88.92 14 .675 .45 89.03
5 .225 - 0.75 88.94 15 .725 .60 89.04
6 .275 -.60 88.95 16 .775 .75 89.05
7 .325 -.45 88.96 17 .825 .93 89.07
8 .375 -.31 88.97 18 .875 1.13 89.08
9 .425 - .18 88.98 19 .925 1.41 89.11
10 .475 -.06 88.99 20 .975 1.96 89.15
The ordered observations in Table 1.8 are then plotted versus F1
j 12n
for
j = 1, 2, , 20 where F(.) is the cumulative distribution function of the Normalrandom variable X with mean 88.993 seconds, and standard deviation .08 seconds to
obtain the q - q plot. The plotted values are shown in Figure 1.3. The general per-ception of a straight line is quite clear in the q - q plot, supporting the hypothesis of a
normal distribution.
Normalquantile
yj
Time xjFigure 1.3 q q plot of the repairing times
Note: The diagnosis of statistical distributions of real life problems are not exact
but at best they represent reasonable approximations.
Problems
1.1 The mean and variance of the number of defective items drawn randomly oneby one with replacement from a lot are found to be 10 and 6 respectively. The
distribution of the number of defective items is:
(a) Poisson with mean 10
29
Probability Models and their Parametric Estimation
8/6/2019 Probability Models and Their Parametric Estimation
30/237
(b) Binomial with n = 25 and p = 0.4(c) Normal with mean 10 and variance 6
(d) None of the above
1.2 If X is a Poisson random variate with mean 3, then P{|X 3| < 1} will be:(a) 12 e
3 (b) 3e3 (c) 4.5e3 (d) 27e3
1.3 Let U(1), U(2), , U(n) be the order statistics of a random sampleU1, U2, , Un of size n from the uniform (0, 1) distribution. Then the con-ditional distribution of U1 given U(n) = u(n) is given by:(a) Uniform on (0, u(n))
(b) P{U1 = u(n)} = 1n and probability n1n is uniformly distributed over(0, u(n)) .
(c) Beta
1n
, n1n
(d) Uniform (0, 1)
1.4 A biased coin is tested 4 times or until a head turns up, whichever occurs earlier.
The distribution of the number of tails turning up is:(a) Binomial (b) Geometric (c) Negative Binomial (d) Hypergeometric
1.5 If X and Y are independent Exponential random variables with the same mean , then the distribution of min(X, Y) is :(a) Exponential with mean 2(b) Exponential with mean (c) not Exponential with mean (d) Exponential with mean 2
1.6 The 2 goodness of fit is based on the assumption that a character under study is(a) Normal (b) Non - Normal (c) any distribution (d) not required
1.7 The exact distribution of 2 goodness of fit for each experiment unit is classified
into one of more k categories of a random sample of size n depends on :(a) Hypergeometric distribution
(b) Normal distribution
(c) Multinomial distribution
(d) Binomial distribution
1.8 If X1 b(n1, 1) , X2 b(n2, 2) and X1 , X2 are independent, then thesum of the variates X1 + X2 is distributed as :(a) Hypergeometric distribution
(b) Binomial distribution
(c) Poisson distribution
(d) None of the above
1.9 If X1
b(n1
, ) , X2
b(n2
, ) and X1
, X2
are independent, then the sum
of the variates X1 + X2 is distributed as :(a) Hypergeometric distribution
(b) Binomial distribution
30
A. Santhakumaran
8/6/2019 Probability Models and Their Parametric Estimation
31/237
(c) Poisson distribution
(d) None of the above
1.10 If X1
P(1), X2
P(2) and X1 , X2 are independent,then the sum of
the variates X1 + X2 is distributed as :(a) Hypergeometric distribution
(b) Binomial distribution
(c) Poisson distribution
(d) None of the above
1.11 The skewness of a Binomial distribution will be zero if:
(a)p < .5 (b) p > .5 (c) p = .5 (d) p = .51.12 If the sample size n = 2 , the students t - distribution reduces to:
(a) Normal distribution
(b) F - distribution(c) 2 - distribution(d) Cauchy distribution
1.13 The reciprocal property of Fn1,n21 distribution can be expressed as:(a) Fn2,n1 (1 ) = 1Fn1,n2 ()(b) P{Fn1,n2 () c} = P
Fn2,n1 () 1c
(c) Fn2,n1 (1 2 ) = 1Fn1,n2 ( 2 )(d) All the above
1.14 The distribution of which the moment generating function is not useful in finding
the moments is :
(a) Binomial distribution
(b) Negative Binomial distribution
(c) Hypergeometric distribution
(d) Geometric distribution
1.15 Probability of selecting a unit from a population of N units in a simple randomsampling technique is a :
(a) Bernoulli distribution
(b) Binomial distribution
(c) Geometric distribution
(d) discrete Uniform distribution
1.16 A production process is a sequence of Bernoulli trials, the number of x defectiveunits in a sample of n units is a:(a) Bernoulli distribution
(b) Binomial distribution
(c) Multinomial distribution
(d) Hypergeometric distribution
1.17 A random variable X is related to a sequence of Bernoulli trials in which thenumber of trials (x + 1) to achieve the first success, then the distribution of X
31
Probability Models and their Parametric Estimation
8/6/2019 Probability Models and Their Parametric Estimation
32/237
is :
(a) Bernoulli distribution
(b) Binomial distribution
(c) Multinomial distribution
(d) Geometric distribution
1.18 If X1, X2, , Xn are iid Geometric variables, thenn
i=1 Xi follows:(a) Negative Binomial distribution
(b) Binomial distribution
(c) Multinomial distribution
(d) Geometric distribution
1.19 A random variable X is related to a sequence of Bernoulli trials in which xfailures preceding the nth success in (x + n) trials is a :(a) Binomial distribution
(b) Multinomial distribution
(c) Negative Binomial distribution
(d) Geometric distribution
1.20 If a random experiment has only two mutually exclusive outcomes of a Bernoulli
trial, then the random variable leads to:
(a) Binomial distribution
(b) Multinomial distribution
(c) Negative Binomial distribution
(d) Geometric distribution
1.21 A box contains N balls M of which are white and N M are red. If Xdenotes the number of white balls in the sample contains n balls with replace-ment, then X is a :(a) Binomial variate
(b) Bernoulli variate
(c) Negative Binomial variate
(d) Hypergeometric variate
1.22 The number of independent events that occur in a fixed amount of time may
follow:
(a) Exponential distribution
(b) Poisson distribution
(c) Geometric distribution
(d) Gamma distribution
1.23 A power series distribution
P{
X = x}
= axx
f() x S, ax 00 otherwise
where f() = (1 + )n, = p(1p) and S = {0, 1, 2, } . Then the randomvariable X has
32
A. Santhakumaran
8/6/2019 Probability Models and Their Parametric Estimation
33/237
(a) Geometric distribution
(b) Bernoulli distribution
(c) Binomial distribution
(d) Negative Binomial distribution
1.24 The given probability function p(x) = 23x+1 for x = 0, 1, 2, 3, , represents:(a) Negative Binomial distribution
(b) Binomial distribution
(c) Bernoulli distribution
(d) Geometric distribution
1.25 Dinesh Kumar receives 2, 2, 4 and 4 telephone calls on 4 randomly selected days.
Assuming that the telephone calls follow Poisson distribution, the estimate of the
number of telephone calls in 8 days is:
(a) 12 (b) 3 (c) 24 (d) none of the above
1.26 The exact distribution of 2 goodness of fit test for each experiment units isclassified into one of two categories of a random sample of size
ndepends on :
(a) Hypergeometric distribution
(b) Normal distribution
(c) Multinomial distribution
(d) Binomial distribution
1.27 The pmf of a random variable X is
p(x) =
k=0(1)k
k + x
k
x+k
(x+k+1) x = 0, 1, = 0 otherwise
It is known as
(a) Binomial ( b) Negative Binomial (c) Poisson (d) Geometric
33
Probability Models and their Parametric Estimation
8/6/2019 Probability Models and Their Parametric Estimation
34/237
2. CRITERIA OF POINT ESTIMATION
2.1 Introduction
In real life applications, determining appropriate distributions from the randomsample is a major task. Faulty assumption of distributions will lead to misleading rec-
ommendations. As a family of distributions induced by a parameter has been selected,
the next step is to estimate the parameters of the distribution. The criteria of the point
estimators for many standard distributions are described in this chapter.
The set of all admissible values of parameters of a distribution is called the parame-
ter space . Any member from the parameter space is called parameter. For example,a random variable X is assumed to follow a normal distribution with mean andvariance 2 . The parameter space = {(, ) | < < , 0 < 2 < } .Suppose a random sample X1, X2, X3, , Xn is taken on X. Here a statisticT = t(X) from the sample X1, X2, , Xn which gives the best value for the pa-rameter . The particular value of the Statistic T = t(x) = x based on the valuesx1, x2,
, xn is called an estimate of . If the statistic T = X is used to estimate
the unknown parameter , then the sample mean is called an estimator of . Thus anestimator is a rule or a procedure to estimate the value of . The numerical value x iscalled an estimate of .
2.2 Point Estimator
Let X1, X2, , Xn be n independent identically distributed ( iid ) randomsample drawn from a population with probability density function (pdf ) p(x) , . The statistic T = t(X) is said to be a point estimator of , if the func-tion T = t(X) has a single point (X) which maps to in the parameter space .
2.3 Problems of Point EstimationThe problems involved in point estimation are
to select or choose a statistic T = t(X) . to find the distribution function of the statistic T = t(X) . to verify the selected statistic satisfies the criteria of the point estimation .
2.4 Criteria of the Point Estimation
The criteria of the point estimation are
(i) Consistency
(ii) Unbiasedness
(iii) Sufficiency and
(iv) Efficiency
34
A. Santhakumaran
8/6/2019 Probability Models and Their Parametric Estimation
35/237
2.5 Consistency
Consistency is a convergence property of an estimator. It is an asymptotic or large
sample size property. Let X1, X2, , Xn be iid random sample drawn from a pop-ulation with common distribution P, . An estimator T = t(X) is consistentfor if for every > 0 and for each fixed , P{|T| > } as n ,i.e. T
P as n for fixed .Example 2.1 Let X1, X2, , Xn be a random sample drawn from a normal
population with mean and known variance 2 . The statistic T = X is chosen for
an estimator of the parameter . The statistic X N( , 2n ). To test the consistencyof the estimator, consider for every > 0 and fixed ,
P{|X | > } = 1 P{|X | < }= 1 P{ < X < }= 1 P{
n/ 0 and fixed ,P{|X | > } = 1 P{ < X < }
= 1 P{ < X < + }
= 1 +
1
1
1 + (x )2 dx
since X Cauchy distribution with parameter = 1
1
1
1 + z2dz where x = z
= 1 1
[tan1(z)]
= 1 2
tan1() since tan1() = tan1()
35
Probability Models and their Parametric Estimation
8/6/2019 Probability Models and Their Parametric Estimation
36/237
Thus P{|X | > } 0 as n i.e., X
P
as n . For Cauchy population the sample mean X is not aconsistent estimator of the parameter .
2.6 Sufficient condition for consistency
Theorem 2.1 If {Tn}n=1 is a sequence of estimator such that E[Tn] andV[Tn] 0 as n , then the statistic Tn is a consistent estimator of the param-eter .
Consider E[Tn ]2 = E (Tn E[Tn] + E[Tn] )2= E (Tn E[Tn])2 + {E[Tn ]}2= V[Tn] + {E[Tn ]}2
since E (Tn E[Tn]) = 0
By Chebychevs inequality
P{|Tn | > } 12
E[Tn ]2
12
V[Tn] + {E[Tn ]}2
0 as n
since V[Tn] 0 and E[Tn] as n . Tn is a consistent estimator of .
Remark 2.2 The conditions are only sufficient, but not necessary. Since if
{Xn}n=1 is a sequence of iid random variables from a population with finite mean = E[X] , then X converges to in probability for each fixed . It is knownas Khintchins Weak Law of Large Numbers, i.e., sample mean X finitely exists, is aconsistent estimator for the population mean which does not require the conditionV[X] 0 as n for every fixed . Thus consistency follows the ex-istence of the expectation of the statistic and the assumption of finite variance of the
statistic is not needed.
For illustration the Cauchy pdf is
p(x) =
1
11+x2 < x <
0 otherwise
The mean E[X] does not exist finitely, i.e.,
E[X] =
1
x
1 + x2dx
36
A. Santhakumaran
8/6/2019 Probability Models and Their Parametric Estimation
37/237
is divergent. But the Cauchy Principle value
1
lim
t
t
t
x
1 + x2dx =
1
2lim
t
t
t
2x
1 + x2dx
=1
2lim
t
log(1 + x2)tt
=1
2lim
t[log(1 + t2) log(1 + t2)]
= 0
The Cauchy Principle value 0 is taken as the mean of the Cauchy distribution. Thus the
Cauchy distribution has not the mean finitely exist. Hence for the Cauchy population,
the sample mean X is not a consistent estimator of the parameter .Example 2.3 If X1, X2, , Xn is a random sample drawn from a normal popu-
lation N( 0, 2 ). Show that 13nn
k=1 X4k is a consistent estimator of
4.Let T = 13n
nk=1 X
4k .
E4 [T] =1
3n
nk=1
E4 [X4k ]
=1
3n
nk=1
E4 [Xk 0]4 since E[Xk] = 0 k = 1, 2,
=1
3nn4 =
1
3n3n4 since 4 = 3
4 where
2n = 1 3 5 (2n 1)2n n = 1, 2, = 4
V4
[T] =
1
(3n)2
nk=1 V
4
[X
4
]
=1
(3n)2
nk=1
E4 [X
4k ]
2 E4 [X4k ]2
=1
(3n)2n[8 24]
=1
32n[1058 (34)2] since 8 = 1 3 5 7 8
=1
32n968 0 as n .
Thus T is a consistent estimator of 4 .Example 2.4 Let
X1, X2, Xnbe a random sample drawn from a population
with rectangular distribution (0, ), > 0 . Show that (ni=1 Xi) 1n is a consistentestimator of e1 .
37
Probability Models and their Parametric Estimation
8/6/2019 Probability Models and Their Parametric Estimation
38/237
Let GM = (n
i=1 Xi)1n Xi > 0, i = 1, 2, , n .
loge GM =1
n
n
i=1
log Xi
E[log X] =1
0
log xdx
=1
[x log x]0
0
dx
=1
log lim
x0x log x
= log 1
Since limx0
x log x = limx0
log x1x
= limx0
1x
1x2= 0
E[log X]2
=
1
0 (log x)2
dx
=1
[x(log x)2]0
1
0
2xlog x
xdx
= (log )2 1
limx0
x(log x)2 2
[ log ]= (log )2 2log + 2 since lim
x0x(log x)2 = 0
V[log X] = (log )2 2log + 2 (log 1)2 = 1
V[log GM] =1
n2
ni=1
V[log Xi] =1
n
V[log GM]
0 as n
,
> 0
Thus loge GM is a consistent estimator of log 1 , i.e., GM is a consistent estimatorof e1 .
Example 2.5 Let X1, X2, , Xn be iid random sample drawn from a pop-ulation with E[Xi] = and V[Xi] =
2, i = 1, 2, , n. Prove that
38
A. Santhakumaran
8/6/2019 Probability Models and Their Parametric Estimation
39/237
2n(n+1)
ni=1 iXi is a consistent estimator of .
E n
i=1
iXi = E[X1 + 2X2 + + nXn]= + 2 + + n= [1 + 2 + + n]=
n(n + 1)
2
2
n(n + 1)E
n
i=1
iXi
= ,
V
n
i=1
iXi
=
ni=1
i2V[Xi]
= 2n
i=1
i2
= 2n(n + 1)(2n + 1)
6
V
2
n(n + 1)
ni=1
iXi
=
2
3
(2n + 1)
n(n + 1)2 0 as n
Thus 2n(n+1)n
i=1 iXi is a consistent estimator of .
Consistent estimator is not unique
Example 2.6 Let T = max1in{Xi} be the nth order statistic of a randomsample of size n drawn from a population with a uniform distribution on the interval
( 0, ). The pdf of T is
p(t) =
ntn1
n 0 < t < , > 00 otherwise
E[T] =n
n
0
tndt =n
n + 1
E[T2] =
n2
(n + 2), V[T] =
n2
(n + 2)(n + 1)2
Thus E[T] and V[T] 0 as n . T is a consistent estimator of . AlsoE
(n+1)
n T
= and V[(n+1)
n T] =2
n(n+2) 0 as n , i.e., (n+1)n T is
also a consistent estimator of . The statistic T and
(n+1)
n T are the two consistentestimators of the same parameter . Thus consistent estimator is not unique.
39
Probability Models and their Parametric Estimation
8/6/2019 Probability Models and Their Parametric Estimation
40/237
2.7 Invariance Property of Consistent Estimator
If T = t(X) is a consistent estimator of , then anT, T + cn, and anT + cnare also consistent estimators of , where an = 1 +
kn , k and an 1 and
cn 0 as n for every fixed . In general, we have the Theorem 2.2.Theorem 2.2 If Tn = tn(X) is a consistent estimator of () and (()) is acontinuous function of () , then (Tn) is a consistent estimator of (()) .
Proof Given Tn = tn(X) is a consistent estimator () , i.e., TnP () as
n .Therefore for given > 0, > 0 , there exist a positive integer n N(, ) suchthat
P{|Tn ()| < } > 1 n NAlso (.) is a continuous function , i. e., For every such that{|(Tn) (())|} < 1 whenever |Tn ()| < i.e., |Tn ()| < |(Tn) (())| < 1
For any two events A and B if A B , then A B .Therefore P(A) P(B), i.e., P(B) P(A) . Let A = {|Tn ()| < } andB = {|(Tn) (())| < 1} thenP{(Tn) (())| < 1} P{|Tn ()| < }i.e.,P{|(Tn) (())| < 1} 1 n N (Tn) P (()) asn .i.e.,(Tn) is a consistent estimator of (())
Example 2.7 Suppose T = t(X) is a statistic with pdf p(x) for > 0, .Prove that T2 = t2(X) is a consistent estimator of 2 , if T = t(X) is a consistentestimator of .
Given T = t(X) is a consistent estimator of .By the definition of consistent estimator, P{|T | < } 1 as n , for >0,
, consider
P{|T | < } = P{ < T < + }= P{( )2 < T2 < ( + )2}= P{2 < T2 2 2 < 2}= P{ < T2 2 2 < }
where = 2
= P{ < T 2 < }where T = T2 2
= P{|T 2| < } 1 as n
T = T2 2 T2 as n since 0 as n ... P{|T2 2| < } 1 as . Thus T2 is a consistent estimator of 2 .
40
A. Santhakumaran
8/6/2019 Probability Models and Their Parametric Estimation
41/237
2.8 Unbiased Estimator
For any statistic g(T) , if the mathematical expectation is equal to a parameter () ,then g(T) is called an unbiased estimator of the parameter () ,
i.e., E[g(T)] = (), .
Otherwise, the statistic g(T) is said to be a biased estimator of () . The unbiasedestimator is also called zero bias estimator. A statistic g(T) is said to be asymptoticallyunbiased estimator if E[g(T)] () as n , .
Example 2.8 A random variable X has the pdf
p(x) =
2x if0 < x < 1(1 ) if1 x < 2, 0 < < 10 otherwise
Show that g(X) , a measurable function of X is an unbiased estimator of if and
only if 10 xg(x)dx =
1
2and 2
1 g(x)dx = 0.Assume g(X) is an unbiased estimator of , i.e.,
E[g(X)] = 10
g(x)2xdx +
21
g(x)(1 )dx =
10
2xg(x)dx 2
1
g(x)dx
+
21
g(x)dx =
1
0
2xg(x)dx 2
1
g(x)dx = 1 and
2
1
g(x)dx = 0
i.e.,
10
xg(x)dx =1
2and
21
g(x)dx = 0
Conversely,1
0xg(x)dx = 12 and
21
g(x)dx = 0, then g(X) is an unbiased esti-mator of .
E[g(X)] =
10
2xg(x)dx +
21
(1 )g(x)dx
= 2 1
0
xg(x)dx + (1 ) 2
1
g(x)dx
= 21
2+ (1 ) 0
=
41
Probability Models and their Parametric Estimation
8/6/2019 Probability Models and Their Parametric Estimation
42/237
Thus g(X) is an unbiased estimator of .Example 2.9 If T denotes the number of successes in n independent and identicaltrials of an experiment with probability of success . Obtain an unbiased estimator of2 and (1
), 0 < < 1.
Let Xi b(1, ), i = 1, 2, , n , then T = ni=1 Xi b(n, ) . If g(T) isthe unbiased estimator of () = (1 ) , then E[g(T)] = (1 )
nt=0
g(t)cnt t(1 )nt = (1 )
nt=0
g(t)cnt
1 t
= (1 )1n
Consider =
1 =
1 +
...
n
t=0
g(t)cnt t =
1 + 1
1 + 1n
= (1 + )n2
= [1 + cn21 + cn32
2 + + n]Equating the coefficient oft on both sides
g(t)cnt = cn2t1
g(t) =(n 2)!
(t 1)!(n t 1)!t!(n t)!
n!
=(n 2)!t(t 1)!(n t)(n t 1)!(t 1)!n(n 1)(n 2)!(n t 1)!
=t(n t)
n(n 1) , if n = 2, 3,
Thus the unbiased estimator of (1 ) isT(n T)n(n 1) n = 2, 3,
42
A. Santhakumaran
8/6/2019 Probability Models and Their Parametric Estimation
43/237
Let the unbiased estimator of 2 be given by
E[g(T)] = 2
nt=0
g(t)cnt
1 t
(1 )n = 2n
t=0
g(t)cnt t = 2(1 + )n2
= 2[1 + cn21 + + cn2t t + + n2] g(t)cnt = c
n2t2
g(t) = (n 2)!t!(n t)!(t 2)!(n t)!n!
=(n 2)!t(t 1)!(t 2)!(t 2)!n(n 1)(n 2)!
=
t(t
1)
n(n 1) n = 2, 3,
Thus the unbiased estimator of 2 is
g(T) =T[T 1]n(n 1) n = 2, 3,
Example 2.10 Obtain an unbiased estimator of 1 , given a sample observation from
a Geometric population with pmf
p(x) =
(1 )x1 x = 1, 2, 3, , 0 < < 10 otherwise
E[g(X)] =1
x=1
g(x)(1 )x1 = 1
x=1
g(x)(1 )x = (1 )2
Take 1 = = 1
x=1
g(x)x = (1 )2
= (1 + 2 + 32 +
+ xx1 +
)
g(x) = x x = 1, 2, 3,
Thus g(X) = X is the unbiased estimator of 1 .
43
Probability Models and their Parametric Estimation
8/6/2019 Probability Models and Their Parametric Estimation
44/237
Unbiased estimator is not exist
Example 2.11 Assume X b(1, ), 0 < < 1. If a single observation x of Xfrom a Bernoulli population, then there is no unbiased estimator exist for 2 .
p(x) =
(1 )1x x = 0, 1 and 0 < < 10 otherwise
Let there be an unbiased estimator for 2 say g(X) . That is,
E[g(X)] = 2
1x=0
g(x)x(1 )1x = 2
g(0)(1 ) + g(1) = 2
[g(1) g(0)] + g(0) = 2 g(1) = 0 and g(0) = 0 i.e., g(x) = 0 for x = 0, 1.Thus the value of 2 is 0 for x = 0 or x = 1 . But the value of 2 lies between 0 to1. The unbiased estimator of 2 does not exist.
Example 2.12 If X b(n, ) , then show that there exist no unbiased estimatorof the parameter 1
Consider E[g(X)] =1
n
i=0
g(x)n!
x!(n x)!
1 x
(1 )n = 1
ni=0
g(x)n!
x!(n x)! x =
(1 + )n+1
where = 1g(x) n!x!(nx)!
x g(0) as 0 and (1+)n+1 as 0 or 0Thus there is no unbiased estimator exist of the parameter 1 .
Unbiased estimator is unique
Example 2.13 A random sample X is drawn from a Bernoulli population b(1, ), ={ 14 , 12} . Then there exists an unique unbiased estimator of 2 .
Let E[g(X)] = 2
1x=0
g(x)x
(1 )1
x
= 2
When =1
4 3g(0) + g(1) = 1
4(2.1)
44
A. Santhakumaran
8/6/2019 Probability Models and Their Parametric Estimation
45/237
When =1
2 g(0) + g(1) = 1
2(2.2)
Solving the equations (2.1) and (2.2) for g(0) and g(1) , one gets the values of g(0) =
1
8
and g(1) = 5
8
,
i.e., g(x) =
18 for x = 058 for x = 1
Thus the unbiased estimator of 2 is g(X) = X which is unique.
Unbiased estimator is not unique
Example 2.14 Let X1, X2, , Xn be a iid random sample drawn from a popu-lation with Poisson distribution P() . g1(X) = X and g2(X) =
1n
ni=1(Xi X)2
are the two unbiased estimators of . Consider a statistic g(X) = g1(X) + (1 )g2(X), , 0 < < 1 . Then E[g(X)] = and which is notunique. Thus unbiased estimator is not unique.
Example 2.15 Show that the mean X of a random sample of size n drawn from
a population with probability density function
p(x) =
1
ex 0 < x < , > 0
0 otherwise
is an unbiased estimator of and has variance 2
n .
Let T =n
i=1 Xi G(n, ). The pdf of T is
p(t) =
1
nn e t tn1 0 < t < , > 0
0 otherwise
E[T] =
0
1
nne
1 ttn+11dt
= n
E
n
i=1
Xi
= n > 0
E[nX] = n > 0 E[X] = > 0
E[T2] = n(n + 1)2 > 0
V[T] = n2 > 0
... V[X] = V
ni=1 Xi
n
=1
n2V[T]
=1
n2n2 =
2
n
45
Probability Models and their Parametric Estimation
8/6/2019 Probability Models and Their Parametric Estimation
46/237
Example 2.16 Let X1, X2, , Xn be a random sample drawn from a normal pop-ulation with mean zero and variance 2, 0 < 2 < . Show that
ni=1 X
2i
n is an
unbiased estimator of 2 and has variance 24
n .
Define ns2 =n
i=1 X2i , then Y = ns2
2 2 distribution with n degreesof freedom , i.e., Y G( n2 , 12 ).
p(y) =
1
2n2 n2
e12 yy
n2 1 0 < y <
0 otherwise
E[Y] =
0
1
212 n2
e12 yy
n2 +11dy
=1
2n2 n2
( n2 + 1)
( 12 )n2 +1
= n
E[Y2] = n2 + 2nV[Y] = 2n
But Y =ns2
2
... E2
ns2
2
= n
E2 [s2] = 2
Thus
X2in is an unbiased estimator of
2 .
V2 ns2
2 = 2nn2
4V2 [s
2] = 2n
V2 [s2] =
24
n
Example 2.17 Let Y1 < Y2 < Y3 be the order statistics of a random sample ofsize 3 drawn from an uniform population with pdf
p(x) =
1 0 < x < 0 otherwise
Show that 4Y1 and 2Y2 are unbiased estimators of . Also find the variance of theseestimators.
The pdf of Y1 is
p(y1) =
3!
1!2!1
y1
1 dx
20 < y1 <
0 otherwise
46
A. Santhakumaran
8/6/2019 Probability Models and Their Parametric Estimation
47/237
p(y1) =
3 [1 y1 ]2 0 < y1 < 0 otherwise
E[Y1] = 3
0y1(1 y1
)2dy1
=3
10
t(1 t)2dt where y1 = t
= 3
10
t21(1 t)31dt
= 323
5=
4 > 0
Similarly E[Y2
1 ] =2
10and V[Y1] =
32
15
... V[4Y1] =32
5
The pdf of Y2 is
p(y2) =3!
1!1!1!
y20
1
dx
1
y2
1
dx
p(y2) =
6
2 y2[1 y2 ] 0 < y2 < 0 otherwise
.. E[Y2] =2
2Y2 is an unbiased estimator of and E[Y2] = 3210 and V[Y2] = 2
20
V[2Y2] = 25Example 2.18 Let Y1 and Y2 be two independent and unbiased estimators of .
If the variance of Y1 is twice the variance of Y2 , find the constant k1 and k2 so thatk1Y1 + k2Y2 is an unbiased estimator of with smaller possible variance for such alinear combination.
Given E[Y1] = and E[Y2] = and V[Y1] = 22 and
47
Probability Models and their Parametric Estimation
8/6/2019 Probability Models and Their Parametric Estimation
48/237
V[Y2] = 2 . Also E[k1Y1 + k2Y2] =
k1E[Y1] + k2E[Y2] =
k1 + k2 = 1
i.e., k2 = 1 k1Consider = V[k1Y1 + k2Y2]
= k21 V[Y1] + k22V[Y2]
= k21 22 + (1 k1)22
= 3k212 2k12 + 2
Differentiate twice this with respective to k1d
dk1= 6k1
2 22
d2
dk21= 62
For minimum ddk1
= 0 and d2
dk21
> 0
6k12 22 = 0i.e., k1 =
1
3and k2 =
2
3
Thus 13 Y1 +23 Y2 has minimum variance.
Consistent estimator need not be unbiased
Example 2.19 Let X1, X2, , Xn be a sample of size n drawn from a normalpopulation with mean and variance 2 . Define s2 = 1n
ni=1(Xi X)2 , then
Y = ns2
2 2 distribution with (n 1) degrees of freedom and Y G( n12 , 12 ) .It has the pdf
p(y) =
1
2n1
2 n12e
12 yy
n12 1 0 < y <
0 otherwise
48
A. Santhakumaran
8/6/2019 Probability Models and Their Parametric Estimation
49/237
E[Yr] =
0
1
2n1
2 n12e
12 yy
n12 +r1dy
=1
2n
12 n12
n1
2 + r( 12 )
n
12 +r
=2r
n12
n 1
2+ r
When r = 1
E[Y] =2
n12
n 12
n 1
2= n 1
... E2
ns2
2
= n 1
E2 [s2] = n 1n 2
and V2 [s2] =
2(n 1)n2
4
Thus E2 [s2] 2 and V2 [s2] 0 as n
.. 1n
ni=1(Xi X)2 is a consistent estimator of 2 .
But E2 [s2] = 2. .. 1n
ni=1(Xi X)2 is not an unbiased estimator of 2 .
Example 2.20 Illustrate with an example that an estimator is both consistent and
unbiased.
Let X1, X2, , Xn be a random sample of size n drawn from a normalpopulation with mean and variance 2 . Define s2 = 1n
ni=1(Xi X)2 and
S2 = 1n1n
i=1(Xi X)2 , then Y = ns2
2 2 distribution with (n 1) degreesof freedom and Y G( n12 , 12 ) . with E2 [s2] = n1n 2 and V2 [s2] = 2(n1)n2 4 .
(n 1)S2
= ns
2
S2
=
n
n 1 s2
E2 [S2] =
n
n 1 E2 [s2]
=n
n 1n 1
n2 = 2
V2 [S2] =
n2
(n 1)2 E2 [s2]
=n2
(n 1)22(n 1)
n24
=24
(n 1) 0 as
Thus S2 = 1n1n
i=1(Xi X)2 is consistent and also unbiased estimator of 2 .Example 2.21 Give an example that an unbiased estimator need not be consistent.
Let X1, X2, , Xn be a random sample drawn from a normal populationwith mean and known variance 2 , then the estimator X1 ( first observation) of the
49
Probability Models and their Parametric Estimation
8/6/2019 Probability Models and Their Parametric Estimation
50/237
sample is unbiased but not consistent. Since E[X1] = and V[X1] = 2
and
P
{|X1
|<
}= P
{ < X1
<
}= P{ < X1 < + }
=12
+
e1
22(x1)2 dx1
1 as n
. . X1 is not consistent but unbiased estimator of .Example 2.22 Give an example that an estimator is not consistent and not unbi-
ased.
Let Y1 < Y2 < Y3 be the order statistics of a random sample of size 3 drawnfrom a uniform population with pdf for given is
p(x) = 1 0 < x < 0 otherwise
then Y1 is not consistent and not unbiased estimator of , since
E[Y1] =
4= and
P
Y1 4 <
= P
4 < Y1 <
4+
=3
4 +
4
1 y1
2dy1
1 as n
Thus Y1 the first order statistic is not consistent and not unbiased estimator of .
2.9 Sufficient Statistic
Sufficient statistic conveys as much as information about the distribution of a ran-
dom variable which is contained in the sample. It helps to identify a family of distribu-
tions only and not for the parameters of the distributions.
Definition 2.1 Let X1, X2, , Xn be a random sample of size n drawn from apopulation with pdf p(x | ). Let T = t(X) be a statistic whose pdf is p(t) . Fora continuous random variable X, T = t(X) is said to be a sufficient statistic iff
p(x1, x2, , xn)p(t)
is independent of for every given T = t . Similarly for a discrete random variableX , T = t(X) is said to be a sufficient statistic iff
P {X1 = x1, X2 = x2, | T = t}
50
A. Santhakumaran
8/6/2019 Probability Models and Their Parametric Estimation
51/237
is independent of for every given T = t .Example 2.23 Let X be a single observation from a population with pmf
p(x), 0 < < 1 .
p(x) =
|x|(1)|
x|
2 x = 1, 11 (1 ) x = 00 otherwise
Show that |X| is sufficient.Let Y = |X| . Then P{Y = 0} = P{|X| = 0} = P{X = 0} = 1 (1 )
P{Y = 1} = P{|X| = 1} = P{X = 1orX = 1} = P{X = 1} + P{X = 1} =(1 )Consider
P{X = 1 | Y = 1} = P{X = 1 Y = 1}P{Y = 1}
=X = 1 |X| = 1}
P{Y = 1}=
P{X = 1}P{Y = 1}
=(1)
2
(1 ) =1
2is independent of
Therefore Y = |X| is sufficient.Example 2.24 Let X1, X2, , Xn be independent random sample drawn from apopulation with pdf
p(x) =
eix x > i, i = 1, 2, 3 , n0 otherwise
Show that T = min1inXi
i is a sufficient statistic.Let y = xii , then dx = idy
Given p(x) = ei[xi ]
i.e., p(y) = iei[y], y >
Take T = min1in Yi . The pdf of T is
p(t) =n!
1!(n 1)! iei[t]
t
ieiiydyn1
= ineninit < t < p(x1, x2, , xn)
p(t)=
ein
xi
nieninit
=1
ni
enitxi
It is independent of . Thus T = min1in Yi = min1in Xii is sufficient.Example 2.25 Let X1 and X2 be iid Poisson random variables with parameter
. Prove that
51
Probability Models and their Parametric Estimation
8/6/2019 Probability Models and Their Parametric Estimation
52/237
(i) X1 + X2 is a sufficient statistic.
(ii) X1 + 2X2 is not a sufficient statistic.
(i) Given that
P{X1 = x1} =
e x1x1!
x1 = 0, 1, 2, 0 otherwise
and P{X2 = x2} =
e x2x2!
x2 = 0, 1, 2, 0 otherwise
Let T = X1 + X2 , then
P{T = t} =
e tt! t = 0, 1, 2,
0 otherwise
Consider P{X1 = x1, X2 = x2 | T = t} =P
{X1 = x1, X2 = t
x1
}P{T = t}=
P{X1 = x1}P{X2 = t x1}P{T = t}
=
e x1x1!
e tx2(tx2)!
e2(2)tt!
=t!
(t x1)!x1!2t is independent of.
.. X1 + X2 is a sufficient statistic.
(ii) Consider P{X1 + 2X2 = 2} = P{X1 = 0, X2 = 1}
+ P{X1 = 2, X2 = 0}= P{X1 = 0}P{X2 = 1}+ P{X1 = 2}P{X2 = 0}= e2 +
2
2e2
= e2[1 +
2]
Therefore P{X1 = 0, X2 = 1 | X1 + 2X2 = 2} = P{X1 = 0, X2 = 1}P{X1 + 2X2 = 2}
=e2
e2[1 + 2 ]
=2
2 + depends on .
.. X1 + 2X2 is not a sufficient statistic.Example 2.26 Let X1 and X2 be two independent Bernoulli random variables such
52
A. Santhakumaran
8/6/2019 Probability Models and Their Parametric Estimation
53/237
that P{X1 = 1} = 1 P{X1 = 0} = , 0 < < 1 and P{X2 = 1} =1 P{X2 = 0} = 2, 0 < 12 . Show that X1 + X2 is not a sufficient statistic.
Let T = X1 + X2. Consider
P{T = 1} = P{X1 + X2 = 1}= P{X1 = 0, X2 = 1} + P{X1 = 1, X2 = 0}= (1 )2 + (1 2)= (3 4)
..P{X1 = 0 | X1 + X2 = 1} = P{X1 = 0 X1 + X2 = 1}P{X1 + X2 = 1}
=P{X1 = 0, X2 = 1}
P{X1 + X2 = 1}=
(1 )2(3 4)
=2(1
)
(3 4) is dependent on .
. . X1 + X2 is not a sufficient statistic.Example 2.27 If X1 and X2 denote a random sample drawn from a normal popula-tion N( , 1 ), < < . Show that T = X1 + X2 is a sufficient statistic.
The joint pdf of X1 and X2 is
p(x1, x2) = p(x1)p(x2)
=1
2e
12 (x1)2 12 (x2)2
Let T = X1 + X2 N(2, 2)
p(t) =
122 e1
4 (t2)2
< t < 0 otherwise
The definition of sufficient statistic gives
p(x1, x2)
p(t)=
12 e
12 [x21+x222(x1+x2)+22]
12
e
14 [t
24t+42]
=1
e12 (x
21+x
22)+(x1+x2)2
e14 (x1+x2)
2+(x1+x2)2
=1
e12 (x
21+x
22)+
14 (x1+x2)
2
is independent of.
. . T = X1 + X2 is a sufficient statistic.
Example 2.28 Let X1, X2, X3 be a sample from B(1, ) . Show that X1X2 +X3is not sufficient.
53
Probability Models and their Parametric Estimation
8/6/2019 Probability Models and Their Parametric Estimation
54/237
Let Y = X1X2 and T = X1X2 + X3 , then
P{Y = 0} = P{X1 = 0X2 = 0} + P{X1 = 1, X2 = 0} + P{X1 = 0, X2 = 1}= (1
)2 + (1
) + (1
)
= 1 2
P{Y = 1} = P{X1 = 1, X2 = 1}= 2
P{Y + X3 = 1} = P{Y = 0, X3 = 1} + P{Y = 1, X3 = 0}= (1 2) + 2(1 )
i.e., P{T = 1} = (1 )(1 + 2)Consider
P{Y = 1 | T = 1} = P{Y = 1, T = 1}P{T = 1}
=P{Y = 1}P{X3 = 0}
P{T = 1}=
2
(1 )(1 + 2)=