1 Lecture Slides by Dr. Muhammad Hanif Mian for Workshop on Recent Developments in Survey Sampling (August 26-27, 06) AN ALTERNATIVE ESTIMATOR FOR Y Since repetition of the observation of a repeated unit in a sample selected with srswr does not provide additional information for estimating Y , the mean of the values of the distinct units in a sample of n units may be considered as an alternative estimator. That is, if / / / 1 2 , ,..., d y y y denote the values of the distinct units in a simple random sample of n units selected with replacement ( ) d n ≤ , then the suggested alternative estimator is / / 1 1 d i i y Sy d = = (3.21) This estimator is unbiased for Y and is more efficient than the sample mean y , namely, 1 1 1 1 , n n i i i i i y y ry n n = = ′ = = ∑ ∑ where i r of is the number of repetitions of the i-th distinct unit and 1 . d i S ri n = = The variance of / y can be obtained by nothing that in this case two stages of randomization are involved: (i) d is a random variable taking values 1 to n with certain probabilities, and (ii) selection of the d distinct units from
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Lecture Slides by Dr. Muhammad Hanif Mian for Workshop on Recent Developments in Survey Sampling
(August 26-27, 06) AN ALTERNATIVE ESTIMATOR FOR Y Since repetition of the observation of a repeated unit in a sample selected with srswr does not provide additional information for estimatingY , the mean of the values of the distinct units in a sample of n units may be considered as an alternative estimator. That is, if
/ / /1 2, ,..., dy y y denote the values of the distinct units in a
simple random sample of n units selected with replacement ( )d n≤ , then the suggested alternative estimator is / /
1
1 d
iiy S y
d == (3.21)
This estimator is unbiased for Y and is more efficient than the sample mean y , namely,
1 1
1 1 ,n n
i i ii i
y y r yn n= =
′= =∑ ∑
where ir of is the number of repetitions of the i-th distinct
unit and 1
.d
iS ri n=
=
The variance of /y can be obtained by nothing that in this case two stages of randomization are involved: (i) d is a random variable taking values 1 to n with certain probabilities, and (ii) selection of the d distinct units from
2
N units with equal probability without replacement and applying the formula of simple random sampling, we get
( )/ 21 1 ,1
NVar y Ed N N
σ⎧ ⎫⎛ ⎞= −⎨ ⎬⎜ ⎟ −⎝ ⎠⎩ ⎭ (3.22)
Where 1 1 11 1 2 .......n n n
n
NEd N
− − −+ + +⎛ ⎞ =⎜ ⎟⎝ ⎠
. Neglecting terms of
degree greater than 21
N⎛ ⎞⎜ ⎟⎝ ⎠
in (3.22), we get
( )/ 22
1 1 12 12 1
n NVar yn N N N
σ−⎛ ⎞= − +⎜ ⎟ −⎝ ⎠ (3.23)
An unbiased estimator of ( )/Var y is given by
( )/ 21 1 1var ,dn
Ny sd N N N
⎧ ⎫−⎛ ⎞= − +⎨ ⎬⎜ ⎟ −⎝ ⎠⎩ ⎭ (3.24)
Where 2 0ds = for 1d = and ( )22 / /
1
11
d
d iis S y y
d == −
−for 2.d ≥
The second term in the curly brackets in (3.24), namely ( )( )1 ,nN N N− − is likely to be negligibly small compared to the first term and hence the variance estimator may be taken as
( )/ 21 1var .dy sd N
⎛ ⎞= −⎜ ⎟⎝ ⎠
(3.25)
It may be noted that if N is considerably larger than n, then the chance of repetition of a unit in the sample will be small and hence the gain in using /y , instead of y will be only marginal. The results mentioned in this section have been discussed in detail by Basu (1958), Raj and Khamis (1958) and Pathak (1962.)
3
UNBIASED RATIO ESTIMATOR We have seen that under simple random sampling, classical (conventional) ratio estimator is biased. Lahiri (1951) suggested that classical ratio estimator can be made unbiased if the selection procedure is changed. Midzuno (1950) and Sen (1951) proved the same result. Lahiri suggested that the first unit was selected with probability proportional to the aggregate of the size (PPAS)
or with probability proportional to 1
N
ii
X=∑ , and the remaining n – 1 units
with equal probability and without replacement. Midzuno (1951) simplified this procedure as “the first unit is selected with probability proportional to Xi (measure of size), and the remaining (n – 1) units like Lahiri (1951)”. This idea was introduced by Ikeda (1950) – reported by Midzuno (1951). This sampling scheme has striking resemblance to the simple random sampling without replacement. In fact, it may be viewed as a generalization of the simple random sampling when extra information on the population is available. Let we have a population of N units. The probability that ith unit is first one to be selected and subsequent (n – 1) units with equal probability and without replacement is
1
111
iN
ii
xNXn=
−⎛ ⎞⎜ ⎟−⎝ ⎠
∑ .
The probability that jth unit is first one to be selected and subsequent (n – 1) draws with equal probability and without replacement
1
111
jN
ii
xNXn=
−⎛ ⎞⎜ ⎟−⎝ ⎠
∑,
and so on the probability P(s) for the two selections are therefore
1
1( )11
i jN
ii
x xP s
NXn=
+=
−⎛ ⎞⎜ ⎟−⎝ ⎠
∑.
Since there are n such selection therefore the probability of the selection of the sample will be
4
1 1( )11
n
ii
xP s
NXn
==−⎡ ⎤
⎢ ⎥−⎣ ⎦
∑ . (6.6.1)
1xNXn
=⎛ ⎞⎜ ⎟⎝ ⎠
. (6.6.2)
The classical ratio estimator is
1
1
n
iin
ii
yy X
x
=
=
′′ =∑
∑ (6.1.3)
THEOREM (6.2): Classical ratio estimator is unbiased under Ikeda-Midzuno- Sen –Lahiri selection procedure with variance
property and will be referred to as Ratio Estimator Property. THEOREM ( 6.3) The mean of ratio estimator is an unbiased with variance
1 2
2( )N yVar y X Yn x
−⎛ ⎞ ⎡ ⎤′′ ′= Σ −⎜ ⎟ ⎢ ⎥
⎣ ⎦⎝ ⎠ (6.6.5)
PROOF Taking the expectation of (6.1.4) we get
6
( ) yE y XEx
⎡ ⎤′′ = ⎢ ⎥⎣ ⎦ ( )y P s X
x⎡ ⎤′= Σ ⎢ ⎥⎣ ⎦
Using (6.6.2) we have
1( ) y xE y XNx Xn
⎡ ⎤⎢ ⎥⎢ ⎥′′ ′= Σ⎢ ⎥⎛ ⎞⎢ ⎥⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦
Nn
y YC⎡ ⎤′= Σ =⎢ ⎥⎣ ⎦
◊ (6.6.5)
Proceeding by the same way as before we can derive the variance expression of y ′′ , i.e.
1 2
2( )N yVar y X Yn x
−⎛ ⎞ ⎡ ⎤′′ ′= Σ −⎜ ⎟ ⎢ ⎥
⎣ ⎦⎝ ⎠ (6.6.6)
THEOREM (6.4) An unbiased estimator of )y(Var ′′ is
2
2
1 1 1
1ar( )( 1)
n n ni
i ji i j
y X N Xv y y y yNn x Nn n x= = =
−′′ ′′= − −−∑ ∑ ∑ (6.6.7)
2 2 2y
X N ny y sx Nn
−⎡ ⎤′′= − −⎢ ⎥⎣ ⎦ (6.6.8)
PROOF It may be proved that [ ar( )] ( )E v y Var y′′ ′′= . For this
2 2
1 1
( )n n
i i
i i
y yX XE P sNn x Nn x= =
′⎡ ⎤ ⎡ ⎤=⎢ ⎥ ⎢ ⎥
⎣ ⎦ ⎣ ⎦∑ ∑ ∑
= 2
/ 1iy X xNN x Xn
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎛ ⎞⎢ ⎥⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦
∑ 2
1
1 N
ii
YN =
= ∑ (6.6.9)
and
1 1
1 1( 1) ( )( 1) ( 1)
n n
i j i ji j
j i
N x NE y y n n E y ynN n X n n N= =
≠
⎡ ⎤− −⎢ ⎥ = −⎢ ⎥− −
⎢ ⎥⎣ ⎦∑∑
7
1 1
21 1
1 1( 1)
N N
i ji j N N
j ii j
i jj i
YYN YY
N N N N
= =≠
= =≠
−= =
−
∑∑∑∑ (6.6.10)
Hence
2 2( ) ( )E y Y Var y′′ ′′= − = Similarly we can show that an unbiased estimator of population total will be
2 2 2ar( ) yN X N nv y y y s
x Nn−⎡ ⎤′′ ′′= − −⎢ ⎥⎣ ⎦
RATIO ESTIMATOR AS MODEL-UNBIASED Consider all estimators y′ of Y that are linear functions of sample values yi, that are of the form
1
,n
i ii
y c y=
′ = ∑ (6.8.1)
where the ic does not depend on syi′ though they may a function xi. The choice of the ic s′ restricted to those that give unbiased estimation of Y. The estimator with the smallest variance is called best linear unbiased estimator. The model is:
2 2 2
,( ) 0, ( , ) 0,
1( ) 12
i i i
i i j
i i i
y xwhere E Cov
and Var X γ
β εε ε ε
ε σ σ γ
⎫⎪= +⎪⎪= = ⎬⎪⎪= = ≤ ≤⎪⎭
(6.8.2)
where εi are independent of the xi and xi are > 0. The xi (i = 1, 2, …… N) are known. The model is the same that was employed by Cochran (1953), which appears to have been originated by H.F. Smith (1938). Useful references to this model are Cochran (1953, 63, 77), Brewer (1963b), Godambe and Joshi (1965), Hanif (1969) Foreman and Brewer (1971), Royall (1970). (1975),Brewer and Hanif(1983) Cassel, et al (1976), Isaki and Fuller (1982), Hansen, Madow and Tepping (1983), Samiuddin et al (1992)and many others.
8
Brewer (1963b) defined an unbiased ratio estimator under model (6.8.2). He used the concept of unbiased ness which was different from that given in randomization (design - based) theory. Royall (1970) also used this model. Brewer and Royall regarded an estimator y′ (estimated population total) is unbiased if ( ) ( )YEyE =′ in repeated selections of the finite population and sampled under the model. Under model (6.8.2) Brewer (1963b) proved that the classical ratio estimate was model – unbiased and is best linear unbiased estimator for any sample [random or not] selected solely according to the values of the Xi. This result hold goods if the following line conditions are satisfied; (i) The relation between estimated (yi) and benchmark (xi) is linear and
passes though the origin. (ii) The Var(yi) about this line is proportional to xi. THEOREM (6.6): Under the model (6.8.2) classical ratio estimator is unbiased with variance
1 22( )
N yVar y X Yn x
−⎛ ⎞ ⎡ ⎤′′ ′= Σ −⎜ ⎟ ⎢ ⎥
⎣ ⎦⎝ ⎠= X
xn)xnX( −λ
= (6.8.3)
PROOF: We know that
1
n
i ii
y c y=
′ = ∑ (8.6.4)
Using model (6.8.2) we have
[ ]1 1 1
n n n
i i i i i i ii i i
y c x c x cβ ε β ε= = =
′ = + = +∑ ∑ ∑
Since E(εi) = 0 we then have
1 1
( ) ( )n n
i i i ii i
E y c x c Eβ ε= =
′ = +∑ ∑1
n
i ii
c xβ=
= ∑ (6.8.5)
We also know that i i iY Xβ ε= + or X)Y(E β= (6.8.6) Now
1 1 1
[ ] ( ) ( )N N N
i i i i ii i i
E y Y c x c E X Eβ ε β ε= = =
′ − = + − −∑ ∑ ∑
9
1 1
0n n
i i i ii i
c x X If c x Xβ= =
⎡ ⎤= − = =⎢ ⎥⎣ ⎦∑ ∑ (6.8.7)
Therefore we say that /y is model unbiased if
1
n
i ii
c x X=
=∑ (6.8.8)
The variance expression of /y , i.e.
[ ]22( ) ( ) ( )Var y E y E y′ ′ ′= − (6.8.9)
2 2 2 2 2 2 2
1 1
( ) ( ) 2 ( )n n n
i i i i i i ii i
E y c x c E c x Eβ ε β ε= =
′ = + +∑ ∑ ∑
Using the condition of model we will have:
2( )E y′ 2 2 2 2
1 1
( )n n
i i i ii i
c x c Varβ ε= =
= +∑ ∑ (6.8.10)
Using (6.8.2), (6.8.5) and (6.8.9) in (6.8.9), we will have
( )2
1
( )n
i ii
Var y c Varλ ε=
′ = ∑ (6.8.11)
Let us for simplicity we assume ( )i iVar xε λ= then (6.8.11) will be:
2
1
( )n
i ii
Var y c xλ=
′ = ∑ (6.8.12)
We can minimize /( )Var y w.r.t. ci. For this the Lagrange’s multiplier will be
2
1 1
n n
i i i ii i
c x c x Xφ λ µ= =
⎡ ⎤= − −⎢ ⎥⎣ ⎦∑ ∑
Differentiating unconditionally with respect to ci, we get.
2 0i i ii
c x xaφ λ µ∂= − =
∂
or 2ic Cµλ
= = (constant)
We know from (6.8.7) that 1
n
i ii
c x X=
=∑
or
1
,n
i ii
Xc x X or c cn x=
= = =∑
10
Hence /
1
n
i ii
y c y=
=∑ 1
1
1
n
ini
i ni
ii
yX y X y
n x x
=
=
=
′′= = =∑
∑∑
The best linear unbiased estimator yy/ ′′= , which is a classical (conventional) ratio estimator. For the derivation of )y(Var ′′ we proceed as follows:
1 1 1
n n N
i i i i ii i i
y Y c x c Xβ ε β ε= = =
′′ − = + − −∑ ∑ ∑
Since 1
n
i ii
c x X=
=∑ and 1 1
n N
i i ii i
X Xc then y Yn x n x
ε ε= =
′′= − = −∑ ∑
Divide ∑=ε
N
1ii into sample and non-sample values we have
1 1
n N
i i ii i
X Xa then y Yn x n x
ε ε= =
′′= − = −∑ ∑
or 1 1
( ) 1n N n
i ii i
Xy Yn x
ε ε−
= =
⎛ ⎞′′ − = − −⎜ ⎟⎝ ⎠
∑ ∑
Squaring and taking the expectation
( ) ( ) ( ) ( )2
2 2 2
1 1
1n N n
i ii i
XE y Y Var y E Enx
ε ε−
= =
⎛ ⎞′′ ′′− = = − +⎜ ⎟⎝ ⎠
∑ ∑
( ) ( ) ( )2
1 1
1 var varn N n
i ii i
XVar ynx
ε ε−
= =
⎛ ⎞′′ = − +⎜ ⎟⎝ ⎠
∑ ∑
Substituting the value of Var(xi), we have:
( )
( )
2
1
2
1 1
2
1N n
ii
N n
i i ii i
X nxVar y xnx
X nx x X xnx
X nx nx X nxnx
λ λ
λ λ
λ λ
−
=
= =
−⎛ ⎞′′ = − +⎜ ⎟⎝ ⎠
− ⎡ ⎤⎛ ⎞= + −⎜ ⎟ ⎢ ⎥⎝ ⎠ ⎣ ⎦
−⎛ ⎞= + −⎜ ⎟⎝ ⎠
∑
∑ ∑ ∑
2
2
( )( ) ( )( )X nxVar y nx X nxn x
λ λ−′′ = + − ( )X nx X
n xλ −
= (6.8.3)
11
Using all these assumptions a model-unbiased estimator λλ ofˆ from the sample may be easily proved as
2
1
1 1ˆ ( )1
n
i ii i
y r xn x
λ=
= −− ∑ . (6.8.13)
Putting this value of λ̂ in (6.8.4) a model-unbiased variance estimator is
2
1
( ) 1 1( ) ( )1
n
i ii i
X nx XVar y y rxn x n x=
−′′ = −− ∑ (6.8.14)
This model based unbiased estimator is not only superior to /y but is the best of a whole class of estimators. For details see Brewer (1963b, 1979), Royall (1970), Royall and Herson (1973) and Samiuddin et.al. (1978). 6.9 COMPARISON y ′′ AND /y UNDER STOCHASTIC MODEL It is an established fact that the choice of a suitable sample plan is central to the design of a sample survey. Sample design can be regarded as comprising separate selection and estimation procedures, but the choices of these are so interdependent that they must be considered together for virtually all purposes. Some times the nature of the sample plan is determined by circumstances, but usually the designer is faced with a choice, and frequently it is obvious which of a number of possible plan will be most efficient in terms of minimum sample error for given cost( or vice versa). Standard sampling theory using imputed values for such quantities as the means, variances, and correlation coefficient of the (finite) population, or strata or clusters within it, can often indicate which design is most efficient. Sometimes, however, this is not so. A well-known example is the comparison between classical ratio estimation using unequal probabilities. To obtain a straight forwarded answer in this case, Cochran (1953) made use of a certain super population (6.8.2) which is intuitively attractive and appears to have some empirical basis. The purpose here is to compare classical ratio estimator and unbiased estimation method of estimation using equal probabilities and using large scale sample result which can be obtained using generalization of model. Comparison for probability proportional to size will be discussed in Chapter 7, 8 and 9. The stochastic model used here for the purpose of comparing efficiencies. 6.9.1. Unbiased Estimate for Population Total Based on Simple Random
Sampling THEOREM (6.7). Under linear stochastic model (6.8.2) ratio estimator will be more efficient than
unbiased estimator if 2 2 2
1
N
x ii
β σ σ′=
> ∑
PROOF
12
We know that:
1
n
ii
Ny yn =
′ = ∑
Putting the value of (6.8.2) we get
( )1 1 1
n n n
i i i ii i i
N Ny x xn n
β ε β ε= = =
⎡ ⎤′ = + = +⎢ ⎥⎣ ⎦∑ ∑ ∑
=1
n
ii
Nxn
β ε=
′ + ∑ (6.9.1)
Also [ ]1 1 1
N N N
i i i ii i i
Y Y X Xβ ε β ε= = =
= = + = +∑ ∑ ∑ (6.9.2)
or 1
N
ii
Y Xε β=
= −∑ (6.9.3)
( ) ( )2Var y E y Y′ ′= −
[ ]2E y X X Yβ β′= − + −
( ) ( ) 2E y X Y Xβ β′⎡ ⎤= − − −⎣ ⎦
( ) ( )2 2E y X E Y Xβ β′= − − − (6.9.4)
as cross product term is equal to ( )22E Y Xβ− Now on first term of (6.9.4) using (6.8.2) will be
( )2
2
1
n
M D M D ii
NE E y X E E x Xn
β β ε β=
⎡ ⎤′ ′− = + −⎢ ⎥⎣ ⎦∑
( )2
1
n
M D ii
NE E x Xn
β ε=
⎡ ⎤′= − +⎢ ⎥⎣ ⎦∑
2 2 2
1
N
x ii
Nn
β σ σ′=
= + ∑ (6.9.5)
Similarly
( )2
2
1
N
M i Mi
E E Y Xε β=
⎛ ⎞ = −⎜ ⎟⎝ ⎠∑
13
or ( )22
1
N
ii
E Y Xσ β=
= −∑ (6.9.6)
Using (6.9.5) and (6.9.6) in (6.9.4) we get:
( ) 2 2 2 2
1
N
y x ii
N nVar yn
σ β σ σ′ ′=
−′ = = + ∑ (6.9.7)
Ratio Estimator
yy Xx′
′′ =′
(6.1.3)
( )
1
n
i ii
N Xn X
x
β ε=
+=
′
∑
1
n
ii
Nxn Xx
β ε=
⎡ ⎤′ +⎢ ⎥⎣ ⎦=′
∑ (6.9.8)
Now ( ) [ ]2Var y E y Y′′ ′′= −
[ ]2E y X X Yβ β′′= − + −
( ) ( )2 2E y X E Y Xβ β⎡ ⎤′′= − − −⎣ ⎦ (6.9.9)
Now
[ ]
2
2 1
n
ii
M D M D
NxnE E y X E E X Xx
β εβ β=
⎡ ⎤′ +⎢ ⎥′′ − = −⎢ ⎥
′⎢ ⎥⎢ ⎥⎣ ⎦
∑
2
1
n
M D ii
N XE E X Xn x
β ε β=
⎡ ⎤= + −⎢ ⎥′⎣ ⎦∑
2
1
n
M D ii
N XE En x
ε=
⎡ ⎤= ⎢ ⎥′⎣ ⎦∑
22
21
N
ii
N Xn x
σ=
=′ ∑
2
1
N
ii
Nn
σ=
= ∑
(6.9.10) 2
1( )
N
ii
E Y Xβ σ=
− =∑
Therefore
14
( )Var y′′ = 2
1
N
ii
Nn
σ=
= ∑ - 2
1
N
iiσ
=∑
Comparing (6.9.7) and (6.9.11) we have:
( ) ( ) 2 2 2 2 2
1 1 1
N N N
x i i ii i i
N NVar y Var yn n
β σ σ σ σ′= = =
′ ′′− = + − −∑ ∑ ∑ -
2
1
N
iiσ
=∑
2 2xβ σ ′= So Ratio Estimator will always be more efficient if 2 2 2
1
N
x ii
β σ σ′=
= −∑ is
positive or Foreman and Brewer (1971) used the following model i i iY Xα β ε= + + With the same assumption given in (6.8.2) they compared various method of estimation and proved that ratio method of estimation is more efficient than unbiased estimation method provided | α | < | βX | .. SOME RECENT DEVELOPMENTS ON RATIO ESTIMATORS Recently two benchmark variables have been used to increase the efficiency. Some of them are given here 6.10 1 Modification of Classical Ratio Estimator – I Chand’s (1975) developed a chain ratio type estimator in the context of two phase sampling. It seems sensible to study the possibility of adapting it to the new situation although the force of its argument is somewhat lost in the single phase case. THREROM (6.8). An estimator suggested by Samiuddin and Hanif (2006) by using two auxiliary variables i.e, ratio cum ratio is
2X ZT yx z
= (6.10.1)
With Mean square error is
15
( ) 2 2 2 22 1 2 2 2y x z x y yx y z yz x z xzMSE T Y C C C C C C C C C⎡ ⎤= θ + + − ρ − ρ + ρ⎣ ⎦ (6.10.2)
The construction of this estimator is made multiplying Classical Ratio estimator
by Zz
.
PROOF Using the concept given in (6.2.23) we get
( )2 1 1x zy
e eT Y Y e Y
X Z⎡ ⎤⎛ ⎞⎛ ⎞− = + − − −⎢ ⎥⎜ ⎟⎜ ⎟
⎝ ⎠⎝ ⎠⎣ ⎦ (6.10.3)
Ignoring send and higher order terms we get
2 y x zY YT Y e e eX Z
− − −; (6.10.4)
The mean square error 3T will be
( )22E T Y− =2
X Zy Yx z
⎛ ⎞= −⎜ ⎟⎝ ⎠
(6.10.5)
Using (6.10.4) in (6.10.5) we get
( )
22
2
2 2 22 2 2
2 2 2 2 2
y x z
y x z y x y z x z
Y YE T Y E e e eX Z
Y Y Y Y YE e e e e e e e e eX Z XZX Z
⎛ ⎞− − −⎜ ⎟
⎝ ⎠⎡ ⎤
+ + − − +⎢ ⎥⎣ ⎦
;
;
Applying expectation we get
( )2 2
2 2 2 2 2 22 1 2 2
2
2
2 2
y x z x y xy
y z yz x z xz
Y Y YMSE T Y C C X C Z Y XC CXX Z
Y YY ZC C XZC CZ X Z
⎡θ + + − ρ⎢⎣
⎤− ρ + ρ ⎥
⎦
;
On simplification we get ( ) 2 2 2 2
2 1 2 2 2y x z x y yx y z yz x z xzMSE T Y C C C C C C C C C⎡ ⎤= θ + + − ρ − ρ + ρ⎣ ⎦ (6.10.2) 2 2 2 2 2
2 1 1( ) [ 2 ] [ 2 2 ]y x xy x y z y z yz x z xzMSE T Y C C C C Y C C C C Cθ ρ θ ρ ρ= + − + − +
2 2
2 1 1( ) ( ) [ 2 2 ]z y z yz x z xzMSE T MSE T Y C C C C Cθ ρ ρ= + − + (6.10.2)
6.10.2 Revised Ratio Estimator( An Estimator with suitable “a” involving two auxiliary variables)
16
THEOREM (6.9). A possible estimator with the involvement of suitable “a” and with two auxiliary variable suggested by Samiuddin and Hanif (2006) is
(6.10.13) Putting the value of “a” from (6.10.12) in (6.10.13) and on simplification we
( ) ( ) ( )222 2 2
3 2 222
x y xy y z yz x z xz zy z yz y z
x z x z xz
C C C C C C CMSE T Y C C C C
C C C C
⎡ ⎤ρ − ρ − ρ +⎢ ⎥=θ + − ρ −⎢ ⎥+ − ρ⎢ ⎥⎣ ⎦
(6.10.7
Since 0 and 1α α= = are special cases of 5T therefore we conclude that
( )5MSE T XMSE yx
⎛ ⎞≤ ⎜ ⎟
⎝ ⎠ and
ZMSE yz
⎛ ⎞⎜ ⎟⎝ ⎠
. In 5T , α will have to be replaced
by its sample estimate . SAMPLING WITH PROBABILITIES PROPORTIONAL
TO SIZE (WITH REPLACEMENT)
19
7.1. INTRODUCTION. In previous chapters equal probability sampling selection procedure and estimation methods have been discussed. In this and subsequent chapters those selection procedures will be considered in which probability of selection varies from unit to unit (unequal probability) in the population. In equal probability sampling, selection does not depend how large or small that unit is but in probability proportionate (proportional) to size sampling these considerations are made. The probabilities must be known for all units of the population. The general theory of unequal probabilities in sampling was perhaps first presented by Hansen and Hurwitz (1943). They demonstrated, however, that use of unequal selection probabilities within a stratum frequently made far more efficient estimator of total than did equal probability selection provided measure
of size ( iZ i.e. 1
N
ii
Z Z=
=∑ ) is sufficiently correlated with estimand,( variable
under study) Yi. A method of selection in which the units are selected with probability proportionate (proportional) to given measure of size, related to the characteristic under study is called unequal probability sampling or the probability proportional to size sampling, commonly known as PPS or πPS sampling. 7.2. SAMPLING WITH UNEQUAL PROBABILITIES WITH
REPLACEMENT [PPS SAMPLING]. The use of unequal probabilities in sampling was first suggested by Hansen and Hurwitz (1943). Prior to that date there had been substantial developments in sampling theory and practice, but all these had been based on the assumption that probabilities of selection within each stratum would be equal. They proposed a two stage sampling scheme (will be discussed in Chapter 11). The first stage selection took place in independent draws. At each draw, a single first-stage unit is selected with probabilities proportional to a measure of size, the number of second-stage sampling units within each first-stage units. At the second-stage, the same number of second stage-units is selected from each sampled first-stage unit. Because it is possible for the same first-stage unit to be selected more than once therefore, this type of unequal probability sampling is generally known as sampling with replacement. Since, however, the independence of the draws is not necessary condition for the units to have a non-zero probability of being selected more than once, another name first suggested by Hartley and Rao (1962) is
20
multinomial sampling, a term justified by the multinomial distribution of the number of units in the sample. Unequal probability can however be used in single stage design. This scheme compared favorably with other two –stage sampling schemes; these used equal probabilities of selection at the first stage, and then took either a fixed number or a constant proportion of sub-sampling units from each selected first stage unit. This selection procedure is explained as: A list of 523 villages of Multan district along with population of males and females is given in Appendix-I. In order to understand the selection procedure of probability proportional to size sampling, 5% sample has been selected from this population. In order to select a sample we cumulate the measure of sizes (area) under this selection procedure, 26(5% of total villages) random numbers are selected from 001 to 956204. These random numbers along with the serial number of villages, total population and initial probabilities of selection are given(data is given on next page). If any unit is selected more than once it should be included in the sample 7.3 EXPECTATION. If the ith unit is selected from a population of N units with probability
1/
N
i i ii
P Z Z=
= ∑ , than an unbiased estimator, HH ppsy or y′ ′ of population total Y
as suggested by Hansen and Hurwitz (1943) is:
1
1 ,n
iHH PPS
i i
yy yn p=
′ ′= = ∑ (7.3.1)
where HH denotes the Hansen and Hurwitz, and pps denotes probability proportional to size. THEOREM (7.1)
A sample of size n is drawn from a population of N units with probability proportional to size and with replacement HHy′ is an unbiased estimator of population total, Y.
PROOF We know that
21
1
1 ,n
iHH
i i
yyn p=
′ = ∑ (7.3.1)
Taking the expectation
1 1
1( ) ( ) ( )n N
i i iHH i
i ii i i
y y YE y E E P Yn p p P= =
′ = = = =∑ ∑ ◊
Therefore HHy′ is an unbiased estimator of population total Y.
A sample of size n is drawn from a population of N units with probability proportional to size and with replacement, the variance of HHy′ is
22
1
1( )N
iHH
i i
YVar y Yn P=
⎛ ⎞′ = −⎜ ⎟
⎝ ⎠∑ (7.4.1)
PROOF.
We know that 2 2( ) ( )HH HHVar y E y Y′ ′= −
Substituting the value, HHy′ from (7.3.1), we have
2
2
1
1( )n
iHH
i i
yVar y E Yn p=
⎛ ⎞′ = −⎜ ⎟
⎝ ⎠∑
22
2 21 1 1
1 i
i
n n ni j
i i j i jj i
y y yE E Y
n p p p= = =≠
⎡ ⎤⎛ ⎞⎛ ⎞⎢ ⎥⎜ ⎟= + −⎜ ⎟⎢ ⎥⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠⎢ ⎥⎝ ⎠⎣ ⎦
∑ ∑∑
22
2 21 1 1
1 ( 1)i
i
N N Ni j
i iji i j i j
j i
Y YYn P n n P Y
n P PP= = =≠
⎡ ⎤⎢ ⎥= + − −⎢ ⎥⎢ ⎥⎣ ⎦∑ ∑∑ .
Since the selection of population units are independent; therefore Pij = PiPj, substituting the value of pij:
22 2 2
1 1 1
1( ) ( 1) ( )iN N N
HH i ii i ii
YVar y n Y Y Y
n P= = =
⎡ ⎤⎛ ⎞′ = + − − −⎢ ⎥⎜ ⎟⎝ ⎠⎢ ⎥⎣ ⎦
∑ ∑ ∑ .
On simplification we get:
22
1
1( ) iN
HHi i
YVar y Y
n P=
⎡ ⎤′ = −⎢ ⎥
⎢ ⎥⎣ ⎦∑ ◊
This expression may alternatively be written as
23
2
1
1( )N
iHH i
i i
YVar y P Yn P=
⎡ ⎤′ = −⎢ ⎥
⎣ ⎦∑ . (7.4.2)
2
1 1
12
N Nji
i ji j i j
YYPPn P P= =
⎡ ⎤= −⎢ ⎥
⎢ ⎥⎣ ⎦∑∑ (7.4.3)
2
1
1 1 ( )N
i ii i
Y P Yn P=
= −∑ . (7.4.4)
7.4.1 An Alternative proof( using Indicator Variable) Let ai is defined as the number of times that the ith unit of the population to be in the sample (Chapter 2), then the joint distribution of ai is
1 2!1 2
1 2! ! !Naa a
NN
n P P Pa a a
KK
(7.4.5)
Then
( ) ; ( ) (1 ); ( , )i i i i i i j i jE a nP Var a nP P Cov a a nPP= = − = − (7.4.6) An unbiased estimator of population total will be
1
1 Ni
HH ii i
Yy an P=
′ = ∑ (7.4.7)
The unbiased ness can be proved easily as:- Taking the expectation of (7.4.7) and putting E(ai) = nPi from (7.4.6) we get
1 1
1 1( ) ( )N N
i iHH i i
i ii i
Y YE y E a nP Yn P n P= =
′ = = =∑ ∑
24
The variance of HHy′ may be written (see chapter 2) as:
2
2 21 1 1
1( ) ( ) ( , )N N N
ji iHH i i j
i i ji i jj i
YY YVar y Var a Cov a an P P P= = =
≠
⎡ ⎤⎢ ⎥′ = +⎢ ⎥⎢ ⎥⎣ ⎦∑ ∑ ∑ (7.4.8)
Putting the values of ( )iVar a and ( , )i jCov a a from (7.4.6) in (7.4.8) and on simplification we get (7.4.1).
It follows that, if 1
/N
i i ii
P Y Y=
= ∑ the variance is zero. In practice, this ideal
situation can of course not be realized as the probabilities cannot be chosen proportional to Yi, which still has to be observed. But this situation can be approximated if it is possible to choose Pi proportional to some measures of size Zi, which is known for all units in the population and which may be assumed approximately proportional to iY . The iZ will then be called the size of the ith unit and least possible variance may be obtained by choosing the probabilities proportional to the sizes.
An analogous expression for the covariance of HHy′ and HHx′ in the case of sampling with replacement and with probabilities proportional to size may be written in a straight far warded manner, i.e.
1
1( , )N
i ii
i i i
Y XCov y x P Y Xn P P=
⎛ ⎞ ⎛ ⎞′ ′ = − −⎜ ⎟ ⎜ ⎟
⎝ ⎠ ⎝ ⎠∑ . (7.4.9)
7.4.1. Unbiased Variance Estimator THEOREM (7.3) A sample of size n is drawn from a population of N units with probability
proportional to size and with replacement then an unbiased variance
estimator of (7.4.1) is:
2
1
1var( )( 1)
ni
HH HHi i
yy yn n p=
⎛ ⎞′ ′= −⎜ ⎟− ⎝ ⎠
∑ . (7.4.10)
25
PROOF. Taking expectation of (7.4.10)
[ ]2
1
1var( )( 1)
ni
HH HHi i
yE y E yn n p=
⎡ ⎤⎛ ⎞′ ′⎢ ⎥= −⎜ ⎟−⎢ ⎥⎝ ⎠⎣ ⎦
∑ ,
Now
( )2 2
2
1 1
n ni i
HH HHi ii i
y yy Y n E y Yp p= =
⎛ ⎞ ⎛ ⎞′ ′− = − − −⎜ ⎟ ⎜ ⎟
⎝ ⎠ ⎝ ⎠∑ ∑ .
Taking the expectation of the above equation
( )2HH
n
1i
2
i
in
1i
2
HHi
i YyEnYpy
Eypy
E −′−⎥⎥⎦
⎤
⎢⎢⎣
⎡∑ ⎟⎟
⎠
⎞⎜⎜⎝
⎛−=
⎥⎥⎦
⎤
⎢⎢⎣
⎡∑ ⎟⎟
⎠
⎞⎜⎜⎝
⎛′−
==
( ) ( ) ( )
2
1
22
1
( )
1 var 1 var
Ni
i HHi i
Ni
i HH HHi i
Yn P Y n Var yP
Yn P Y n y n n yn P
=
=
⎛ ⎞′= − −⎜ ⎟
⎝ ⎠
⎛ ⎞′ ′= − − = −⎜ ⎟
⎝ ⎠
∑
∑
Using (7.4.2) we get 2
1
ni
HHi i
yE yp=
⎡ ⎤⎛ ⎞′⎢ ⎥−⎜ ⎟
⎢ ⎥⎝ ⎠⎣ ⎦∑ ( 1) ( )HHn n Var y′= − .
Using this result in (7.4.10), we get
[ ]var( ) ( )HH HHE y Var y′ ′=
(7.4.10) may be written as
( )
2
21 1
1var( )2 1
n nji
HHi j i j
yyyn n p p= =
⎛ ⎞′ = −⎜ ⎟⎜ ⎟− ⎝ ⎠
∑∑ . (7.4.11)
For calculation purpose alternative form of (7.4.10) is
2'2
21
1var( )( 1
ni
HH PPSi i
yy n yn n p=
⎡ ⎤′ = −⎢ ⎥− ⎣ ⎦
∑ . (7.4.12)
26
An unbiased covariance expression may be written analogous to (7.4.9) as
1
1( , ) ( )( )( 1)
nii
HH HHi i i
xyCov y x y xn n p p=
′ ′ ′ ′= − −− ∑ . (7.4.13)
Though this scheme is based on with replacement process but for the following reasons, it is preferred to be used in large scale sample surveys;
(i) selection of the sample is simple,
(ii) can be used for any finite predetermined number of units in the sample,
(iii) an unbiased variance estimator is simple, and
(iv) it is also comparatively easy to obtain unbiased variance estimator of total in multistage designs.
This selection procedure may be more efficient than simple random
sampling if the measure of size is approximately proportional to estimated i.e. Yi and Zi are linearly related and regression line passing through the origin.
EXAMPLE (7.2)
Select a sample of 26 villages using probability proportional to size and with replacement selection procedure form the data given in Appendix-I. Estimate the total number of person in 523 villages and compare this result with actual number of population given in 523 villages. Estimate )y(Var PPS′ and calculate standard error of this estimate. Solution:
whereas the actual/total for 523 villages is 1797841.
(ii) ∑−
⎟⎟⎠
⎞⎜⎜⎝
⎛′−
−=′
n
iPPS
i
iPPS y
py
nnyVar
1
2
)1(1)(
350186431492625
50.07723262118047=
×=
29
(iii) 2834.224023)(. =′PPSyES
(iv) ( ). 1606783 2 224023.2834PPSC L y′ = ± × This may also be calculated as:
( )PPSvar y′ = ( )
22
2
11
ipps
i
y nyn n p
⎡ ⎤′−⎢ ⎥− ⎣ ⎦
∑
= ( )[ ]2382.16067832680.578699746754262625
1−
×
= 50186431493 7.4.2. Comparison of Simple Random Sampling with Replacement and Probability Proportional to Size with Replacement We know that
22
1
1( )N
iHH
i i
YVar y Yn P=
⎛ ⎞′ = −⎜ ⎟
⎝ ⎠∑ (7.4.1)
If Pi =1/N then (7.4.1) becomes
22 2 2
1
1( )N N
ran i ii
N YVar y N Y Y Yn n N=
⎛ ⎞⎛ ⎞′ = − = −⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠∑ ∑ (7.4.14)
which is a variance expression for simple random sampling with replacement.
Putting Pi = Zi/Z in (7.4.1) and subtracting from (7.4.14), we obtain
2
1( ) ( ) 1
N
ran HH ii i
N ZVar y Var y Yn Z=
⎛ ⎞′ ′− = −⎜ ⎟
⎝ ⎠∑ , (7.4.15)
where 1
/N
ii
Z Z N=
= ∑ .
Probability Proportional to size (PPS) sampling with replacement will be more efficient than simple random sampling provided.
2
1
( ) 0N
ii
i i
YZ ZZ=
− >∑ (7.4.16)
i.e. If Zi and 2 /i iY Z are positively correlated.
However, it was noted by Raj (1954) that estimator based on PPS sampling with replacement turns out to be inefficient compared to unbiased estimate based on simple random sampling with replacement if the regression line Yi on Zi is far from the origin.
30
7.4.3Comparison of )y(Var ran′ and )y(Var HH′ Using a Linear Stochastic Model We have already shown in (7.4.2) thatt
2
1( ) ( ) 1
N
ran HH ii i
N ZVar y Var y Yn Z=
⎛ ⎞′ ′− = −⎜ ⎟
⎝ ⎠∑ (7.4.15)
( )2
1
Ni
ii i
YN Z Zn Z=
= −∑ (7.4.17)
For the purpose of comparison, let us take the linear model as defined in (6.8.2) the Chapter 6, i.e. Assuming that the finite population Y1, Y2, …., YN is a random sample from an infinite super-population in which i i iY Zβ ε= +
* *
2 2 2 2 2
( ) 0, ( ) 0,1( ) , 12
i i j
i i i i
where E E
E and Z whereγ
ε ε ε
ε σ σ σ γ
⎫= =⎪⎬
= = ≤ ≤ ⎪⎭
(6.8.2)
Substituting the value of Yi from the model in (7.4.17), we have
7.6. ALTERNATIVE ESTIMATOR TO HANSEN AND HURVITZ ESTIMATOR
Pathak (1962) described an estimator for the sampling scheme suggested by Hansen
and Hurvitz (1963). For this let we have a sample of three units selected from a
population of N units. Llet the selected sample has yi, yi, yj observations with
probabilities pi, , pi, pj respectively, then Pathak (1962) defines an estimator:
13
j i jip
i j i j
y y yyyp p p p
⎡ ⎤+′ = + +⎢ ⎥
+⎢ ⎥⎣ ⎦, (7.6.1)
37
or for sample size n it may be written as:
11
1
1
1
n
ini i
p ni i
ii
yyy
n p p
−=
=
=
⎡ ⎤⎢ ⎥
′ = +⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
∑∑
∑.. (7.6.2)
This is more efficient than Hansen and Hurwitz (1993) estimator but more difficult to
calculate. The gain in precision is small unless the sampling fraction is large.
7.7. RATIO ESTIMATION FOR PPS SAMPLING We know that
1 1
1 1n ni i
HH HHi ii i
y xy and xn p n p= =
′ ′= =∑ ∑
Therefore
1
1
1
.1
ni
i iHH n
i
i i
yn py X
xn p
=
=
′′ =∑
∑. (7.7.1)
From Hansen, Hurwitz and Madow (1953), we have
( ) ( ) ( ) ( )22 ,HH HH HH HH HHVar y Var y R Cov y x R Var y′′ ′ ′ ′ ′= − + .(6.2.19)
Using (7.4.2) and (7.4.9) and analogues expression
38
2
1
1( )N
iHH i
i i
XVar x P Xn P=
⎛ ⎞′ = −⎜ ⎟
⎝ ⎠∑ , (7.7.2)
in (6.2.19) and on simplification
( )2 2
2 2
1 1 1
1 2 ( )N N N
i i i iHH
i i ii i i
Y Y X YVar y R R Y RXn P P P= = =
⎡ ⎤′′ = − + − −⎢ ⎥
⎣ ⎦∑ ∑ ∑ (7.7.3)
2
1
1 1 (N
i ii i
Y R Xn P=
⎡ ⎤= −⎢ ⎥
⎣ ⎦∑ . (7.7.4)
This may be put easily as
2
1
1( )N
i iHH i
i i i
Y XVar y P Rn P P−=
⎛ ⎞′′ = −⎜ ⎟
⎝ ⎠∑ . (7.7.5)
An approximate unbiased estimator of )y(Var HH′′ may be written in a straight forward
way or may be derived
2
1
1var( )( 1)
ni i
HHi i i
y xy rn n p p=
⎛ ⎞′′ = −⎜ ⎟− ⎝ ⎠
∑ , (7.7.6)
or
2
1
1var( )( 1)
Ni i
HHi i i
y xyyn n p x p=
⎛ ⎞′′′ = −⎜ ⎟′− ⎝ ⎠
∑ . (7.7.7)
CHAPTER-4
TWO-PHASE SAMPLING 1.1 Introduction
Consider the problem of estimating population mean of Y of a Study Variable Y from a finite population on N units. When information on one or more auxiliary variable say X and Z which are correlated with the variable Y are available or can be cheaply obtained ratio or regression type estimates can be used to improve the efficiency. These cases may include knowledge of X or Z or both X and Z . These are
39
however situations where prior knowledge about these may be lacking and a census or complete count is too costly. Two phase sampling is used to gain information about x & z cheaply from a first stage bigger sample. A sub sample is then selected from the units selected at the first phase & Y is observed for the selected units. Useful references in this area are Mohanty (1967), Chand (1975), Ahmed (1977), Kiregyera (1980, 1984), Sahoo et al (1993) and Roy (2003). We have used Linear models and the method of Least Squares (L.S) following Roy (2003) to deal with different situations. The results as expected are encouraging. We have also indicated how slight adjustments can be made in earlier works to improve the efficiency of the estimates. An implication of this is that some of these earlier works do not fully utilize the available information. Let N be the size of the population, from which a sample of size 1n ( 1n < N ) is drawn using a simple random sampling without replacement. The values of X and Z are noted for the quits selected. From this sample a sub-sample of size 2n ( 2 1n n< ) is again selected using a simple random sampling with out replacement observing as Y. S. Further let 2y , 2x and 2z be the sample means of y, x, and z variables respectively based on the sample of size 1n and let 2x and 2z be the sample mean based on the first phase sample of size 1n of variable x and z respectively. Various situations of interest may arise depending on availability of information about X and Z . We will deal with them separately.
To suit different situation we introduce the following notations. Let ( )22
1
11
Ny i
iS Y Y
N == −
−∑ ,
11
1 1n N
θ = − , 22
1 1n N
θ = − , 2 2 2y yC S Y= with 2 2,x zC C similarly defined. Also , andxy yz xzρ ρ ρ denote
the population correlation coefficient between andX Y , andY Z and andX Z respectively. We will also write
40
11 yy Y e= + ,
11 ,xx X e= + 11 zz Z e= + , ( )1
2 2 21x xE e X C= θ
( ) ( )1 1
2 2 2 2 2 21 1y y z yE e Y C and E e Z C= θ = θ . ( )x y y x xyE e e X Y C C= ρ
22 ,xx X e= + ( )2
2 2 22 ,x xE e X C= θ ( )2 2 2x y x y xyE e e X YC C= θ ρ
( ) ( )2
2 2 21 2 1x xE e e X C− = θ − θ ,
(4.1.1) ( ) ( )
2 1 2 2 1y x x y x xyE e e e Y X C C⎡ ⎤− = θ − θ ρ⎣ ⎦
( )1 20x z zE e e e⎡ ⎤− =⎣ ⎦
( )2 1 1y x y x xyE e e Y X C C= θ ρ
with other terms similarly defines: Also we will assume that both 1 2
,y ye e are much smaller in comparison with Y with similar assumptions for auxiliary variables we will look into the following situations separately.
i) In addition to the sample we are given the population means of X and Z which are X and Z respectively. We may call this complete information case.
ii) In addition to the sample we are given X only, ( Z being unknown). We will call this partial information case.
iii) Only the information on the sample is available i.e. X and Z are unknown. We will call this no additional information case.
4.2 Ratio and Regression Estimators
In this section following estimator of ratio and regression alongwith mean square error have been considered.
a) ( )2
1 22
yT X
x= [ X is known ]
b) ( )2
12 22
yT x
x= [ no information ]
c) ( ) ( )2 1 23 2 yxT y b x x= + − [ no information ]
4.2.1 Ratio Estimator with known information
Consider
( )2
1 22
yT X
x=
(4.2.1)
Using (1.1.1) we get
41
( )2
2
2 2
2 2
2 2
1 2 .
1 1
1
y
x
y x
y x
y x
Y eT X
X e
e eY
Y X
e eY
Y X
YY e eX
+=
+
⎛ ⎞⎛ ⎞= + −⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟
⎝ ⎠⎝ ⎠⎛ ⎞
= + −⎜ ⎟⎜ ⎟⎝ ⎠
= + −
( ) 2 2y xYT Y e eX
⎛ ⎞− = −⎜ ⎟
⎝ ⎠
The mean square error of ( )1 2T will be
( )( ) ( )( ) 2 2
22
1 2 1 2 y xYMSE T E T Y E e eX
⎛ ⎞= − = −⎜ ⎟
⎝ ⎠
(4.2.2)
Taking the square R.H.S of (4.2.2.) we get
2 2 2 2
22 2
2 2y x y xY YE e e e e
XX
⎡ ⎤= + −⎢ ⎥
⎢ ⎥⎣ ⎦
Using (1.1.1)
MSE ( )( )1 2T2
2 2 2 22 2 22 2y x y x xy
Y YY C X C Y X C CXX
= θ + θ − θ ρ
On simplification we get
( ) ( )( )1 2 1 2V MSE T= 2 2 22 2y x xy x yY C C C C⎡ ⎤= θ + − ρ⎣ ⎦
(4.2.3) 4.2.2 Ratio Estimator with no information
Consider
( )2
12 22
yT x
x=
(4.2.4)
Using (1.1.1) in (4.2.4) we get
( ) ( )2
12
2 2y
xx
Y eT X e
X e
+= +
+
42
( ) ( ) ( )
( )
( )
( )
2 1 2
1 2
2
1 2
2
2 1 2
1
1 1
1
y x x
x xy
x xy
y x x
Y e X e X e
e eY e
X X
e eY e
X X
YY e e eX
−= + + +
⎛ ⎞⎛ ⎞= + + −⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟
⎝ ⎠⎝ ⎠⎛ ⎞
= + + −⎜ ⎟⎜ ⎟⎝ ⎠
= + + −
or
( ) ( )2 1 22 2 y x xYT Y e e eX
− = + −
The mean square error of ( )2 2T
( )( ) ( )( ) ( )2 1 2
2
2 2 2 2 y x xYMSE T E T Y E e e eX
⎡ ⎤= − = + −⎢ ⎥
⎣ ⎦
(4.2.5)
( ) ( )2 1 2 2 1 2
2 222 2y x x y x x
Y YE e e e e e eXX
⎡ ⎤= + − + −⎢ ⎥
⎢ ⎥⎣ ⎦
Using (1.1.1) we get
MSE ( )( )2 2T ( ) ( )2
2 2 2 22 1 2 2 12 2y x x y xy
Y YY C X C Y X C CXX
= θ + θ − θ + θ − θ ρ
( ) ( )2 2 2 2 22 1 2 1 22y x x y xyY C Y C Y C C= θ + θ − θ − θ − θ ρ
or ( ) ( )( )3 2 3 2MSEV T= ( )( )2 2 2
2 2 1 2y x x y xyY C C C C⎡ ⎤= θ + θ − θ − ρ⎣ ⎦
(4.2.6) 4.2.3 Regression Estimator with no information
Consider
( ) ( )2 1 23 2 yxT y b x x= + − (4.2.7)
Using (1.1.1) we get
( ) ( ) ( )
( )2 1 2
2 1 2
3 2 y yx r x x
y yx x x
T Y e e e e
Y e e e
= + β + −
= + + β −
or ( )( ) ( )2 1 22 2 y yx x xT Y e e e− = + β −
(4.2.8)
The mean square error of ( )3 2T is
( )( ) ( )( ) ( )2 1 2
2 2
3 2 3 2MSE y yx x xT T Y E e e e⎡ ⎤= − = + β −⎣ ⎦
(4.2.9) or
43
( )( ) ( )2 2 1 2
23 2MSE y yx y x xT E e e e e⎡ ⎤= + β −⎣ ⎦
(4.2.10) or ( )( ) ( )2 2
2 1 23 2MSE y yx y x xyT Y C Y X C C= θ + β θ − θ ρ
Substituting the value of xyyyx
x
Y CX C
ρβ =
( )( ) ( )2 22 1 23 2MSE xy y
y y x xyx
YCT Y C Y X C C
XCρ
= θ + θ − θ ρ
On simplification we get
( )( ) ( )2 2 22 1 23 2MSE y xyT Y C ⎡ ⎤= θ + θ − θ ρ⎣ ⎦
( ) ( )( ) ( )2 2 2 22 12 2 2 2MSE 1y xy xyV T Y C ⎡ ⎤= = θ − ρ + θ ρ⎣ ⎦
(4.2.11) 4.3 Mohanty’s [1967] Estimator and some modifications
In this section following estimators are mentioned.
a) ( ) ( )2 1 24 22
yxZT y b x xz
⎡ ⎤= + −⎣ ⎦
b) ( ) ( ) 12 1 25 2
2yx
zT y b x x
z⎡ ⎤= + −⎣ ⎦
c) ( ) ( )2 1 26 22
yzXT y b z zx
⎡ ⎤= + −⎣ ⎦
d) ( ) ( )21 17 2
2yx
zT z b X x
z⎡ ⎤= + −⎣ ⎦
4.3.1 Mohanty (1967) considered the estimation when Z is known
( ) ( )2 1 24 22
yxZT y b x xz
⎡ ⎤= + −⎣ ⎦
(4.3.1)
Using (1.1.1) in (4.3.1) we get
( ) ( ) ( )2 1 22
4 2 y yx r x xz
ZT Y e e e ez e
⎡ ⎤= + + β + −⎣ ⎦ +
On simplification we get
( ) ( )2 1 2 24 2 y yx x x zYT Y e e e eZ
= + + β − −
or
( ) ( )2 1 2 24 2 y yx x x zYT Y e e e eZ
− = + β − −
The MSE of ( )3 2T is
44
( )( ) ( )2 1 2 2
22
4 2 y yx x x zYE T Y E e e e eZ
⎡ ⎤− = + β − −⎢ ⎥
⎣ ⎦
(4.3.2)
( ) ( )
( )
2 1 2 2 2 1 2
2 2 2 1 2
222 2 22 2
2 2
y yx x x z yx y x x
y z yx z x x
YE e e e e e e eZ
Y Ye e e e eZ Z
⎡= + β − + + β −⎢
⎢⎣⎤
− − β − ⎥⎦
( ) ( )2
2 2 2 2 2 2 22 2 1 24 2 2MSE y yx x z
YT Y C X C Z CZ
⎡ ⎤ = θ + θ − θ β + θ⎣ ⎦
( )1 2 22 2yx y x xy y z yzYY X C C Y Z C CZ
+ θ − θ β ρ − θ ρ
( )1 22 yx z x xzY Z X C CZ
− θ − θ β ρ
(4.3.3)
Putting the values of xy yyx
x
C YX C
ρβ = in (4.3.3) we get
( )( ) ( )2 2 2
2 2 2 22 1 14 2 2 2
xy yy x
x
C YMSE T Y C X C
X C
ρ= θ + θ − θ
( )1 22 xy yy x xy
x
YCY X C C
X Cρ
+ θ − θ ρ
( )22 1 22 xy y
y z yz z x xzx
C Y YY C C Z X C CC X Z
ρ−θ ρ − θ − θ ρ (4.3.4)
On simplification
( )( ) ( ) ( )2 2 2 2 2 2 22 2 1 2 1 24 2 2y y xy z y xyMSE T Y C C C C⎡= θ + θ − θ ρ + θ + θ − θ ρ⎣
( )2 1 22 2y z yz y z xy xzC C C C ⎤− θ ρ − θ − θ ρ ρ ⎦
( ) ( )2 2 2 2 22 2 1 2 2 1 22 2y y xy z y z yz y z xy xzY C C C C C C C⎡ ⎤= θ − θ −θ ρ +θ − θ ρ − θ −θ ρ ρ⎣ ⎦
( )2 2 2 2 2 2 2 2 2
2 2 2 2 1 2y y yz y yz y xy zY C C C C C⎡= θ − θ ρ + θ ρ − θ − θ ρ + θ⎣
( )2 1 22 2y z y z xy xzC C C C ⎤− θ − θ − θ ρ ρ ⎦
or ( ) ( )2 2 2 2 2 2 2 2
2 21 2y yz z y z y yz y yzY C C C C C C⎡= θ − ρ + θ − + ρ − ρ⎣
( ) ( )2 2 2 22 2 1 1 22y yz y xy y z xy xzC C C C ⎤θ ρ − θ − θ ρ − θ − θ ρ ρ ⎦
or
( ) ( )22 2 2 2 22 21y xz z y yz y yzY C C C C⎡ ⎧ ⎫= θ − ρ + − ρ − θ ρ⎨ ⎬⎢ ⎩ ⎭⎣
45
( ) ( )2 2 2 22 2 1 1 22y yz y xy y z xy xzC C C C ⎤+θ ρ − θ − θ ρ − θ − θ ρ ρ ⎥⎦
This is a modification of Chand (1975). Kiregyera (1980) assumed that iZ is closely related to iX , but compared to iX is remotely related to iY . This assumption may not always be to realize in particular. Therefore ( )8 2T may not be effectively used in many situations