1 16. Mean Square Estimation ormation that is related to an unknown quantity of problem is to obtain a good estimate for the unknow bserved data. represent a sequence of ran t whom one set of observations are available, and Y unknown random variable. The problem is to obtain a for Y in terms of the observations an estimate for Y. can be a linear or a nonlinear function of the o Clearly error in the above estimate, and the square n X X X , , , 2 1 . , , , 2 1 n X X X ) ( ) , , , ( ˆ 2 1 X X X X Y n (16-1) ) ( . , , , 2 1 n X X X ) ( ˆ ) ( X Y Y Y X (16-2) 2 | | PILLAI
33
Embed
1 16. Mean Square Estimation Given some information that is related to an unknown quantity of interest, the problem is to obtain a good estimate for the.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
16. Mean Square Estimation
Given some information that is related to an unknown quantity of interest, the problem is to obtain a good estimate for the unknown interms of the observed data.
Suppose represent a sequence of random variables about whom one set of observations are available, and Yrepresents an unknown random variable. The problem is to obtain a good estimate for Y in terms of the observations Let
represent such an estimate for Y.Note that can be a linear or a nonlinear function of the observation Clearly
represents the error in the above estimate, and the square of
nXXX ,,, 21
.,,, 21 nXXX
)(),,,(ˆ21 XXXXY n (16-1)
)(.,,, 21 nXXX
)(ˆ)( XYYYX (16-2)2||
PILLAI
2
the error. Since is a random variable, represents the meansquare error. One strategy to obtain a good estimator would be tominimize the mean square error by varying over all possible forms of and this procedure gives rise to the Minimization of the Mean Square Error (MMSE) criterion for estimation. Thus under MMSE criterion,the estimator is chosen such that the mean square error is at its minimum.
Next we show that the conditional mean of Y given X is the best estimator in the above sense.Theorem1: Under MMSE criterion, the best estimator for the unknownY in terms of is given by the conditional mean of Y gives X. Thus
Proof : Let represent an estimate of Y in terms of Then the error and the mean squareerror is given by
}|| { 2E
),(
)(
nXXX ,,, 21
}.|{)(ˆ XYEXY (16-3)
)(ˆ XY ).,,,( 21 nXXXX ,YY
}|)(| {}|ˆ| {}|| { 2222 XYEYYEE (16-4) PILLAI
}|| { 2E
3
Since
we can rewrite (16-4) as
where the inner expectation is with respect to Y, and the outer one iswith respect to Thus
To obtain the best estimator we need to minimize in (16-6)with respect to In (16-6), since and the variable appears only in the integrand term, minimizationof the mean square error in (16-6) with respect to is equivalent to minimization of with respect to
}]|{[][ XzEEzE zX
}]|)(| {[}|)(| {
z
2
z
22 XXYEEXYE YX
.X
.)(}|)(| {
}]|)(| {[
2
22
dxXfXXYE
XXYEE
X
(16-6)
(16-5)
, 2
. ,0)( Xf X ,0}|)(| { 2 XXYE
2
}|)(| { 2 XXYE .
PILLAI
4
Since X is fixed at some value, is no longer random, and hence minimization of is equivalent to
This gives
or
But
since when is a fixed number Using (16-9)
)(X}|)(| {
2 XXYE
.0}|)(| { 2
XXYE (16-7)
0}|)({| XXYE
(16-8)
),(}|)({ XXXE (16-9)
)( , XxX ).(x
PILLAI
.0}|)({}|{ XXEXYE
5
in (16-8) we get the desired estimator to be
Thus the conditional mean of Y given represents the bestestimator for Y that minimizes the mean square error.
The minimum value of the mean square error is given by
As an example, suppose is the unknown. Then the bestMMSE estimator is given by
Clearly if then indeed is the best estimator for Y
}.,,,|{} |{)(ˆ21 nXXXYEXYEXY
nXXX ,,, 21
.0)}|{var(
]} |)|(| {[}|)|(| {)var(
222min
XYE
XXYEYEEXYEYEXY
(16-11)
(16-10)
3XY
,3XY 3ˆ XY
.}|{}|{ˆ 3
3 XXXEXYEY (16-12)
PILLAI
6
in terms of X. Thus the best estimator can be nonlinear.Next, we will consider a less trivial example.
Example : Let
where k > 0 is a suitable normalization constant. To determine the bestestimate for Y in terms of X, we need
Thus
Hence the best MMSE estimator is given by
otherwise, 0
10 ,),(
,
yx kxyyxf
YX
).|(|
xyfXY
1.x0 ,2
)1(2
),()(
212
1
1
,
xkxkxy
kxydydyyxfxf
x
x xYXX
y
x
1
1
.10 ;1
22/)1( )(
),()|( 22
,
yxxy
xkxkxy
xf
yxfxyf
X
YX
XY
PILLAI
(16-13)
7
Once again the best estimator is nonlinear. In general the best estimator is difficult to evaluate, and hence next we will examine the special subclass of best linear estimators.
Best Linear EstimatorIn this case the estimator is a linear function of the
observations Thus
where are unknown quantities to be determined. The mean square error is given by
.1
)1(32
11
32
132
)|( }|{)(ˆ
2
2
2
31
2
3
1
2
12
1
1
2
1
22
|
xxx
xx
xy
dyydyy
dyxyfyXYEXY
x
xxx x
y
x XY
}|{ XYE
(16-14)
Y.,,, 21 nXXX
naaa ,,, 21
n
iiinnl XaXaXaXaY
12211 .ˆ (16-15)
)ˆ( lYY PILLAI
8
and under the MMSE criterion should be chosen so that the mean square error is at its minimum possible value. Let represent that minimum possible value. Then
To minimize (16-16), we can equate
This gives
But
naaa ,,, 21
}|| {}|ˆ{|}|| { 222 iil XaYEYYEE
}|| { 2E2n
n
iii
aaan XaYE
n 1
2
,,,
2 }.|{|min21
(16-17)
,n.,,kEak
21 ,0}|{| 2 (16-18)
.02||
}|{|*2
2
kkk aE
aEE
a
(16-19)
PILLAI
(16-16)
9
Substituting (16-19) in to (16-18), we get
or the best linear estimator must satisfy
Notice that in (16-21), represents the estimation error and represents the data. Thus from (16-21), the error is orthogonal to the data for the best linear estimator. This is the orthogonality principle.
In other words, in the linear estimator (16-15), the unknownconstants must be selected such that the error
.)()(
11k
k
n
iii
kk
n
iii
k
Xa
Xa
aY
a
XaY
a
,0} {2}|{| *
2
k
k
XEa
E
.,,2,1 ,0} { * nkXE k
(16-20)
(16-21)
n
i ii XaY1
),(
nkX k 1 ,
nkX k 1 ,
naaa ,,, 21
PILLAI
10
is orthogonal to every data for thebest linear estimator that minimizes the mean square error.
Interestingly a general form of the orthogonality principle holds good in the case of nonlinear estimators also.Nonlinear Orthogonality Rule: Let represent any functional form of the data and the best estimator for Y given With we shall show that
implying that
This follows since
n
i ii XaY
1 nXXX ,,, 21
)(Xh}|{ XYE .X
}|{ XYEYe
).( }|{ XhXYEYe
,0)}({ XehE
.0)}({)}({
]}|)([{)}({
})(]|[{)}({
)}(])|[{()}({
XYhEXYhE
XXYhEEXYhE
XhXYEEXYhE
XhXYEYEXehE
PILLAI
(16-22)
11
Thus in the nonlinear version of the orthogonality rule the error is orthogonal to any functional form of the data.
The orthogonality principle in (16-20) can be used to obtainthe unknowns in the linear case.
For example suppose n = 2, and we need to estimate Y in terms of linearly. Thus
From (16-20), the orthogonality rule gives
Thus
or
naaa ,,, 21
21 and XX
2211ˆ XaXaYl
0}){(}X {
0}){(}X {*22211
*2
*12211
*1
XXaXaYEE
XXaXaYEE
}{}|{|}{
}{}{}|{|*2 2
221
*21
*1 2
*121
21
YXEaXEaXXE
YXEaXXEaXE
PILLAI
12
(16-23) can be solved to obtain in terms of the cross-correlations.The minimum value of the mean square error in (16-17) is given by
But using (16-21), the second term in (16-24) is zero, since the error isorthogonal to the data where are chosen to be optimum. Thus the minimum value of the mean square error is givenby
21 and aa
}{
}{
}|{| }{
}{ }|{|*2
*1
2
1
22
*21
*12
21
YXE
YXE
a
a
XEXXE
XXEXE(16-23)
2n
naaa ,,, 21 ,iX
}. {min} {min
})({min}{min
}|{|min
*
1,,,
*
,,,
1
*
,,,
*
,,,
2
,,,
2
2121
2121
21
l
n
ii
aaaaaa
n
iii
aaaaaa
aaan
XEaYE
XaYEE
E
nn
nn
n
(16-24)
PILLAI
13
where are the optimum values from (16-21). Since the linear estimate in (16-15) is only a special case of the general estimator in (16-1), the best linear estimator that satisfies (16-20) cannot be superior to the best nonlinear estimator Often the best linear estimator will be inferior to the best estimator in (16-3).
This raises the following question. Are there situations in which the best estimator in (16-3) also turns out to be linear ? Inthose situations it is enough to use (16-21) and obtain the best linear estimators, since they also represent the best global estimators.Such is the case if Y and are distributed as jointly Gaussian.
We summarize this in the next theorem and prove that result.Theorem2: If and Y are jointly Gaussian zero
naaa ,,, 21
)(X
}.|{ XYE
}{}|{|
}){(} {
1
*2
1
**2
n
iii
n
iiin
YXEaYE
YXaYEYE
(16-25)
nXXX ,,, 21
nXXX ,,, 21 PILLAI
14
mean random variables, then the best estimate for Y in terms of is always linear.Proof : Let
represent the best (possibly nonlinear) estimate of Y, and
the best linear estimate of Y. Then from (16-21)
is orthogonal to the data Thus
Also from (16-28),
nXXX ,,, 21
}|{),,,(ˆ 21 XYEXXXY n (16-26)
n
iiil XaY
1
ˆ (16-27)
.1 , nkX k
(16-28)
.1 ,0}X { *k nkE (16-29)
.0}{}{}{1
n
iii XEaYEE (16-30)
PILLAI
1
n
l i ii
Y Y Y a X
15
Using (16-29)-(16-30), we get
From (16-31), we obtain that and are zero mean uncorrelated random variables for But itself represents a Gaussian random variable, since from (16-28) it represents a linear combinationof a set of jointly Gaussian random variables. Thus and X are jointly Gaussian and uncorrelated random variables. As a result, andX are independent random variables. Thus from their independence
But from (16-30), and hence from (16-32)
Substituting (16-28) into (16-33), we get
.1 nk kX
.1 ,0}{}{} { ** nkXEEXE kk (16-31)
}.{}|{ EXE (16-32)
,0}{ E
.0}|{ XE (16-33)
0}|{}|{
1
XXaYEXEn
iii
PILLAI
16
or
From (16-26), represents the best possible estimator,and from (16-28), represents the best linear estimator.Thus the best linear estimator is also the best possible overall estimatorin the Gaussian case.
Next we turn our attention to prediction problems using linearestimators.
Linear PredictionSuppose are known and is unknown.
Thus and this represents a one-step prediction problem.If the unknown is then it represents a k-step ahead predictionproblem. Returning back to the one-step predictor, let represent the best linear predictor. Then
.}|{}|{1
1
l
n
iii
n
iii YXaXXaEXYE
(16-34)
)(}|{ xXYE
n
i ii Xa
1
nXXX ,,, 21 1nX,1 nXY
,knX
1ˆ
nX
PILLAI
17
where the error
is orthogonal to the data, i.e.,
Using (16-36) in (16-37), we get
Suppose represents the sample of a wide sense stationary iX
(16-35)
,1 ,
ˆ
1
1
1
12211
1111
n
n
iii
nnn
n
iiinnnn
aXa
XXaXaXa
XaXXX
(16-36)
.1 ,0} { * nkXE kn (16-37)
1
1
** .1 ,0}{} {n
ikiikn nkXXEaXE (16-38)
PILLAI
11
ˆ = ,n
n i ii
X a X
18
stochastic process so that
Thus (16-38) becomes
Expanding (16-40) for we get the following set oflinear equations.
Similarly using (16-25), the minimum mean square error is given by
** )(}{ ikkiki rrkiRXXE
)(tX
(16-39)
.1 ,1 ,0}{ 1
1
1
* nkaraXE n
n
ikiikn
(16-40)
,,,2,1 nk
.0
20
10
10*
33*
22*
11
121302*
11
1231201
nkrrararara
krrararara
krrararara
nnnn
nnn
nnn
(16-41)
PILLAI
19
The n equations in (16-41) together with (16-42) can be represented as
Let
.
}){(
} {} {}|{|
01*
23*
12*
1
1
1
*1
1
1
*1
*1
*22
rrararara
raXXaE
XEYEE
nnnn
n
iini
n
inii
nnnn
(16-42)
.
0
0
0
0
1
2n
3
2
1
0*
1*
1*
10*
2*
1
20*
1*2
110*
1
210
n
nn
nn
n
n
n
a
a
a
a
rrrr
rrrr
rrrr
rrrr
rrrr
(16-43)
PILLAI
20
Notice that is Hermitian Toeplitz and positive definite. Using (16-44), the unknowns in (16-43) can be represented as
Let
nT
.
0*
1*
1*
110*
1
210
rrrr
rrrr
rrrr
T
nn
n
n
n
(16-44)
1
2
2n
13
2
1
of
column
Last
0
0
0
0
1
n
nn
n T
T
a
a
a
a
(16-45)
PILLAI
21
Then from (16-45),
Thus
.
1,12,11,1
1,22221
1,11211
1
nnn
nn
nn
nnnn
nnnn
n
TTT
TTT
TTT
T
.
1
1,1
1,2
1,1
2
2
1
nnn
nn
nn
n
nT
T
T
a
a
a
,0 1
1,12 nn
nn T
(16-46)
(16-47)
(16-48)PILLAI
22
and
Eq. (16-49) represents the best linear predictor coefficients, and they can be evaluated from the last column of in (16-45). Using these,The best one-step ahead predictor in (16-35) taken the form
and from (16-48), the minimum mean square error is given by the(n +1, n +1) entry of
From (16-36), since the one-step linear prediction error
.
1
1,1
1,2
1,1
1,1
2
1
nnn
nn
nn
nnn
n T
T
T
T
a
a
a
(16-49)
nT
.1nT
.)(1ˆ
1
1,1,11
n
ii
ninnn
nn XT
TX (16-50)
,11111 XaXaXaX nnnnnn (16-51)PILLAI
23
we can represent (16-51) formally as follows
Thus, let
them from the above figure, we also have the representation
The filter
represents an AR(n) filter, and this shows that linear prediction leadsto an auto regressive (AR) model.
nn
nnn zazazaX
1 1
21
11
,1)( 12
11 n
nnn zazazazA
(16-52)
. )(
1 1 n
nn X
zA
nnnn zazazazA
zH
12
111
1)(
1)(
(16-53)
PILLAI
24
The polynomial in (16-52)-(16-53) can be simplified using (16-43)-(16-44). To see this, we rewrite as
To simplify (16-54), we can make use of the following matrix identity
)(zAn
)(zAn
2n
11)1(
2
1
1)1(
121
)1(21
0
0
0
]1,,,,[
1
]1,,,,[
1)(
nnn
n
nn
nnnn
n
Tzzz
a
a
a
zzz
zazazazazA
(16-54)
PILLAI
.
0
0
1
BCADC
A
I
ABI
DC
BA(16-55)
25
Taking determinants, we get
In particular if we get
Using (16-57) in (16-54), with
.
1BCADADC
BA (16-56)
,0D
.0
)1( 1
C
BA
ABCA
n
(16-57)
2
1)1(
0
, ],1,,,,[
n
nnn BTAzzzC
PILLAI
26
we get
Referring back to (16-43), using Cramer’s rule to solve for we get
.
1
||
0 1,,,
0
0
0
||
)1( )(
1)1(
10*
2*
1
110*
1
210
2
1
2
zzz
rrrr
rrrr
rrrr
T
zz
T
TzA
nn
nn
n
n
n
n
n
n
n
n
n
n
(16-58)
),1(1 na
PILLAI
1||
||
||
1201
10
2
1
n
nn
n
n
n
n
n T
T
T
rr
rr
a
27
or
Thus the polynomial (16-58) reduces to
The polynomial in (16-53) can be alternatively represented as
in (16-60), and in fact represents a stable
.0||
||
1
2 n
nn T
T
.1
1
||
1 )(
12
11
1)1(
10*
2*
1
110*
1
210
1
nnn
nn
nn
n
n
nn
zazaza
zzz
rrrr
rrrr
rrrr
TzA
(16-59)
(16-60)
PILLAI
)(zAn
)( ~ )(
1)( nAR
zAzH
n
28
AR filter of order n, whose input error signal is white noise of constant spectral height equal to and output is It can be shown that has all its zeros in provided thus establishing stability.
Linear prediction ErrorFrom (16-59), the mean square error using n samples is given
by
Suppose one more sample from the past is available to evaluate ( i.e., are available). Proceeding as abovethe new coefficients and the mean square error can be determined.From (16-59)-(16-61),
n||/|| 1nn TT .1nX
1nX
.0||
||
1
2 n
nn T
T (16-61)
011 ,,,, XXXX nn 2
1n
PILLAI
.||
|| 121
n
nn T
T (16-62)
)(zAn 1|| z0|| nT
29
Using another matrix identity it is easy to show that
Since we must have or for every n.From (16-63), we have
or
since Thus the mean square error decreases as moreand more samples are used from the past in the linear predictor.In general from (16-64), the mean square errors for the one-step predictor form a monotonic nonincreasing sequence
).||1( ||
|||| 2
11
2
1
nn
nn s
T
TT (16-63)
,0 || kT 0)||1( 21 ns 1 || 1 ns
)||1( ||
||
||
|| 21
1
1
221
nn
n
n
n sT
T
T
T
nn
, )||1( 221
221 nnnn s (16-64)
PILLAI
.1)||1( 21 ns
30
whose limiting value Clearly, corresponds to the irreducible error in linear
prediction using the entire past samples, and it is related to the powerspectrum of the underlying process through the relation
where represents the power spectrum of For any finite power process, we have
and since Thus
2221
2 knn (16-65)
.02
)(nTX
).(nTX( ) 0XX
S
2
1exp ln ( ) 0.
2 XXS d
(16-66)
( ) ,
XXS d
PILLAI
02
ln ( ) ( ) .
XX XXS d S d
( ( ) 0), ln ( ) ( ).XX XX XX
S S S
(16-67)
31
Moreover, if the power spectrum is strictly positive at everyFrequency, i.e.,
then from (16-66)
and hence
i.e., For processes that satisfy the strict positivity condition in (16-68) almost everywhere in the interval the final minimum mean square error is strictly positive (see (16-70)).i.e., Such processes are not completely predictable even using their entire set of past samples, or they are inherently stochastic,
( ) 0, in - ,XX
S
ln ( ) .
XXS d
2
1exp ln ( ) 0
2 XXS d e
(16-68)
(16-69)
(16-70)
),,(
PILLAI
32
since the next output contains information that is not contained in the past samples. Such processes are known as regular stochasticprocesses, and their power spectrum is strictly positive.
)(XX
S
Power Spectrum of a regular stochastic Process
PILLAI
Conversely, if a process has the following power spectrum,
such that in then from (16-70), ( ) 0XX
S 21 .02
2
1
)(XX
S
33
Such processes are completely predictable from their past data samples. In particular
is completely predictable from its past samples, since consistsof line spectrum.
in (16-71) is a shape deterministic stochastic process.