Bayesian Scientific Computing, Spring 2013 (N. Zabaras) Gaussian Linear Models Prof. Nicholas Zabaras Materials Process Design and Control Laboratory Sibley School of Mechanical and Aerospace Engineering 101 Frank H. T. Rhodes Hall Cornell University Ithaca, NY 14853-3801 Email: [email protected]URL: http://mpdc.mae.cornell.edu/ January 24, 2014 1
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Gaussian Linear Models
Prof. Nicholas Zabaras Materials Process Design and Control Laboratory
Sibley School of Mechanical and Aerospace Engineering 101 Frank H. T. Rhodes Hall
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Information/Canonical Parametrization
4
A useful application of this parametrization is the multiplication of Gaussians. You can show the following:
Compare with the corresponding more complicated
expression in terms of moments:
( ) ( ) ( )c f f c g g c f g f gλ λ λ λ= +, , + ,ξ ξ ξ ξN N N
( ) ( )2 2 2 2
2 22 2 2 2
f g g f f gf f g g
f g f g
µ σ µ σ σ σµ σ µ σ
σ σ σ σ
=
+, , ,
+ +N N N
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Bayes’ Theorem and Gaussian Linear Models
5
Consider a linear Gaussian model: A Gaussian marginal distribution p(x) and a Gaussian conditional distribution p(y|x) in which p(y|x) has a mean that is a linear function of x, and a covariance which is independent of x.
We want using Bayes’ rule to find p(y) and p(x|y).
We start with the joint distribution over z=(x,y) which is
quadratic in the components of z – so p(z) is a Gaussian.
( ) ( )( ) ( )
1
1
| ,
| | ,
p
p
−
−
=
= +
N
N
x x
y x y Ax b L
µ Λ
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Bayes’ Theorem and Gaussian Processes
6
( ) ( )( ) ( )
1
1
| ,
| | ,
p
p
−
−
=
= +
N
N
x x
y x y Ax b L
µ Λ
( ) ( ) 1ln ) ln ln | ( ) ( )2
1 ( ) ( )2
T
T
p( p p
const
= + = − − −
− − − +
z x y x x x
y Ax - b L y Ax - b
µ µΛ
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Covariance of the Joint Distribution
7
We can immediately write down the covariance of z.
Here we used an earlier result on matrix inversion.
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Example of Linear Gaussian Systems: Inferring the Mean
14
The posterior precision is the sum of the precision of the prior plus one contribution of the data precision for each observed data point. For N→∞ the posterior peaks around the µML and the posterior variance goes to zero, i.e. MLE estimate is recovered within the Bayesian paradigm.
If we apply the data sequentially, we can write for the posterior mean
after the collection of one data point (N=1) the following:
Shrinkage is often measured also with the signal-to-noise ratio:
How about when In this case note that
2 2 2 22 0 0
02 2 2 2 2 20 0 0
,N N MLNand
N N Nσ σ σ σσ µ µ µσ σ σ σ σ σ
= = ++ + +
20 ?σ →∞
22N N MLand
Nσσ µ µ→ →
2
1 0 02 20
( ) ( )y y shrinkage of the data y towards prior meanσµ µ µσ σ
= − −+
( )2 2 2
20 022
, ( ), ~ 0,X
SNR for y x observed signalσ µ ε ε σσε
+ = = = +
N
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Example of Linear Gaussian Systems: Inferring an Unknown Vector
15
Consider the following linear Gaussian system:
Consider an effective observation .*
From our earlier results with A=I, we have:
------------------------------- * Note that the effective observation comes with precision from . One can see this by writing the likelihood of the N data as which can equivalently be written as
{ }1 2 0 0, ,..., ~ ( , ), ~ ( , ).N y with prior=y y y y x xΣ ΣµN N
( ) ( )( ) ( )
1
1
| ,
| | ,
p
p
−
−
=
= +
N
N
x x
y x y Ax b L
µ Λ( ) ( ) ( ) ( )( )1 1
| | ( ,T T Tp− −
= +x y x A LA A L y b) A LAµ −Λ + Λ Λ +N
( ) ( ) ( ) ( )( )1 11 1 1 1 1 11 2 0 0 0| , ,..., | ,N y y yp N N N
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
-1 0 1-1
-0.5
0
0.5
1data prior
-1 0 1-1
-0.5
0
0.5
1
Bayesian Inference for the Mean of a 2d Gaussian
16
Illustration of Bayesian inference for the mean of a 2d Gaussian. (a) The data is generated from yi ∼ N(x,Σy), where x = [0.5, 0.5]T and Σy = 0.1[2, 1; 1, 1]. We assume the sensor noise covariance Σy is known but x is unknown. The black cross represents x. (b) The prior is p(x) = N(x|0, 0.1I2). (c) We show the posterior after 10 data points have been observed.
Think of this as identifying a missile location x from noisy measurements yi.
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Sensor Fusion
17
-1 -0.5 0 0.5 1 1.5-1.5
-1
-0.5
0
0.5
1
-1 -0.5 0 0.5 1 1.5-1.6
-1.4
-1.2
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
-0.5 0 0.5 1 1.5-1.4
-1.2
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
We observe y1 = (0,−1) (red cross) and y2 = (1, 0) (green cross) and infer E(μ|y1, y2, θ) (black cross). (a) Equally reliable sensors, so the posterior mean estimate is in between the two circles. (b) Sensor 2 is more reliable, so the estimate shifts more towards the green circle. (c) Sensor 1 is more reliable in the vertical direction, Sensor 2 is more reliable in the horizontal direction. The estimate is an appropriate combination of the two measurements.
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
We revisit the interpolation example discussed earlier but with noisy data. We observe N observations yi of x1,…,xN.
Here A is an NxD matrix that picks the observed elements out of x.
The prior is as before:
Using the linear Gaussian system results, we can compute the
needed posterior (see an example on the next slide).
The posterior mean can also be computed by solving the following (regularized) optimization problem (the 2nd term penalizes rapid variability of the data – 1st derivative Tikhonov regularizer)
18
Interpolating Noisy Data
D. Calvetti and E. Somersalo, Introduction to Bayesian Scientific Computing, 2007
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Interpolating Noisy Data
We now see that the prior precision λ effects the posterior mean as well as the posterior variance (in comparison to the case of no noise)
For a strong prior (large λ), the estimate is very smooth, and the uncertainty is low but for a weak prior (small λ), the estimate is wiggly, and the uncertainty (away from the data) is high.