Prediction variance
Sampling plans for linear regressionGiven a domain, we can
reduce the prediction error by good choice of the sampling
points.The choice of sampling locations is called design of
experiments or DOE.Will consider DOEs for linear regression using
linear and quadratic polynomials and where errors are due to noise
in the data.With a given number of points the best DOE is one that
will reduce the prediction variance (reviewed in next few
slides).The simplest DOE is full factorial design where we sample
each variable (factor) at a fixed number of values (levels)With k
factors and three levels each we will sample 3k pointsPractical
only for low dimensionsFor vvuq course will cover slides
1,5,6,9-13,14,16
1Linear Regression
2Model based error for linear regression
3Prediction varianceLinear regression model
Define then
With some algebra
Standard error
4Prediction variance for full factorial designRecall that
standard error (square root of prediction variance is For full
factorial design the domain is normally a box.Cheapest full
factorial design: two levels (not good for quadratic
polynomials).For a linear polynomial standard error is then
Maximum error at vertices
What does the ratio in the square root represent?
5Designs for linear polynomialsTraditionally use only two
levels.Orthogonal design when XTX is diagonal.Full factorial design
is orthogonal, not so easy to produce other orthogonal designs with
less points.It is beneficial to place the points at the edges of
the design domain.Stability: Small variation of prediction variance
in domain is also desirable property.
The full factorial design is often not affordable because of the
large number of points, especially in high dimensions. However,
even with a smaller number of points, it pays to retain the
property of orthogonality of the columns of X that leads to
diagonal XTX . This is not easy to do with any number of points.
The minimum number of points, is of course, n+1, when the number of
points is equal to the number of coefficients of the linear
polynomial. This is called a saturated design, and it does not
allow us to estimate the noise from the data.
The quest for small values of the prediction variance, usually
pushes the design towards placing points on the boundary of the
domain. However, another desired property is called stability,
which is limited variation of the prediction variance between
different points in the domain. Stability is usually measured by
the ratio of the highest to lowest prediction variance. This
sometimes pushes us to add points at the center of the domain as we
will see by examples later. 6Example
7Comparison
8Quadratic PolynomialA quadratic polynomial has (n+1)(n+2)/2
coefficients, so we need at least that many points.Need at least
three different values of each variable.Simplest DOE is
three-level, full factorial designImpractical for n>5Also
unreasonable ratio between number of points and number of
coefficientsFor example, for n=8 we get 6561 samples for 45
coefficients.My rule of thumb is that you want twice as many points
as coefficients
A quadratic polynomial in n variables has (n+1)(n+2)/2
coefficients, so you need at least that many data points. You also
need three different values of each variable. We can achieve both
requirements with a full factorial design with three levels.
However for n>5 this will normally give more points than we can
afford to evaluate. In addition, for n>3 we will get
unreasonable ratio between the number of points and number of
coefficients. This ratio is normally around 2. For n=4 we have 15
coefficients and 81 points, and for n=8 we have 45 coefficients and
6561 points.9Central Composite Design
10Repeated observations at originUnlike linear designs,
prediction variance is high at origin.Repetition at origin
decreases variance there and improves stability.What other
rationale for choosing the origin for repetition?Repetition also
gives an independent measure of magnitude of noise.Can be used also
for lack-of-fit tests.
11Without repetition (9 points)
Contours of prediction variance for spherical CCD design.How
come it is rotatable?
12Center repeated 5 times (13 points).
With five repetitions we reduce the maximum prediction variance
and greatly improve the uniformity.Five points is the optimum for
uniformity.
We now add four repetitions at the origin, for a total of five
points, and we get a much improved prediction variance, both in
terms of the maximum and in terms of the stability. This design can
be obtained from Matlab by using the ccdesign function with the
call below. The function call tells it to repeat points at the
center. We can give the actual number of repetitions, or ask it as
we do below, for the number that would lead to the most uniform
prediction variance.
d=ccdesign(2,'center', 'uniform')d = -1.0000 -1.0000 -1.0000
1.0000 1.0000 -1.0000 1.0000 1.0000 -1.4142 0 1.4142 0 0 -1.4142 0
1.4142 0 0 0 0 0 0 0 0 0 0
13Top hat questionFor the case of fitting a quadratic polynomial
(6 coefficients) in two dimensions, we reduced the maximum
prediction variance from 9 to 3.5 by repeating the observation at
the origin five times, requiring 13 observations instead of 9.By
what factor do you expect the prediction variance to change if you
increased the number of points from 9 to 13 without targeting the
point of highest variance?9/13; 3/7; 3/sqrt(13); sqrt(3/7)Variance
optimal designsFull factorial and CCD are not flexible in number of
pointsStandard errorA key to most optimal DOE methods is moment
matrix
A good design of experiments will maximize the terms in this
matrix, especially the diagonal elements.D-optimal designs maximize
determinant of moment matrix.Determinant is inversely proportional
to square of volume of confidence region on coefficients.
15ExampleGiven the model y=b1x1+b2x2, and the two data points
(0,0) and (1,0), find the optimum third data point (p,q) in the
unit square.We have
So that the third point is (p,1), for any value of pFinding
D-optimal design in higher dimensions is a difficult optimization
problem often solved heuristically
16Matlab example>> ny=6;nbeta=6;>>
[dce,x]=cordexch(2,ny,'quadratic');>> dce' 1 1 -1 -1 0 1 -1 1
1 -1 -1 0scatter(dce(:,1),dce(:,2),200,'filled')>>
det(x'*x)/ny^nbetaans = 0.0055With 12 points:>>
ny=12;>> [dce,x]=cordexch(2,ny,'quadratic');>> dce' -1
1 -1 0 1 0 1 -1 1 0 -1 1 1 -1 -1 -1 1 1 -1 -1 0 0 0
1scatter(dce(:,1),dce(:,2),200,'filled')>>
det(x'*x)/ny^nbetaans =0.0102
17Other criteriaA-optimal minimizes trace of the inverse of the
moment matrix.This minimizes the sum of the variances of the
coefficients.G-optimality minimizes the maximum of the prediction
variance.
18ExampleFor the previous example, find the A-optimal design
Minimum at (0,1), so this point is both A-optimal and
D-optimal.
ProblemsCreate a 13-point D-optimal design in two dimensional
space and compare its prediction variance to that of the CCD design
shown on Slide 13.Generate noisy data for the function y=(x+y)2 and
fit using the two designs and compare the accuracy of the
coefficients.