Sampling plans for linear regression

Prediction variance

Sampling plans for linear regressionGiven a domain, we can reduce the prediction error by good choice of the sampling points.The choice of sampling locations is called design of experiments or DOE.Will consider DOEs for linear regression using linear and quadratic polynomials and where errors are due to noise in the data.With a given number of points the best DOE is one that will reduce the prediction variance (reviewed in next few slides).The simplest DOE is full factorial design where we sample each variable (factor) at a fixed number of values (levels)With k factors and three levels each we will sample 3k pointsPractical only for low dimensionsFor vvuq course will cover slides 1,5,6,9-13,14,16

1Linear Regression

2Model based error for linear regression

3Prediction varianceLinear regression model

Define then

With some algebra

Standard error

4Prediction variance for full factorial designRecall that standard error (square root of prediction variance is For full factorial design the domain is normally a box.Cheapest full factorial design: two levels (not good for quadratic polynomials).For a linear polynomial standard error is then

Maximum error at vertices

What does the ratio in the square root represent?

5Designs for linear polynomialsTraditionally use only two levels.Orthogonal design when XTX is diagonal.Full factorial design is orthogonal, not so easy to produce other orthogonal designs with less points.It is beneficial to place the points at the edges of the design domain.Stability: Small variation of prediction variance in domain is also desirable property.

The full factorial design is often not affordable because of the large number of points, especially in high dimensions. However, even with a smaller number of points, it pays to retain the property of orthogonality of the columns of X that leads to diagonal XTX . This is not easy to do with any number of points. The minimum number of points, is of course, n+1, when the number of points is equal to the number of coefficients of the linear polynomial. This is called a saturated design, and it does not allow us to estimate the noise from the data.

The quest for small values of the prediction variance, usually pushes the design towards placing points on the boundary of the domain. However, another desired property is called stability, which is limited variation of the prediction variance between different points in the domain. Stability is usually measured by the ratio of the highest to lowest prediction variance. This sometimes pushes us to add points at the center of the domain as we will see by examples later. 6Example

7Comparison

8Quadratic PolynomialA quadratic polynomial has (n+1)(n+2)/2 coefficients, so we need at least that many points.Need at least three different values of each variable.Simplest DOE is three-level, full factorial designImpractical for n>5Also unreasonable ratio between number of points and number of coefficientsFor example, for n=8 we get 6561 samples for 45 coefficients.My rule of thumb is that you want twice as many points as coefficients

A quadratic polynomial in n variables has (n+1)(n+2)/2 coefficients, so you need at least that many data points. You also need three different values of each variable. We can achieve both requirements with a full factorial design with three levels. However for n>5 this will normally give more points than we can afford to evaluate. In addition, for n>3 we will get unreasonable ratio between the number of points and number of coefficients. This ratio is normally around 2. For n=4 we have 15 coefficients and 81 points, and for n=8 we have 45 coefficients and 6561 points.9Central Composite Design

10Repeated observations at originUnlike linear designs, prediction variance is high at origin.Repetition at origin decreases variance there and improves stability.What other rationale for choosing the origin for repetition?Repetition also gives an independent measure of magnitude of noise.Can be used also for lack-of-fit tests.

11Without repetition (9 points)

Contours of prediction variance for spherical CCD design.How come it is rotatable?

12Center repeated 5 times (13 points).

With five repetitions we reduce the maximum prediction variance and greatly improve the uniformity.Five points is the optimum for uniformity.

We now add four repetitions at the origin, for a total of five points, and we get a much improved prediction variance, both in terms of the maximum and in terms of the stability. This design can be obtained from Matlab by using the ccdesign function with the call below. The function call tells it to repeat points at the center. We can give the actual number of repetitions, or ask it as we do below, for the number that would lead to the most uniform prediction variance.

d=ccdesign(2,'center', 'uniform')d = -1.0000 -1.0000 -1.0000 1.0000 1.0000 -1.0000 1.0000 1.0000 -1.4142 0 1.4142 0 0 -1.4142 0 1.4142 0 0 0 0 0 0 0 0 0 0

13Top hat questionFor the case of fitting a quadratic polynomial (6 coefficients) in two dimensions, we reduced the maximum prediction variance from 9 to 3.5 by repeating the observation at the origin five times, requiring 13 observations instead of 9.By what factor do you expect the prediction variance to change if you increased the number of points from 9 to 13 without targeting the point of highest variance?9/13; 3/7; 3/sqrt(13); sqrt(3/7)Variance optimal designsFull factorial and CCD are not flexible in number of pointsStandard errorA key to most optimal DOE methods is moment matrix

A good design of experiments will maximize the terms in this matrix, especially the diagonal elements.D-optimal designs maximize determinant of moment matrix.Determinant is inversely proportional to square of volume of confidence region on coefficients.

15ExampleGiven the model y=b1x1+b2x2, and the two data points (0,0) and (1,0), find the optimum third data point (p,q) in the unit square.We have

So that the third point is (p,1), for any value of pFinding D-optimal design in higher dimensions is a difficult optimization problem often solved heuristically

16Matlab example>> ny=6;nbeta=6;>> [dce,x]=cordexch(2,ny,'quadratic');>> dce' 1 1 -1 -1 0 1 -1 1 1 -1 -1 0scatter(dce(:,1),dce(:,2),200,'filled')>> det(x'*x)/ny^nbetaans = 0.0055With 12 points:>> ny=12;>> [dce,x]=cordexch(2,ny,'quadratic');>> dce' -1 1 -1 0 1 0 1 -1 1 0 -1 1 1 -1 -1 -1 1 1 -1 -1 0 0 0 1scatter(dce(:,1),dce(:,2),200,'filled')>> det(x'*x)/ny^nbetaans =0.0102

17Other criteriaA-optimal minimizes trace of the inverse of the moment matrix.This minimizes the sum of the variances of the coefficients.G-optimality minimizes the maximum of the prediction variance.

18ExampleFor the previous example, find the A-optimal design

Minimum at (0,1), so this point is both A-optimal and D-optimal.

ProblemsCreate a 13-point D-optimal design in two dimensional space and compare its prediction variance to that of the CCD design shown on Slide 13.Generate noisy data for the function y=(x+y)2 and fit using the two designs and compare the accuracy of the coefficients.