Local surrogates

Slide 1

Local surrogatesTo model a complex wavy function we need a lot of data.Modeling a wavy function with high order polynomials is inherently ill-conditioned. With a lot of data we normally predict function values using only nearby values. We may fit several local surrogates as in figure.

For example, if you have the price of gasoline every first of the month from 2000 through 2009, how many values would you use to estimate the price on June 15, 2007?

Linear regression using low order polynomials is ideal for approximating simple functions when the data is contaminated with substantial noise. In that case we may want to have many data points in order to filter out the noise and this is done well by giving all the points equal weight. On the other hand, when we model a complex wavy function without much noise, we need the large number of points in order to capture the local behavior of the function.

It is possible to model a complex function with high order polynomials, but this Is inherently ill conditioned. Even if we use orthogonal polynomials to alleviate the ill conditioning, we often get poor accuracy.

Instead, with a lot of data about a wavy function, we should use only data near the point where we want to estimate the function. For example, if we have a table that lists the price of gasoline every first of the month from 2000 to 2009, and we want to estimate the price on June 15, 2007, we would probably interpolate linearly using the values for June 1, 2007 and July 1, 2007.

The figure compares a single global surrogate fitted to the entire data with three local surrogate fitted to the data in three different regions.

1Popular local surrogatesMoving least squares: Weighting more heavily points near the prediction location.Radial basis neural network: Regression with local functions that decay away from data points.Kriging: Radial basis functions, but fitting philosophy not based on error at data points but on correlation between function values at near and far points.

In this lecture we will study two local surrogates, moving least squares and radial basis neural networks. Moving least squares performs weighted linear regression with the points nearby having larger weights than far away points. Radial basis neural networks achieves a similar result by having the values of the data at points multiply functions that peak at the data point and decay rapidly away from that point. A third popular local surrogate, kriging, is covered in a separate lecture, because it is currently the most popular local surrogate and is more versatile than the other two. However, it is also much more computationally expensive, especially for large number of data points.2Review of Linear Regression

3Moving least squares

4Weighted least squares

5Six-hump camelback functionDefinition:

Function fit with moving least squares using quadratic polynomials.

6Effect of number of points and decay rate.

7Radial basis neural networks

a1a2a3x(x)InputOutputRadial basis functionW1W2W30.5Radial basis functions10-0.8330.833bInput

8In regression notation

9ExampleEvaluate the function y=x+0.5sin(5x) at 21 points in the interval [1,9], fit an RBF to it and compare the surrogate to the function over the interval[0,10].

Fit using default options in Matlab, achieves zero rms error by using all data points as basis functions (neurons)Very good interpolation, but even mild extrapolation is horrible.

10Accept 0.1 mean squared errornet=newrb(x,y,0.1,1,20,1); spread set to 1,( 11 neurons were used).

With about half of the data points used as basis functions, the fit is more like polynomial regression.Interpolation is not as good, but the trend is captured, so that extrapolation is not as disastrous.Obviously, if we just wanted to capture the trend, we would have been better with a polynomial.

At the other extreme, we can allow fairly large mean square error and get more robust fit in terms of the trend at the expense of poorer interpolation. Here we specify mean square error of 0.1, which would correspond to rms error of about 0.3.

As can be seen in the figure, the fit now captures the trend better, so that extrapolation is less risky. This is done by using only 11 of the 21 data points as neurons, so that the ratio between data points and coefficient is similar to what we normally look for in polynomial regression. However, this comes at the expense of much poorer accuracy in the interpolation range. Obviously, though, if we merely want to capture the overall trend, we should not be using a purely local surrogate like radial basis neural network. Kriging, for example, uses the same local shape functions, but it also permits modeling a trend, so the local functions are used only to model the departure from the trend.11Too narrow a spreadnet=newrb(x,y,0.1,0.2,20,1); ( 17 neurons used)

With a spread of 0.2 and the points being 0.4 apart (21 points in [1,9]), the shape functions decay to less than 0.02 at the nearest point.This means that each data point if fitted individually, so that we get spikes at data points.A rule of thumb is that the spread should not be smaller than the distance to the nearest point.

12ProblemsFit the example with weighted least squares. You can use Matlabs lscov to perform the fit. Compare the fit to the one obtained with the neural network fit.Repeat the example with 41 points, experimenting with the parameters of newrb. How much of what you see did you expect?

Local surrogates

Documents