– Simple Linear Regression – Polynomial Regression – Multiple Regression – Statistical Analysis of L-S Theory – Non-Linear Regression Part 5a: LEAST-SQUARES REGRESSION
– Simple Linear Regression
– Polynomial Regression
– Multiple Regression
– Statistical Analysis of L-S Theory
– Non-Linear Regression
Part 5a: LEAST-SQUARES REGRESSION
m v
t
Introduction:
Consider the falling object in air problem:
0v
m
m
1v
0t
1t
nt nv (t) values considered to be error-free.Every measurement of (v) contain some error.Assume error in (v) are normally distributed
(random error).Find “best fit” curve to represent v(t)
“Best fit”
n
i
n
i
iifitmeasr xaayyyS1 1
2
10
2 )()(
Simple Linear Regression
xaay 10
Find a0 and a1 that minimizes Sr (least-square)
Consider a set of n scattered data Find a line that “best fits” the scattered data
There are a number of ways to define the “best fit” line. However we want to find one that is unique, i.e., for a particular set of data. A uniquely defined best-fit line can be found by minimizing the sum of the square of the residuals from each data point:
sum of the square of the residuals (or spread)
a0= intercepta1= slope
To minimize Sr (a0 , a1), differentiate and set to zero:
0)(21
10
0
n
i
iir xaay
a
S
0])[(21
10
1
n
i
iiir xxaay
a
S
or
ii xaay 100 2
100 iiii xaxaxy
ii yaxna 10
iiii yxaxax 1
2
0
Need to solve these simultaneous equations for the unknowns a0
and a1
Normal equations for simple linear L-S regression
Solution for a1 and a0 gives
221
ii
iiii
xxn
yxyxna and xay
n
xa
n
ya
ii
110
EX: Find linear fit for the set of measurements:
x y
1 0.5
2 2.5
3 2.0
4 4.0
5 3.5
6 6.0
7 5.5
5.119ii yx
7n
1402
ix
28ix 24iy
4286.3y4x
839.0)28()140(7
)24(2)5.119(721a
0714.0)4(839.04286.30axy 839.00714.0
Quantification of Error:
2/
n
Ss r
xy
standard error of the L-S estimate
2
1
)(n
i
it yyS
2
1
10 )(n
i
iir xaayS
standard deviation
Sum of the square of the residuals for the mean
sum of the square of the residuals for the linear regression
1n
Ss t
y
All these approaches are based on the assumptions:x > error-freey > normal error
t
rt
S
SSr 2 r “correlation coefficient”
Sr =0 (r=1) Perfect fitSr =St (r=0) No improvement by fitting the line
2222
iiii
iiii
yynxxn
yxyxnr
Alternative formulation for the correlation coefficient
Note: r 1 does not always necessarily mean that the fit is “good”. You should always plot the data along with the regression curve to see the goodness of the fit.
Four set of data with same r=0.816
“Coefficient of determination” is defined as
Linearization of non-linear relationships:
Many engineering applications involve non-linear relationships, e.g., exponential, power law, or saturated growth rate.
xbeay 1
12
2
bxay
xb
xay
3
3
exponential power-law saturated growth-rate
These relationships can be linearized by some mathematical operations:
xbay 11lnln 22 logloglog axby
33
3 111
axa
b
y
Linear L-S fit can be applied to find the coefficients.
EX: Fit a power law relationship to the following dataset:
Calculate logarithm of both data:
x y
1 0.5
2 1.7
3 3.4
4 5.7
5 8.4
log x log y
0 -0.301
0.301 0.226
0.477 0.534
0.602 0.753
0.699 0.922
2
2
bxay
22 logloglog axby
Power law model
(find a2 and b2)
Applying simple linear regression gives;
slope=1.75 and intercept=-0.300
75.12b 300.0log 2a 5.02a
75.15.0 xy
Polynomial Regression
In some cases, we may want to fit our data to a curve rather than a line. We can then apply polynomial regression (In fact, linear regression is nothing but an n=1 polynomial regression).
Data to fit to a second order polynomial:
2
210 xaxaay
2
1
2
210
2
1
,, )()(n
i
iii
n
i
fitiobsir xaxaayyyS
Sum of the square of the residuals (spread)
To minimize Sr(a0, a1, a2), take derivatives and equate to zero:
0)(21
2
210
0
n
i
iiir xaxaay
a
S
0)(21
2
210
1
n
i
iiiir xaxaayx
a
S
0)(21
2
210
2
2
n
i
iiiir xaxaayx
a
S
Three linear equations with three unknowns a0, a1, a2 :
iii yaxaxan 2
2
10)(
iiiii yxaxaxax 2
3
1
2
0
iiiii yxaxaxax 2
2
4
1
3
0
2 all summations are i=1..n
“normal equations”
This set of equations can be solved by any linear solution techniques (e.g., Gauss elimination, LU Dec., Cholesky Dec., etc.)
The approach can be generalized to order (m) polynomial following the same way. Now, the fit function becomes
m
mxaxaxaay ..2
210
This will require the solution of an order (m+1) system of linear equations. The standard error becomes
)1(/
mn
Ss r
xy
Because (m+1) degrees of freedom was lost from data of (n) due to extraction of (m+1) coefficients .
EX 17.5: Fit an 2nd order polynomial to the following data xi yi
0 2.1
1 7.7
2 13.6
3 27.2
4 40.9
5 61.1
2m
6n 15ix
6.152iy
552
ix 2253
ix 9794
ix
6.585ii yx
8.24882
ii yx
System of linear equations:
We get
Then, the fit function:
Standard error:
where
8.2488
6.585
6.152
97922555
2255515
55156
2
1
0
a
a
a
86071.1
35929.2
47857.2
2
1
0
a
a
a
286071.135929.247857.2 xxy
12.1)3(6
74657.3
)1(/
mn
Ss r
xy
74657.3)86071.135929.247857.2(
26
1
2
i
iiir xxyS
Multiple Linear Regression
2x
1x
),( 21 xxy In some cases, data may have two or more independent variables. In this example, for a function of two variables, the linear regression gives a planar fit function.
Function to fit
22110 xaxaay
Sum of the square of the residuals (spread)
2
1
22110
2
1
,, )()(n
i
iii
n
i
fitiobsir xaxaayyyS
Minimizing the spread function gives:
0)(21
22110
0
n
i
iiir xaxaay
a
S
0)(21
221101
1
n
i
iiiir xaxaayx
a
S
0)(21
221102
2
n
i
iiiir xaxaayx
a
S
The system of equations to be solved:
ii
ii
i
iiii
iiii
ii
yx
yx
y
a
a
a
xxxx
xxxx
xxn
2
11
2
1
0
2
2212
21
2
11
21Normal equations for multiple linear regression
EX 17.7: Fit a planar surface to the following data
We first do the following calculations:
x1 x2 y
0 0 5
2 1 10
2.5 2 9
1 3 0
4 6 3
7 2 27
y x1 x2 x1x1 x2x2 x1x2 x1y x2y
5 0 5 0 0 0 0 0
10 1 10 4 1 2 20 10
9 2 9 6.25 4 5 22.5 18
0 3 0 1 9 3 0 0
3 6 3 16 36 24 12 18
27 2 27 49 4 14 189 54
54 16.5 14 76.25 54 48 243.5 100
The system of equations to calculate the fit coefficients:
returns
The fit function
100
5.243
54
544814
4825.765.16
145.166
2
1
0
a
a
a
50a 41a 32a
21 345 xxy
For the general case of a function of m-variables, the same strategy can applied. The fit function in this case:
mmxaxaxaay ..22110
Standard error:
)1(/
mn
Ss r
xy
A useful application of multiple regression is for fitting a power law equation of multiple variables of the form:
ma
m
aaxxxay ..21
210
Linearization of this equation gives
mm xaxaay log...logloglog 110
The coefficients in the last equation can be calculated using multiple linear regression, and can be substituted to the original power law equation.
Generalization of L-S Regression:
In the most general form, L-S regression can be stated as
mmzazazay ...1100
functions
mm xzxzz ...,,,1 110 Multiple regression
m
m xzxzxz ...,,, 1
1
0
0 Polynomial regression
In general, this form is called “linear regression” as the fitting coefficients are linearly dependant on the fit function.
Other functions can be defined for fitting as well, e.g.,
tataay sincos 210
ezazazay mm...1100
For a particular data point
For n data (in matrix form):
eaZy
nmnn
m
zzz
zzz
Z
10
11110
...
...
...
Calculated based on the measured independant variables
m: order of the fit functionn: number of data points
Z is generally not a square matrix.
1mn
ny
y
y
y...
2
1
ma
a
a
a...
1
0
residuals
ne
e
e
e...
2
1
data coefficients
Sum of the square of the residuals:
2
1 0
)(n
i
m
j
jijir zayS
To determine the fit coefficients, minimize ),..,,( 10 mr aaaS
This is equivalent to the following:
yZaZZTT
Normal equations for the general L-S regression
This is the general representation of the normal equations for L-S regression including simple linear, polynomial, and multiple linear regression methods.
Solution approaches:
yZaZZTT
A symmetric and squarematrix of size [m+1 , m+1]
Elimination methods are best suited for the solution of the above linear system:
LU Decomposition / Gauss EliminationCholesky Decomposition
Especially, Cholesky decomposition is fast and requires less storage. Furthermore,Cholesky decomposition is very appropriate when the order of
the polynomial fit model (m) is not known beforehand. Successive higher order models can be efficiently developed.Similarly, increasing the number of variables in multiple
regression is very efficient using Cholesky decomposition.
Statistical Analysis of L-S Theory
Some definitions:
n
y
y
n
i
i
1 mean
1
2
n
yys
i
yStandard deviation
1
2
n
Ss t
yvariance
For a perfectly normal distribution:mean±std fall about 68% of the total data. mean±2std fall about 95% of the total data.
If a histogram of the data shows a bell shape curve, normallydistributed data.This has a well-defined statistics
: true mean: true std
Confidence intervals:
For 95% confidence interval =0.05
1,2/ n
yt
n
syL
2,2/ n
yt
n
syU
t-distribution (tabulated in books); in EXCEL tinv ( ,n)
significance level
1ULP
true mean
Confidence interval estimates intervals within which the parameter is expected to fall, with a certain degree of confidence.Find L and U values such that
e.g., for =0.05 and n=20t /2, n-1=2.086
T-distribution is used to compramize between a perfect and an imperfect estimate. For example, if data is few (small n), t-value becomes larger, hence giving a more conservative interval of confidence.
EX: Some measurements of coefficient of thermal expansion of steel (x10-6 1/°F):
Find the mean and corresponding 95% confidence intervals for the a) first 8 measurements b) first 16 measurements c) all 24 measurements.
For n=8
6.495 6.595 6.615 6.635 6.485 6.555
6.665 6.505 6.435 6.625 6.715 6.655
6.755 6.625 6.715 6.575 6.655 6.605
6.565 6.515 6.555 6.395 6.775 6.685
n=8
n=16
n=24
59.6y 089921.0ys 364623.218,2/05.01,2/ tt n
5148.6364623.28
089921.059.61,2/ n
yt
n
syL
6652.6364623.28
089921.059.62,2/ n
yt
n
syU
6652.65148.6
For eight measurements, there is a 95% probability that true mean falls between these values.
The cases of n=16 and n=24 can be performed in a similar fashion. Hence we obtain:
Results shows that confidence interval narrows down as the number of measurements increases (even though sy increases by increasing n!).
For n=24 we have 95% confidence that true mean is between 6.5590 and 6.6410.
n mean(y) sy t /2,n-1 L U
8 6.5900 0.089921 2.364623 6.5148 6.6652
16 6.5794 0.095845 2.131451 6.5283 6.6304
24 6.6000 0.097133 2.068655 6.5590 6.6410
Using matrix inverse for the solution of (a) is inefficient:
yZZZaTT 1
However, inverse matrix carries useful statistical informationabout the goodness of the fit.
1
ZZT
Diagonal terms variances (var) of the fit coefficients
Off -diagonal terms covariances (cov) of the fit coefficients
2
/1)var( xyiii sua
2
/,11 ),cov( xyjiji suaa
These statistics allow calculation of confidence intervals for the fit coefficients.
Confidence Interval for L-S regression:
uij: Elements of the inverse matrix
Inverse matrix
Calculating confidence intervals for simple linear regression:
)( 02,2/0 astaL n
)( 02,2/0 astaU n
)( 12,2/1 astaL n
)( 12,2/1 astaU n
xaay 10
For the intercept (a0)
Standard error for the coefficient (extracted from the inverse matrix)
)var()( ii aas
For the slope (a1)
EX 17.8: Compare results of measured versus model data shown below. a) Plot the measured versus model values. b) Apply simple linear regression formula to see the adequacy of the measured versus model data.c) Recompute regression using matrix approach, estimate standard error of the estimation and for the fit parameters, and develop confidence intervals.
Measured Value
Model value
10 8.953
16.3 16.405
23 22.607
27.5 27.769
31 32.065
35.6 35.641
39 38.617
41.5 41.095
42.9 43.156
45 44.872
46 46.301
45.5 47.49
46 48.479
49 49.303
50 49.988
0
10
20
30
40
50
60
0 20 40 60
a)
b) Applying simple linear regression formula gives
xy 032.1859.0x: measuredy: model
measured
mo
del
c) For the statistical analysis, first form the following [Z] matrix and (y) vector
Then,
Solution using the matrix inversion
501
....
....
3.161
101
Z
988.49
..
..
405.16
953.8
y
yZaZZTT
43.22421
741.552
21.221913.548
3.54815
1
0
a
a
yZZZaTT 1
031592.1
85872.0
43.22421
741.552
000465.001701.0
01701.0688414.0
1
0
a
a
Standard error for the fit function:
Standard error for the coefficients:
For a 95% confidence interval ( =0.05, n=13, Excel returns inv(0.05,13)=2.160368)
Desired values of slope=1 and intercept=0 falls in the intervals (hence we can conclude that a good fit exist between measured and model values).
863403.02
/n
Ss r
xy
716372.0)863403.0(688414.0)( 22
/110 xysuas
018625.0)863403.0(000465.0)( 22
/221 xysuas
547627.185872.0
)716372.0(160368.285872.0)( 02,2/00 astaa n
040237.0031592.1
)018625.0(160368.2031592.1)( 12,2/11 astaa n
Non-linear Regression
In some cases we must fit a non-linear model to the data, e.g.,
)1( 1
0
xaeay
parameters a0 and a1
are not linearly dependant on y
Generalized L-S formulation cannot be used for such models.Same approach of using sum of square of the residuals are
applied, but the solution is sought iteratively.
Gauss-Newton method:
A Taylor series expansion is used to (approximately) linearize the model. Then standard L-S theory can be applied to estimate the improved estimates of the fit parameters.
),..,;( 10 maaaxfy
In most general form
Taylor series around the fit parameters
1
1
0
0
1
)()()()( a
a
xfa
a
xfxfxf
jiji
iji
i: i-th data pointj: iteration number
1
1
0
0
)()(a
a
xfa
a
xfyy
jiji
fitmeas
In matrix form:
aZd j
00
0
2
0
2
0
1
0
1
......
a
f
a
f
a
f
a
f
a
f
a
f
Z
nn
j
)(
...
)(
)(
22
11
nn xfy
xfy
xfy
d 1
0
a
aa
iteration number
Then
Applying the generalized L-S formula
dZaZZT
jj
T
j
We solve the above system for ( A) for improved values of parameters:
0,01,0 aaa jj 1,11,1 aaa jj
The procedure is iterated until an acceptable error:
1,0
,01,0
0j
jj
aa
aa
1,1
,11,1
1j
jj
aa
aa