Linear Least Squares Approximation By Kristen Bauer, Renee Metzger, Holly Soper, Amanda Unklesbay.

Linear Least SquaresLinear Least SquaresApproximationApproximation

By By

Kristen Bauer, Renee Metzger, Kristen Bauer, Renee Metzger,

Holly Soper, Amanda UnklesbayHolly Soper, Amanda Unklesbay

Linear Least SquaresLinear Least Squares

Is the line of best fit Is the line of best fit for a group of pointsfor a group of points

It seeks to minimize It seeks to minimize the sum of all data the sum of all data points of the square points of the square differences between differences between the function value and the function value and data value.data value.

It is the earliest form It is the earliest form of linear regressionof linear regression

Gauss and LegendreGauss and Legendre

The method of least squares was first published by Legendre in 1805 and by Gauss in 1809.Although Legendre’s work was published earlier, Gauss claims he had the method since 1795.Both mathematicians applied the method to determine the orbits of bodies about the sun.Gauss went on to publish further development of the method in 1821.

ExampleExampleConsider the points (1,2.1), (2,2.9), (5,6.1), and (7,8.3) with the best fit line f(x) = 0.9x + 1.4

The squared errors are:

x1=1 f(1)=2.3 y1=2.1 e1= (2.3 – 2.1)² = .04

x2=2 f(2)=3.2 y2=2.9 e2= (3.2 – 2.9)² =. 09

x3=5 f(5)=5.9 y3=6.1 e3= (5.9 – 6.1)² = .04

x4=7 f(7)=7.7 y4=8.3 e4= (7.7 – 8.3)² = .36

So the total squared error is .04 + .09 + .04 + .36 = .53

By finding better coefficients of the best fit line, we can make this error smaller…

We want to minimize the

vertical distance between the point

and the line.

• E = (d1)² + (d2)² + (d3)² +…+(dn)² for n data points

• E = [f(x1) – y1]² + [f(x2) – y2]² + … + [f(xn) – yn]²

• E = [mx1 + b – y1]² + [mx2 + b – y2]² +…+ [mxn + b – yn]²

• E= ∑(mxi+ b – yi )²

E must be MINIMIZED!E must be MINIMIZED!

How do we do this?How do we do this?E = ∑(mxi+ b – yi )²

Treat x and y as constants, since we are trying to find m and b.

So…PARTIALS!

E/E/m = 0 and m = 0 and E/E/b = 0b = 0

But how do we know if this will yield But how do we know if this will yield maximums, minimums, or saddle points?maximums, minimums, or saddle points?

Minimum Point Maximum Point

Saddle Point

Minimum!Minimum!

Since the expression Since the expression E is a sum of E is a sum of squares squares and is and is therefore positive (i.e. therefore positive (i.e. it looks like an upward it looks like an upward paraboloid), we know paraboloid), we know the solution must be a the solution must be a minimum.minimum.We can prove this by We can prove this by using the 2using the 2ndnd Partials Partials Derivative Test.Derivative Test.

22ndnd Partials Test Partials Test

And form the discriminant D = AC – BAnd form the discriminant D = AC – B22

1)1) If D < 0, then (xIf D < 0, then (x00,y,y00) is a ) is a saddle pointsaddle point..

2)2) If D > 0, then f takes onIf D > 0, then f takes onA A local minimumlocal minimum at (x at (x00,y,y00) if A > 0) if A > 0

A A local maximumlocal maximum at (x at (x00,y,y00) if A < 0) if A < 0

Af

xB

f

y xC

f

y

2

2

2 2

2, ,

Suppose the gradient of f(xSuppose the gradient of f(x00,y,y00) = 0. ) = 0.

(An instance of this is (An instance of this is E/E/m = m = E/E/b = 0.)b = 0.)

We setWe set

Calculating the DiscriminantCalculating the Discriminant

Af

x

AE

m

Am x b y

m

Ax m x b y

m

A x

A x

2

2

2

2

2 2

2

2

2

2

2

2

( )

( )( )

( )

Bf

y x

BE

b m

Bm x b y

b m

Bx m x b y

bB x

B x

2

2

2 2

2

2

2

( )

( )( )

( )

Cf

y

CE

b

Cm x b y

b

Cm x b y

bC

C

2

2

2

2

2 2

2

2

2

2 1

( )

( )( )

D A C B x x 2 2 24 1 4 ( ( ))

1)1) If D < 0, then (xIf D < 0, then (x00,y,y00) is a ) is a saddle pointsaddle point..2)2) If D > 0, then f takes onIf D > 0, then f takes on

A A local minimumlocal minimum at (x at (x00,y,y00) if A > 0) if A > 0

A A local maximumlocal maximum at (x at (x00,y,y00) if A < 0) if A < 0

Now D > 0 by an inductive proof showing that Now D > 0 by an inductive proof showing that

Those details are not covered in this Those details are not covered in this presentation.presentation.We know A > 0 since A = 2 We know A > 0 since A = 2 ∑ x∑ x22 is always is always positive (when not all x’s have the same value).positive (when not all x’s have the same value).

D A C B x x 2 2 24 1 4 ( ( ))

n x xi

n

ii

n

i

1

2

1

2

Therefore…Therefore…

Setting Setting E/E/m and m and E/E/b equal to zero will b equal to zero will yield two yield two minimizingminimizing equations of E, the equations of E, the sum of the squares of the error.sum of the squares of the error.

Thus, the linear least squares algorithmThus, the linear least squares algorithm(as presented) is valid and we can continue.(as presented) is valid and we can continue.

E = ∑(mxE = ∑(mxii + b – y + b – yii)²)² is minimized (as just shown) when is minimized (as just shown) when

the partial derivatives with respect to each of the the partial derivatives with respect to each of the variables is zero. ie: variables is zero. ie: E/E/m = 0m = 0 and and E/E/b = 0b = 0

E/b = ∑2(mxi + b – yi) = 0 set equal to 0 m∑xi + ∑b = ∑yi mSx + bn = SymSx + bn = Sy

E/m = ∑2xi (mxi + b – yi) = 2∑(mxi² + bxi – xiyi) = 0

m∑xi² + b∑xi = ∑xiyi

mSxx + bSx = SxymSxx + bSx = Sxy

NOTE:

∑xi = Sx ∑yi = Sy ∑xi² = Sxx ∑xiyi = SxSy

Next we will solve the system of Next we will solve the system of equations for unknowns m and b:equations for unknowns m and b:

nmSxx + bnSx = nSxy Multiply by n

mSxSx + bnSx = SySx Multiply by Sx

nmSxx – mSxSx = nSxy – SySx Subtract

m Sxx bSx Sxy

m Sx bn Sy

mnSxy SySx

nSxx SxSx

m(nSxx – SxSx) = nSxy – SySx Factor m

Solving for m…

Next we will solve the system of Next we will solve the system of equations for unknowns m and b:equations for unknowns m and b:

mSxSxx + bSxSx = SxSxy Multiply by Sx

mSxSxx + bnSxx = SySxx Multiply by Sxx

bSxSx – bnSxx = SxySx – SySxx Subtract

m Sxx bSx Sxy

m Sx bn Sy

bSxxSy SxySx

nSxx SxSx

b(SxSx – nSxx) = SxySx – SySxx Solve for b

Solving for b…

Example: Find the linear least squares Example: Find the linear least squares approximation to the data: (1,1), (2,4), (3,8)approximation to the data: (1,1), (2,4), (3,8)

mnSxy SySx

nSxx SxSx

Sx = 1+2+3= 6Sxx = 1²+2²+3² = 14Sy = 1+4+8 = 13Sxy = 1(1)+2(4)+3(8) = 33n = number of points = 3

The line of best fit is y = 3.5x – 2.667

Use these formulas:

bSxxSy SxySx

nSxx SxSx

b

14 13 33 6

3 14 6 6

16

62 667

( ) ( )

( ) ( ).

m

3 33 6 13

3 14 6 6

21

63 5

( ) ( )

( ) ( ).

Line of best fit: y = 3.5x – 2.667

-1 1 2 3 4 5

-5

5

10

15

THE ALGORITHMTHE ALGORITHM

in Mathematica in Mathematica

ActivityActivityFor this activity we are going to use the linear least For this activity we are going to use the linear least squares approximation in a real life situation.squares approximation in a real life situation.You are going to be given a box score from either a You are going to be given a box score from either a baseball or softball game.baseball or softball game.With the box score you are given you are going to write With the box score you are given you are going to write out the points (with the x coordinate being the number of out the points (with the x coordinate being the number of hits that player had in the game and the y coordinate hits that player had in the game and the y coordinate being the number of at-bats that player had in the game).being the number of at-bats that player had in the game).After doing that you are going to use the linear least After doing that you are going to use the linear least squares approximation to find the best fitting line.squares approximation to find the best fitting line.The slope of the besting fitting line you find will be the The slope of the besting fitting line you find will be the team’s batting average for that game. team’s batting average for that game.

In Conclusion…In Conclusion…

E = ∑(mxi+ b – yi )² is the sum of the squared error between the set of data points {(x1,y1),…,(xi,yi),…,(xn,yn)} and the line approximating the data f(x) = mx + b.

By minimizing the error by calculus methods, we get equations for m and b that yield the least squared error:

mnSxy SySx

nSxx SxSx

b

SxxSy SxySx

nSxx SxSx

AdvantagesAdvantagesMany common methods of approximating data Many common methods of approximating data seek to minimize the measure of difference seek to minimize the measure of difference between the approximating function and given between the approximating function and given data points. data points.

Advantages for using the squares of differences Advantages for using the squares of differences at each point rather than just the difference, at each point rather than just the difference, absolute value of difference, or other measures of absolute value of difference, or other measures of error include:error include:– Positive differences do not cancel negative differencesPositive differences do not cancel negative differences– Differentiation is not difficultDifferentiation is not difficult– Small differences become smaller and large differences Small differences become smaller and large differences

become largerbecome larger

DisadvantagesDisadvantages

Algorithm will fail if data points fall in a Algorithm will fail if data points fall in a vertical line.vertical line.

Linear Least Squares will not be the best Linear Least Squares will not be the best fit for data that is not linear. fit for data that is not linear.

The End

Linear Least Squares Approximation By Kristen Bauer, Renee Metzger, Holly Soper, Amanda Unklesbay.

Documents

mx n b y n e

y i msxx bsx

y i msx bn

fx n y n e

expression e

n data points e

discriminant slide

minimizing equations