Pt4Part 4 Chapter13Chapter 13 - CAUcau.ac.kr/~jjang14/NAE/Chap13.pdf · 2009. 10. 23. · Introduction ((/)2/2) Plot of forces vs. wind velocity for an object suspended in a wind
Post on 07-Mar-2021
1 Views
Preview:
Transcript
P t 4Part 4Chapter 13Chapter 13
Li R iLinear Regression
All images copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
PowerPoints organized by Dr. Michael R. Gustafson II, Duke UniversityRevised by Prof. Jang, CAU
Chapter ObjectivesChapter Objectives
F ili i i lf ith b i d i ti• Familiarizing yourself with some basic descriptive statistics and the normal distribution.
• Knowing how to compute the slope and interceptKnowing how to compute the slope and intercept of a best fit straight line with linear regression.
• Knowing how to compute and understand the i f th ffi i t f d t i ti dmeaning of the coefficient of determination and
the standard error of the estimate.• Understanding how to use transformations toUnderstanding how to use transformations to
linearize nonlinear equations so that they can be fit with linear regression.K i h i l li i i h• Knowing how to implement linear regression with MATLAB.
Introduction (1/2)Introduction (1/2)
A free-falling bungee jumper is subjected to the air
resistance force. This force was proportional to the p p
square of velocity as in . Experiments can
play a critical role in this formulation
2vcF dD
play a critical role in this formulation.
Wind tunnel experiment to measure how the force of air resistance depends on velocity.
Introduction (2/2)( / )
Plot of forces vs. wind velocity for an object suspended in a
wind tunnel
v (m/s) 10 20 30 40 50 60 70 80
F (N) 25 70 380 550 610 1220 830 1450
The forces increase with increasing velocity.
What kind of relationship between forces and velocities? Linear, square, or others?
How to fit a “best” line or curve to these data?
Statistics ReviewMeasure of Location
h h f• Arithmetic mean: the most common measure of central tendency.
The sum of the individual data points (y ) divided by the
y yi
n– The sum of the individual data points (yi) divided by the
number of points n:
• Median: the midpoint of a group of data.– In the odd number of measurements, the median is the
middle valuemiddle value.– In the even number of measurements, the median is the
arithmetic mean of two middle values.
• Mode: the value that occurs most frequently in a group of data.
Statistics ReviewMeasures of Spread
Standard deviation: the most common spread for a sample• Standard deviation: the most common spread for a sample about the mean.
sy St
n 1
where St is the sum of the squares of the data residuals:
n 1
S 2
and n-1 is referred to as the degrees of freedom.• Variance:
St yi y
2Variance:
sy2
yi y 2
n 1
yi2 yi 2 /n
n 1• Coefficient of variation: the ratio of standard deviation to
the mean.c v
sy 100%c.v.y 100%
Normal Distribution• Data distribution: the shape with which the data is spread around
the mean.• A histogram is constructed by sorting the measurements into
intervals, or bins.• If we have a very large set of data the histogram can beIf we have a very large set of data, the histogram can be
approximated by a smooth curve, which is symmetric, bell-shaped curve called the normal distribution.
Histograms og a
Normal distribution
2d2b tTh ts.measuremen total theof %95 68% encompass will
2and2between rangeThe yyyy sysysysy
Descriptive Statistics in MATLABDescriptive Statistics in MATLAB
MATLAB h l b ilt i d t• MATLAB has several built-in commands to compute and display descriptive statistics. Assuming some column vector s:Assuming some column vector s: – mean(s), median(s), mode(s)
• Calculate the mean, median, and mode of s. mode is a part of the statistics toolboxpart of the statistics toolbox.
– min(s), max(s)• Calculate the minimum and maximum value in s.
– var(s), std(s)• Calculate the variance and standard deviation of s
• Note if a matrix is given the statistics will be• Note - if a matrix is given, the statistics will be returned for each column.
Histograms in MATLABHistograms in MATLAB
[ ] hi ( )• [n, x] = hist(s, x)– Determine the number of elements in each bin of
data in s x is a vector containing the center valuesdata in s. x is a vector containing the center values of the bins.
• [n x] = hist(s m)[n, x] hist(s, m)– Determine the number of elements in each bin of
data in s using m bins. x will contain the centers of gthe bins. The default case is m=10
• hist(s, x) or hist(s, m) or hist(s)– With no output arguments, hist will actually
produce a histogram.
Histogram ExampleHistogram Example
Linear Least-Squares Regression
Linear least squares regression is a method to• Linear least-squares regression is a method to determine the “best” coefficients in a linear model for given data set.given data set.
• “Best” for least-squares regression means minimizing the sum of the squares of the estimate residuals, q ,which are differences between the model and the observations. For a straight line model, this gives:
n n
Sr ei2
i1
n
yi a0 a1xi 2i1
n
• This method will yield a unique line for a given set of d t
y a0 a1x
data.
Least-Squares Fit of a Straight gLine
• For a minimum to occur, it is necessary that 0)(2 10
ii
r xaayaS
0])[(2 10 iii
r xxaayaS
0 a 1 a
ii xaay 100 2100 iiii xaxayx
h b l d
ii yaxna 10 iiii yxaxax 12
0
• These two equations can be solved simultaneously for
221
ii
iiii
xxn
yxyxna xaya 10
lyrespectiveandofmeanstheandwhere yxyx .lyrespective,andofmeansthe andwhere yxyx
Example 13.2pV
(m/s)F
(N) a1 n xiyi xi yi
2 2
8 312850 360 5135 8 20400 360 2 19.47024( ) ( )
i xi yi (xi)2 xiyi
1 10 25 100 250
n xi2 xi 8 20400 360
a0 y a1x 641.875 19.47024 45 234.2857
135,53602 20 70 400 1400
3 30 380 900 11400 vF 47024.192857.234
875.6418135,5 45
8360
yx
4 40 550 1600 22000
5 50 610 2500 30500
6 60 1220 3600 73200
7 70 830 4900 58100
8 80 1450 6400 116000
360 5135 20400 312850
Quantification of ErrorQuantification of Error• Recall for a straight line the sum of theRecall for a straight line, the sum of the
squares of the estimate residuals:
2102
n
ii
n
ir xaayeS
liithdd tb t th distance verticalrepresents where
110
1
i
iii
iir
e
y
• Standard error of the estimate: quantify how
line.regressiontheanddataebetween th
• Standard error of the estimate: quantify how good the fit (regression line) is.
Ssy / x Sr
n 2
Standard Error of the EstimateStandard Error of the Estimate• Regression data showing (a) the spread of data aroundRegression data showing (a) the spread of data around
the mean of the dependent data and (b) the spread of the data around the best fit line:
sy St
n 1 sy / x Sr
n 2
S y y 2 n
ii xaayS 210
• The reduction in spread represents the improvement due t li i
St yi y i
iir xaayS1
10
to linear regression.
Coefficient of DeterminationCoefficient of Determination• The coefficient of determination r2 is the difference
b h f h f h d id lbetween the sum of the squares of the data residuals and the sum of the squares of the estimate residuals, normalized by the sum of the squares of the datanormalized by the sum of the squares of the data residuals:
r2 represents the percentage of the original uncertainty
r2 St Sr
Str2 represents the percentage of the original uncertainty explained by the model.St-Sr quantifies the improvement due to describing theSt Sr quantifies the improvement due to describing the data in terms of a straight line rather than a mean.
• For a perfect fit, Sr=0 and hence r2=1.• If r2=0 ->Sr=St , there is no improvement of fit over
simply picking the mean.If 2 0 th fit i th i l i ki th !• If r2<0, the fit is worse than simply picking the mean!
Example 13.3Example 13.3
V F 47024.192857.234 vFest
(m/s) (N)i xi yi a0+a1xi (yi- ȳ)2 (yi-a0-a1xi)2
1 10 25 39 58 380535 4171
216118
18082972
10
2
xaayS
yyS
iir
it
1 10 25 -39.58 380535 4171
2 20 70 155.12 327041 7245
3 30 380 349 82 68579 911 216118
26.50818
1808297
sy
3 30 380 349.82 68579 911
4 40 550 544.52 8441 30
5 50 610 739 23 1016 16699
79.18928
216118
/
/
ss
s
yxy
xy
5 50 610 739.23 1016 16699
6 60 1220 933.93 334229 81837
7 70 830 1128.63 35391 891808805.0
18082972161181808297
advantage.an has regressionlinear the
2
r
8 80 1450 1323.33 653066 16044
360 5135 1808297 216118
88.05% of the original uncertaintyhas been explained by the linear modellinear model
Nonlinear RelationshipsNonlinear Relationships• Linear regression is predicated on the fact that theLinear regression is predicated on the fact that the
relationship between the dependent and independent variables is linear - this is not always the case.
• The first step in any regression analysis is to plot and visually determine what kind of model (linear or
li ) i i tnonlinear) is appropriate.• Three other common examples of nonlinear models
are: i l xare: exponential : y 1e1x
power : y x2power : y 2x2
saturation - growth - rate : y 3xsaturation growth rate : y 3 3 x
Linearization of Nonlinear Relationships
O ti f fi di th ffi i t• One option for finding the coefficients for a nonlinear fit is to linearize it. For the three common models, this may involve taking logarithms or inversion:involve taking logarithms or inversion:
Model Nonlinear Linearized
exponential : y 1e1x ln y ln1 1x
power : y 2x2 log y log2 2 log x
x 1 1 3 1saturation - growth - rate : y 3x
3 x1y
13
3
3
1x
Transformation ExamplesTransformation Examples
Example 13.4 (1/2)p ( / )Q. Fit Eq. (13.23) to the data below using log q g g
transformation.
i )2x y x yx xy loglog log (log logi )2
12
1020
2570
1.0001.301
1.3981.845
1.0001.693
1.3982.401
ix iy ix iyix ixiy loglog log (log log
345
304050
380550610
1.4771.6021.699
2.5802.7402.785
2.1822.5672.886
3.8114.3904.732
678
607080
1,220830
1,450
1.7781.8451.903
3.0862.9193.161
3.1623.4043.622
5.4885.3866.016
12.606 20.515 20.516 33.622
Example 13 4 (2/2)Example 13.4 (2/2)loglogloglog
221log)(log
loglogloglog
ii
iiii
xxn
yxyxna xaya 10
5644.28515.20 5757.1
8606.12
yx
9842.1)606.12()516.20(8
)515.20(606.12)622.33(821
a
5620.0)5757.1(9842.15644.20 a
xy log9842156200log 9842.127410 vF xy log9842.15620.0log 2741.0 vF
xy logloglog 22 22
xy
Linear Regression ProgramLinear Regression Program
MATLAB FunctionsMATLAB Functions• MATLAB has a built-in function polyfit that fits aMATLAB has a built in function polyfit that fits a
least-squares nth order polynomial to data:– p = polyfit(x y n)– p = polyfit(x, y, n)
• x: independent data• y: dependent datay p• n: order of polynomial to fit• p: coefficients of polynomial
f( ) n n 1f(x)=p1xn+p2xn-1+…+pnx+pn+1
• MATLAB’s polyval command can be used to t l i th ffi i tcompute a value using the coefficients.
– y = polyval(p, x)
top related