y = 9.24x 0.73 R 2 = 0.95 0.1 1 10 100 1000 10000 100000 0.001 1 1000 B ody w eight[kg] Brain w eight[g] M amm als y = 4.4x 0.53 R 2 = 0.19 1 10 100 1000 1 10 100 P oplarplantation A gricultural field G round beetles at tw o adjacent sites There is a hypothesis about dependent and independent variables The relation is supposed to be linear We have a hypothesis about the distribution of errors around the hypothesized regression line There is a hypothesis about dependent and independent variables The relation is non-linear We have no data about the distribution of errors around the hypothesized regression line There is no clear hypothesis about dependent and independent variables The relation is non-linear We have no data about the distribution of errors around the hypothesized regression line y = 1.16x + 4.17 R 2 = 0.49 0 20 40 60 80 100 0 10 20 30 40 50 # prey species # predatorspecies Assumptions of linear regression
19
Embed
There is a hypothesis about dependent and independent variables
Assumptions of linear regression. There is a hypothesis about dependent and independent variables The relation is supposed to be linear We have a hypothesis about the distribution of errors around the hypothesized regression line. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
y = 9.24x0.73
R2 = 0.950.1
1
10
100
1000
10000
100000
0.001 1 1000Body weight [kg]
Bra
in w
eigh
t [g]
z
Mammals
y = 4.4x0.53
R2 = 0.191
10
100
1000
1 10 100Poplar plantation
Agr
icul
tura
l fie
ld
z
Ground beetles at two adjacent sites
There is a hypothesis about dependent and independent variables
The relation is supposed to be linear
We have a hypothesis about the distribution of errors around the hypothesized regression line
There is a hypothesis about dependent and independent variables
The relation is non-linear
We have no data about the distribution of errors around the hypothesized regression line
There is no clear hypothesis about dependent and independent variables
The relation is non-linear
We have no data about the distribution of errors around the hypothesized regression line
y = 1.16x + 4.17R2 = 0.49
0
20
40
60
80
100
0 10 20 30 40 50# prey species
# pr
edat
or s
peci
es
z Assumptions of linear regression
Nicolaus Copernicus University – Department of Animal Ecology
y = 1.16x + 4.17R2 = 0.49
0
20
40
60
80
100
0 10 20 30 40 50# prey species
# pr
edat
or s
peci
es
z
0
5
10
15
20
25
30
0 5 10 15 20 25 30
x
y
y
y
y
y
Least squares method 2 2
1 1
( ) [ ( )]n n
i ii i
D y y ax b
Assumptions:
A linear model applies
The x-variable has no error term
The distribution of the y errors around the regression line is normal
1
1
2 ( ) 0
2 ( ) 0
n
i i ii
n
i ii
D x y ax baD y ax bb
2xy
x
a
Nicolaus Copernicus University – Department of Animal Ecology
y = 9.24x0.73
R2 = 0.950.1
1
10
100
1000
10000
100000
0.001 1 1000Body weight [kg]
Bra
in w
eigh
t [g]
z
Mammals The second example is nonlinear
We hypothesize the allometric relation W = aBz
)(
lnlnln
lnlnln
ExpaBW
BzaW
BzaW
aBW
z
z
Linearised regression model
Assumption:
The distribution of errors is lognormal
z
z
aBW
aBW
Nonlinear regression model
Assumption:
The distribution of errors is normal
Nicolaus Copernicus University – Department of Animal Ecology
Y=e0.1X+norm(0;Y) Y=X0.5enorm(0;Y)
y = 0.60x0.56
0.1
1
10
100
1 10 100 1000 10000x
y
y = 1.57x0.46y = 1.16e0.089x
0
20
40
60
80
100
0 20 40x
y
y = 1.04e0.098x
In both cases we have some sort of autocorrelation
Using logarithms reduces the effect of autocorrelation and makes the distribution of errors more homogeneous.
Non linear estimation instead puts more weight on the larger y-values.
If there is no autocorrelation the log-transformation puts more weight on smaller values.
Problem might arise from the intercorrelation between the predictor variables
(multicollinearity).
We solve the problem by a step-wise approach eliminating the variables that are either not significant or give unreasonable parameter
values
The variance explanation of this final model is higher than that of the previous one.
y = 0.6966x + 0.7481R² = 0.6973
-1-0.5
00.5
11.5
22.5
33.5
44.5
0 1 2 3 4
ln(#
spec
ies
pred
icte
d)
ln (# species observed)
......... 33221
3223
2222221
3113
211211110 XaXaXaXaXbXaXaXaXaaY nnn
Multiple regression solves systems of intrinsically linear algebraic equations
YXXXA '' 1
• The matrix X’X must not be singular. It est, the variables have to be independent. Otherwise we speak of multicollinearity. Collinearity of r<0.7 are in most cases tolerable.
• Multiple regression to be safely applied needs at least 10 times the number of cases than variables in the model.
• Statistical inference assumes that errors have a normal distribution around the mean.• The model assumes linear (or algebraic) dependencies. Check first for non-linearities. • Check the distribution of residuals Yexp-Yobs. This distribution should be random.• Check the parameters whether they have realistic values.
y = 0.6966x + 0.7481R² = 0.6973
-1-0.5
00.5
11.5
22.5
33.5
44.5
0 1 2 3 4
ln(#
spec
ies
pred
icte
d)
ln (# species observed)
Multiple regression is a hypothesis testing and not a hypothesis generating
technique!!
Polynomial regression General additive model
Standardized coefficients of correlation
xZZ-tranformed distributions have a mean
of 0 an a standard deviation of 1.
YXXX ZZZZB '' 1n
i i n ni 1 i i
X Yi 1 i 1X Y X Y
(X X)(Y Y)(X X) (Y Y)1 1 1r Z Z
n 1 s s n 1 s s n 1
nnn
n
iiiiini
nniii
rr
rr
nR
ZxZxZxZx
ZxZxZxZx
..............................
.......
'11
..............................
......
'
1
111
1
111
ZZZZ
XYxx RRB 1
In the case of bivariate regression Y = aX+b, Rxx = 1.Hence B=RXY.
Hence the use of Z-transformed values results in standardized correlations coefficients, termed b-values