Variance and covariance Sums of squares General linear models n i i T n T n a a a a a a a 1 2 2 1 2 1 ... ... UU U U n i i n i i T a n a n Variance 1 2 2 1 2 1 1 1 1 ... ... M M n i i T n Variance n a a a a 1 2 2 1 ) 1 ( ... VV M U V T T n n Variance ) )( ( 1 1 1 1 2 M U M U VV Covarian ) 1 ( ... ; ... 1 2 1 2 1 n b a b b b a a a n i B i A i T B n B B A n A A AB B A
19
Embed
Variance and covariance Sums of squares General linear models.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Variance and covariance
n
ii
T
nT
n
a
aaa
a
a
a
1
2
212
1
......
UU
UU
n
ii
n
ii
T
an
an
Variance1
22
1
2
11
11
......
MM
n
ii
T
n
Variancena
a
a
a
1
22
1
)1(...
VVMUV
TT
nnVariance ))((
11
11 2 MUMUVV
Covariance)1(
...;
...
1
2
1
2
1
nba
b
b
b
a
a
a
n
iBiAi
T
Bn
B
B
An
A
A
AB
BA
Sums of squares
General linear models
iancevarCon B
TA
)()(
11
XBXA
The coefficient of correlation
yx
xy
yx
xyr
)cov(
)()'(11
)var(
)()'(11
)var(
)()'(11
)cov(
YY
X
YX
ΜXΜX
ΜXΜX
ΜYΜX
ny
nx
nxy
Y
XX
)()')(()'(
)()'(
YY
YX
ΜYΜYΜXΜX
ΜYΜXR
XX
For a matrix X that contains several variables
holds
The matrix R is a symmetric distance matrix that contains all correlations between the variables
11
11 )()'(1
1
XX
XX
DΣΣR
ΣΜXΜXΣRn
The diagonal matrix SX contains the standard deviations as entries.
X-M is called the central matrix.
We deal with samples
)()'(11
ΜXΜXD
n
MatrixCov
Xn
X
X
X
000
0...00
000
000
2
1
Σ
Linear regression
European bat species and environmental correlates
y = 9.24x0.73
R2 = 0.950.1
1
10
100
1000
10000
100000
0.001 1 1000Body weight [kg]
Bra
in w
eig
ht [
g]
z
Mammals
y = 4.4x0.53
R2 = 0.191
10
100
1000
1 10 100Poplar plantation
Ag
ricu
ltura
l fie
ld
z
Ground beetles at two adjacent sites
There is a hypothesis about dependent and independent variables
The relation is supposed to be linear
We have a hypothesis about the distribution of errors around the hypothesized regression line
There is a hypothesis about dependent and independent variables
The relation is non-linear
We have no data about the distribution of errors around the hypothesized regression line
There is no clear hypothesis about dependent and independent variables
The relation is non-linear
We have no data about the distribution of errors around the hypothesized regression line
Problem might arise from the intercorrelation between the predictor variables
(multicollinearity).
We solve the problem by a step-wise approach eliminating the variables that are either not significant or give unreasonable parameter
values
The variance explanation of this final model is higher than that of the previous one.
y = 0.6966x + 0.7481R² = 0.6973
-1-0.5
00.5
11.5
22.5
33.5
44.5
0 1 2 3 4
ln(#
spe
cies
pre
dict
ed)
ln (# species observed)
......... 33221
3223
2222221
3113
211211110 XaXaXaXaXbXaXaXaXaaY nnn
Multiple regression solves systems of intrinsically linear algebraic equations
YXXXA '' 1
• The matrix X’X must not be singular. It est, the variables have to be independent. Otherwise we speak of multicollinearity. Collinearity of r<0.7 are in most cases tolerable.
• Multiple regression to be safely applied needs at least 10 times the number of cases than variables in the model.
• Statistical inference assumes that errors have a normal distribution around the mean.• The model assumes linear (or algebraic) dependencies. Check first for non-linearities. • Check the distribution of residuals Yexp-Yobs. This distribution should be random.• Check the parameters whether they have realistic values.
y = 0.6966x + 0.7481R² = 0.6973
-1-0.5
00.5
11.5
22.5
33.5
44.5
0 1 2 3 4
ln(#
spe
cies
pre
dict
ed)
ln (# species observed)
Multiple regression is a hypothesis testing and not a hypothesis generating
technique!!
Polynomial regression General additive model
Standardized coefficients of correlation
x
ZZ-tranformed distributions have a mean of 0 an a standard deviation of 1.
YXXX ZZZZB '' 1n
i i n ni 1 i i
X Yi 1 i 1X Y X Y
(X X)(Y Y)(X X) (Y Y)1 1 1
r Z Zn 1 s s n 1 s s n 1
nnn
n
iiiiini
nniii
rr
rr
nR
ZxZxZxZx
ZxZxZxZx
......
............
............
.......
'11
......
............
............
......
'
1
111
1
111
ZZZZ
XYxx RRB 1
In the case of bivariate regression Y = aX+b, Rxx = 1.Hence B=RXY.
Hence the use of Z-transformed values results in standardized correlations coefficients, termed b-values