Top Banner
03/21/2 2 H.S . 1 Linear Regression Hein Stigum Presentation, data and programs at: http://folk.uio.no/heins/ courses
23

Jul-15H.S.1 Linear Regression Hein Stigum Presentation, data and programs at: .

Dec 22, 2015

Download

Documents

Sophia Cox
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Jul-15H.S.1 Linear Regression Hein Stigum Presentation, data and programs at: .

04/19/23 H.S. 1

Linear Regression

Hein Stigum

Presentation, data and programs at:

http://folk.uio.no/heins/courses

Page 2: Jul-15H.S.1 Linear Regression Hein Stigum Presentation, data and programs at: .

CONCEPTSLinear regression

04/19/23 H.S. 2

Page 3: Jul-15H.S.1 Linear Regression Hein Stigum Presentation, data and programs at: .

04/19/23 H.S. 3

Outcome and regression types

• Numerical data– Discrete

• number of partners

– Continuous• Weight

• Categorical data– Nominal

• disease/ no disease

– Ordinal• small/ medium/ large

Poisson regression

Linear regression

Logistic regression

Ordinal regression

Page 4: Jul-15H.S.1 Linear Regression Hein Stigum Presentation, data and programs at: .

04/19/23 H.S. 4

Regression idea

residual error,e

xofeffect ,tcoefficienb

covariate =x

outcome=y

:model

1

10

exbby

covariate = x,x

:cofactorsmany with model

21

22110 exbxbby

2500

3000

3500

4000

4500

5000

birt

h w

eigh

t (gr

am

)

250 260 270 280 290 300 310gestational age (days)

Page 5: Jul-15H.S.1 Linear Regression Hein Stigum Presentation, data and programs at: .

04/19/23 H.S. 5

Measures and Assumptions

• Adjusted effects– b1 is the increase in weight per day of gestational age

– b1 is adjusted for b2

• Assumptions– Independent errors

– Linear effects

– Constant error variance

• Robustness– influence

esexbgestbbweight 210

Page 6: Jul-15H.S.1 Linear Regression Hein Stigum Presentation, data and programs at: .

04/19/23 H.S. 6

Workflow

• DAG

• Plots: distribution and scatter

• Bivariate analysis

• Regression– Model estimation– Test of assumptions

• Independent errors• Linear effects• Constant error variance

– Robustness • Influence

Discuss

Plot

Plot

Page 7: Jul-15H.S.1 Linear Regression Hein Stigum Presentation, data and programs at: .

ANALYSISContinuous outcome: Linear regression, Birth weight

04/19/23 H.S. 7

Page 8: Jul-15H.S.1 Linear Regression Hein Stigum Presentation, data and programs at: .

04/19/23 H.S. 8

DAGs

Egest age

Dbirth weight

C2parity

C1sex

Associations Bivariate (unadjusted)Causal effects Multivariable (adjusted)

Draw your assumptions before your conclusions

Page 9: Jul-15H.S.1 Linear Regression Hein Stigum Presentation, data and programs at: .

04/19/23 H.S. 9

Plot outcome by exposure

OK

Be clear on the research question: overall birth weight: linear regression low birth weight: logistic regression linear and logistic can give opposite resultsMay lead to non-constant error variance

May have high influential outliers

Effects on linear regression:

Page 10: Jul-15H.S.1 Linear Regression Hein Stigum Presentation, data and programs at: .

Plot outcome by exposure, cont.

04/19/23 H.S. 10

Linear effects?

Yes

Page 11: Jul-15H.S.1 Linear Regression Hein Stigum Presentation, data and programs at: .

Bivariate analysis

04/19/23 H.S. 11

Outcome: birthweightN Mean p-value

All 564 3604Gestational age <0.001

<=280 days 230 3436>280 days 288 3744

Sex 0.004Boy 291 3668Girl 273 3535

Parity <0.0010 225 34851 215 36772 123 3695

Page 12: Jul-15H.S.1 Linear Regression Hein Stigum Presentation, data and programs at: .

REGRESSIONContinuous outcome: Linear regression, Birth weight

04/19/23 H.S. 12

Page 13: Jul-15H.S.1 Linear Regression Hein Stigum Presentation, data and programs at: .

Categorical covariates

• 2 categories– OK, but know the coding

• 3+ categories– Use “dummies”

• “Dummies” are 0/1 variables used to create contrasts

• Want 3 categories for parity: 0, 1 and 2-7 children

• Choose 0 as reference

• Make dummies for the two other categories

04/19/23 H.S. 13

generate Parity1 = (parity==1) if parity<.

generate Parity2_7 = (parity>=2) if parity<.

Page 14: Jul-15H.S.1 Linear Regression Hein Stigum Presentation, data and programs at: .

Model estimation

04/19/23 H.S. 14

Syntax:regress weight gest sex Parity1 Parity2_7

Page 15: Jul-15H.S.1 Linear Regression Hein Stigum Presentation, data and programs at: .

Create meaningful constant

Expected birth weight at:gest= 0, sex=0, parity=0

gest=280, sex=1, parity=0

7_21

)(tirth weighExpected b

43210 ParityParitysexgest

yE

gr

gr

35241280

1972

210

0

Alternative: center variablesgen gest280=gest-280 gest280 has a meaningful zero at 280 days

gen sex0=sex-1 sex0 has a meaningful zero at boys

Page 16: Jul-15H.S.1 Linear Regression Hein Stigum Presentation, data and programs at: .

Model results

04/19/23 H.S. 16

coeff 95% conf. Int.Birth weight at ref 3524.3Gestational age

per day 6.0 (3.9 , 8.2)Sex

Boy 0Girl -139.2 (-228.9 , -49.5)

Parity0 01 232.0 (130.6 , 333.5)2-7 226.0 (106.9 , 345)

Page 17: Jul-15H.S.1 Linear Regression Hein Stigum Presentation, data and programs at: .

04/19/23 H.S. 17

Test of assumptions

• Discuss

• Independent residuals?

• Plot residuals versus predicted y

• Linear effects?

• constant variance?-1

000

-500

050

010

0015

00R

esid

uals

3200 3400 3600 3800 4000Linear prediction

Outlier not included

Page 18: Jul-15H.S.1 Linear Regression Hein Stigum Presentation, data and programs at: .

04/19/23 H.S. 18

Violations of assumptions• Dependent residuals

Use linear mixed models

• Non linear effectsAdd square term

Or use piecewise linear

• Non-constant varianceUse robust variance estimation

-1-.

50

.51

200 220 240 260 280 300gest

-2-1

01

2re

s

3400 3500 3600 3700 3800p

Page 19: Jul-15H.S.1 Linear Regression Hein Stigum Presentation, data and programs at: .

04/19/23 H.S. 19

Influence

Outlier

Regression with outlier

Regressionwithout outlier

2000

3000

4000

5000

6000

Birt

h w

eigt

h

200 300 400 500 600 700Gestational age

Page 20: Jul-15H.S.1 Linear Regression Hein Stigum Presentation, data and programs at: .

04/19/23 H.S. 20

Measures of influence

• Measure change in:– Predicted outcome

– Deviance

– Coefficients (beta)• Delta beta

Remove obs 1, see changeremove obs 2, see change

-.6

-.4

-.2

0.2

Influ

ence

1 2 10Id

Page 21: Jul-15H.S.1 Linear Regression Hein Stigum Presentation, data and programs at: .

Delta beta for gestational age

04/19/23 H.S. 21

539-10

-8-6

-4-2

0D

fbet

a ge

stC

280

2000 3000 4000 5000 6000weight

beta for gestational age= 6.04

If obs nr 539 is removed, beta will change from 6 to 16

Page 22: Jul-15H.S.1 Linear Regression Hein Stigum Presentation, data and programs at: .

Removing outlier

04/19/23 H.S. 22

coeff 95% conf. Int.Birth weight at ref 3524Gestational age

per day 6 (4 , 8)Sex

Boy 0Girl -139 (-229 , -49)

Parity0 01 232 (131 , 333)2-7 226 (107 , 345)

coeff 95% conf. Int.Birth weight at ref 3531Gestational age

per day 17 (13 , 20)Sex

Boy 0Girl -166 (-252 , -80)

Parity0 01 229 (132 , 326)2-7 225 (112 , 339)

Full data Outlier removed

One outlier affected two estimates Final model

Page 23: Jul-15H.S.1 Linear Regression Hein Stigum Presentation, data and programs at: .

04/19/23 H.S. 23

Summing up

• DAGs– Guide analysis

• Plots– Unequal variance, non-linearity, outliers

• Bivariate analysis

• Linear regression– Fit model– Check assumptions– Check robustness– Make meaningful constant