Top Banner
Regression: Motivation One dimensional data (Summary by Mean) 10 20 30 40 50
36

Regression: Motivation

Jan 03, 2016

Download

Documents

dennis-delacruz

Regression: Motivation. One dimensional data (Summary by Mean) 10 20 30 40 50. X (X-a) 2 10(10-a) 2 20 (20-a) 2 30(30-a) 2 40(40-a) 2 50(50-a) 2 150Tmin T when a = mean=30. Regression. Regression. Concerns Data summarization (As in one dimensional data) - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Regression: Motivation

Regression: Motivation

One dimensional data

(Summary by Mean)

10 20 30 40 50

Page 2: Regression: Motivation

X (X-a)2

10 (10-a)2

20 (20-a)2

30 (30-a)2

40 (40-a)2

50 (50-a)2

150 T min T when a = mean=30

Page 3: Regression: Motivation

RegressionEstriol Birth Wt

7 25

9 25

9 25

12 27

14 27

14 30

15 32

15 34

15 34

15 35

16 27

16 24

16 30

16 31

16 32

Estriol Birth Wt

30 35.5

32 35.5

36 35.5

35 37.0

37 37.0

31 38.5

34 38.5

38 40.0

30 41.5

40 43.0

28 46.0

43 46.0

32 47.5

39 47.5

34 50.5

Page 4: Regression: Motivation

Regression

• Concerns– Data summarization

• (As in one dimensional data)

– Prediction of low birthweight baby• (for special prenatal care to those in high risk)

Page 5: Regression: Motivation

Scatter plot

7 12 17 22 27

24

29

34

39

43

Birt

h w

eigh

t

Estriol

Page 6: Regression: Motivation

Lines through scatter plot to represent the data

7 12 17 22 27

24

29

34

39

43

Line 3

Line 4

Line 5

Estriol (mg/24 hr)

Bir

thw

eigh

t (g/

100)

Line 2

Page 7: Regression: Motivation

Regression line: The best lineThe best representation of data

Regression Line through Scatter Plot

7 12 17 22 27

24

29

34

39

43

Fig Reg 1.6

Estriol (mg/24 hr)

Bir

thw

eigh

t (g/

100)

Page 8: Regression: Motivation

What is this with a line and numbers anyway?

• They could be the same in two different form or language

• But, lines require less space to record remember, memorize and are easy to comprehend

• Lines could be pictorial or mathematical representation of numerical data

Page 9: Regression: Motivation

• A lineY = 2+3X

Numbers generated by the line

Slope = 2

Intercept =3

(interpretation ??)

x y

0 2

1 5

2 8

… …

50 152

… …

… …

Page 10: Regression: Motivation

Representation of bivariate measure ments in different forms

• Equation Y =2+3x

• Data/Number

• x y

• 0 2• 1 5• 2 8• … …

50 152• … …• … …

Y

X0 3

2

11

Picture/Graph

Page 11: Regression: Motivation

Straight lines

Inte

rcep

t

-------

A Straight Line

X

Y

Two Straight lines with the Same Slope but Different Intercepts

X Y

Page 12: Regression: Motivation

Straight lines

Zero Slope

Zero Intercept

X X

Y

Y

Two Straight Lines with the same Intercept but Different Slopes

Straight Line with Zero Slope and Zero Intercept

Page 13: Regression: Motivation

Regression: what line will generate the data?

Estriol Birth Wt

7 25

9 25

9 25

12 27

14 27

14 30

15 32

15 34

15 34

15 35

16 27

16 24

16 30

16 31

16 32

Estriol Birth Wt

30 35.5

32 35.5

36 35.5

35 37.0

37 37.0

31 38.5

34 38.5

38 40.0

30 41.5

40 43.0

28 46.0

43 46.0

32 47.5

39 47.5

34 50.5

Page 14: Regression: Motivation

Regression: what line will generate the data?

7 12 17 22 27

24

29

34

39

43

Birt

h w

eigh

t

Estriol

Page 15: Regression: Motivation

Which is the best line?

7 12 17 22 27

24

29

34

39

43

Line 1

Line 3

Line 4

Line 5

Estriol (mg/24 hr)

Bir

thw

eigh

t (g/

100)

Line 2

Page 16: Regression: Motivation

The best lineBirthweight = 21.52 + 0.608 Estriol

Regression Line through Scatter Plot

7 12 17 22 27

24

29

34

39

43

Estriol (mg/24 hr)

Bir

thw

eigh

t (g/

100)

Page 17: Regression: Motivation

Computer output

Coefficientsa

21.523 2.620 8.214 .000 16.164 26.883

.608 .147 .610 4.143 .000 .308 .908

(Constant)

ESTRIOL

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Lower Bound Upper Bound

95% Confidence Interval for B

Dependent Variable: BWEIGHTa.

Page 18: Regression: Motivation

Regression

The Saga continues

Page 19: Regression: Motivation

Out of curiosity

How did this accomplish what we wanted (i.e. data summarization and identifying women who might need special prenatal care)

Page 20: Regression: Motivation

• 1. We end up with the line Birthweight =21.52+0.608 Estriol, hoping that

this line will generate the original data

2. In the case of univariate ‘mean’ is closest to the data in a sense. In similar way, regression line is the closet line to the data . In that sense it summarizes the data.

Page 21: Regression: Motivation

Recall

One dimensional data

(Summary by Mean)

10 20 30 40 50

Page 22: Regression: Motivation

Recall

X (X-a)2Bweight (bweight- L)2

10 (10-a)2 25 (25-L)2

20 (20-a)2 25 (25-L)2

30 (30-a)2 25 (25-L)2

40 (40-a)2 27 (27-L)2

50 (50-a)2 … …

Mean=30 minimizes sum L =21.52+0.608 Esriol minimizes the sum – This is regression line

Page 23: Regression: Motivation

Prediction

• Women that need special care

• If lowbirth weight is defined as < 2500g, then women with estriol level < 5.72 are in hirisk of having low birthweight babies.

Page 24: Regression: Motivation

• So is everything fine and dandy

• Not necessarily -– How closely does the regression line

generates the data?– How much is estriol is responsible for

birthweight??– Was there something that would have better

predicted women at risk???

Page 25: Regression: Motivation

Birthweights Generated From

Observed Difference

Squared From

Obs. No.

(a)

Estriol

(b)

Observed Data (c)

Line 1.1

(d)

Line 1.2

(e)

Line 1.1 [(c)-(d)]2

Line 1.2 [(c)-(e)]2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

7 9 9

12 14 14 15 15 15 15 16 16 16 16 16 16 17 17 17 18 18 19 19 20 21 22 24 24 25 25 27

25 25 25 27 27 30 32 34 34 35 27 24 30 31 32 35 30 32 36 35 37 31 34 38 30 40 28 43 32 39 34

20.5 23.5 23.5 28.0 31.0 31.0 32.5 32.5 32.5 32.5 34.0 34.0 34.0 34.0 34.0 34.0 35.5 35.5 35.5 37.0 37.0 38.5 38.5 40.0 41.5 43.0 46.0 46.0 47.5 47.5 50.5

25.776 26.992 26.992 28.816 30.032 30.032 30.640 30.640 30.640 30.640 31.248 31.248 31.248 31.248 31.248 31.248 31.856 31.856 31.856 32.464 32.464 33.072 33.072 33.680 34.288 34.896 36.112 36.112 36.720 36.720 37.936

20.25 2.25 2.25 1.00

16.00 1.00 0.25 2.25 2.25 6.25

49.00 100.00

16.00 9.00 4.00 1.00

30.25 12.25 0.25 4.00 0.00

56.25 20.25 4.00

132.25 9.00

324.00 9.00

240.25 72.25

272.25

0.6022 3.9681 3.9681 3.2979 9.1930 0.0010 1.8496

11.2896 11.2896 19.0096 18.0455 52.5335

1.5575 0.0615 0.5655

14.0775 3.4447 0.0207

17.1727 6.4313

20.5753 4.2932 0.8612

18.6624 18.3869 26.0508 65.8045 47.4445 22.2784

5.1984 15.4921

Sum Mean Variance

534.00 17.23 22.58

992.00 32.00 22.47

1111.00 35.84 50.81

992.00 32.00 8.35

1419.00 - -

423.43 - -

Page 26: Regression: Motivation

E BW Pred Diff 7.00 25.00 25.78076 -.78076 9.00 25.00 26.99714 -1.99714 9.00 25.00 26.99714 -1.99714 12.00 27.00 28.82171 -1.82171 14.00 27.00 30.03810 -3.03810 14.00 30.00 30.03810 -.03810 15.00 32.00 30.64629 1.35371 15.00 34.00 30.64629 3.35371 15.00 34.00 30.64629 3.35371 15.00 35.00 30.64629 4.35371 16.00 27.00 31.25448 -4.25448 16.00 24.00 31.25448 -7.25448 16.00 30.00 31.25448 -1.25448 16.00 31.00 31.25448 -.25448 16.00 32.00 31.25448 .74552 16.00 35.00 31.25448 3.74552 17.00 30.00 31.86267 -1.86267 17.00 32.00 31.86267 .13733 17.00 36.00 31.86267 4.13733 18.00 35.00 32.47086 2.52914 18.00 37.00 32.47086 4.52914 19.00 31.00 33.07905 -2.07905 19.00 34.00 33.07905 .92095 20.00 38.00 33.68724 4.31276 21.00 30.00 34.29543 -4.29543 22.00 40.00 34.90362 5.09638 24.00 28.00 36.12000 -8.12000 24.00 43.00 36.12000 6.88000 25.00 32.00 36.72819 -4.72819 25.00 39.00 36.72819 2.27181 27.00 34.00 37.94457 -3.94457

Page 27: Regression: Motivation

How good is the regression

Regression Line through Scatter Plot

7 12 17 22 27

24

29

34

39

43

Fig Reg 1.6

Estriol (mg/24 hr)

Bir

thw

eigh

t (g/

100)

Page 28: Regression: Motivation

How good is the regression

• R2 = 0.372– Estriol explains about 37.2% of variation in

the birthweights. Remaining 62.8 % is explained by other factors

– At estriol 16, we have several birthweight s(24,30,31,32 and 35). If estriol is the only factor for Birthweight we would not see this variation.

Page 29: Regression: Motivation

How good is the regrssionRegression line and 95% confidence intervals around predicted values

Estriol

Bweight line upper lower

7 27

22.4777

43

Page 30: Regression: Motivation

Other factors

Multiple Regression

Page 31: Regression: Motivation

Regression Diagnostics

Residual Analysis

Page 32: Regression: Motivation

Diagnostics

• Residual for a patient (observation)– Difference between observed birthweight and

the birthweight regression line would generate (predict)

• Example: (for the first patient)– Observed birthweight = 25– Generated = 21.52+0.608 estriol

=21.52+0.608(7)=25.776

Residual = 25-25.776= -0.776

Page 33: Regression: Motivation

Diagnostics

• Residual plots

• Plot of residuals against predicted values

• For assumptions– Normality, linearity and homoscedasticity

Page 34: Regression: Motivation

Non normal

Heteroscedasticity

nonlinearity

Page 35: Regression: Motivation

Diagnostics

• Residuals for influence patients (observation)

- change in estimated parameters (slope and intercept) when the analysis is redone without the patient in question

Patients with high leverage and large residual will have greater influence.

Page 36: Regression: Motivation

Diagnostics

• Standardized and the studentized (or jackknife) residual

– A patient with large values for these residuals indicate outliers