Top Banner
BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) STATA Users sol_regression(1 of 2) STATA.docx Page 1 of 18 Unit 2 – Regression and Correlation 1 of 2 - Practice Problems Solutions – Stata Users 1 . In this exercise, you will gain some practice doing a simple linear regression using a Stata data set called week02.dta . This data set has n=31 observations of boiling points (Y=boiling) and temperature (X=temp). Tip – In this exercise you will need to create a new variable newy = 100*log 10 (boiling) . generate newy=100*log10(boiling) Carry out an exploratory analysis to determine whether the relationship between temperature and boiling point is better represented using (i) 0 1 Y = β +β X or (ii) 0 ' ' 10 1 100 log (Y) = β +β X In developing your answer, try our hand at producing (a) Estimates of the regression line parameters (b) Analysis of variance tables (c) R 2 (d) Scatter plot with overlay of fitted line. Complete your answer with a one paragraph text that is an interpretation of your work. Take your time with this and have fun.
18

sol regression(1 of 2) STATA - Web Hosting at UMass ...people.umass.edu/biep640w/pdf/sol_regression(1 of 2...BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1

Apr 22, 2018

Download

Documents

vankhuong
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: sol regression(1 of 2) STATA - Web Hosting at UMass ...people.umass.edu/biep640w/pdf/sol_regression(1 of 2...BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1

BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) STATA Users

sol_regression(1 of 2) STATA.docx Page 1 of 18

Unit 2 – Regression and Correlation

1 of 2 - Practice Problems Solutions – Stata Users

1 . In this exercise, you will gain some practice doing a simple linear regression using a Stata data set called week02.dta . This data set has n=31 observations of boiling points (Y=boiling) and temperature (X=temp). Tip – In this exercise you will need to create a new variable newy = 100*log10(boiling)

. generate newy=100*log10(boiling)

Carry out an exploratory analysis to determine whether the relationship between temperature and boiling point is better represented using (i) 0 1Y = β +β X or

(ii) 0

' '10 1100 log (Y) = β +β X

In developing your answer, try our hand at producing (a) Estimates of the regression line parameters (b) Analysis of variance tables (c) R2 (d) Scatter plot with overlay of fitted line. Complete your answer with a one paragraph text that is an interpretation of your work. Take your time with this and have fun.

Page 2: sol regression(1 of 2) STATA - Web Hosting at UMass ...people.umass.edu/biep640w/pdf/sol_regression(1 of 2...BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1

BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) STATA Users

sol_regression(1 of 2) STATA.docx Page 2 of 18

Tip! A really good practice is to maintain a log of your session. Here is how. Step 1: From the main menu at upper left, click: FILE > LOG > BEGIN

Page 3: sol regression(1 of 2) STATA - Web Hosting at UMass ...people.umass.edu/biep640w/pdf/sol_regression(1 of 2...BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1

BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) STATA Users

sol_regression(1 of 2) STATA.docx Page 3 of 18

Step 2: At the dialog box SAVE AS: Type in a file name of your choosing. Mine is “carol_log” Note that I have NOT typed an extension. Stata does this for you. Do NOT click on SAVE just yet.

Step 3: At the dialog box WHERE: Click on the drop down menu and select a path of your choosing. Again, do NOT click on SAVE just yet

Step 4: At the dialog box FILE FORMAT: click on drop down menu at right.

Step 5: From the drop down menu, choose STATA LOG. Now you can click on SAVE

Page 4: sol regression(1 of 2) STATA - Web Hosting at UMass ...people.umass.edu/biep640w/pdf/sol_regression(1 of 2...BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1

BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) STATA Users

sol_regression(1 of 2) STATA.docx Page 4 of 18

My Stata Session Note – I imported my log into an MS Word document and, there, edited all my misspellings and errors

Key BLACK – commands. Comments begin with asterisk and are in bold. Blue – output name: <unnamed> log: /Users/carolbigelow/Desktop/hw_week02.log log type: text . * . ***** Turn off screen pause of output . set more off . * . ***** Read data that you have already downloaded into stata workspace . ***** At top left, FILE > OPEN . * . ***** Preliminary) Use command codebook with option compact for a quick look at data. codebook, compact Variable Obs Unique Mean Min Max Label -----------------------------------------------------------------------------------------temp 31 29 191.7839 180.6 210.8 boiling 31 31 19.76958 12.267 29.211 . * . ***** Preliminary) Use command generate to create newy = 100* log10(boiling) . generate newy=100*log10(boiling) . * . ***** Simple Linear Regression . * . ** a-c) Use command regress yvariable xvariable for: fitted line, anova, and R-squared . regress boiling temp Source | SS df MS Number of obs = 31 -------------+------------------------------ F( 1, 29) = 336.60 Model | 450.558755 1 450.558755 Prob > F = 0.0000 Residual | 38.8187237 29 1.33857668 R-squared = 0.9207 -------------+------------------------------ Adj R-squared = 0.9179 Total | 489.377479 30 16.3125826 Root MSE = 1.157 ------------------------------------------------------------------------------ boiling | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- temp | .4437942 .0241895 18.35 0.000 .3943211 .4932674 _cons | -65.343 4.643815 -14.07 0.000 -74.84067 -55.84533 ------------------------------------------------------------------------------

Page 5: sol regression(1 of 2) STATA - Web Hosting at UMass ...people.umass.edu/biep640w/pdf/sol_regression(1 of 2...BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1

BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) STATA Users

sol_regression(1 of 2) STATA.docx Page 5 of 18

. *

. ** d) Use command graph twoway for scatter plot with overlay of fitted line

.** No frills graph

. graph twoway (scatter boiling temp) (lfit boiling temp)

. ** Graph with aesthetics. . graph twoway (scatter boiling temp, symbol(d)) (lfit boiling temp), title("Simple Linear Regression") subtitle("Y=boiling on X=temp") caption("boiling.png", size(vsmall))

Tip – I used the option caption( )so that the name of my saved file (which I am about to save) will appear on the graph itself. Handy!

Page 6: sol regression(1 of 2) STATA - Web Hosting at UMass ...people.umass.edu/biep640w/pdf/sol_regression(1 of 2...BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1

BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) STATA Users

sol_regression(1 of 2) STATA.docx Page 6 of 18

Tip! Save your graph as a “.png” image so that you can “paste” it elsewhere. Here is how. Step 1: Create your graph (see previous page )

Step 2: With the graph window still active, click on the SAVE icon.

Step 3: From drop down menu for File format: Choose PORTABLE NETWORK GRAPHICS (*.png)

Page 7: sol regression(1 of 2) STATA - Web Hosting at UMass ...people.umass.edu/biep640w/pdf/sol_regression(1 of 2...BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1

BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) STATA Users

sol_regression(1 of 2) STATA.docx Page 7 of 18

. *

. ***** Simple Linear Regression of Y=100*log10(boiling) on X=temp

. *

. ** a-c) Use command regress yvariable xvariable for: fitted line, anova, and R-squared. regress newy temp Source | SS df MS Number of obs = 31 -------------+------------------------------ F( 1, 29) = 223.69 Model | 1962.25312 1 1962.25312 Prob > F = 0.0000 Residual | 254.398153 29 8.7723501 R-squared = 0.8852 -------------+------------------------------ Adj R-squared = 0.8813 Total | 2216.65127 30 73.8883757 Root MSE = 2.9618 ------------------------------------------------------------------------------ newy | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- temp | .9261545 .0619247 14.96 0.000 .7995043 1.052805 _cons | -48.85826 11.88807 -4.11 0.000 -73.17209 -24.54443 ------------------------------------------------------------------------------ . * . ** d) Scatter plot with overlay of fitted line using the command graph twoway . ** No frills graph . graph twoway (scatter newy temp) (lfit newy temp)

Page 8: sol regression(1 of 2) STATA - Web Hosting at UMass ...people.umass.edu/biep640w/pdf/sol_regression(1 of 2...BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1

BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) STATA Users

sol_regression(1 of 2) STATA.docx Page 8 of 18

. ** Graph with aesthetics . graph twoway (scatter newy temp, symbol(d)) (lfit newy temp), title("Simple Linear Regression") subtitle("Y=100*log10(boiling) on X=temp") caption("newy.png", size(vsmall))

Page 9: sol regression(1 of 2) STATA - Web Hosting at UMass ...people.umass.edu/biep640w/pdf/sol_regression(1 of 2...BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1

BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) STATA Users

sol_regression(1 of 2) STATA.docx Page 9 of 18

Putting it all together … (a) Estimates of Regression Line Parameters i. Y= -65.34+0.44*X ii. 100log 10 (Y)= -48.85830+0.93*X

Table 1. Parameters estimations for dependent = y = boiling (Boiling Point) Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Intercept Intercept 1 -65.34300 4.64382 -14.07 <.0001 Temp Temperature 1 0.44379 0.02419 18.35 <.0001

Table 2. Parameters estimations for dependent = y = newy (100 log10 (Boiling Point))

Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Intercept Intercept 1 -48.85830 11.88807 -4.11 0.0003 Temp Temperature 1 0.92615 0.06192 14.96 <.0001

(b) Analysis of Variance Tables Y=boiling (Boiling Point)

Y = newy (100 log10 (Boiling Point))

Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 450.55876 450.55876 336.60 <.0001 Error 29 38.81874 1.33858 Corrected Total 30 489.37750

Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 1962.25366 1962.25366 223.69 <.0001 Error 29 254.39829 8.77235 Corrected Total 30 2216.65195

Page 10: sol regression(1 of 2) STATA - Web Hosting at UMass ...people.umass.edu/biep640w/pdf/sol_regression(1 of 2...BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1

BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) STATA Users

sol_regression(1 of 2) STATA.docx Page 10 of 18

(c) R2

Y=boiling (Boiling Point): R2 = 0.9207 Y = newy (100 log10 (Boiling Point)) R2 = 0.8852

(d) Scatterplot with Overlay of Fitted Line Y=boiling (Boiling Point)

Y = newy (100 log10 (Boiling Point))

Solution (one paragraph of text that is interpretation of analysis): Did you notice that the scatter plot of these data reveal two outlying values? Their inclusion may or may not be appropriate. If all n=31 data points are included in the analysis, then the model that explains more of the variability in boiling point is Y=boiling point modeled linearly in X=temperature. It has a greater R2 (92% v 89%). Be careful - It would not make sense to compare the residual mean squares of the two models because the scales of measurement involved are different.

Page 11: sol regression(1 of 2) STATA - Web Hosting at UMass ...people.umass.edu/biep640w/pdf/sol_regression(1 of 2...BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1

BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) STATA Users

sol_regression(1 of 2) STATA.docx Page 11 of 18

2. A psychiatrist wants to know whether the level of pathology (Y) in psychotic patients 6 months after treatment could be predicted with reasonable accuracy from knowledge of pretreatment symptom ratings of thinking disturbance (X1) and hostile suspiciousness (X2).

(a) The least squares estimation equation involving both independent variables is given by Y = -0.628 + 23.639(X1) – 7.147(X2) Using this equation, determine the predicted level of pathology (Y) for a patient with pretreatment scores of 2.80 on thinking disturbance and 7.0 on hostile suspiciousness. How does the predicted value obtained compare with the actual value of 25 observed for this patient?

Y = -0.628 + 23.639 1X - 7.147 2X with 1X =2.80 and 2X =7.0 ⇒ Y = 15.5322 This value is lower than the observed value of 25

(b) Using the analysis of variance tables below, carry out the overall regression F tests for models containing both X1 and X2, X1 alone, and X2 alone.

Source DF Sum of Squares

Regression on X1 1 1546

Residual 51 12246

Source DF Sum of Squares

Regression on X2 1 160

Residual 51 13632

Page 12: sol regression(1 of 2) STATA - Web Hosting at UMass ...people.umass.edu/biep640w/pdf/sol_regression(1 of 2...BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1

BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) STATA Users

sol_regression(1 of 2) STATA.docx Page 12 of 18

Source DF Sum of Squares

Regression on X1 , X2 2 2784

Residual 50 11008

Model Containing X1 and X2

F =

SSQ regression on X1 and X2 /Regression dfSSQ residual / Residual df

⎛⎝⎜

⎞⎠⎟= 2,784 / 2

11,008 / 50⎛⎝⎜

⎞⎠⎟= 6.3227

on DF=2,50 p-value=0.00356 à Application of the null hypothesis model has led to an extremely unlikely result (p-value = .00356), prompting statistical rejection of the null hypothesis. The fitted linear model in X1 and X2 explains statistically significantly more of the variability in level of pathology (Y) than is explained by Y (the intercept model) alone. Model Containing X1 ALONE

F =

SSQ Regression on X1/DF RegressionSSQ residual/DF Residual

⎛⎝⎜

⎞⎠⎟= 1546 / 1

12,246 / 51⎛⎝⎜

⎞⎠⎟= 6.4385

on DF=1,51

p-value=0.01427 Here, too, application of the null hypothesis model has led to an extremely unlikely result (p-value = .014), prompting statistical rejection of the null hypothesis. The fitted linear model in X1 explains statistically significantly more of the variability in level of pathology (Y) than is explained by Y (the intercept model) alone. Model Containing X2 ALONE

F =

SSQ Regresion on X2 /DF RegressionSSQ Residual/DF Residual

⎛⎝⎜

⎞⎠⎟= 160 / 1

13,632 / 51⎛⎝⎜

⎞⎠⎟= 0.5986

on DF=1,51 p-value=0.44268 Here, application of the null hypothesis model has not led to an extremely unlikely result (p-value = .44). The null hypothesis is therefore not rejected. The fitted linear model in X2 does not explain statistically significantly more of the variability in level of pathology (Y) than is explained by Y (the intercept model) alone.

(c) Based on your results in part (b), how would you rate the importance of the two variables in predicting Y?

1

2

X explains a significant proportion of the variability in Y when modelled as a linear predictor. X does not. (However, we don't know if a different functional form might have been important.)

Page 13: sol regression(1 of 2) STATA - Web Hosting at UMass ...people.umass.edu/biep640w/pdf/sol_regression(1 of 2...BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1

BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) STATA Users

sol_regression(1 of 2) STATA.docx Page 13 of 18

(d) What are the R2 values for the three regressions referred to in part (b)?

Total SSQ= (Regression SSQ) + (Residual SSQ) is constant. Therefore total SSQ can be calculated from just one anova table: Total (SSQ)= 1,546 + 12,246 = 13,792

( )

( ) ( )

21

22

21 2

R X only (Regression SSQ)/(Total SSQ) = (1546)/(13,792) = 0.1121

R (X only) = (160)/(13,792) = 0.0116 R X and X 2784 /(13,792) 0.2019

=

= =

(e) What is the best model involving either one or both of the two independent variables?

Eliminate from consideration model with 2X only. Compare model with 1X alone versus 1X and 2X using partial F test.

Partial F ={(SSQ Regression on X1,X2 ) - (SSQ Regression on X1)}/VDF

SSQ Residual for model w X1,X2 /Residual DF= (2784−1546) / 1

(11,008) / 50

= 5.6263 on DF=1,50P-value =0.02162Addition of X2 to model containing X1 is statistically significant (p-value = .02). →More appropriate model includes X1 and X2

Page 14: sol regression(1 of 2) STATA - Web Hosting at UMass ...people.umass.edu/biep640w/pdf/sol_regression(1 of 2...BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1

BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) STATA Users

sol_regression(1 of 2) STATA.docx Page 14 of 18

3. In an experiment to describe the toxic action of a certain chemical on silkworm larvae, the relationship of log10(dose) and log10(larva weight) to log10(survival) was sought. The data, obtained by feeding each larva a precisely measured dose of the chemical in an aqueous solution and then recording the survival time (ie time until death) are given in the table. Also given are relevant computer results and the analysis of variance table. Larva 1 2 3 4 5 6 7 8 Y = log10(survival time) 2.836 2.966 2.687 2.679 2.827 2.442 2.421 2.602 X1=log10(dose) 0.150 0.214 0.487 0.509 0.570 0.593 0.640 0.781 X2=log10(weight) 0.425 0.439 0.301 0.325 0.371 0.093 0.140 0.406 Larva 9 10 11 12 13 14 15 Y = log10(survival time) 2.556 2.441 2.420 2.439 2.385 2.452 2.351 X1=log10(dose) 0.739 0.832 0.865 0.904 0.942 1.090 1.194 X2=log10(weight) 0.364 0.156 0.247 0.278 0.141 0.289 0.193 Y = 2.952 – 0.550 (X1) Y = 2.187 + 1.370 (X2) Y = 2.593 – 0.381 (X1) + ).871 (X2)

Source DF Sum of Squares

Regression on X1 1 0.3633

Residual 13 0.1480

Source DF Sum of Squares

Regression on X2 1 0.3367

Residual 13 0.1746

Source DF Sum of Squares

Regression on X1 , X2 2 0.4642

Residual 12 0.0471

Page 15: sol regression(1 of 2) STATA - Web Hosting at UMass ...people.umass.edu/biep640w/pdf/sol_regression(1 of 2...BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1

BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) STATA Users

sol_regression(1 of 2) STATA.docx Page 15 of 18

(a) Test for the significance of the overall regression involving both independent variables X1 and X2.

X1and X2

F =(SSQ regression on X1 and X2 ) / 2

(SSQ Residual) / 12= (0.4642) / 2

(0.0471) / 12= 59.18 on DF = 2,12

P − value < 0.0001

Application of the null hypothesis model has led to an extremely unlikely result (p-value = .0001), prompting statistical rejection of the null hypothesis. The fitted linear model in X1 and X2 explains statistically significantly more of the variability in log10(survival time) (Y) than is explained by Y (the intercept model) alone.

(b) Test to see whether using X1 alone significantly helps in predicting survival time.

|

X1 alone

F = (0.3633) / 1(0.1480) / 13

=(SSQ Regression on X1) / 1

(SSQ Residual) / 13= 31.9115 on DF = 1,13

P − value = 0.00008

Application of the null hypothesis model has led to an extremely unlikely result (p-value = .00008), prompting statistical rejection of the null hypothesis. The fitted linear model in X1 explains statistically significantly more of the variability in log10(survival time) (Y) than is explained by Y (the intercept model) alone.

(c) Test to see whether using X2 alone significantly helps in predicting survival time.

X2 alone

F =(SSQ Regression on X2 ) / 1

(SSQ Residual) / 13= (0.3367) / 1

(0.1746) / 13=25.07 on DF = 1,13

P − value = 0.00027

Application of the null hypothesis model has led to an extremely unlikely result (p-value = .00027), prompting statistical rejection of the null hypothesis. The fitted linear model in X2 explains statistically significantly more of the variability in log10(survival time) (Y) than is explained by Y (the intercept model) alone.

Page 16: sol regression(1 of 2) STATA - Web Hosting at UMass ...people.umass.edu/biep640w/pdf/sol_regression(1 of 2...BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1

BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) STATA Users

sol_regression(1 of 2) STATA.docx Page 16 of 18

(d) Compute R2 for each of the three models.

( )

( )

21 2

21

22

TotalSSQ = 0.5113R X andX 0.4642 / 0.5113 0.9079

R (X alone) 0.3633/ 0.5113 0.7105

R X alone 0.3367 / 0.5113 0.6585

= =

= =

= =

(e) Which independent predictor do you consider to be the best single predictor of survival time?

Using just the criteria of the overall F test and comparison of 2R , the single predictor model containing 1X is better.

(f) Which model involving one or both of the independent predictors do you prefer and why?

1 1 2Partial F for comparing model with X alone versus model with X and X

=ΔRegressionSSQ( ) /ΔReg DF

ResidualSSQ-model w X1, X2( ) /ResidualDF-model w X1, X2

=0.4642-0.3633( ) / 2-1( )

0.0471/12

=25.707 on DF=1,12P-value= 0.0003 Choose model with both X1 and X2

Page 17: sol regression(1 of 2) STATA - Web Hosting at UMass ...people.umass.edu/biep640w/pdf/sol_regression(1 of 2...BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1

BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) STATA Users

sol_regression(1 of 2) STATA.docx Page 17 of 18

4. Using Stata, try your hand at reproducing the analysis of variance tables you worked with in problem #3. . * The following assumes you: a) downloaded larvae.dta; and b) opened it . * Stata commands describe and notes to get information on data set . * Tip – You need to issue the command clear before you load larvae.dta .describe Contains data from /Users/carolbigelow/Desktop/larvae.dta obs: 15 PubHlth 640 Unit 2 Regression - Larvae data vars: 4 10 Feb 2013 16:36 size: 240 (_dta has notes) ----------------------------------------------------------------------------------------- storage display value variable name type format label variable label ----------------------------------------------------------------------------------------- id float %9.0g larva id y float %9.0g log10(survival) x1 float %9.0g log10(dose) x2 float %9.0g log10(weight) ----------------------------------------------------------------------------------------- Sorted by: . notes _dta: 1. "Week 3 homework assignment exercises 2 and 3" . ***** 2) Fit model containing x1 alone using command regress yvariable xvariable . regress y x1 Source | SS df MS Number of obs = 15 -------------+------------------------------ F( 1, 13) = 31.91 Model | .363274049 1 .363274049 Prob > F = 0.0001 Residual | .147992891 13 .011384069 R-squared = 0.7105 -------------+------------------------------ Adj R-squared = 0.6883 Total | .51126694 14 .036519067 Root MSE = .1067 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | -.5498559 .0973376 -5.65 0.000 -.7601409 -.3395709 _cons | 2.952199 .073555 40.14 0.000 2.793293 3.111105 ------------------------------------------------------------------------------

Page 18: sol regression(1 of 2) STATA - Web Hosting at UMass ...people.umass.edu/biep640w/pdf/sol_regression(1 of 2...BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1

BIOSTATS 640 – Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) STATA Users

sol_regression(1 of 2) STATA.docx Page 18 of 18

. ***** 2) Model containing x2 alone . regress y x2 Source | SS df MS Number of obs = 15 -------------+------------------------------ F( 1, 13) = 25.07 Model | .336674125 1 .336674125 Prob > F = 0.0002 Residual | .174592816 13 .013430217 R-squared = 0.6585 -------------+------------------------------ Adj R-squared = 0.6322 Total | .51126694 14 .036519067 Root MSE = .11589 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x2 | 1.375579 .2747401 5.01 0.000 .7820388 1.969119 _cons | 2.184706 .0819958 26.64 0.000 2.007565 2.361847 ------------------------------------------------------------------------------ . ***** 2) Model containing both x1 and x2 . regress y x1 x2 Source | SS df MS Number of obs = 15 -------------+------------------------------ F( 2, 12) = 59.18 Model | .46420206 2 .23210103 Prob > F = 0.0000 Residual | .04706488 12 .003922073 R-squared = 0.9079 -------------+------------------------------ Adj R-squared = 0.8926 Total | .51126694 14 .036519067 Root MSE = .06263 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | -.3784805 .0663741 -5.70 0.000 -.5230972 -.2338637 x2 | .8749749 .1724835 5.07 0.000 .4991656 1.250784 _cons | 2.588996 .0836079 30.97 0.000 2.40683 2.771162 ------------------------------------------------------------------------------