-
BIOSTATS 640 Exam 1 – Spring 2016 Name
________________________________________________
Z:\bigelow\...\2016\...\BE640 Exam 1 2016.doc Page 1 of 16
BIOSTATS 640 Intermediate Biostatistics
Spring 2016 Examination 1
Units 1 and 2 – Review of Introductory Biostatistics &
Regression and Correlation Due: Wednesday March 2, 2016
Before you begin: This is a “take-home” exam. You are welcome to
use any reference materials you wish. You are welcome to use the
computer as you wish, too. However, you MUST work this exam by
yourself and you may not consult with anyone. Instructions and
Checklist: __1. Start each problem on a new page. __ 2. Write your
name on every page. __ 3. Make a photo-copy of your exam for
safekeeping prior to submission __ 4. Complete the signature page
__ 5. Please DO NOT submit a copy of the exam questions!! I have
them…. How to submit your exam (sorry – Faxed exams are
NOTpermitted): (1) ONLINE Students Please be sure your name is
somewhere on your submission. Next, save it as a SINGLE FILE pdf
using the naming convention lastname_exam1.pdf. Email it to me at:
[email protected] (2) Worcester Section. Please be sure
your name is somewhere on your submission. Next, save it as a
SINGLE FILE pdf using the naming convention lastname_exam1.pdf.
Email it to me at: [email protected] (3) Amherst Section
Please put your exam (stapled please) in my mailbox, located in the
mail room on the 4th floor of Arnold House. If you are unable to
come to Arnold House on Wednesday March 2, 2016, I will accept a
pdf (see instructions for online students). (4) ALL I will also
accept exams sent by U.S. Post. Please mail with postmark no later
than March 2, 2016 to: Carol Bigelow School of Public Health/402
Arnold House University of Massachusetts/Amherst 715 North Pleasant
Street Amherst, MA 01003-9304 Tel. 413-545-1319.
-
BIOSTATS 640 Exam 1 – Spring 2016 Name
________________________________________________
Z:\bigelow\...\2016\...\BE640 Exam 1 2016.doc Page 2 of 16
Signature This is to confirm that in completing this exam, I
worked independently and did not consult with anyone. Name:
___________________________________________________________ Date:
___________________________
Thank you!
-
BIOSTATS 640 Exam 1 – Spring 2016 Name
________________________________________________
Z:\bigelow\...\2016\...\BE640 Exam 1 2016.doc Page 3 of 16
1. (10 points, total) Diagnostic related groups (DRG’s) are used
in the payment for the health care of Medicare-funded patients. The
following are lengths of stay LOS (days) for 50 patients with a
specific DRG.
1 2 3 5 6 8 13 18 26 43 1 2 4 5 7 9 15 19 29 49 2 2 4 5 7 9 15
19 31 52 2 3 4 6 8 10 17 20 34 67 2 3 5 6 8 12 17 23 36 96
1a. (1 points) Calculate the sample mean and the sample standard
deviation. 1b. (5 points) Calculate the five percentiles: minimum,
first quartile, median, third quartile, maximum. 1c. (2 points)
By any means you like, construct a box plot graphical
summary.
1d. (2 points) In no more than 1-2 sentences, in your
assessment, is this distribution skewed? Explain.
-
BIOSTATS 640 Exam 1 – Spring 2016 Name
________________________________________________
Z:\bigelow\...\2016\...\BE640 Exam 1 2016.doc Page 4 of 16
2. (10 points, total)
2a. (2 points) If 25% of 11 year-old children have no decayed,
missing or filled (DMF) teeth, what is the probability that in a
random sample of 20 11 year-old children, there will be exactly 3
with no DMF teeth?
2b. (2 points) (Same setting as for question #2a). If 25% of 11
year-old children have no decayed, missing or filled (DMF) teeth,
what is the probability that in a random sample of 20 11 year-old
children, there will be fewer than 3 with no DMF teeth?
2c. (3 points) Suppose that, among all persons 17 years of age
and older, half the males and one third of the females are current
smokers. What is the probability that a random sample of 10 males
and 15 females includes exactly 4 male and 6 female smokers? You
may assume independence of the males and females.
2d. (3 points) (Same setting as for question #2c). Suppose that
among all persons 17 years of age and over, half the males and one
third of the females are current smokers. In a random sample of 10
males and 15 females, what I the probability that none smoke?
-
BIOSTATS 640 Exam 1 – Spring 2016 Name
________________________________________________
Z:\bigelow\...\2016\...\BE640 Exam 1 2016.doc Page 5 of 16
3. (10 points total) For questions 3a and 3b: According to a
recent Census Bureau report, 59% of Americans have private health
insurance, 25% have government health insurance (meaning: Medicare
or Medicaid or military health care) and 16% have no health
insurance.
3a. (3 points) Estimate the probability that a randomly selected
American has health insurance. 3b. (3 points)
Given that a randomly selected American is known to have health
insurance, estimate the probability that it is private. For
question 3c: In a jury trial, suppose the probability that the
defendant is convicted, given guilt, is 0.95, and the probability
that the defendant is acquitted, given innocence, is 0.95. Suppose
that 90% of all defendants truly are guilty:
3c. (4 points) Given that the defendant is convicted, what is
the probability that he or she was actually innocent?
-
BIOSTATS 640 Exam 1 – Spring 2016 Name
________________________________________________
Z:\bigelow\...\2016\...\BE640 Exam 1 2016.doc Page 6 of 16
4. (10 points total)
4a. (5 points) Suppose that the weight W of male patients
registered at a certain diet clinic is distributed normal with mean
µ = 190 and variance σ2 = 100. For random sample of n=25, find the
values “a” and “b” such that
n=25P [ a W b] = .80≤ ≤ Recall: n=25W is the sample mean of 25
observations of W.
4b. (5 points) A random sample of 32 persons attending a certain
diet clinic was found to have lost, over a three- week period, an
average of 30 pounds with a sample standard deviation of 11 pounds.
Calculate a 99% confidence interval estimate of the true mean
weight loss, over a three-week period, experienced by all persons
attending the clinic. You may assume that the distribution of
weight loss is normal.
-
BIOSTATS 640 Exam 1 – Spring 2016 Name
________________________________________________
Z:\bigelow\...\2016\...\BE640 Exam 1 2016.doc Page 7 of 16
5. (10 points total) An environmentalist is interested in
determining if the pH of the creek water behind his house is
affected by the new development upstream. He knows that a neutral
stream has a pH of 7. He draws 16 samples of water and measures for
each the pH. Is the pH of the creek statistically significantly
different from neutral? Following are the data:
Sample 1 2 3 4 5 6 7 8 pH 7.5 7.6 7.1 6.2 6.3 6.9 7.1 7.3
Sample 9 10 11 12 13 14 15 16 pH 6.3 6.6 7.1 7.1 6.3 6.9 6.7
6.9
5a. (5 points) Using the critical region approach with type I
error = 0.05, conduct a statistical significance test to evaluate
this claim. You may assume normality. In reporting your answer,
please provide - The null and alternative hypothesis (1 point) -
The name of the test statistic used to develop the correct critical
region (1 point) - The values defining the critical region (1
point) - The value of the test statistic (1 point) - Interpretation
of your findings (1 point) 5b. (5 points)
What is the achieved level of significance (the p-value)?
-
BIOSTATS 640 Exam 1 – Spring 2016 Name
________________________________________________
Z:\bigelow\...\2016\...\BE640 Exam 1 2016.doc Page 8 of 16
6. (20 points total) It has been suggested that, compared to
older children and adults, very young children have a higher
metabolism that gives them more energy. Suppose we wish to
investigate this hypothesis in a simple linear regression analysis.
The data are pulse rates (Y) and age (X) for a sample of n=22
children. The following statistics have been calculated for
you.
22
ii=122
2i
i=1
x = 233
x = 3345
∑
∑
22
ii=122
2i
i=1
y = 1725
y = 140,933
∑
∑
22
i ii=1
x y = 16,748∑
n = 22
6a. (2 points) State the assumptions necessary for a simple
linear model relating pulse rate (Y) and age (X). 6b. (2 points)
Calculate the least squares estimate of the slope and intercept.
6c. (2 points) Complete the following analysis of variance table.
Source DF Sum of Squares Mean Square F-Ratio p-value Regression
?___ ?___ ?___ ?___ ?___
Residual
?___ ?___ ?___
Total, corrected ?___ ?___
Tip! ( ) ( )( )n n
2 2 2i i
i=1 i=1y - y = y - n y∑ ∑
-
BIOSTATS 640 Exam 1 – Spring 2016 Name
________________________________________________
Z:\bigelow\...\2016\...\BE640 Exam 1 2016.doc Page 9 of 16
6. (20 points total) - continued 6d. (2 points) Test the fitted
model for statistical significance. In reporting your answer,
please include statements of: the null and alternative hypotheses,
the formula for the test statistic, the value of the test
statistic, the p-value and most importantly, your interpretation!
6e. (2 points) Using the answer you obtained in #6b, what is the
predicted pulse for an average 12 year-old child? 6f. (2 points)
Consider your answer to #6e. What is the estimated standard error
of the estimated mean prediction you obtained? 6g. (2 points) Using
the answer you obtained in #6b, what is the predicted pulse for the
individual John Smith who is 12 years old? 6h. (2 points) Consider
your answer to #6f. What is the estimated standard error of the
estimated individual prediction you obtained? 6i. (2 points) How
does John Smith compare with other children his age if his actual
pulse is 75? 6j. (2 points) Comment on the difference between the
two predictions (questions #6e and #6g) and the two standard errors
(questions #6f and #6h).
-
BIOSTATS 640 Exam 1 – Spring 2016 Name
________________________________________________
Z:\bigelow\...\2016\...\BE640 Exam 1 2016.doc Page 10 of 16
7. (10 points total) A multiple linear regression analysis of
n=19 cases of coronary artery disease investigated three predictors
in relationship to Y = VO2 max. X1 = maximal ejection fraction X2 =
maximal heart rate X3 = maximal systolic blood pressure Preliminary
descriptive statistics on the 19 values of Y = VO2 max yielded a
sample mean Y=37.052 and s=8.7017 Suppose several multiple
predictor models are fit and you are given the following.
Predictors in the model Sum of Squares Residual (due error) X1 ,
X2 , X3 790.76
X1 , X2 791.49 X1 , X3 1270.24
X2 , X3 814.16 X1 1357.48 X2 814.41 X3 1281.19
7a. (2 points) Complete the following analysis of variance table
by completing the 10 cells with “?___”
Source DF SSQ MSQ F-Ratio R2 Regression { X1 , X2 , X3}
?___ ?___ ?___ ?___ ?___
Residual
?___ ?___ ?___
Total, corrected ?___ ?___ 7b. (3 points) Complete the following
analysis of variance table by completing the 7 cells with
“?___”
Source DF SSQ
Regression 3
2 3
1 2 3
(X )(X |X )(X |X ,X )
⎧ ⎫⎪ ⎪⎨ ⎬⎪ ⎪⎩ ⎭
1 1 1
?___ ?___ ?___
Residual
?___ ?___
Total, corrected ?___ ?___
-
BIOSTATS 640 Exam 1 – Spring 2016 Name
________________________________________________
Z:\bigelow\...\2016\...\BE640 Exam 1 2016.doc Page 11 of 16
7 - CONTINUED
7c. (5 points) Carry out the appropriate test to compare the
following two models
0 1 1 2 2 3 3
0 3 3
Y = β + β X + β X + β X + EversusY = β + β X + E
In your answer, please indicate 7c (i). (1 point) The null and
alternative hypotheses. 7c (ii). (1 point) The test statistic
formula and its value for these data. 7c (iii). (1 point) The
achieved level of significance (p-value). 7c (iv). (2 points)
Interpretation of your findings.
-
BIOSTATS 640 Exam 1 – Spring 2016 Name
________________________________________________
Z:\bigelow\...\2016\...\BE640 Exam 1 2016.doc Page 12 of 16
8. (20 points total) Low birth weight is of concern because of
its association with infant mortality and birth defects. A woman’s
behavior during pregnancy (including diet, smoking habits, prenatal
care) can affect the chances of carrying a baby to term and of
delivering a baby of normal birth weight. The following is a code
sheet of variables that were investigated in a multivariable
regression analysis of n=189 birth weight outcomes. In these
analyses the dependent variable is Y=BWT. The predictors of
interest are AGE, LWT, SMOKE, PTL, HT, UI, and FTV.
Variable Variable Name and Coding Y = BWT Birth weight (grams)
AGE Age of mother (years) LWT Weight at last menstrual period
(pounds) SMOKE Indicator smoked during pregnancy (1=yes, 0=no) PTL
Indicator history of premature labor (1=yes, 0=no) HT Indicator of
hypertension (1=yes, 0=no) UI Indicator of uterine irritability
(1=yes, 0=no) FTV Number of visits to doctor during first trimester
(integer, 0, 1, 2, etc)
Selected calculations have been performed for you.
LWT Y = BWT N of Cases 189 189 Minimum 80.000 709.000 Maximum
250.00 4990.000 Mean 129.15 2944.656 Standard deviation 30.579
729.022
DEP VAR: BWT N: 189 MULTIPLE R: .420 SQUARED MULTIPLE R: .176
ADJUSTED SQUARED MULTIPLE R: .144
Variable Coefficient Standard Error T P (2 tail) CONSTANT
2508.341 294.508 8.517 0.000 AGE 4.692 9.722 0.483 0.630 LWT 4.276
1.725 2.479 0.014 SMOKE -226.379 102.517 -2.208 0.028 PTL -72.254
105.526 -0.685 0.494 HT -642.251 209.345 -3.068 0.002 UI -526.027
143.902 -3.655 0.000 FTV -7.987 48.118 -0.166 0.868
-
BIOSTATS 640 Exam 1 – Spring 2016 Name
________________________________________________
Z:\bigelow\...\2016\...\BE640 Exam 1 2016.doc Page 13 of 16
8a (5 points) Complete the following analysis of variance
table.
Source Sum of Squares DF Mean Square F-Ratio P Regression
?___________ ?____ 2,516,982.109 ?_________ ?_____
Residual
?___________ ?____ ?____________
Total, corrected ? ____________
Hint: Notice that the standard deviation of Y =BWT =
729.022.
8b (5 points) What is the test statistic and p-value for the
test of the global hypothesis that the fit of the multiple linear
model explains a statistically significant greater proportion of
the variability in BWT than is explained by the average BWT
alone?
8c (10 points) On the next page are the results of two
hierarchical multivariable models fit to the same data. The smaller
model is a simple linear regression model with LWT as the predictor
variable. The larger model is a 4 predictor model that contains LWT
plus SMOKE, HT, and UI as predictor variables. Carry out an
appropriate hypothesis test to determine which model should be
reported.
-
BIOSTATS 640 Exam 1 – Spring 2016 Name
________________________________________________
Z:\bigelow\...\2016\...\BE640 Exam 1 2016.doc Page 14 of 16
Model 1. DEP VAR: BWT N: 189 MULTIPLE R: .186 SQUARED MULTIPLE
R: .035 ADJUSTED SQUARED MULTIPLE R: .029
Variable Coefficient Standard Error T P (2 tail) CONSTANT
2369.672 228.431 10.374 0.000 LWT 4.429 1.713 2.586 0.010 Model
1.
Source Sum of Squares DF Mean Square F-Ratio P Regression
3,448,881.301 1 3,448,881.301 6.686 .010
Residual
.964682E+08 187 515872.574
Model 2. DEP VAR: BWT N: 189 MULTIPLE R: .416 SQUARED MULTIPLE
R: .173 ADJUSTED SQUARED MULTIPLE R: .155
Variable Coefficient Standard Error T P (2 tail) CONSTANT
2575.769 226.819 11.356 0.000 LWT 4.510 1.660 2.716 0.007 SMOKE
-240.081 100.139 -2.397 0.018 HT -649.271 206.349 -3.145 0.002 UI
-548.924 139.440 -3.937 0.000 Model 2.
Source Sum of Squares DF Mean Square F-Ratio P Regression
.173312E+08 4 4,332,798.603 9.653 .000
Residual
.825859E+08 184 448,836.186
-
BIOSTATS 640 Exam 1 – Spring 2016 Name
________________________________________________
Z:\bigelow\...\2016\...\BE640 Exam 1 2016.doc Page 15 of 16
EXTRA CREDIT Up to 10 points, up to a maximum total exam score
of 100
Radial keratotomy is a type of surgery performed to reduce
myopia in near sighted patients. . The Prospective Evaluation of
Radial Keratotomy (PERK) study was initiated in 1983 with the goal
of investigating the effects of radial keratotomy. In one study the
outcome of interest was Y = 5-year post surgical change in
refractive error (diopters) in relationship to an hypothesized
predictor X1 = baseline refractive error. A sample of n=54 was
studied. Now suppose we want to investigate whether the
relationship of Y = 5-year post surgical change in refractive error
(diopters) to X1 = baseline refractive error is different,
depending on gender. To address this, two new variables are
created, Z and X1Z Z = 1 if patient is male 0 if patient is female.
X1Z = (Z ) * (X1 ) Recall from class what this kind of new variable
does: X1Z = (Z)*(X1) = (1) * X1 if patient is male = 0 if patient
is female. The following two models are fit and yielded the
following output. Model 1: Y regressed on X1 and Z 1ŷ = 2.752647 -
0.309731*x - 0.412878*z
df Sum of squares Mean square F p-value Model 2 15.30101 7.65207
6.009 0.0045 Error 51 64.94880 1.27351
Total, corrected 53 80.25294 Model 2: Y regressed on X2 and Z
and X1Z 1ŷ = 3.178210 - 0.201008*x - 1.995126*z - 0.383826*x1z
df Sum of squares Mean square F p-value Model 3 19.65170 6.55057
5.405 0.0027 Error 50 60.60124 1.21202
Total, corrected 53 80.25294
-
BIOSTATS 640 Exam 1 – Spring 2016 Name
________________________________________________
Z:\bigelow\...\2016\...\BE640 Exam 1 2016.doc Page 16 of 16
(2 points) State a single multiple linear regression model that
defines straight-line models relating Y = 5-year post surgical
change in refractive error (diopters) to X1 = baseline refractive
error for both males and females. Be sure to define all terms. (3
points) Using the output provided on the previous page, carry out
the appropriate statistical test to test whether the lines for
males and females coincide. In reporting your answer, be sure to
state the null and alternative hypotheses, show your work, and
interpret your results. (3 points) Again using the output provided
on the previous page, carry out the appropriate statistical test to
test whether the lines for males and females are parallel. In
reporting your answer, be sure to state the null and alternative
hypotheses, show your work, and interpret your results. (2 points)
In 1-2 sentences at most, comment on the comparison of the
straight-line models relating . Y = 5-year post surgical change in
refractive error (diopters) to X1 = baseline refractive error for
both males and females.