-
PubHlth 640 Exam 1 – Spring 2015 (v3 2/24/2015) Name
________________________________________________
Z:\bigelow\...\2015\BE640 Exam 1 2015.doc Page 1 of 15
PubHlth 640 Intermediate Biostatistics
Spring 2015 Examination 1
Units 1 and 2 – Review of Introductory Biostatistics &
Regression and Correlation Due: Monday March 2, 2015
Before you begin: This is a “take-home” exam. You are welcome to
use any reference materials you wish. You are welcome to use the
computer as you wish, too. However, you MUST work this exam by
yourself and you may not consult with anyone. Instructions and
Checklist: __1. Start each problem on a new page. __ 2. Write your
name on every page. __ 3. Make a photo-copy of your exam for
safekeeping prior to submission __ 4. Complete the signature page
__ 5. Please DO NOT submit a copy of the exam questions!! I have
them…. How to submit your exam (sorry – Faxed exams are
NOTpermitted): (1) ONLINE Students Please be sure your name is
somewhere on your submission. Next, save it as a SINGLE FILE pdf
using the naming convention lastname_exam1.pdf. Email it to me at:
[email protected] (2) Worcester Section. Please bring
your exam (stapled please please please) to class on Monday March
2, 2015. If you are unable to come to class that day, I will accept
a pdf (see instructions for online students). (2) Amherst Section
Please put your exam (stapled please please please) in my mailbox,
located in the mail room on the 4th floor of Arnold House. If you
are unable to come to Arnold House on Monday the 2nd, I will accept
a pdf (see instructions for online students). (3) ALL I will also
accept exams sent by U.S. Post. Please mail with postmark no later
than March 2, 2015 to: Carol Bigelow School of Public Health/402
Arnold House University of Massachusetts/Amherst 715 North Pleasant
Street Amherst, MA 01003-9304 Tel. 413-545-1319.
-
PubHlth 640 Exam 1 – Spring 2015 (v3 2/24/2015) Name
________________________________________________
Z:\bigelow\...\2015\BE640 Exam 1 2015.doc Page 2 of 15
Signature This is to confirm that in completing this exam, I
worked independently and did not consult with anyone. Name:
___________________________________________________________ Date:
___________________________
Thank you!
-
PubHlth 640 Exam 1 – Spring 2015 (v3 2/24/2015) Name
________________________________________________
Z:\bigelow\...\2015\BE640 Exam 1 2015.doc Page 3 of 15
1. (10 points, total) Two healthy parents have a child with a
severe autosomal recessive condition that cannot be identified by
prenatal diagnosis. They realize that the risk of this condition
for subsequent offspring is 0.25, but with to embark on a second
pregnancy. During the early stages of the pregnancy, an ultrasound
test determines that there are twins.
1a. (3 points) Suppose the twins are identical (monozygotic).
What is the probability that both twins are affected? What is the
probability that exactly one twin is affected? 1b. (3 points)
Suppose the twins are fraternal (dizygotic). What is the
probability that both twins are affected? What is the probability
that exactly one twin is affected? 1c. (4 points)
Suppose the probability that the twins are identical is 0.33
and, thus, the probability that the twins are fraternal is 0.67.
What is the overall probability that both twins are affected? What
is the probability that exactly one twin is affected?
-
PubHlth 640 Exam 1 – Spring 2015 (v3 2/24/2015) Name
________________________________________________
Z:\bigelow\...\2015\BE640 Exam 1 2015.doc Page 4 of 15
2. (10 points, total) An outbreak of acute gastroenteritis
occurred at a nursing home in Baltimore, Maryland in December 1980.
A total of 46 out of 98 residents of the nursing home became ill.
People living in the nursing home shared rooms: 13 rooms contained
2 occupants, 4 rooms contained 3 occupants, and 15 rooms contained
4 occupants. One question that arises is whether or not a
clustering of disease occurred for persons living in the same room.
The following table shows the observed distribution of cases.
2a. (2 points) If the binomial distribution holds, what is the
probability of finding two cases of acute gastroenteritis in a room
with 2 occupants?
2b. (2 points) If the binomial distribution holds, what is the
probability of finding two or more cases of acute gastroenteritis
in a room with 3 occupants?
2c. (2 points) If the binomial distribution holds, what is the
probability of finding two or more case of acute gastroenteritis in
a room with 4 occupants?
2d. (2 points) One useful measure of geographical clustering is
the number of rooms for which the number of cases of acute
gastroenteritis is two or more. If the binomial distribution holds,
what is the expected number of rooms with two or more cases of
acute gastroenteritis over the entire nursing home?
2e. (2 points) Finally, compare your answer to question #2d (the
expected number of rooms with 2 or more cases) with the observed
number of rooms with 2 or more cases. In 1-2 sentence, what is your
assessment of whether or not there is any evidence of clustering of
disease within rooms?
-
PubHlth 640 Exam 1 – Spring 2015 (v3 2/24/2015) Name
________________________________________________
Z:\bigelow\...\2015\BE640 Exam 1 2015.doc Page 5 of 15
3. (10 points total) The distribution of serum levels of alpha
tocopherol (serum vitamin E) is approximately normal with mean µ =
860 µg/dL and standard deviation σ = 340 µg/dL .
3a. (3 points) What percent of people have serum alpha
tocopherol levels between 400 and 1000 µg/dL? 3b. (3 points)
Suppose a person is identified has having toxic levels of alpha
tocopherol if his or her serum level is > 2000 µg/dL. What
percentage of people will be so identified? 3c (4 points) A study
is undertaken for evidence of toxicity among 2000 people who
regularly take vitamin-E supplements. The investigators found that
4 people have serum alpha tocopherol levels > 2000 µg/dL. Is
this an unusual number of people with toxic levels of serum alpha
tocopherol?
-
PubHlth 640 Exam 1 – Spring 2015 (v3 2/24/2015) Name
________________________________________________
Z:\bigelow\...\2015\BE640 Exam 1 2015.doc Page 6 of 15
4. (10 points total) Hypertensive patients are screened at a
neighborhood health clinic and are given methyl dopa, a strong
antihypertensive medication for their condition. They are asked to
come back 1 week later and have their blood pressures measured
again. Suppose the initial and follow-up systolic blood pressures
(SBP) of the patients are given in the table below.
Patient id Initial SBP Follow-up SBP 1 200.0 188.0 2 194.0 212.0
3 236.0 186.0 4 163.0 150.0 5 240.0 200.0 6 225.0 222.0 7 203.0
190.0 8 180.0 154.0 9 177.0 180.0
10 240.0 225.0 To test the effectiveness of the drug, we want to
measure the difference (D = Initial – Follow up) between initial
and follow-up SBP blood pressures for each person. 4a (2 points)
What are the sample mean and sd of D? 4b (2 points) What is the
estimated standard error of the mean difference? 4c (3 points)
Assume that D is distributed normal. Construct a 99% confidence
interval for µD 4d (3 points) Using your answer to question #4c,
what is your opinion regarding the effectiveness of methyl dopa in
reducing systolic blood pressure?
-
PubHlth 640 Exam 1 – Spring 2015 (v3 2/24/2015) Name
________________________________________________
Z:\bigelow\...\2015\BE640 Exam 1 2015.doc Page 7 of 15
5. (10 points total) Suppose that, among 25-34 year old males in
the general population, the average daily intake of linoleic acid
is 15 g. As part of a dietary-instruction program, ten 25-34 year
old males adopted a vegetarian diet for one month. While on the
diet, the average daily intake of linoleic acid was a sample mean
=13 g with a sample standard deviation = 4 g. Suppose we are
uncertain what effect a vegetarian diet will have on the level of
linoleic-acid.
5a. (2 points) What are the null and alternative hypotheses in
this case? 5b. (2 points) Using the p-value method, carry out the
appropriate statistical hypothesis test to compare the mean level
of linoleic acid in the vegetarian population with that of the
general population. 5c. (2 points) Suppose the sample standard
deviation, based on a sample of n=20 subjects is s=5. Using the
critical region method, test the null hypothesis HO: σ
2 = 16 versus HA: σ2 ≠ 16 .
5d. (2 points) Next, consider a sample of n = 20. Suppose the
sample standard deviation in this sample is s = 5. Use p-value
method to test the null hypothesis HO: σ
2 = 16 versus HA: σ2 ≠ 16 .
5e. (2 points) Now go back to the sample size n = 10. Using the
summary statistics provided at the beginning of this question
(sample mean =13 g and sample standard deviation = 4 g), compute a
90% confidence interval estimate of the true mean intake of
linoleic acid in the vegetarian population of 25-34 year old
males.
-
PubHlth 640 Exam 1 – Spring 2015 (v3 2/24/2015) Name
________________________________________________
Z:\bigelow\...\2015\BE640 Exam 1 2015.doc Page 8 of 15
6. (10 points total) In a study of crop losses due to air
pollution, plots of Blue Lake snap beans were grown in n = 12
open-top field chambers, which were fumigated with various
concentrations of sulfur dioxide (X), in ppm. After a month of
fumigation, the plants were harvested and the total yield (Y) of
bean pods, in kg, was recorded for each chamber. Some preliminary
calculations have been performed for you.
n = 12 sx = 0.11724
SSQ (residual) = 0.2955
x = 0.12 sy = 0.31175
y = 1.117
rxy = −0.8506
6a. (2 points) Calculate the linear regression of Y on X by
obtaining the values of the estimated slope ( β̂1 ) and intercept (
β̂0 ). 6b. (2 points) Produce the analysis of variance table by
completing the “?” entries in the table below.
Source Sum of Squares DF Mean Square F-Ratio P
Regression
?___________
?____
? ____________
?_________
?_____
Residual
?___________ ?____ ?____________
Total
?_____________
? ___
6c. (2 points) Under the assumption that the linear model is
applicable, calculate a 95% confidence interval estimate of an
individual (single chamber) yield of beans exposed to x=0.24 ppm of
sulfur dioxide. 6d. (2 points) Under the assumption that the linear
model is applicable, calculate a 95% confidence interval estimate
of the mean yield of beans grown under conditions of exposure to
x=0.24 ppm of sulfur dioxide. 6e. (2 points) What percent of the
observed variability in yield is explained by the fitted model?
-
PubHlth 640 Exam 1 – Spring 2015 (v3 2/24/2015) Name
________________________________________________
Z:\bigelow\...\2015\BE640 Exam 1 2015.doc Page 9 of 15
7. (10 points total) To assess physical conditioning in normal
individuals, it is useful to know how much energy they are capable
of expending. Since the process of expending energy requires
oxygen, one way to evaluate this is to look at the rate at which
they use oxygen at peak physical activity. To examine the peak
physical activity, tests have been designed where the individual
runs on a treadmill. At specified time intervals, the speed at
which the treadmill moves and the grade of the treadmill both
increase. The individual is then systematically run to maximum
physical capacity. The maximum capacity is determined by the
individual; the person stops when unable to go further. Because
physical conditioning is relative to the size of the individual,
such measures take into account body size. One of these is VO2 MAX
(ml/kg/min); this is computed by looking at the volume of oxygen
used per minute per kilogram of body weight. Consider the following
multiple predictor regression analysis of n=94 sedentary males with
treadmill tests. The dependent (outcome) variable is Y = VO2 MAX .
There are four predictors:
X1 = treadmill duration (seconds) X2 = maximum heart rate
(beats/minute) X3 = height (centimeters) X4 = weight
(kilograms)
A partial display of the regression results is provided.
Coefficients Table Constant or Predictor β̂ SE(β̂ )
X1 = treadmill duration 0.0510 0.00416 X2 = max heart rate
0.0191 0.0258
X3 = height -0.0320 0.0444 X4 = weight 0.0089 0.0520
Constant (intercept) 2.89 11.17 Analysis of Variance Table
Source Sum of Squares DF Mean Square F-Ratio P
Regression
4,314.69
?____
? ____________
?_________
?_____
Residual
?___________ ?____ ?____________
Total
5,245.31
?____
-
PubHlth 640 Exam 1 – Spring 2015 (v3 2/24/2015) Name
________________________________________________
Z:\bigelow\...\2015\BE640 Exam 1 2015.doc Page 10 of 15
7a. (2 points) Compute the t-statistic value for testing the
adjusted statistical significance of X1 = treadmill duration. What
is its achieved significance (the p-value)? Do we reject β1 = 0 at
the 10% significance level?
7b. (3 points) Fill in the missing values in the analysis of
variance table (-- corrected 2/17/2015 --).
Source Sum of Squares DF Mean Square F-Ratio P
Regression
4,314.69
?____
? ____________
?_________
?_____
Residual
?___________ ?____ ?____________
Total
5,245.31
?____
7c. (3 points) Next, test the overall significance of the fitted
model. In developing your answer, be sure to state the null and
alternative hypotheses. In 1-2 sentences, interpret your findings.
7d. (2 points) What is R2? In reporting your answer, give its
numerical value and 1 sentence, explain its meaning.
-
PubHlth 640 Exam 1 – Spring 2015 (v3 2/24/2015) Name
________________________________________________
Z:\bigelow\...\2015\BE640 Exam 1 2015.doc Page 11 of 15
8. (10 points total) Consider a multiple linear regression to
evaluate some hypothesized associations with plasma lipid levels of
total cholesterol (Y), mg/dL, in a sample of 25 patients suffering
from hyperlipoproteinemia. Two predictors were considered:
X1 = weight (kg) X2 = age (years)
Three models were fit. The table below shows the estimated
regression model and the residual sum of squares (SSE) for each
model. The total sum of squares, corrected is SSY = 145,377.04
Model Fitted line Sum of Squares Residual, SSE 1 Ŷ = 199.2975
+1.622X1
135,145.3138
2 Ŷ = 102.5751+ 5.321X2
43,444.3743
3 Ŷ = 77.983+ 0.417X1 + 5.217X2
42,806.2254
8a. (3 points) For each model, what is the predicted cholesterol
level Ŷ for a 30-year old patient who weights 70 kg? Next, suppose
the observed cholesterol for this patient is Y = 263 mg/dL. In 1-2
sentences, compare each of the 3 predicted values Ŷ with the
observed Y =263 mg/dL. 8b. (3 points) For each model, what is the
R2 value? 8c. (4 points) If you use R2 and model simplicity as your
selection criteria, what model appears to be the best predictive
model?
-
PubHlth 640 Exam 1 – Spring 2015 (v3 2/24/2015) Name
________________________________________________
Z:\bigelow\...\2015\BE640 Exam 1 2015.doc Page 12 of 15
9. (10 points total) A psychologist performed a multiple
predictor linear regression analysis of anxiety level (Y), measured
on a scale ranging from 1 to 50, as the average of an index
determined at three points in a 2-week period. Three predictors
were considered:
X1 = systolic blood pressure (mm Hg) X2 = IQ X3 = Job
satisfaction, measured on a scale ranging 1 to 25.
The following table summarizes the results obtained from a
“variables-added-in-order” regression on data from a sample of size
n=22.
Source DF Sum of Squares Regression
X1 X2 | X1
X3 | (X1, X2)
1 1 1
981.326 190.232 129.431
Residual, SSE 18 442.292
9a. (5 points) Test for the significance of each independent
variable as it enters the model. For each test, state the null and
alternative hypotheses in terms of regression coefficient
parameters. 9b. (5 points) Test for the significance of adding both
X2 and X3 to a model already containing X1. State the null and
alternative hypotheses in terms of regression coefficient
parameters.
-
PubHlth 640 Exam 1 – Spring 2015 (v3 2/24/2015) Name
________________________________________________
Z:\bigelow\...\2015\BE640 Exam 1 2015.doc Page 13 of 15
10. (10 points total) A randomized controlled trial is performed
to test the effectiveness of a new medication in reducing the
duration of “heartburn” following meals. 120 subjects give informed
consent and participate in the trial. Prior to randomization, a
baseline (before medication) average time of heartburn in minutes
(X1) is recorded. Participants are then randomized. 60 subjects are
randomized to receive the new medication. The other 60 are
randomized to receive a placebo. X2 is the indicator (dummy)
variable, indicating receipt of the new medication. The response
variable of interest in this trial is the participant’s average
time of heartburn (Y) on the “post-treatment” occasion of
measurement. For your reference, the total sum of squares,
corrected is TSS = 17,851.7. Thus, this is a multiple predictor
regression setting with two predictors:
X1 = baseline average time of heart burn (minutes) X2 =
Indicator of randomization to the “active” treatment
10a. (1 point) State the definition of X2, the indicator of
randomization to the “active” treatment. State the definition of an
appropriate interaction variable, for the interaction of X1 and X2.
10b. (1 point) What is the linear model for the expected value of Y
= post-treatment average duration of heartburn if it is related to
treatment only, with no confounding and no modification by
baseline? In developing your answer, use terms such as E [ Y | X1,
X2] for the expected value of Y for given values of X1 and X2, β0
for the intercept term, etc. 10c. (1 point) What is the linear
model for the expected value of Y = post-treatment average duration
of heartburn if it is related to treatment, with confounding but no
modification by baseline? In developing your answer, use terms such
as E [ Y | X1, X2] for the expected value of Y for given values of
X1 and X2, β0 for the intercept term, etc. 10d (1 point) What is
the linear model for the expected value of Y = post-treatment
average duration of heartburn if it is related to treatment, and
differently so (modified by) depending on baseline? In developing
your answer, use terms such as E [ Y | X1, X2] for the expected
value of Y for given values of X1 and X2, β0 for the intercept
term, etc.
-
PubHlth 640 Exam 1 – Spring 2015 (v3 2/24/2015) Name
________________________________________________
Z:\bigelow\...\2015\BE640 Exam 1 2015.doc Page 14 of 15
10e. (1 point) Complete the following “model 1” analysis of
variance table by filling in the “?_____”.
Source DF Sum of
Squares Mean
Square
F
p-value Regression
X1
?_____
?______
?_____
?_____
?_____
Residual, SSE ?______ ?_____ 73.43 Total,
corrected
?____
?_______
10f. (1 point) Complete the following “model 2” analysis of
variance table by filling in the “?_____”.
Source DF Sum of
Squares Mean
Square
F
p-value Regression
X2
?_____
?______
?_____
?_____
?_____
Residual, SSE ?______ ?_____ 142.73 Total,
corrected
?____
?_______
10g (1 point) Complete the following “model 3” analysis of
variance table by filling in the “?_____”.
Source DF Sum of
Squares Mean
Square
F
p-value Regression
X1, X2
?_____
?______
?_____
?_____
?_____
Residual, SSE ?______ ?_____ 71.66 Total,
corrected
?____
?_______
-
PubHlth 640 Exam 1 – Spring 2015 (v3 2/24/2015) Name
________________________________________________
Z:\bigelow\...\2015\BE640 Exam 1 2015.doc Page 15 of 15
10h. (1 point) Complete the following “model 4” analysis of
variance table by filling in the “?_____”.
Source DF Sum of
Squares Mean
Square
F
p-value Regression
X1, X2, X1*X2
?_____
?______
?_____
?_____
?_____
Residual, SSE ?______ ?_____ 72.28 Total,
corrected
?____
?_______
10i. (2 points) Your turn! Just give it a try. It’s only 2
points. Carry out the appropriate assessments of these fitted
models to assess the benefit of the experimental treatment? Is
there an overall benefit? Is it confounded by baseline? Is it
modified by baseline? Report your findings in a 1 paragraph report.
Here are the estimated betas and associated estimated standard
errors for your use. Model 1 Model 2 Model 3 Model 4 X1 =
baseline
β̂ =
0.81 - 0.89 0.89
sê(β̂ ) =
0.07 - 0.08 0.14
X2 = treatment
β̂ =
- 5.80 -3.49 -3.70
sê(β̂ ) =
- 2.18 1.76 5.91
X1 * X2
β̂ =
- - - 0.01
sê(β̂ ) =
- - - 0.17