1 Stat 401XV Final Exam Spring 2017 I have neither given nor received unauthorized assistance on this exam. ________________________________________________________ Name Signed Date _________________________________________________________ Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning will receive NO partial credit. Correct numerical answers to difficult questions unaccompanied by supporting reasoning may not receive full credit. SHOW YOUR WORK/EXPLAIN YOURSELF!
13
Embed
Stat 401XV Final Exam S17 - Iowa State Universityvardeman/stat401/Stat 401XV Final Exam S17... · Stat 401XV Final Exam ... The tree below was fit using rpart ... ("1" is the "y ≥
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Stat 401XV Final Exam Spring 2017
I have neither given nor received unauthorized assistance on this exam.
________________________________________________________ Name Signed Date
_________________________________________________________ Name Printed
ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning will receive NO partial credit. Correct numerical answers to difficult questions unaccompanied by supporting reasoning may not receive full credit. SHOW YOUR WORK/EXPLAIN YOURSELF!
2
1. A so-called " k out of n system" will function provided at least k of its n components function. Consider a "4 out of 5 system" with independent components that each have reliability (probability of functioning) p . I need to know how large p must be in order to have overall system reliability (probability of functioning) .99. Set up an equation you could solve in order to find this p for me. 2. Customers arrive at a service counter with inter-arrival times (times between consecutive arrivals) modeled as independent exponential random variables with mean 1 min . a) Under this model, what fraction of inter-arrival times are less than .5 min ? b) Under this model, approximate the probability that less than 50 customers arrive in a particular 60 minute period. (Hint: This is the probability that the sum of 50 inter-arrival times is larger than 60.)
6 pts
4 pts
7 pts
3
3. A student project concerned measurement of resistivity of a type of copper wire at two different temperatures. Seven pieces of this were used in the study, and measured resistances at 0.0 C° and at 21.8 C° are in the following table. (Units are 810 m− Ω .) Wire 1 2 3 4 5 6 7 21.8 C° Resistivity 1.72 1.56 1.68 1.64 1.69 1.71 1.72
a) Give and interpret a 95% lower confidence bound for the mean increase in resistivity of this wire associated with an increase in temperature from 0.0 C° to 21.8 C° . (PLUG IN COMPLETELY, but there is no need to simplify. Say what the "95%" means.) b) Give a two-sided interval that you are "95% sure" will bracket 99% of measured increases in resistivity of this wire associated with an increase of temperature 0.0 C° to 21.8 C° . (PLUG IN COMPLETELY, but there is no need to simplify.) In a second study concerning resistivity of this wire, two different meters were both used in measuring resistance at 21.8 C° for the same 70n = specimens. For 50 of the 70 specimens/trials, meter A produced a higher reading than did meter B. c) Give a -vlauep for assessing whether there is clear evidence that the fraction of specimens for which meter A produces a higher reading than meter B exceeds .5.
6 pts
5 pts
6 pts
4
4. Beginning on Page 8 there is R analysis of a partially replicated 32 factorial experiment due to R. Snee treated in Engineering Statistics by Hogg and Ledolter. It concerned the effects of factors Factor Levels A-Polymer Type Standard (1) vs New (But Expensive) (2)B-Polymer Concentration .01% (1) vs .04% (2) C-Amount of an Additive 2 lb (1) vs 12 lb (2) on percentage impurityy = produced by a chemical process. Use that in the following questions. a) Give "margins of error" based on 95% two-sided confidence limits to associate with the 8 sample means in the study. (Some of these "sample means" are based on only 1 observation.) Where combination 1n = : | Where combination 2n = : | | | | | | | b) Give the value of an F statistic and degrees of freedom for testing the hypothesis that all 8 experimental combinations produce the same mean purity. F = ___________________ . .d f = _______ , _______ c) Based on the last 3 runs of the lm() routine with these data, what model for y in terms of the experimental variables do you judge to be best? (Name and interpret values of detectable effects and say what other effects are not detectable.) d) For the first case, the predicted value produced in the final lm() run is .895. If it were printed out, what would be the corresponding value for the next-to-final run? If it is .895 say why. If it is not .895 say why not.
5 pts
6 pts
4 pts
4 pts
5
5. There is a dataset on the UCI Machine Learning Data Set Repository that provides 1-10 quality ratings by experts ( y ) for wine samples and corresponding results of 11 chemical analyses ( ( )1 2 11, , ,x x x=x ). This problem concerns data analysis for 1599 red wine samples. Beginning on Page 10 there is relevant R code and output. Consider first a SLR analysis of the variable quality using the predictor variable alcohol. Below is a scatterplot for these variables and the least squares line through the data pairs. (The plotting locations have ben randomly "jittered" slightly to minimize the visual effects of over-plotting.)
a) Say what the plot suggests about the appropriateness of the Gaussian simple linear regression model (particularly the modeling of "errors" iε ). b) Would you be willing to use a 95% prediction interval for the expert quality rating, y , of a new specimen with alcohol content 11x = based on these data and the Gaussian SLR model? Explain. c) Is there definitive evidence that average quality rating increases with alcohol content? Provide quantitative support for your answer based on the R output.
4 pts
4 pts
5 pts
6
Suppose that one suspends any concerns about model assumptions and adopts the usual MLR model 0 1 1 2 2 11 11y x x xβ β β β ε= + + + + + for quality rating as a function of the 11 chemical analysis results. d) Interpret the fitted regression coefficient for 2 grams of acetic acid per cubic decimeterx = . e) Give the value of an F statistic and degrees of freedom for judging whether after accounting for
11 alcohol contentx = , the other 10 chemical analysis results add detectably to one's ability to predict quality rating. F = _______________ . .d f = _______ , _______ Consider now the only issue of building an effective predictor of quality ratingy = . (Leave behind Gaussian model assumptions.) f) Below is a table of some summaries for several linear predictors fit by least squares. Which linear predictor (set of chemical analysis terms) is most attractive and why? Chemical Analysis Terms 2R MSE -CV RMSPE 1 through 11 .3606 .6480 .6504 2,3,5,6,7,9,10,11 .3599 .6477 .6491 2,5,6,7,9,10,11 .3595 .6477 .6489 2,5,7,9,10,11 .3572 .6487 .6495 2,5,7,10,11 .3515 .6514 .6519 2,7,10,11 .3438 .6550 .6551 2,10,11 .3359 .6587 .6587 2,11 .3170 .6678 .6674
4 pts
8 pts
4 pts
7
g) Searching for an elastic net predictor for quality ratingy = based on the 11 predictors, the best CV-RMSPE available seems to be about .6502 for .0011α ≈ and .03λ ≈ . The predictions it produces are not much different from ordinary MLR. Why is this not surprising given the elastic net parameters and what you know about the MLR model from part f)? h) There is code and output from train() in caret for k -nearest-neighbor and random forest predictors for quality ratingy = based on the 11 predictors. What value of " k " is best for the former and what value of " mtry " is best for the latter? How do these predictors compare to each other and to MLR predictors in terms of performance? (Give numerical support for your latter answer.) i) The printout presents a scatterplot matrix and correlations between y and MLR, kNN, and random forest predictions. It seems impossible to improve much on the best of these predictors using a linear combination of them. Based on the information available to you, give rationale for this happening. j) Rather than predict y , one could instead use a classification tree to identify chemical analysis vectors ( )1 2 11, , ,x x x that produce 7y ≥ . The tree below was fit using rpart (and .022cp = ). What is the misclassification rate for this tree on the training set? Describe in simple terms what chemical analysis results it associates with a quality score of 7 or more.
("1" is the " 7y ≥ " class and "to the left" is "the condition holds" circumstance.)