7/23/2019 CI for a Proportion http://slidepdf.com/reader/full/ci-for-a-proportion 1/24 Confidence Interval for a Proportion 1 Confidence Interval for a Proportion Example 74% of a company’s customers would like to see new product packaging. A random sample of 50 customers is taken. X is the number of customers in the sample who would like to see the new packaging; then the sample proportion is n X p ˆ . The mean and standard deviation for p ˆ are Mean of p ˆ : p p ˆ = 0.74 Standard deviation of p ˆ : 06203 50 26 . 0 74 . 0 1 ˆ n p p p Since np = 50(0.74) = 37 and n(1 – p) = 50(0.26) = 13 are both at least 10, the distribution of p ˆ is approximately Normal. Using the Normal approximation we know that the probability is approximately 95% that p ˆ falls within 1.96 standard deviations of the mean. 95% of all samples have a sample proportion p ˆ between 0.74 – 1.96(0.06203) = 0.618 and 0.74 + 1.96(0.06203) = 0.862. As 1.96(0.06203) = 0.1216 we could equivalently state “ p ˆ is within 0.122 of 0.74.”For this situation approximately 95% of all possible samples yield a proportion p ˆ within 0.122 of 0.740 (within 12.2% of 74.0%). In general, provided a Normal approximation can be used, 95% of all possible samples yield a proportion p ˆ within n p p 1 96 . 1 of p. An approximate 95% Confidence Interval The section begins with a description of steps that lead to a usable result. (A rigorous treatment of the issue requires a good deal of mathematical statistics.) The explanations provided below are simplified. We’ve observed the following: For approximately 95% of all samples p ˆ is within n p p 1 96 . 1 of p. Flip-flopping p and p ˆ yields the following statement (which is true): For approximately 95% of all samples p is within n p p ˆ 1 ˆ 96 . 1 of p ˆ . In this second statement the interval is random. Different samples yield different values for p ˆ , which result in different intervals.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
For one sample yielding a result p̂ , the interval
E p ˆ where n p p E ˆ1ˆ96.1
forms an approximate 95% confidence interval for p. p̂ is the point estimate of p; E is the error
margin associated with the estimate.
That is: (approximately) 95% of all random samples (of size n) produce an interval including p
within the bounds. When we obtain the random sample and compute the interval from it, we no
longer have anything random. At this point we state that we are (approximately) 95% confident
that p is within the interval bounds.
A couple restrictions are on this formula.
1. It should be applied to situations where units are randomly selected.
2.
If sampling is without replacement, check the 20 Times Rule – the population must be at
least 20 times the sample size to use this result. (If not, and you know the population size,
you can use the adjustment described a bit later in this section. However: small
populations are uncommon, and the adjustment is rarely needed. You should recognize
that when the population is not at least 20 times the sample size, then our recipe above for
error margin does not work.1)
3. The actual counts of Successes X and Failures (n – X ) must both be at least 10.2 (If not,
either you have too small a sample, or the Success probability is likely too close to 0 or 1,
for the Normal to be a decent approximation.) If this is not the case, you must seek
alternative strategies. (Minitab, and most other statistical software, can obtain theconfidence interval by an “exact” method that doesn’t require the Normal distribution.
When the counts X and (n – X ) both at least 10, the exact method and the approximate
interval given here will be quite similar.)
We state such an interval in one of four equivalent styles:
E p p E p ˆˆ E p E p ˆ,ˆ E p E p ˆtoˆ E p ˆ
The first is preferred, as it indicates what is being estimated: p. In the first through third versions,
the lower value is always stated first. Every confidence interval should be accompanied by an
interpretation that states the confidence level (here 95%).
1 In fact our formula gives too large an error margin. So in essence you can be more than 95% confident in our result
when its misapplied to situations where the population is small. Probably not the worse error in the world.2 Some sources use 5 in place of 10. 10 is better. 5 is somewhat OK, but if either of these values is between 5 and 9,
you’d be better off getting some help from a statistician, rather than using our methods.
statistical software. 2) The formula – along with trial and error and some examples – can
assist with your understanding of properties of confidence intervals.
Fact worth knowing
95% is the standard confidence level for scientific polls published in the media and online. If a
poll does not publish an error margin, you may assume that sampling is not random – the poll isnot a scientific one. Keep in mind also that many polls with stated error margins are not done
properly. You should have less than 95% confidence in results from such polls.
Nomenclature
There’s some terminology that goes with each of these quantities. (The terminology is useful
because it is generalized to other situations.)
The confidence level (or just confidence) is C . Usually C = 0.90, 0.95 or 0.99 – generally
we prefer to have a high amount of confidence in our statements. However: There is
nothing illegal or necessarily wrong about a 50% CI. (It’s just that 50% CIs miss the
target quantity half of the time.)
Don’t confuse the confidence level (or just plain confidence) with the confidence
interval. The confidence interval is the interval of values you obtain.
p̂ is the (point) estimate of p. It’s a single “good” estimate for p from the sample of data.
It is the prevalance of Successes in the sample.
2 z is the critical value (from the Standard Normal) that goes with C % confidence.
(Some reference materials use simpler notation like z * - and leave it to common sense
that this z is the one that goes with the confidence C .)
The two endpoints of the interval are the bounds: lower bound and upper bound. The
width W of a confidence interval is the distance from the lower to the upper bound. The
part in total is referred to as the “error margin” E . 2 E = W .
n
p p ˆ1ˆ is often called the “(estimated) standard error of p̂ .” A standard error is
essentially a standard deviation.5 Recall: Standard deviation measures “typical deviation
from mean.” The deviation of p̂ from its mean p is a (sampling) error. It’s common to see
the abbreviation SE for standard error. In this case:
n
p p
pSE
ˆ1ˆˆ
.
The error margin E for this interval can be expressed pSE z E ˆ2
.
5 This quantity really is an estimated standard deviation, as the standard deviation of p̂ is n p p 1 .
Many textbooks make the formulas look shorter by using a second letter q to stand for Failure
rate. So q = (1 – p) and n xn pq ˆ1ˆ . When this is done
n
q p
n
p p ˆˆˆ1ˆ
.
More on interpreting the interval
If you followed the development above, you can deduce the proper interpretation of a confidence
interval. It's also possible to take the justification for granted, and come to an interpretive
understanding.
Different samples give different results. Consider all possible samples. Obtain, for each sample,
a 95% confidence interval. Some of these intervals include p, some do not. Most do. In fact: 95%
of all of them do.
In a statistical study, a single sample is drawn randomly. The data are collected and summarized,
and a 95% confidence interval is computed. We have one sample - selected at random from the
collection of all samples. Because 95% of all samples lead to an interval that covers p, we are
95% confident that the particular interval we have covers p.
We use the word confidence, rather than probability. In statistical applications where parameters
are estimated, those parameters are thought of as fixed values describing populations. They do
not vary. The parameter p is either in the interval or not. There is no probability involved.
Where did the probability go?
There was probability - before the sample was selected. This is similar to tossing a coin. Beforeit's tossed the probability of a Head is 1/2. But once the toss is completed, the probability - for
that toss - is either 0 or 1, depending on the outcome. In this application, the probability is either
0 (the interval covers p) or 1 (the interval doesn't), depending on whether in fact the interval does
or does not cover p. Not knowing p, we cannot tell. All we know is that 95% of all samples yield
an interval covering p. So we are 95% confident that ours does. (Similarly, after the coin is
tossed, if you're unable to see the result, you can be 50% confident it's a Head. The word
probability doesn't apply here.)
In short: Use the word "probability" for random things that haven't yet taken place. Once they've
taken place, even if there are unknowns, use the word confidence. The unknowns merely reflect
Three values impact the error margin of a confidence interval.
1. the prevalence of Success ( p̂ )
2.
the sample size (n)
3. the confidence level (C )
Undertake an investigation: How do changes in each of these impact the error margin? These
issues are addressed through the exercises.
In some respects the properties you discover will convince you that statistics makes sense: The
numbers work out in ways that common sense would anticipate in advance (Common sense
would never anticipate the precise results.6 But certain procedural properties do make sense.
That's what you want to discover.)
What a Confidence Interval Cannot Do
Notice that the 95% in a 95% confidence interval refers to the percent of all samples that yield an
interval that covers p. If we choose one such sample at random, we’re 95% confident in that
result. The error margin in a confidence interval addresses errors due to random sampling.
The error margin in a confidence interval does not include the effects of other errors. Poorly
recorded data is one source of error. Or perhaps the study didn’t really sample randomly. In these
cases, quantifying sampling error is not enough.
All these other factors will lead to additional estimation error – error that is not captured by our
formula. So while you can still use the formula when other types of error are present, it doesn’t
give a 95% confidence interval. The actual confidence is unknown. For analyses involving
nonrandom data, the actual confidence will be considerably lower than 95%. That’s a real issue
in many studies.
Polling Refusals
Suppose (to oversimplify) that 88 million people approve of the President and 72 million
disapprove. So the President’s approval rating is p = 88/160 = 0.55.
A telephone poll is taken. But: The people that approve of the President are crankier than those
that do not. They are less likely to put up with an intruding phone call. In fact, 40% of the
approvers will not respond (that’s 35.2 million people). The disapprovers are more willing totake the call: only 10% of them will refuse (that’s 7.2 million people). Here’s the breakdown
6 In fact, given the randomness involved in selecting the samples, and the various other attributes that change from
problem to problem (n, p, x, as well as the size of the population), it is remarkable that the formula we have is so
Total 88.0 72.0 160.0The problem here is that our sample is going to reflect the views of only the responders. Of the
117.6 million responders, 52.8 million approve, for a rating of 52.8/117.6 = 0.45.
While people will be randomly called, those who refuse to respond will not be included in the
results. So: Our poll will be estimating 0.45. With a sample size of 1000, the error margin will be
around 0.03. While some samples will give results higher than 0.45, it is highly unlikely that
we’ll get a sample that produces a confidence interval including 0.55. After all: The interval is
designed to include 0.45.
This is an example of a biased estimation. A result is biased if it systematically [on average]
produces the wrong result. Yes: We could get a random sample that has unusually high amountsof Approvers, and luckily gives an interval including 0.55. But we are unlikely to do so, because
on average our estimate is 0.45 not 0.55. That’s what bias is: The average result from the
sampling procedure is not equal to the intended result.
If we know that nonresponse occurred with 40% probability among Approvers, and 10% among
Disapprovers, we could adjust the survey results accordingly, and produce an unbiased estimate.
But generally nonresponse rates are unknown, and the rate changes from survey to survey,
depending on what the issue is. It is difficult to adjust results to compensate for the nonresponse
issue.
Poll results are even harder to interpret when sampling is not done randomly. Internet pollschoose subjects by convenience and interest. Only people who care enough to vote will vote.
These people may be significantly different in their views than the population of interest. No
error margin can fix up such polls. (Hopefully results are stated without an error margin.)
Statistical Software
Statistical software will compute the confidence interval for p. All you need to do is input three
values: n, x, and the confidence C , along with specifying the method the software should use.
The interval you have learned is the (approximate) 1 sample Z interval for a proportion.
Good software has other choices – they use a different "formula" than that above. The formula
you have is approximate, and requires at least ten Successes and Failures to allow using the
Normal. For cases where this condition is not met, you may have statistical software compute the
interval using a different method/formula.7 In fact, where there are at least ten Successes and
7 Usually called the “Exact Binomial” interval or method.
Failures, you may use the alternative method in place of the 1 sample Z interval; you’ll get
slightly different results.8 For really large samples, these differences will be quite small.
One quirk about one method software may use: The intervals may not balanced: The estimate p̂
is not exactly in the middle of the interval. This is particularly noticeable for results with small
precents of either Successes or Failures. (If you have, say, 2 Successes in only 10 – 20 trials, thenultimately the value of p is quite small. So the distribution is clamped against the left edge of the
range of values, and has right skew. Skewness is an expression of imbance; so it’s no surprise
that the interval is not balanced. This is a case where you could not use the formula stated above
– 2 is too few Successes. The “both at least 10” restriction prevents use of a Normal
approximation when things aren’t at all close to Normal.) If both x and in n – x are large, the
interval – not matter which method is used – is nearly balanced about p̂ , and in fact, the exact
interval and the interval from your formula will give very similar results.
Sample Size Determination
The error margin for our confidence interval is n
p p z pSE z E
ˆ1ˆˆ
22
. This is
equivalent to p p E
z n ˆ1ˆ
2
2
. Suppose, prior to the study, we desire an error margin of E . If
we can produce a reasonable educated guess for the prevalence of Successes ( p̂ ), then an
appropriate minimum sample size for the study is p p E
z n ˆ1ˆ
2
2
.
Example 1Suppose you want to estimate the proportion of students at a large university who are
nearsighted. The prevalence for the general population is around 0.45. Use this as a guess to
determine how many students would need to be included in a random sample if you wanted the
error margin for a 95% confidence interval to be less than or equal to 2%.
Recall that the error margin quantifies the maximum reasonable difference between the observed
value p̂ and the population value p.
8 For really large samples, these differences will be quite small. One quirk about the Exact Binomial method: The
intervals may not balanced: The estimate is generally not exactly in the middle of the interval. This is particularly
noticeable for results with small prevalence of either Successes or Failures. (Any p other than 0.5 implies some
asymmetry, and these intervals reflect this.) If both x and n – x are large, the Exact Binomial interval is nearly
symmetric about the sample propotion, and in fact, the exact interval and the interval from your formula will give
the prevalence – we can use 0.5. This guarantees that actual error margin to be no larger than
what is desired. On the other hand, it also pretty much guarantees that we will take a larger
sample than is necessary (only if the prevalence turns out to be 50% will the sample be “just
large enough”).
Remark 3Whenever the range of plausible guesses includes 0.5, use 0.5 as the guess. This rule works
when one has no idea what the prevalence is: the range of plausible guesses is from 0 to 1,
which certainly includes 0.5.
Most two-candidate political races are reasonably close. Pollsters9 generally use 0.50 to
determine the sample size. Using a guess of 0.50 tends not to lead to dramatic oversampling
unless the result falls below 1/3 or above 2/3.
Example 3
Production line defects occur infrequently at an industrial plant. In the past the rate has generally been between 2% and 6% (this value would change over time as the production line, and the
employees working on it, change). What sample size is required to estimate the current rate at
90% confidence with error margin no larger than 4%?
If we assume 2%, then the required size
is 33; if 6% is assumed, the required size
is 96. Here’s a plot of the relationship.
(The relationship is not exactly linear.
However: Linear interpolation would
work well here. In general, as long as the
proportion is confined to a small range
of values to one side of 0.50,
interpolation does work fine.)
You can see that 6% requires the largest n (it’s closest to 0.5). To cover all historical
possibilities, use n = 96. If the rate is actually less than 0.06, you will have oversampled. What if
you sample less than 96? Perhaps a good idea, but if the rate is near 0.06 you won’t get the
desired error margin. And, of course, if production falls seriously out of control, you might see a
result much higher than 6% – leading to an error margin considerably larger than 0.04.
Remark 4
A good idea is to produce a range of plausible guesses, and find the sample size for a number of
values within that range. Graph this relationship. If the final decision isn’t yours, you can place
your graph in front of the decision maker.
9 People or organizations who are paid to conduct polls.
1. There are 8640 students enrolled at SUNY Oswego this semester; 5146 live more than 50
miles from campus. A professor (unaware of these figures) samples 92 students and finds
that 60 of them live more than 50 miles from campus.
a)
Identify the following: i) The population proportion p; ii) The sample count X ; iii) The
sample proportion p̂ .
b) Which of p and p̂ is a parameter? Which is a statistic?
A student (also unaware of the whole-campus figures) is about to randomly select 142
students to estimate the proportion who live more than 50 miles away.
c) For the student: What are the mean and standard deviation for p̂ ? Interpret this mean.
(Be sure to include the phrase “all possible samples” in your statement.)
2.
The saturation rate for a particular kind of marketing via a newspaper ad is 15%. That is:15% of all newspaper buyers will read the ad. For a new ad, marketers randomly sample 30
buyers and determines that 2 have read the ad.
a) Identify values for p, X , and p̂ .
b) Which of p and p̂ is a parameter? Which is a statistic?
For the following exercises, when you interpret results, use the word “all” or “population.”
3. A random sample of 212 adoptive parents finds that 85 of them stated “No Preference” for
their child’s gender. Use this sample data to construct a 95% confidence interval estimate for
the proportion of adoptive parents who state “No Preference.” Explicitly identify thefollowing:
a) The point estimate.
b) The critical value ( Z /2).
c) The error margin.
d) Write the interval bounds in this format: p̂ E .
e) Express the interval in this format: ________ < ________ < ________ .
f)
What confidence do you have in this result?
g) Explain what p represents in this situation. Is its value known?
h) p̂ : Parameter or Statistic? p: Parameter or Statistic?
i) Interpret your interval in words. We are 95% confident that…
4. The Genetics and IVF Institute conducted a clinical trial of the XSORT method designed to
increase the probability of conceiving a girl. 325 babies were born to parents using XSORT,
and 295 of them were girls. Use this data to construct a 99% confidence interval for the
proportion of girls born to parents using XSORT. Interpret your result.
5.
Do individuals have the ability to temporarily postpone death to survive a major holiday?(The hypothesis would be that these holidays are family affairs that give a dying person
incentive to live a bit longer.) In one study, 12000 deaths, over the period from one week
before to one week after Thanksgiving, were examined. Of these, 6062 occurred in the week
before Thanksgiving. Give a 95% confidence interval for the proportion of deaths in this two
week period that occur in the earlier week. Interpret your result. Does your data conclusively
support the “postpone death” theory? (Hint: Check where 0.5 lands relative to your interval.)
6. Complete the small table indicating which critical value from the Standard Normal table goes
with the given levels of confidence.
C 50% 75% 90% 95% 98% 99% 99.9% 99.99%
Z
/2 1.645 1.960 3.891
7. Over a period of 11 years in Hidalgo County, Texas, 870 people were selected for grand jury
duty, and 39% of them were Mexican-American. Notice that you are told the value of p̂ -
you don’t have to compute it: p̂ = 0.39. From this you can deduce that the number X of
Mexican-Americans in the sample. Since 0.39(870) = 339.3, the number must be 339 (it
can’t be 339.3 – you can’t select 3-10ths of a Mexican-American). The given value is
rounded for convenience: 339/870 = 0.3899 to four significant digits, and 0.390 to three,
which is sufficient for computing purposes.)
a)
Assume these data represent a random sample of jury-duty-eligible county citizens.
Obtain a 99% confidence for the percent of all county citizens that are Mexican-
American. Interpret your result.
b) It was determined that 79.1% of all county citizens were Mexican-American. What does
your confidence interval suggest about selection for jury duty?
8. Perform an investigation of the relationship between confidence and error margin. Here’s
how.
a) Take exercise 7, where n = 870 and p̂ = 0.390. You’ve already obtained a 99%
confidence interval: 0.390 0.043. The error margin is 0.043. Now obtain a 95%confidence interval; determine the error margin.
b) Compute intervals for each of the confidence levels specified in the table. Fill in the table
below with the error margin for the various levels of confidence.
c) Write a single sentence describing the relationship between the proportion p̂ and the
error margin of the confidence interval.
Parents’ Status # of grads p ˆ 95% CI Error Margin
Welfare 40
Poor 80
Low Income 120
Middle Income 200
High Income 280
Wealthy 320
Super rich 360
15.
Go back to problem 10. The 95% confidence interval is 0.3575 < p < 0.4223. The pointestimate for the proportion of green peas is p̂ = 0.7379 with error margin is 0.0358. State the
95% confidence interval for the proportion of peas that are yellow. You should be able to do
so using the results shown here and addition and subtraction.
16. A poll reveals that candidate D has 44.2% of sampled voters leaning towards D (error margin
3.5%). Remember: All media polls are done at 95% confidence unless stated otherwise.
a) Interpret this result.
b) Suppose there is only one other candidate, R. Give the 95% confidence interval for
candidate R.
c) Suppose there are instead three candidates, R, D and U. Is it possible to give an estimate
and error margin for candidate R?
d) In a two candidate race, is it possible that D is actually ahead? (Hint: Suppose U doesn’t
stand for a candidate, but indicates “Undecided.”)
17. A recent newspaper opinion poll found that 81% of Americans are in favor of a military
drawdown in Iraq (error margin 4%). Interpret this statement. Include a confidence level.
Summary
In #12 above you confirmed that (assuming the sample size and confidence stay the same) the
error margin is largest when p̂ = 0.5 and gets smaller as the proportion falls away from 0.5. This
makes intuitive sense: When the prevalence is near 50% there is more uncertainty; when the
prevalence is near 0 (or 1) there is more certainty.
18. At SUNY Oswego n = 125 students are randomly selected; 100 of them are opposed to a
proposal that calls for the college to jam cell phone signals in classrooms. (This would
prevent texting in class.) You can confirm that a 90% confidence interval is
0.741 < p < 0.859.
a)
Give the values of the point estimate and error margin for the interval. b) This survey was also conducted at Penn State University. The results of the survey were
exactly the same: 100 of 125 sampled students opposed the proposal. What is the 90%
confidence interval for Penn State?
c) It should be noted that Penn State has about 5 times more students than does SUNY
Oswego. What impact does the population size have on the error margin for a confidence
interval? Consult the formula: Where does the population size play in to matters?
19. Jake, John and Jaspar are conducting a study: What proportion of SUNY Oswego students
stay in Oswego over the Halloween weekend.
a)
Describe in words what p represents. Is p a parameter or a statistic? Would the value of p
be easy to obtain?
b) They sample students randomly: 56 of 80 sampled students stay in Oswego. This gives
p̂ = 0.70. Is this value a statistic or a parameter?
c) Jake determines correctly that the error margin for a 95% confidence interval is 0.100.
The confidence interval is 0.600 < p < 0.800. John says “I think we should use a 90%
confidence level.” If John’s directive is followed, will the error margin increase or
decrease?
d)
Jaspar too is frustrated by these results. He’s OK with the 90% confidence. But for him,the error margin of 10% is too large. Jaspar says “I want an error margin of about 2%.”
Assuming the result falls at 70%, is a larger or smaller sample required to achieve an
error margin of 0.02? What sample size is required to achieve an error margin of 0.02?
20. In one study of college students, 83.0% admitted to having cheated on a test, with an error
margin of 7.0% (using 95% confidence). A second similar study found the same result of
83.0%; however, the sample size was three times as large. Is the error margin for the second
study (again at 95% confidence) larger or smaller than 7.0%? What is the error margin for
the second study?
21.
Surveys of people were taken in three countries: Mexico, the United States, and Canada. Thesame number of people were surveyed within each country. In the U.S., 40% of people
agreed with the statement “There is an urgent need to take action on global warming.” In
Mexico the result was 25%; in Canada 80%.
a) Does the size of the country’s population have anything to do with the error margin?
b) Convince yourself that the answer to part a is “No.” For which of these countries is the
error margin for a confidence interval the largest? The smallest? (Assume the same
confidence level is used for all three results.)
22. You want a 95% confidence interval estimate with error margin 4% for the proportion of
science majors who are left handed. How many science majors do you sample?a) Describe in words the parameter you are estimating. What symbol is it given?
b) Assume you have no idea what the prevalence of lefties is for this population. Use a
guess of 0.5 to determine the required sample size.
c) In the general population, 10% of people are lefties. Use this value to determine the
sample size.
d) Which of the answers from c or d is the better choice?
e) It turns out that 24 of 217 sampled science majors are lefties. Obtain the confidence
interval. How does the error margin compare to 4%?
23. Suppose you undertook a study of the day of the week that babies are born. You are
interested in the proportion of babies born on a weekend (Saturday or Sunday). Your goal is
a 90% confidence interval with error margin no greater than 3.5%.
a) Explain why a guess of 0.50 is unreasonable.
b) What is a better value for this guess?
c) In fact, 25% is probably an adequate value for the guess. If “all days are equally likely,”
then 2/7 = 28.6% should be born on weekends. However, in recent years there is more of
a trend for doctors to induce pregnancy, which usually happens on a weekday! Use 0.25to obtain a sample size for this study.
d) If the actual prevalence is p̂ = 0.25, determine the confidence interval when the sample
size from c is used. Identify the error margin – does it meet the goal of 0.035? What
would such a result say about the 2/7 hypothesis?
e) If the actual prevalence is 0.20 and the sample size from c is used, how will the error
margin compare to 0.035? Explain.
24. Consider a large city’s mayoral race where there are two candidates.
a)
Determine the required sample size for a media poll to estimate the percent of peoplewho favor the Republican candidate with error margin 3%.
b) Does the required sample size depend on the population of the city?
25. What proportion of people die during summer (as officially defined)? You decide to
investigate this issue by collecting data. How many obituaries would you examine in order to
obtain a 98% confidence interval estimate with error margin of 1%?
b) They are the same for each pair. A Success prevalence of 90% is equivalent to a Failure
prevalence of 10%, so the error margins must be the same.
c) Error margin is largest for prevalence p̂ = 0.5 and drops (symmetrically) as the prevalence
gets further from 0.5 – on either side of 0.5.
15. Yellow has prevalence 1 – 0.7379 = 0.2621. So the interval is 0.2621 0.0358. Or take
1 – 0.702 = 0.298 and 1 – 0.774 = 0.226 to get (0.226, 0.298).
16. a) I am 95% confident that between 40.7% and 47.7% of all voters lean towards D.
b) 55.8% 3.5%. c) No. We don’t know how the 55.8% is split up. d) Yes. If there are, for
instance, 20% undecided, then the result for R is around 35.8%.
17. This is a media poll – the confidence is 95%. I am 95% confident that between 77% and 85%
of all Americans favor a drawdown.
18. a) The point estimate is 0.800, the error margin is 0.059. b) The interval at Penn State is
exactly the same. c) The population size does not play into this. The formula for error margindepends only upon the prevalence p̂ and the sample size n. This is an underappreciated fact
about sampling and statistical analysis: As long as a population is “large” (at least 20 times
bigger than the sample), its size is pretty much immaterial. What matters in most practical
situations is the sample size.
19. a) p is the proportion of all SUNY Oswego students who stay in Oswego. It’s a parameter.
It’d be difficult to get this value – you’d have to census virtually every student. b) 0.70 is a
statistic – it describes a sample. c) For a 90% confidence interval the error margin will be
smaller. (See #7.) d) To get an error margin that is 5 times smaller will require a sample size that
is 52 = 25 times larger. (See exercises 11 – 13.) That’s 2000 students.
20. The error margin for a larger sample size will be smaller. It will not be three times smaller. It
will be 3 = 1.732 times smaller: 0.07 / 1.732 = 0.0404 – about 4%. (See exercises 11 – 13.)
21. a) No. b) The error margin is smallest for Canada and largest for the United States. (Go back
and examine #14.)
22. a) The symbol is p. p = the proportion of all science majors at this university who are left
handed. b) 25.6005.05.004.0
96.1 2
n . Select 601 science majors.
c) 09.2169.01.004.0
96.1 2
n . Select 217 science majors. d) 217 is the better choice. The lefty
rate for scientists is going to be fairly close to that for the general population. (Not only that, but
the sample size is smaller!) e) 0.1106 0.0417. Pretty close to 0.04 for the error margin. It
missed a little because the actual lefty rate was slightly closer to 0.5 than the guess of 0.10 that