Top Banner
According to the Census Bureau's 2007 Current Population Survey, the mean and median income of people at least 25 years old who had a bachelor's degree but no higher degree were $46,453 and $58,886 (not necessarily in that order). 1 . Which of these numbers is the mean and which is the median? Explain your reasoning. A. The median is $58,886 and the mean is $46,453. This is because economic variables are usually skewed to the left, which pulls the mean above the median. B. The mean is $58,886 and the median is $46,453. This is because economic variables are usually skewed to the left, which pulls the mean above the median. C. The median is $58,886 and the mean is $46,453. This is because economic variables are usually skewed to the right, which pulls the mean above the median. D. The mean is $58,886 and the median is $46,453. This is because economic variables are usually skewed to the right, which pulls the mean above the median. 2. Retirement seems a long way off and we need money now, so saving for retirement is hard. Among households with an employed person aged 21 to 64, only 63% own a retirement account. The mean value in these accounts is $112,300, but the median value is just $31,600. For people 55 or older, the mean is $222,100 and the median is $64,400. What explains the differences between the two measures of center? A. The distributions are probably right-skewed, because most of those with retirement savings have not saved much (giving low medians), but a few have saved hundreds of thousands or more (thus pulling the means up sharply.) B. The distributions are probably left-skewed, because most of those with retirement savings have not saved much (giving low medians), but a few have saved hundreds of thousands or more (thus pulling the means up sharply.) C. The distributions are probably right-skewed, because most of those with retirement savings have saved hundreds of thousands or more (giving high means), but a few have saved very small amounts (giving small medians). D. The distributions are probably left-skewed, because most of those with retirement savings have saved hundreds of thousands or more (giving high means), but a few have saved very small amounts (giving small medians). The National Association of College and University Business Officers collects data on college endowments. In 2007, 785 colleges and universities reported the value of their endowments. When the endowment values are arranged in order, what are the positions of the median and the quartiles in this ordered list? Note, use half integers to represent results in between actual positions. Be
31
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Stat Homework 2

According to the Census Bureau's 2007 Current Population Survey, the mean and median income of people at least 25 years old who had a bachelor's degree but no higher degree were $46,453 and $58,886 (not necessarily in that order).

1. Which of these numbers is the mean and which is the median? Explain your reasoning.

A. The median is $58,886 and the mean is $46,453. This is because economic variables are usually skewed to the left, which pulls the mean above the median.

B. The mean is $58,886 and the median is $46,453. This is because economic variables are usually skewed to the left, which pulls the mean above the median.

C. The median is $58,886 and the mean is $46,453. This is because economic variables are usually skewed to the right, which pulls the mean above the median.

D. The mean is $58,886 and the median is $46,453. This is because economic variables are usually skewed to the right, which pulls the mean above the median.

2. Retirement seems a long way off and we need money now, so saving for retirement is hard.Among households with an employed person aged 21 to 64, only 63% own a retirement account.The mean value in these accounts is $112,300, but the median value is just $31,600. For people 55 or older, the mean is $222,100 and the median is $64,400. What explains the differences between the two measures of center?

A. The distributions are probably right-skewed, because most of those with retirement savings have not saved much (giving low medians), but a few have saved hundreds of thousands or more (thus pulling the means up sharply.)

B. The distributions are probably left-skewed, because most of those with retirement savings have not saved much (giving low medians), but a few have saved hundreds of thousands or more (thus pulling the means up sharply.)

C. The distributions are probably right-skewed, because most of those with retirement savings have saved hundreds of thousands or more (giving high means), but a few have saved very small amounts (giving small medians).

D. The distributions are probably left-skewed, because most of those with retirement savings have saved hundreds of thousands or more (giving high means), but a few have saved very small amounts (giving small medians).

The National Association of College and University Business Officers collects data on college endowments. In 2007, 785 colleges and universities reported the value of their endowments. When the endowment values are arranged in order, what are the positions of the median and the quartiles in this ordered list? Note, use half integers to represent results in between actual positions. Be sure you calculate your results manually exactly as described in the text and not using software which may have slightly different definitions for the median and quartiles.

3. The median is in position (Answer to 1 decimal place)

Answer

The median's position is calculated using the formula (n + 1)/2 = 393, with n = 741 being the number of observations.

4. The first quartile is in position (Answer to 1 decimal place)

Page 2: Stat Homework 2

Answer 196.5

5. The third quartile is in position (Answer to 1 decimal place)

Answer 589.5

Here is the distribution of the weight at birth for all babies born in the United States in 2005:

Weight Count Weight CountLess than 500 grams 6,599 3,000 to 3,499 grams 1,596,944500 to 999 grams 23,864 3,500 to 3,999 grams 1,114,8871,000 to 1,499 grams 31,325 4,000 to 4,499 grams 289,0981,500 to 1,999 grams 66,453 4,500 to 4,999 grams 42,1192,000 to 2,499 grams 210,324 5,000 to 5,499 grams 4,7152,500 to 2,999 grams 748,042

6. For comparison with other years and with other countries, we prefer a histogram of the percents in each weight class rather than the counts. Explain why.

A. The use of percents will help us find outlier years/countries where the columns of the histogram don't add up to 100%.

B. Calculating percents makes it easier to display the data using a pie graph.C. Different years and countries may have different overall numbers of newborns, making a

comparison based on the absolute numbers difficult.D. None of the answers are correct.

The correct answer is C.A - By definition, if a histogram is plotted correctly and encompasses all of the data, then all of the columns have to add up to the total number of observations or to 100%. Anything else is a mistake.B - A pie graph is not used to represent distributions.D - Answer C is correct.

Points Earned: 1/1

Correct Answer: C

Your Response: C

7. How many babies were there?

Correct Answer: 4,134,370

8.

Make a histogram of the distribution, using percents on the vertical scale. Choose the correct histogram below.

Page 3: Stat Homework 2

A. Histogram I.B. Histogram II.C. Histogram III.D. Histogram IV.

Histogram II is the correct one. It is easily identified by the relative heights of the three largest classes.

Points Earned: 1/1

Correct Answer: B

Your Response: B

9. What are the positions of the median and quartiles in the ordered list of all birth weights? Match your results below.

1.

1,033,593

2.

1,004,684.5

6. 2,067,185.5

7. 3,100,778

8. 3,014,051.5

Page 4: Stat Homework 2

3.

1,004,685

4.

2,009,366.5

5.

2,009,367

9. 3,014,052

10. 3,014,052.5

A.The first quartile's position is

B.The median's position is

C.The third quartile's position is

There are a total of n = 4,134,370 observations. The median's position is (n + 1)/2 = 2,067,185.5.The first quartile's position is calculated as the median of the first 2,067,185 observations which gives (2,067,185 + 1)/2 = 1,033,593.The third quartile's position is calculated as the median of the last 2,067,185 observations which gives 2,067,185 + 1,033,593 = 3,100,778.

Points Earned: 0/3

Correct Answer: A:1, B:6, C:7

Your Response: A:3, B:5, C:8

10. In which weight classes do the median and quartiles fall?

1.

Less than 500 grams

2.

500 to 999 grams

3.

1,000 to 1,499 grams

4.

1,500 to 1,999 grams

5.

2,000 to 2,499 grams

6.

2,500 to 2,999 grams

7. 3,000 to 3,499 grams

8. 3,500 to 3,999 grams

9. 4,000 to 4,499 grams

10. 4,500 to 4,999 grams

11. 5,000 to 5,499 grams

A.The first quartile's class is

B.The median's class is

C.The third quartile's class is

After finding the positions of the median and quartiles, we can find the associated classes by summing up the total number of observations needed to reach each class to find the positon of the beginning of each class. The following table summarizes the starting positions of the classes.

Weight Starts at Position

Less than 500 grams 1

500 to 999 grams 6,269

1,000 to 1,499 grams 29,114

1,500 to 1,999 grams 58,545

2,000 to 2,499 grams 120,197

2,500 to 2,999 grams 314,078

Page 5: Stat Homework 2

3,000 to 3,499 grams 1,002,708

3,500 to 3,999 grams 2,524,592

4,000 to 4,499 grams 3,650,551

4,500 to 4,999 grams 3,964,733

5,000 to 5,499 grams 4,013,339

Using the result for the median's position 2,009,367.5 we see that it is in the class "3,000 to 3,499 grams". Similarly, the first quartile (in position 1,004,684) falls in the class "3,000 to 3,499 grams", while the third quartile (in position 3,014,051 ) is in the class "3,500 to 3,999 grams".

Points Earned: 0/3

Correct Answer: A:6, B:7, C:8

Your Response: A:3, B:6, C:9

We asked the students in a large first-year college class how many minutes they studied on a typical weeknight. Here are the responses of random samples of 30 women and 30 men from the class:

Women Men180

120180

360 240 90 120 30 90 200

120

180120

240 170 90 45 30 120 75

150

120180

180 150150

120 60 240 300

200

150180

150 180240

60120

60 30

120

60120

180 180 30 230120

95 150

90 240180

115 120 0 200120

120 180

Data set

The most common methods for formal comparison of two groups use x and s to summarize the data.

11. What kinds of distributions are best summarized by x and s ?

A. Skewed distributions without outliers.B. Distributions that are fairly symmetric and free of outliers.C. Symmetric distributions, outliers make no difference.D. Distributions of economic variables, since they are usually skewed to the right.

Both the mean and the standard deviation are not resistant measures, meaning that they are highly influenced by outliers and skewedness. Therefore only symmetric distributions without any outliers are good candidates for using the mean and standard deviation - Answer B.

Points Earned: 1/1

Correct Answer: B

12. One over-zealous student in each group claimed to study at least 300 minutes (five hours) per night. Let's check their influence on x and s. By how much does removing these observations change x for the men's group? Note that negative results indicate a decrease in x when the over-zealous student was removed.

Page 6: Stat Homework 2

A. 12.86B. 7.36C. -6.30D. -7.36

The mean for all of the men is 117.17, while removing the over-zealous student gives 110.86, for an overall change of 110.86 − 117.16 = -6.30.

13. By how much does removing the over-zealous student change s for the men's group?

A. -66.88B. 6.30C. -6.30D. -7.36

The standard deviation for all of the men is 74.24, while removing the over-zealous student gives 66.88, for an overall change of 66.88 − 74.24 = -7.36.

14. By how much does removing the over-zealous student change x for the women's group?

A. 6.30B. -12.86C. -6.30D. -6.72

The mean for all of the women is 165.17, while removing the over-zealous student gives 158.45, for an overall change of 158.45 − 165.17 = -6.72.

15. By how much does removing the over-zealous student change s for the women's group?

A. -66.88B. -12.86C. -6.30D. -7.36

The standard deviation for all of the women is 56.51, while removing the over-zealous student gives 43.65, for an overall change of 43.65 − 56.51 = -12.86.

Here are the survival times in days of 72 guinea pigs after they were injected with infectious bacteria in a medical experiment. Survival times, whether of machines under stress or cancer patients after treatment, usually have distributions that are skewed to the right.

43 45 53 56 56 57 58 66 67 73 74 79

80 80 81 81 81 82 83 83 84 88 89 91

91 92 92 97 99 99 100 100 101 102 102 102

103 104 107 108 109 113 114 118 121 123 126 128

137 138 139 144 145 147 156 162 174 178 179 184

191 198 211 214 243 249 329 380 403 511 522 598Data set

16.

Make a histogram of the distribution using classes 50 days wide (for example the second class has values 50 < days ≤ 100). Which of the histograms below correctly describes the distribution?

Page 7: Stat Homework 2

A. Histogram I.B. Histogram II.C. Histogram III.D. Histogram IV.

The correct choice is Histogram III. Make sure you chose the classes exactly as specified. Note that the second class (50 < days ≤ 100) has 30 guinea pigs, and Histogram III is the only one that reflects this.

Points Earned: 1/1

Correct Answer: C

Your Response: C

17. Describe the distribution's main features. Mark the appropriate features below.

A. Right skewed.B. Symmetrical.C. Left skewed.D. Single peaked.

Page 8: Stat Homework 2

E. Double peaked.F. None of the answers are correct.

The distribution is best described as right skewed with a single main peak.

Points Earned: 1/2

Correct Answer: A, D

Your Response: A

18. Which numerical summary would you choose for these data?

A. Mean and standard deviation.B. Five-number summary.C. None of the answers are correct.

Since the distribution is single peaked a numerical summary is applicable. The skewedness of the distribution means that the five-number summary is better suited than the mean and standard deviation (both of which are not resistant to skewed tails and outliers).

Points Earned: 0/1

Correct Answer: B

Your Response: A

19. Calculate your chosen summary. Mark numerical measures that are not relevant to your numerical summary as so. Note that the five-number summary may vary slightly depending on the definitions used by different calculator/software applications. Therefore if applicable, calculate it manually exactly as described by the procedures in the text. As for the standard deviation, if it's relevant, make sure that you calculate it as defined in the text, dividing by (n− 1) and not by n as done by some calculators/software applications.

1.

42

2.

43

3.

43.5

4.

81.5

5.

82.5

6.

102.5

7.

103

8.

103.5

9. 151.5

10. 153

11. 598

12. Not Relevant.

A.Mean.

B.Standard deviation.

C.Minimum.

D.First Quartile.

E.Median.

F.Third Quartile.

G.Maximum.

The correct numerical measure is the five-number summary. Refer to examples 2.3 and 2.5 for explanations on how to calculate the median and quartiles.

Points Earned: 5/7

Correct Answer: A:12, B:12, C:2, D:5, E:6, F:9, G:11

Page 9: Stat Homework 2

Your Response: A:12, B:12, C:1, D:3, E:6, F:9, G:11

The table below gives the mean number of births in the United States on each day of the week during an entire year.

Day BirthsSunday 7,374Monday 11,704Tuesday 13,169Wednesday 13,038Thursday 13,013Friday 12,664Saturday 8,459

Data set

20. Based on these boxplots, give a more detailed description of how births depend on the day of the week. Mark the correct answers below.

A. There is a marked drop in weekend birthrates, with at least 75% percent of the weekday observations not overlapping with at least 75% of the weekend observations.

B. There is a marked drop in weekend birthrates, with no overlap between the weekend and weekday observations.

C. There is a marked drop in weekend birthrates, with an overlap of more than 75% between the weekend and weekday observations.

D. All of the days have highly skewed distributions.E. The weekend days have similar distributions.

Page 10: Stat Homework 2

F. Most weekdays have similar distributions.The correct answers are A, E, and F.A - Note that there is no overlap between weekend observations below the third quartile and weekday observations above the first quartile, meaning that at least 75% of the weekend observations don't overlap with at least 75% of the weekday observations.B - Is wrong since there are overlapping observations between the weekends and weekdays, as can be seen by the minimal number of births during weekdays that overlap with the weekend distributions and the maximal number of weekend births that overlap with the weekday distributions.C - Is wrong, see explanation for A.D - Is wrong, since most days have fairly symmetrical distributions as can be seen by the median falling almost exactly in between the quartiles. The only possible exception is Tuesday, which has a slight right-hand skew.E, F - Are correct, since in general the weekday distributions overlap between themselves, as do the weekend distributions.

Points Earned: 1/3

Correct Answer: A, E, F

Your Response: A, C, F

21. A report says that "the median credit card debt of American households is zero."We know that many households have large amounts of credit card debt.Explain how the median debt can nonetheless be zero.Choose the most plausible explanation:

A. The median debt can nonetheless be zero because it is not a resistant measure.B. The median debt is zero because the distribution is left-skewed.C. The median debt is zero because the first and the third quartiles are probably

equal.D. The median debt is zero because more than half of credit card debts are zero.

Households with no credit cards, as well as those which pay off the balance each month, have no credit card debt.If we list the credit card debt figures for all American households, more than half of the numbers in that list equal zero, so the median is zero.

Points Earned: 1/1

Correct Answer: D

Your Response: D

This is a standard deviation contest. You must choose four numbers from the whole numbers 0 to 10, with repeats allowed.

22. Choose four numbers that have the smallest possible standard deviation. What is s in this case? Round your answer to 3 decimal digits.

Answer

As long as you choose 4 identical number, the standard deviation will be zero.

Points Earned: 0/1

Correct Answer: 0.000

Your Response: 0,1,2,3

Page 11: Stat Homework 2

23. Is there more than one possibility for choosing four numbers that have the smallest possible standard deviation?

A. Yes.B. No.

As long as you choose 4 identical number, the standard deviation will be zero, leaving us with 11 possible choices in the range 0 to 10.

Points Earned: 0/1

Correct Answer: A

Your Response:

24. Choose four numbers that have the largest possible standard deviation. Match your choice of numbers below in rising order. Pay attention that the number 0 is the 11th choice.

1.

1

2.

2

3.

3

4.

4

5.

5

6.

6

7. 7

8. 8

9. 9

10. 10

11. 0

A.First number (smallest).

B.Second number.

C.Third number.

D.Fourth number (largest).

See explanation in next question.

Points Earned: 2/4

Correct Answer: A:11, B:11, C:10, D:10

Your Response: A:11, B:3, C:7, D:10

25. Is there more than one way to choose four numbers that give the largest possible standard deviation?

A. Yes.B. No.

The choice that gives the maximal standard deviation (which turns out to be 5.774) is by choosing (0,0,10,10). Let see how we arrived at this result. It is clear that in order to get the maximal standard deviation the distribution of numbers should have the largest spread and therefore it should

Page 12: Stat Homework 2

consist of the numbers that are the furthest apart, namely 0 and 10. This leaves us with three combinations to check:(0, 0, 0,10), s = 5(0, 0,10,10), s = 5.774(0,10,10,10), s = 5

Points Earned: 1/1

Correct Answer: B

Your Response: B

26. What is the value of the largest possible standard deviation? Round your answer to 2 decimal digits.

Answer

The choice of numbers for the maximal standard deviation is (0,0,10,10), see explanation in previous question. These give a standard deviation of 5.77. Make sure that when calculating the standard deviation, you divide by (n − 1) and not by n as done by some calculators/software applications. See Example 2.7 for a detailed calculation of the standard deviation.

Points Earned: 0/1

Correct Answer: 5.77

Your Response:

In 2007, the Boston Red Sox won the World Series for the second time in 4 years. The table below gives the salaries of the Red Sox players as of opening day of the 2007 season.

Data set

27.

Describe the distribution of salaries with a histogram using classes 2 million dollars wide. Which of the histograms below depicts the distribution correctly?

Table 2.2Salaries for the 2007 Boston Red Sox World Series team

Player Salary Player Salary Player Salary

Josh Beckett $6,666,667Jon Lester $384,000Jonathan Papelbon $425,000

Alex Cora $2,000,000Javier Lopez $402,000Dustin Pedroia $380,000

Coco Crisp $3,833,333Mike Lowell $9,000,000Manny Ramirez $17,016,381

Manny Delcarmen $380,000Julio Lugo $8,250,000Curt Schilling $13,000,000

J.D. Drew $14,400,000Daisuke Matsuzaka $6,333,333Kyle Snyder $535,000

Jacoby Ellsbury $380,000Doug Mirabelli $750,,000Mike Timlin $2,800,000

Eric Gagne $6,000,000Hideki Okajimi $1,225,000Jason Varitek $11,000,000

Eric Hinske $5,725,000David Ortiz $13,250,000Kevin Youkilis $424,000

Bobby Kielty $2,100,000

Page 13: Stat Homework 2

A. Histogram I.B. Histogram II.C. Histogram III.D. Histogram VI.

The correct answer is Histogram II.

Points Earned: 0/1

Correct Answer: B

Your Response: C

28. Which numerical summary would you choose for these data?

A. Mean and standard deviation.B. Five-number summary.C. Both are equally suited.

Page 14: Stat Homework 2

The skewedness of the distribution means that the five-number summary is better suited than the mean and standard deviation (both of which are not resistant to skewed tails and outliers).

Points Earned: 1/1

Correct Answer: B

Your Response: B

29. Calculate your chosen summary. Mark numerical measures that are not relevant to your numerical summary as so. Note that the five-number summary may vary slightly depending on the definitions used by different calculator/software applications. Therefore if applicable, calculate it manually exactly as described by the procedures in the text. As for the standard deviation, if it's relevant, make sure that you calculate it as defined in the text, dividing by (n− 1) and not by n as done by some calculators/software applications.

1.

$380,000

2.

$850,000

3.

$1,175,000

4.

$424,500

5.

$1,850,000

6.

$2,800,000

7.

$5,234,351

8.

$4,630,838

9. $5,066,389

10. $8,625,000

11. $17,016,381

12. Not Relevant.

A.Mean.

B.Standard deviation.

C.Minimum.

D.First Quartile.

E.Median.

F.Third Quartile.

G.Maximum.

The correct numerical measure is the five-number summary. Refer to examples 2.2 and 2.4 for explanations on how to calculate the median and quartiles.

Points Earned: 3/7

Correct Answer: A:9, B:7, C:1, D:4, E:6, F:10, G:11

Your Response: A:-, B:-, C:1, D:-, E:-, F:10, G:11

30. Based on your graph and numerical summary, describe the distribution's main features. Mark the appropriate features below.

A. Right skewed.B. Symmetrical.C. Left skewed.D. None of the answers are correct.E. There are outliers.F. There are no outliers.

The distribution is best described as right skewed with several outliers.

Points Earned: 0/2

Page 15: Stat Homework 2

Correct Answer: A, F

Your Response: C

How well have stocks done over the past generation? The Standard & Poor's 500 stock index describes the average performance of the stocks of 500 leading companies. Because the average is weighted by the total market value of each company's stock, the index emphasizes larger companies. Here are the real (that is, adjusted for the changing buying power of the dollar) returns on the S&P 500 for the years 1971 to 2006:

Data setWhat can you say about the distribution of real returns on stocks? Follow the four-step process in your answer.

31.

STATE: Which of the options below clearly states the practical question we are trying to answer from the available data?

A. If you had $1 in the beginning of 1972, how many dollars would you have by the end of 2006?

B. What is the likelihood of making a profit by investing in the stock market?C. How can we describe the distribution of returns on stocks (shape, center and spread)?D. Is it better to invest in large companies or in the smaller ones?

The correct answer is C. The others are wrong for the following reasons:A - Eventhough we can get the answer from the data, this tells us nothing on the distribution of returns, which is what we're trying to answer.B - This still doesn't relate directly to the distribution of returns.D - Is not the question asked, and the data can not provide an answer to it.

Points Earned: 1/1

Correct Answer: C

Your Response: C

32.

FORMULATE: Which of the following statistical methods are relevant in this particular case? Select the applicable methods below. This is a general question, answer it in the context of the STATE step.

A. Use numerical measures such as the five-number summary or the mean and standard deviation to describe the distribution.

B. Plot the data using histograms or stemplots.C. Plot the data using a time plot.D. Use a pie chart to get a feeling for the shape of the distribution.E. Use a bar graph to get a feeling for the shape of the distribution.F. Look for trends and cyclical behavior in the time plot.

Page 16: Stat Homework 2

According to the STATE step, we are interested in describing the shape of the distribution. Therefore we first need to plot it using a histogram or stemplot (time plots, bar graphs and pie charts are not applicable to distributions), and then we could describe the distribution using numerical measures such as the mean and standard deviation or the five-number summary, depending on the exact shape of the distribution.

Points Earned: 1/2

Correct Answer: A, B

Your Response: B

33.

SOLVE: Plot the data using a histogram with classes 10% wide. Compare your result to the histograms below and chose the correct one.

A. Histogram I.B. Histogram II.C. Histogram III.D. Histogram IV.

Histogram I is the correct answer.

Points Earned: 0/1

Correct Answer: A

Your Response: B

Page 17: Stat Homework 2

34. SOLVE (continued): Which numerical summary would you choose for these data?

A. Mean and standard deviation.B. Five-number summary.C. Neither of the above.

The distribution has a relatively regular single-peaked shape, and therefore numerical summaries are applicable. The skewedness of the distribution means that the five-number summary is better suited than the mean and standard deviation (both of which are not resistant to skewed tails and outliers).

Points Earned: 1/1

Correct Answer: B

Your Response: B

35. SOLVE (continued): Calculate your chosen summary. Mark numerical measures that are not relevant to your numerical summary as so. Note that the five-number summary may vary slightly depending on the definitions used by different calculator/software applications. Therefore if applicable, calculate it manually exactly as described by the procedures in the text. As for the standard deviation, if it's relevant, make sure that you calculate it as defined in the text, dividing by (n− 1) and not by n as done by some calculators/software applications.

1.

-34.5400%

2.

-5.4715%

3.

-2.2640%

4.

7.9245%

5.

11.6770%

6.

17.7560%

7.

19.0085%

8.

22.4145%

9. 26.3105%

10. 26.5345%

11. 34.1670%

12. Not Relevant.

A.Mean.

B.Standard deviation.

C.Minimum.

D.First Quartile.

E.Median.

F.Third Quartile.

G.Maximum.

The correct numerical measure is the five-number summary. Refer to examples 2.2 and 2.4 for explanations on how to calculate the median and quartiles.

Points Earned: 0/7

Correct Answer: A:12, B:12, C:1, D:2, E:5, F:8, G:11

Your Response: A:-, B:-, C:-, D:-, E:-, F:-, G:-

36.

CONCLUDE: Which of the following are conclusions you can draw based on your statistical analysis?

A. The distribution is right skewed, just like most economic variables.B. If you invested 1$ in the stock market in 1972, by 2006 you would have $7.69.

Page 18: Stat Homework 2

C. On average, bigger companies have higher returns than small ones.D. The distribution has a left skew.E. The center of the stock market returns distribution is positive.F. In more than half of the surveyed years, the stock returns where above 10%.

The correct answers are D, E, and F. Answer F is a direct consequence of the median being 11.677%A - Is wrong, since the distribution is left-skewed.The rest of the answers are non-relevant and/or do not answer our the question from the STATE step. Some of them jump ahead to conclusions that cannot be based on the data at hand.

Points Earned: 0/3

Correct Answer: D, E, F

Your Response:

People gain weight when they take in more energy from food than they expend.Table 2.4 compares volunteer subjects who were lean with others who were mildly obese.

None of the subjects followed an exercise program.The subjects wore sensors that recorded every move for 10 days.The table shows the average minutes per day spent in activity (standing and walking) and in lying down.Compare the distributions of time spent actively for lean and obese subjects and also the distributions of time spent lying down.How does the behavior of lean and mildly obese people differ?

37. State: Which of the options below clearly states the practical question we are trying to answer from the available data?

A. Do lean people spend more energy than obese people in daily activities?B. How do lean and obese people differ in time spent in activity and in time spent lying

down?C. Are there differences in time spent by each group in the two activities?D. Compare the two groups for the difference between energy they take from food and the

energy they expend in daily activities.State: How do lean and obese people differ in time spent in activity and in time spent lying down?

Points Earned: 0/1

Correct Answer: B

Page 19: Stat Homework 2

Your Response:

38. Plan: Which of the options below is most appropriate for planning your statistical analysis?

A. Compare each pair of distributions using graphs.B. Compare each pair of distributions using graphs, means and standard deviations.C. Compare each pair of distributions using numerical summaries.D. Compare each pair of distributions by first using graphs and then numerical

summaries.Plan: We will compare each pair of distributions using graphs and numerical summaries.

Points Earned: 0/1

Correct Answer: D

Your Response:

39. Solve: Draw back-to-back stemplots.Choose the option that best describes your stemplots.

A. None of the stemplots show any particular skewness.B. None of the stemplots show any particular skewness but there are some outliers.C. The distributions are sharply skewed to the left but no outliers are apparent.D. The "Time active-lean" group is considerably skewed, but the other distributions

are quite symmetric.Solve: Below are two back-to-back stemplots; histograms or boxplots could also be used. None of the stemplots show any particular skewness.

Points Earned: 0/1

Correct Answer: A

Your Response:

40. Solve: Which of the options below is the most appropriate numerical summary for these data?

A. Five-number summary.B. Means and standard deviations.C. Medians and standard deviations.D. Five-number summary and means and standard deviations.

Since none of the distributions show particular skewness, either means and standard deviations or five-number summaries would be suitable.

Points Earned: 1/1

Page 20: Stat Homework 2

Correct Answer: D

Your Response: D

41. Conclude: The means, standard deviations and five-number summaries of the distributions are shown below:

What is your conclusion based on this analysis?True or False:"There is no noticeable difference between the two groups of people, in time spent in activity and in time spent lying down."

AnswerConclude: In both the stemplots and the numerical summaries, we observe that lean subjects spent more active time than the obese subjects. There was little difference in time spent lying down.

Points Earned: 1/1

Correct Answer: False

Your Response: False

The table below gives carbon dioxide (CO2) emissions per person for countries with population at least 20 million. A stemplot or histogram shows that the distribution is strongly skewed to the right. The United States and several other countries appear to be high outliers.

Data Set

Data Set

42. Give the five-number summary. Note that the five-number summary may vary slightly depending on the definitions used by different calculator/software applications. Therefore calculate it manually exactly as described by the procedures in the text. Match your answers

Page 21: Stat Homework 2

below. The values are given in $millions.

1.

0.1

2.

0.55

3.

0.95

4.

0.85

5.

2.50

6.

2.85

7.

3.3

8.

3.95

9. 4.60

10. 4.85

11. 7.4

12. 19.6

A.Minimum.

B.First Quartile.

C.Median.

D.Third Quartile.

E.Maximum.

Refer to Exercise 1.36 for more information.

Points Earned: 3/5

Correct Answer: A:1, B:3, C:7, D:11, E:12

Your Response: A:1, B:3, C:5, D:9, E:12

43. Does the five-number summary suggest that the distribution is right-skewed? Explain.

A. No, one cannot get any indication of a distribution's skewedness without making a stemplot or histogram.

B. No, in order to see a skew, we need the mean and standard deviation.C. Surprisingly, the numbers indicate a left skew.D. Yes, one can see that a distribution is skewed by the position of the median relative

to the quartiles. In this case the median is closer to the first quartile, indicating a right-hand skew.

D is the correct answer.A - While it is true that a plot gives more information than a numberical summary, the five-number summary contains enough information to give an indication of a distribution's center, spread and skew, as D explains.B - The mean and standard deviation give no indication of a distribution's skew. As D explains, the five-number summary does.C - Is wrong, see the explanation in D.

Points Earned: 1/1

Correct Answer: D

Your Response: D

44. Below is a stemplot of the carbon dioxide emissions distribution. It suggests that a few countries are outliers. How many countries are outliers according to the 1.5 × IQR rule?

Page 22: Stat Homework 2

A. No countries.B. 1 country.C. 2 countries.D. 3 countries.E. 4 countries.

The 1.5 × IQR rule limits for outliers are calculated as follows:First we calculate the IQR from the quartiles IQR = Q3 − Q1 = 7.05

Next we calculate the limits:Lower limit = Q1 − 1.5 × IQR = -9.825Upper limit = Q3 + 1.5 × IQR = 18.375

Only the United States falls outside these limits and therefore there is only one outlier according to the 1.5 × IQR rule. See Example 2.6 for more details.

Points Earned: 1/1

Correct Answer: C

Your Response: C

45. Do the 1.5 × IQR rule’s suggestions about which countries are and are not outliers match what you see in the stemplot?

A. Yes.B. No.

The plot shows that there are 3 outliers, Australia, Canada and the United States. On the other hand, the rule points out only the United States as an outlier.

Points Earned: 0/1

Correct Answer: B

Your Response: A

Page 23: Stat Homework 2

The table below gives the salaries of the Red Sox players as of opening day of the 2007 season.

Data Set

46. Which members of the Boston Red Sox have salaries that are suspected outliers by the 1.5 × IQR rule? Match your answers below. Make sure you calculate the quartiles as defined by the text.

1.

Is an outlier.

2.

Is not an outlier.

A. Josh Beckett

B.Curt Schilling

C.David Ortiz

The quartiles are Q1 = $424,500 and Q3 = $8,625,000. Then the 1.5 × IQR rule limits for outliers are calculated as follows:First we calculate the IQR from the quartiles IQR = Q3 − Q1 = $8,200,500

Outliers are those salaries above $20,925,750; there are no such salaries.

Points Earned: 2/3

Correct Answer: A:2, B:2, C:2

Your Response: A:1, B:2, C:2

How well have stocks done over the past generation? The Wilshire 5000 index describes the average performance of all U.S. stocks. The average is weighted by the total market value of each company's stock, so think of the index as measuring the performance of the average investor. Here are the percent returns on the Wilshire 500 index for the years 1971 to 2006:

Year Return Year Return Year Return1971 16.19 1983 22.71 1995 36.411972 17.34 1984 3.27 1996 21.561973 -18.78 1985 31.46 1997 31.481974 -27.87 1986 15.61 1998 24.311975 37.38 1987 1.75 1999 24.231976 26.77 1988 17.59 2000 -10.891977 -2.97 1989 28.53 2001 -10.971978 8.54 1990 -6.03 2002 -20.861979 24.40 1991 33.58 2003 31.641980 33.21 1992 9.02 2004 12.481981 -3.98 1993 10.67 2005 6.381982 20.43 1994 0.06 2006 15.77

Page 24: Stat Homework 2

Data Set

47. The returns on stocks vary a lot: they range from a loss of more than 27% to a gain of more than 34%. Are any of these years suspected outliers by the 1.5 × IQR rule? Match your answers below. Calculate the quartiles as defined by the text.

1.

Is an outlier.

2.

Is not an outlier.

A.1995

B.1997

C.2002

D.1974

The quartiles are Q1 = 0.905% and Q3 = 25.585%. The 1.5 × IQR rule limits for outliers are calculated as follows:First we calculate the IQR from the quartiles IQR = Q3 − Q1 = 24.68%

Next we calculate the limits:Lower limit = Q1 − 1.5 × IQR = -36.115%Upper limit = Q3 + 1.5 × IQR = 62.605 %

These limits clearly fall outside the extents of the entire distribution and therefore there are no outliers according to the 1.5 × IQR rule. See Exercise 2.44 for more details. Note that the quartiles where calculated according to the definitions in the text.

Points Earned: 2/4

Correct Answer: A:2, B:2, C:2, D:2

Your Response: A:2, B:2, C:1, D:1

Continue