1 STAT 100 - Richards Statistical Concepts and Reasoning Lecture 5 2 Today’s Statistic 15 Researchers at the Cancer Genome Project have identified all genetic mutations occurring during the lifetime of a cancer patient. They found that a mutation occurs, on average, for every 15 cigarettes smoked by a typical lung-cancer patient. Source: http://www.wellcome.ac.uk/News/Media-office/Press-releases/2009/WTX058047.htm 3 Chapter 9: Plots , Graphs, Pictures How to create good and bad displays of statistical data Darrell Huff, “How To Lie With Statistics,” 1954 Edward Tufte, “Envisioning Information,” 1990 4 Redwood Transit System (RTS) is the public bus system for Humboldt County, California.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
STAT 100 - Richards
Statistical Concepts and Reasoning
Lecture 5
2
Today’s Statistic
15
Researchers at the Cancer Genome Project have identified allgenetic mutations occurring during the lifetime of a cancerpatient. They found that a mutation occurs, on average, forevery 15 cigarettes smoked by a typical lung-cancer patient.
How to create good and bad displays of statistical data
Darrell Huff, “How To Lie With Statistics,” 1954
Edward Tufte, “Envisioning Information,” 1990
4Redwood Transit System (RTS) is the public bus system for Humboldt County, California.
5
The New York Times, March 1987
“Gotti is acquitted by a Federal jury in conspiracy case”
“The last piece of evidence requested by the jury forre-examination was a chart introduced by the defensethat showed the criminal backgrounds of sevenprosecution witnesses. It listed 69 crimes, includingmurder, drug possession and sales, and kidnapping.”
“It was a chart listing the lengthy criminal records ofseven prosecution witnesses who had obtained promisesof leniency and other favors from the Government inreturn for their testimony against Mr. Gotti . . .”
6Source: Tufte, “Envisioning Information”
7
U.S. Population and Violent Crime*
Table 9.2, p. 176
Year 1982 1983 1985 1986 1987 1988 1989 1990 1991
U.S. population 231 234 239 241 243 246 248 249 252
Which of the lines corresponds to “Binge alcohol”and which to “deaths”?
Why are states listed in alphabetical order? Whyare some states’ names are missing?
The vertical scale is the actual number of cases.California has the highest spike because it has thelarges population of all states.
A proper scale is the percentage of the state’spopulation. Always use data that reflects thestate’s population size.
12
A Congressman showed this graph to “prove” that life is getting easier for familiesin the 40-60th percentile of incomes. What is wrong with this graph?
13
Problems with the graph
The vertical axis does not start at zero; is somethingbeing hidden from us?
The grid-lines and data labels are not explained.
The horizontal scale is chosen so that the graphsuggests an enormous fall in the middle fifth’s taxes.
The red and blue bars are misleading and irrelevant.
The graph is 4-dimensional: Year, number of jobs atyear end, number of jobs created, and politicalaffiliation of the Adminstration.
Even 3-dimensional graphs are often confusing.
The data ignore confounding factors, such as: womenentering the workforce, population growth, shiftsfrom part-time to full-time jobs, changes in wages,changes in GDP.
And, just whom should get credit for job creation?
19
In general, pie charts and pictograms are bad;histograms are good.
20
Is the picture a triangle or pyramid?
The area of the bottom part of the triangle is morethan 73.80% of the total area of the triangle.
The area of the bottom part of the pyramid is farmore than 73.8% of the total volume of the pyramid.
A simple bar chart would have provided accurateinformation.
Why 73.80%? Why not approximate to 74%?
73.80% gives a (false) sense of exactness.
21
The moral of the story: Read Chapter 9 carefully.
It is an easy chapter to read.
You will become a better citizen when you ReadChapter 9 carefully.