How to Lie with Statistics Edward H. Freeman. Statisculation misinforming people by the use of statistical material. The better you know the subject,

Post on 27-Dec-2015

215 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

Transcript

How to Lie with StatisticsEdward H. Freeman

Statisculation misinforming people by the use of statistical material.

The better you know the subject, the less likely you are to misuse it by mistake, or be taken in by those who misuse statistics on purpose.

Lying With Statistics

• He uses statistics as a drunken man uses lampposts – for support rather than for illumination. Andrew Lang

Definition of Statistics: The science of producing unreliable facts from reliable figures. We reach conclusions from our statistics – often incorrectly.

How To Lie With Statistics

• Written in 1954 by Darrell Huff (1913-2001), editor of Better Homes and Gardens.

• Huff had no formal training in statistics.• Over 1,500,000 copies sold in English.• Breezy, highly-readable informal style copied by

Dummies books.• 152 pages – lots of pictures.

I couldn’t find aphotograph of Darrell Huff anywhere on the Internet.

Other books by Darrell Huff• The Complete How To Figure It: Using Math in Everyday Life• How to Take a Chance • How to work with concrete and masonry• Score: The strategy of taking tests• Complete Book of Home Improvement • How to figure the odds on everything • Pictures by Pete, a career story of a young commercial photo

grapher,…

• Twenty careers of tomorrow• Woodpulp

and Ink: The less reputable newsstand magazines, 1919-1939

• How to save on the home you want

Mark Twain’s Definition of a Classic

Something that everybody wants to have read and nobody wants to read.

Popular Books on Technical Subjects

• The Universe and Doctor Einstein by Lincoln Barnett – “among the clearest, most readable expositions of relativity theory.”

• Mathematics for the Nonmathematician by Morris Kline – “entertaining overview follows development of mathematics from ancient Greeks to present.”

• A Brief History of Time by Stephen Hawking – “easy, good-natured humor and an ability to

illustrate highly complex propositions with analogies plucked from daily life.”

Chapter 5The Gee Whiz Graph

West Hartford Real Estate Sales 2007

Coldwell 254Raveis 187

Prudential 173Re/Max 107

Let’s Make a Graph!!

Coldwell Raveis Prudential Re/Max0

50

100

150

200

250

300

Coldwell 254Raveis 187

Prudential 173Re/Max 107

Coldwell Raveis Prudential Re/Max100

150

200

250

Year After Year Your #1 Team in West Hart-ford Real Estate!

Sales

Coldwell 254Raveis 187

Prudential 173Re/Max 107

Let’s make another graph!

Chop off the bottomand stretch the top.

Nobody is Selling Anything!

Coldwell Raveis Pruden-tial

Re/Max0

2000

4000

6000

8000

10000

Coldwell 254Raveis 187

Prudential 173Re/Max 107

Squeeze everything to the bottom.

Points to Ponder

• Is the data true and accurate? The Excel Graphing Fallacy.

• Are there other factors? – Prudental specializes in McMansions. – There is another agency, not on the chart, that sold 1000

houses last year.

• Do a seller care how many houses a realtor sells or that she sells his house?

DATA DISTORTION

% of Doctors Who Are Family Practitioners

Ratio to Population

Number of Doctors

1964 27.0% 1:2247 8023

1975 16.9% 1:3157 6064

1990 12.0% 1:4232 5212

UNREADABLE CHARTS(but don’t it look nice!?)

The Pentagon Spaghetti Slide

The Global Warming Chart

The Well-ChosenAverage

Chapter 2 – The Well-Chosen AverageThree Types of Averages:

• Mean – The traditional average ($5,700)• Median – The one in the middle. 12 make more

and 12 make less. ($3,000)• Mode – The salary that occurs most often. ($1,000)

Salary Employees

$45,000 1

$15,000 1

$10,000 2

$5,700 1 (Mean)

$5,000 3

$3,700 4

$3,000 1 (Median)

$2,000 12 (Mode)

Mary Ann or Ginger?

Probably not

Probably not:

Smoking is one of the leading causes of statistics. Fletcher Knebel

An Explanation• The question, asked in a 1976 market research

survey, was whether dentists would recommend sugared gum, sugarless gum, or no gum at all to their patients who chew gum.

• Out of about 1,200 dentists, 85% recommended sugarless gum, with the rest pretty much going to “no gum at all.”

• There is no hard evidence any dentist was in favor of sugared gum.

http://www.bookofodds.com/Daily-Life-Activities/Articles/A0471-No-Gum-at-All-1-in-10

An Explanation (Continued)

• Compare with: “Four out of five oncologists recommend low tar cigarettes for their patients who smoke.”

• Would any dentist say that their patients should chew gum – sugarless or not?

Two more quotes about statistics

• The theory of probabilities is at bottom nothing but common sense reduced to calculus. Laplace

• There are two kinds of statistics, the kind you look up and the kind you make up. Rex Stout

Sample with the Built-In Bias• Time Magazine (circa 1950) “The average

Yalesman, Class of 1924, makes $25,111 a year.”• Four categories of alumni

– Those who responded– Those who did not reply– Those whose addresses are unknown– Those who are dead

Those Who Responded• Did they tell the truth? Will one multimillionaire skew

the average? Outliers• If they lied:

• Did they exaggerate? (to impress their fellow graduates)• Did they underreport? (to avoid problems with the IRS)• Do the liars balance each other out?• Do we know?

Those who didn’t replyThose whose addresses are unknown

• Fact: Well known alumni are easy to locate.• Who did not reply to the survey?– Low Income Earners – “clerks, mechanics, tramps,

unemployed alcoholics, barely surviving writers and artists…people of whom it would take half a dozen or more to add up to an income of $25,111.”

– Tax Cheats - Those who don’t want anybody (the IRS) to know their income.

– Private People -Those who don’t consider their income anybody’s business.

Chapter 10 – How to Talk Back to a Statistic

• Who Says So?– The OK Name. Freeman Institute for

Advanced Statistical Control and Organization. (FIASCO)

– The PowerPoint/Excel Syndrome– “When an OK name is cited (i.e. the Harvard

Institute), make sure that the authority stands behind the information, not merely somewhere alongside it.”

Some More Quotations

• Statistics show that many people watch our show from the bedroom and people you ask into your bedroom have to be more interesting than those you ask into your living room.

Jack Paar, late night host

• Then there is the man who drowned crossing a stream with an average depth of six inches.  W.I.E. Gates

Who Says So? (continued)

– Chicago Journal of Commerce• Sent out a survey to 1200 corporations asking about

price gouging and hoarding during the Korean War.• 169 responded (14%)

– 9% said they had not raised prices– 5% said they had raised prices– 86% didn’t answer at all.

• “The survey shows that corporations have done exactly the opposite of what the enemies of the American business have charged.” (Emphasis mine)

Who Says So? (Continued)

• Whom did they ask?• Who responded?• What did they say?• What did you expect them to say?• Can they be taken at their word?Say you were standing with one foot in the oven and one foot in an ice bucket. According to the percentage people, you should be perfectly comfortable. Bobby Bragan

How Does He Know?

• Is the sample accurate?• Is the sample big enough to represent the

entire population?• Are the people in this room a fair

representation of all voters in Connecticut? (Age, Race, Gender, etc)

• Names Taken Out of the Telephone Book.

What’s Missing?

• 33% of the first class of women admitted to Johns Hopkins University married faculty members.

• Look Magazine – “A survey of 2800 mothers shows that over half of the mothers of children born with Down’s syndrome were over 35 years old.”

Torture numbers, and they'll confess to anything. Gregg Easterbrook

What’s Missing (Examples)

• You are three times more likely to be hit by lightning than you are to be attacked by a shark.

• “This is the first time I have ever seen you sober.• April Retail Sales were higher this year than last

year.

Another Quotation

• USA Today has come out with a new survey – apparently, three out of four people make up 75% of the population. David Letterman

Did Somebody Change the Subject?

• Census Department – Half a million more farms in 1930 than in 1935. Definition of a farm was revised in 1932.

• 1950 Census – More people 65 – 70 than were 55 – 60 in the 1940 census.– Not explained by immigration.– Social Security– Vanity

Did Somebody Change the Subject? (Continued)

• “We could take a prisoner from Alcatraz and board him at the Waldorf-Astoria cheaper.” Senator William Langer (R – ND)

• Later went to prison himself.• Comparing cost of hotel room to total maintenance of a

prisoner. (Food, Security)

Senator William Langer (R-ND)1886 - 1959

How is the Question Phrased?

• Building in bias. Bias can be built into a questionnaire by little more than careless wording.

• Compare– Should the government help people who face losing their homes

to foreclosure?

with

– Should you be forced to pay more taxes to help people make the payments on their houses?

Both accurately describe what will happen.

Correlation vs. Causation

• Post Hoc– Ice cream sales go up during the summer.– Homicides increase during the summer.– Therefore, ice cream leads to murder or murder

leads to ice cream.– Correlation is not causation. Perhaps neither of

these things has produced the other, but both are a product of some third factor. (It’s hot). Be careful when somebody says that A leads to B.

– Mudders, Tampa Bay and UConn Basketball

Non Representative Sample

• Practically all statistics are based on a sample of a population. So…... – how was the sample chosen? – how big is the sample? – what population does it claim to represent? – what population does it actually represent?– The Self-Selecting Sample

Other ways to Misuse Statistics

• Overgeneralizing. Example: Studying only men, and then generalizing conclusions to both men and women.

• Interpreting probability as certainty. Example: – Finding that women are more likely than men to

favor strict gun control only means that women have a higher probability of favoring strict gun control than men.

– It does not mean that all women favor strict gun control and all men do not favor it.

Other ways to Misuse Statistics

• Faking data. A famous instance of this occurred in a study of separated identical twins.– The researcher wished to show that despite

separation, twins remained similar in certain traits.

– It was later shown that the data were fabricated.– Lies, Lies, I can’t believe a word you say. The

Knickerbockers, 1966• Using data selectively. Sometimes a survey includes

many questions, but the researcher reports on only a few of the answers.

Data Precision

Quoting specific numbers, especially including decimals points, can look authoritative. "Real estate values up 4.95%" Why would someone be so precise if they didn't know their stuff? The numbers can be wild guesses, but accuracy gives an air of authority.

Samples that went wrong (Part I)1936 Presidential Election

• FDR v. Alfred Landon –Republican Governor of Kansas

• Literary Digest – Survey of 2,500,000 voters concludes that Landon would win in a landslide

• Survey consisted of– Subscribers– Readily Available Lists

• Automobile Owners• Telephones.

• Folded in 1938

President Alfred Landon(1887 – 1987)

Samples that went wrong (Part II) 1948 Presidential Election

• Virtually every poll indicated that incumbent President Harry S. Truman would be defeated by New York Governor Thomas E. Dewey. Truman won, overcoming a three-way split in his own party.

Six Questions to Ask About Any Statistic

1. Who Created it? Do They Have an Agenda? (Republican Pollster)

2. Why was it created? For Research or to persuade?

3. How was it Created? What Methodology was used?

4. What is missing? Is there some hidden context?

5. Is it relevant? Does it tend to mislead the reader?

6. Does it make sense? If it sounds ridiculous, it probably is.

How to Avoid Being Sucked In

• Be suspicious of any data that does not identify the number of cases sampled or does not provide the probable error.

• Be skeptical of the conclusions reached.• Are they playing with your emotions?

How to Avoid Being Sucked In

• Compared to what? – Associated Press: Almost a third (29%) of all

deaths among nuclear workers aged 44 to 65 were linked to cancer.

– An independent party observed that 35% of all deaths of those between 44 and 65 years of age are attributable to cancer; therefore, the workers died from cancer at a lower rate than others.

– AP numbers don’t prove anything, one way or the other.

Conclusions

• Statistics are commonly used to support a biased position or an outright fabrication for two reasons.– Few people understand statistics well enough to

question them.– Lying with statistics requires no actual lying. If the

most favorable data is highlighted and the most unfavorable data is suppressed, statistics can be manipulated to illustrate just about any point of view, allowing the manipulator’s hands to remain unsullied.

Finally

• Statistics are no substitute for judgment.Henry Clay

Questions or Comments?

top related