Statistical Literacy at the Reference Desk Why you should care, and what you can do about it.
Mar 27, 2015
Statistical Literacy at the Reference DeskWhy you should care, and what you can do about it.
“Nothing exists until it is measured”. -- Niels Bohr
“Innumeracy is the mathematical equivalent of illiteracy”. -- Joel Best
What we’ll cover…
• Background and context.
• How you can recognize good, reliable, well-reported statistics.
• A chance for YOU to interpret some statistics.
What is ‘Statistical Literacy’?
STATISTICAL LITERACY, NUMERACY AND THE FUTUREPeter Holmes, Senior Consultant,
RSS Centre for Statistical Education.Nottingham Trent University, Nottingham England, 2003
“I think the whole thing started in England. Brits do startsome things. We started with a word. We had a wordthat you didn’t have. In 1959, there was a governmentreport in England that talked about the numeracy problem.… it was talking about the education of 16-year-olds sayingthat they needed to be literate. There was a literacy strand,but they also needed to be numerate. So there was anumeracy strand.
So from 1959, we have had a very good English word called numeracy."
“…There’s now “Statistical Numeracy,” “Statistical Literacy,” or “Statistical Reasoning” or“Statistical Thinking”….”
“But they’re all in the same ballpark. Theword numeracy when it was first introduced was in thecontext of the ability to use numbers in practice.”
“… particularly in the context of statistics that youmight have to read and interpret. In fact in that first use of [numeracy] in 1959, it was in terms of reading tables.
STATISTICAL LITERACY, NUMERACY AND THE FUTUREPeter Holmes, 2003
A more recent take on Statistical Literacy…
“Statistical Literacy studies the use of statistics as evidence in arguments” (Schield, Milo 1998,1999)
"A key element of statistical literacy is assembly: how the statistics are defined, selected and presented"
Schield, Milo (2004). "Information Literacy, Statistical Literacy
and Data Literacy". IASSIST Quarterly 28 (2-3): 6-11.
“Literacy matters. There is no argument about that fundamental statement. But
numeracy counts. Research in numeracy trails research in literacy by 50 years. It will
never catch up if elected leaders and politically appointed officials continue to
exclude numeracy. That means numeracy needs to count more.”
Lynda E. Colgan:
Kingston Whig-Standard,
January 18, 2006, p. 5
• Know about and how to use major statistical sources (print and electronic, national and international)
• Know about value-added commercial products that may ‘hide’ statistical details from us.
• Be critical consumers of statistics• Be familiar with and able to make informed
decisions about the use of charts, graphs, mapping, etc used in the presentation of statistics.
Summarized from:Data and Statistical Literacy for Librarians
Ann S. Gray IASSIST Quarterly, Summer/Fall 2004
Special Issue: Developing Statistical Literacy
Issue 2/3
What Librarians Need to Know:
Published 2001
Published 2004
More Damned Lies and Statistics: How Numbers Confuse Public Issues
StatisticsThe word “statistics”
• Origins in the 1600’s
• ‘Political arithmetic’ used to calculate population size & life expectancy
• A growing population was thought to reflect a healthy ‘state’ – so early number crunchers became known as ‘statists’.
• Hence, development of the term ‘statistics’…
Statistics crop up in a variety of circumstances in Libraries…
Copyright: Unshelved.com (c) Overdue Media LLC and used with permission
Contrary to Laine’s email signoff:
“Smoking is a major cause of statistics”
statistics are in fact, a major ‘cause’ of social problems.
Statistics identify and define social issues (a.k.a. problems) and provide ‘ammunition’ to those who would promote these issues.
Belief in ‘the numbers’, especially those reported by ‘experts’, typically solidifies popular conviction that a problem exists.
Statistics Create Social Problems
Issue or situation
MeasurementOpposition
PromotionAwareness
‘Official’ statistics
Polls, etc.
‘Official’ statistics
Polls, etc.
Activists, media, officials, experts, etc.
Activists, media, officials, experts, etc.
General public awareness and/or involvement
General public awareness and/or involvement
Defence of policies, interests, etc.
Defence of policies, interests, etc.
Statistics Create Social Problems
Number Laundering
Best describes three types of people when it comes to statistics: Cynical, Naïve, and Critical
Cynical – Suspicious of statistics; as consumers of statistics, not willing to give them much stock. They will often discount or ignore statistics that don’t align with their views. Worse, as producers of statistics, cynics will collect and report statistics in such a way as to support their point of view.
Derived from Best, 2001, p 162-167
Naïve – “Slightly more sophisticated than the Awestruck”; they think they understand something about statistics (but often don’t), and are basically accepting of any numbers they encounter, and accept that they mean what they appear to mean. As consumers of numbers, they are bad enough, but as producers of numbers they can be as dangerous as cynics, if not worse.
Derived from Best, 2001, p 162-167
Critical Thinkers – Not negative or hostile; thoughtful in approaching statistics. Recognize that statistics summarize complex information into relatively simple numbers and that as a consequence “some of the complexity is lost”.
Statistics are a product of choices and more specifically a compromise among choices. Given this, approaching statistics with a ‘critical’ eye is only being prudent and responsible. ‘Critical thinkers’ ask questions about statistics.
Derived from Best, 2001, p 162-167
Some Common Problems
Geographic comparisons – “there is a good chance statistics gathered from different places are based on different definitions and different measurements”.
For example, comparing US and Canadian statistics on ‘race’ is complicated by different perspectives on this issue (i.e. definitions and measurements can vary widely).
“Cult ‘X’ is the fastest growing religion in Canada”On closer examination, the cult grew from 20 to 200 members (a 1000 % increase). To match this, the Catholic Church in Canada would have to grow from 13 million to 130 million – far more than the population of Canada.
SIZE MATTERS…
Comparing groups
(derived from Best, 2001 p. 113)
Numbers vs Percentages
• “Most poor people are white”
Take, for example, a population of 700 families
600 white families, of which 60 are poor 10%
100 visible minorities, of which 20 are poor 20%
Number Percentage
In absolute numbers, more white families are poor, but…
Proportionally, more visible minority families are poor.
Mutant Statistics“Not all statistics start out bad”.
Even good numbers can be “stretched, twisted, distorted, or mangled”… generating “mutant statistics”.
Best, 2001, pp. 62 - 95
Generalizations, Transformations, & ConfusionThere are three main ways “mutant statistics” are created:
Robert Ludlum,
The Ambler Warning
2005, p. 465-466.
An Economist, Physicist, and Statistician were driving through Scotland, and they see a brown cow…
The Economist says, “Fascinating that the cows in Scotland are brown”.
The Physicist says, “I’m afraid you’re overgeneralizing from the evidence. All we know is that some cows in Scotland are brown.”
The Statistician shakes his head at both of them. “Wrong again. Completely unwarranted by the evidence. All we can infer, logically, is that there exists at least one cow in this country, at least one side of which is brown.”
Generalizations…
GeneralizationsMeasuring ALL the cases of a given social phenomenon is normally not feasible. We collect samples and generalize, but problems can arise:
Definitions
Measurements
Sampling
Best, 2001, pp. 62 - 95
Definitions – In 1996, “... news media reported on what was considered to be a rash of arson fires against black churches in the southern U.S. Amid those images were fears of raging racism.”
Statistics were suspect because of poor definitions of what was an ‘appropriate’ church fire to include in the counts.
Analysis of six years of federal, state and local data found that the number of arson cases was up, but that these increases applied to both black and white churches in roughly equal proportions.
…There was NO dramatic increase in the number of insurance claims made against church fires.
http://www.emergency.com/arsnstat.htm & Best, 2001, pp. 62 - 95
Measurements – Hate crimes statistics are gathered across many jurisdictions.
Best, 2001, pp. 62 - 95
RaceReligionSexual OrientationEthnicity/National OriginDisabilityMultiple-Bias Incidents
But, ultimately, any crime could be a hate crime. It comes down to a question of ‘motive’ – and how do you objectively and consistently measure ‘motive’?
Sampling – Bad sampling can give rise to mutant statistics. If you’re in the wrong place, or at the right place at the wrong time, your sample won’t be representative. A report on ‘racial profiling’ by Kingston Police was criticized for this.
Best, 2001, pp. 62 - 95
Calculation of thePolice Stop Rate:
Number of Stopsdivided by
Population EstimateTimes1,000
BUT…
How, when and where was this
‘mini-census’ conducted?
BUT…
How, when and where was this
‘mini-census’ conducted?
TransformationsThis form of ‘mutant statistics’ results from
transforming the meaning of a number.
Take the estimate that 6% of the 52,000 Roman Catholic Priests in the US are at some point in their adult lives sexually preoccupied with young people
Source: A former priest turned psychologist who treated disturbed clergy and derived this estimate from his observations.
transformed into 6% of priests are pedophiles.
Best, 2001, pp. 62 - 95
Best, 2001, pp. 62 - 95
Transformations:
1. People forgot that it was an estimate and treated it as fact.
2. The original ‘sample’ was drawn from priests who sought psychological help (hence a biased sample) and generalized to all priests.
3. People turned “Sexual preoccupation” into actual behaviour.
4. “Young people” were morphed into “children” – bringing the word ‘pedophile’ into the mix.
Confusion“Garbling complex statistics”Wendy Watkins of Carleton University provided an example:
Two polling companies, Decima and Compass, surveyed Canadians regarding Harper’s policy on the Middle East.
Decima – 30 % approval of policyStatistic based on a single question: “What do you think about Harper’s Middle East policy?”
Compass – 60% approval of policyStatistic based on an amalgam of responses to several questions – Israel’s right to defend itself…
Syria flouting UN sanctions…Iran flouting UN sanctions… etc.
Compass Survey sponsored by a ‘right-leaning’ Think Tank
“This kind of statistics is about as valid as the one that argues that the average Canadian has
one testicle”
• Now, over to Suzette…
How can you recognise good, reliable, well reported statistics?
A critical viewLook at:• Who collected the data (source)• Why were they collected• How were they collected• What was counted• When the data were collected• How were the data processed after collection
(added up, averaged, grouped etc.)• How are the data being presented.• Always read the footnotes!
Who? - Formal Organizations
• Statistics Canada (National statistical agency)• United Nations Statistics Division (national
statistics)• OECD (NGO)• Provincial and Municipal governments
– Ontario– City of Toronto
• Societies and Associations: – Cancer Society; Amnesty International etc.
Sources• Companies:
– Sears Canada; Ford etc.
• Consumer advocacy groups:– International Coffee Organization– Dairy Farmers of Canada
• Publications (print and electronic)
– Annual reports from companies and societies– Journal articles, print and electronic– Newspapers, print and electronic, such as Toronto
Star, Globe and Mail– Commercial databases such as Datastream
Sources – Media etc.• Media
– Magazines range from National Enquirer to Chatelaine, MacLean’s to the Economist
– Newsfeeds - Reuters to more dubious ones
• Informal Organizations– Wikipedia – variable content– User groups – again a range from
professional ones to casual ones– Blogs, Chatrooms
Good or Quality statistics
• If the figures are from a “reputable” source then usually considered “good”
• But still consider the “Why?” Especially for companies, opinion polls, consumer organizations, advocacy organizations such as Greenpeace, United Way etc.
• Can get question bias
• Can get sample bias
Why were the data collected?
Why were the data collected?
• Government planning at all levels
• Political reasons (good, bad or neutral)
• Academic research
• Commercial reasons (company finances, resellers of data, media, etc.)
• Baseline data (environment, health)
• Advocacy organizations (Greenpeace, Amnesty International, Cancer Society)
How were the data collected?
How were the data collected?
• Census and Statistics Canada surveys: can be considered a “gold standard”
• Academic research
• Companies, product associations
• Media
How - Newspapers, Magazines• MacLeans University issue
– “Now in its 16th year, the annual MacLean's rankings assess Canadian universities on a diverse range of factors “
– “From its inception, Maclean’s has consulted with academic experts about the design, composition and methodology of the rankings.”
– Universities boycotting it now
• Globe and Mail University survey– students register themselves therefore self selections– More than 32,700 students answered over 100 questions– “Our assessment has spread to 49 schools -- up from 37 “
• Toronto Life surveys– Talk to 100 pedestrians about a topic
What is being Counted?
What is being counted?Need to be aware of definitions so you can get
comparable data over time and place• If it is a number what does that number represent:
– a person, a household, a family?– Total, single or multiple responses?– income or earnings?– a weight, kilograms or pounds?– a currency, Can$ or U.S.$– Is it a percentage?– Is it in “millions” or does the table have a ‘000
sign?
What is the unit of measurement?
• Is it a rate e.g. Unemployment rate?
• Is it indexed e.g. Consumer price index?– What is the base date– Has the “basket of goods” changed
• Is it seasonally adjusted?
• Are classifications comparable: – NAICS 2000 vs. SIC 1980, definition of pet
food may have changed– Concordances exist
Household internetuse at homeby internet activity
What is being measured?
Internet use by individualsby type of activity
What is the unit of measurement - Geography
• Make sure that if data are from different tables or sources that they are for the same geographic area– North America vs. U.S.A.– Maritimes vs. Atlantic Canada– City of Toronto 1998 and before vs. City of Toronto
after amalgamation. In the late 1990’s many municipalities amalgamated
– Prior to 1949 Newfoundland was not part of Canada– Nunavut included in the Northwest Territories prior
to1999
Date of the Data!
• Data are often several years old before publication
• There should always be a date that tells you what time period the data are for and the unit of time – monthly, quarterly, annual etc.
• Census data – the income information is always for the previous year so the 2006 census will give income for 2005
Presentation of the data
• Often crucial for the awareness of the value of statistics
• Can be in the form of :
• Text
• Tables
• Graphs and charts
• Maps
Text: Mackenzie InvestmentsBurn Rate (RRSP season)
Text: MacKenzie InvestmentsBurn Rate
Table: $ thousands
http://www40.statcan.ca/l01/cst01/comm02b.htm
Table:Weight
and Footnote
http://www.ico.org/prices/m1-a.htm
Graph:Exaggerated
VerticalScale
Map: Change in the variable displayed can make a significant difference to impact the map makes on the user
Average income
Median income
HELP!
• See Bibliography
• See Statistics Canada website
Statistics Canada Resources
http://www.statcan.ca/english/edu/power/about/about2.htm
Statistics Canada Resources
http://www.statcan.ca/english/freepub/11-533-XIE/11-533-XIE2005001.htm
Discussion Points
• What are the responsibilities of reference desk staff in evaluating statistics and educating users?– Do we review the stats with the user when we
direct the user to them or is caveat emptor?– Should we direct users to a website or a
handout that talks about how to recognize “good” statistics
Discussion points
• What are the chances of people actually reading the necessary information?
• Does our responsibility vary with the type of library we work in?– School– Public– Post secondary
Statistics Canada Resources
http://www.statcan.ca/english/edu/power/toc/contents.htm
Statistics Canada Resources
http://www.statcan.ca/english/concepts/index.htm
Statistics Canada Resources
http://www.statcan.ca/english/freepub/11-533-XIE/2005001/using/reading.htm
Lies, Damn Lies and Statistics! (attributed to Disreali 1804-1881)
Scepticism about statistics has been around for a long time – need to be a critical thinker!
What should we look at to get some idea of the validity and reliability of the statistics we or our user have
found?
Sources (Who) (adapted from Rice, 2006)
Formal Organizat.
Publications Media Informal Organizat.
Individuals
National Govt.
Books T.V. Special Interest
Statisticians
Local Govt Journal Art. Magazines E-Mail Experts
Universities Reports Radio User groups Teachers
Companies Newspapers Newsfeeds Chatroom Colleagues
Non-Govt
Organizat.
Commercial
websites
Open Repositories
Web Pages (Wikipedia)
Librarians
Societies Opinion Polls Blogs Family
How were the data collected?
• Census and Statistics Canada surveys– Usually a lengthy user guide that gives you details of
the methodology http://www.statcan.ca– Structured questionnaire with carefully phrased
questions e.g. Census form – Selected sample – who were selected and why, which
populations were over or under sampled e.g. some native communities “opt” out of the census
– How and when it was carried out – personal interview, telephone survey, web survey. What the follow-up was to get responses from missed respondents.
How were the data collected?
• Academic research– Usually can get methodology from researcher– May be mentioned in book or article– May be web-link to method and data
• Companies, product associations– May be somewhere on the website e.g.
http://www.ico.org – May not give much detail
• Media often only give “source” and no details e.g. Statistics Canada
Internet use by individualsby type of activity
Reading tables 101
Laine Ruus <[email protected]>
University of Toronto Data Library Service
2007/02/02
OLA Super Conference 2007<http://www.chass.utoronto.ca/datalib/misc/ola07_stat_literacy.ppt>
Take a table, one that Statistics Canada publishes like this:
Source: STC cat no. 71-001-XIE200612
We can now make part of the table look like…
Full vs part-time employment by gender, Canada, 2005
…this (note, it’s a different date, and therefore different numbers from the previous slide):
Source: Labour force historical review: table cd1t15an. [computer file] 2006 ed.
And compute some percentages to make it look like…
Full vs part-time employment by gender, Canada, 2005
More males work full-time than part-time: True/FalseMore females work full-time than part-time: True/FalseThree times as many women as men work part-time: True/FalseWomen are three times more likely to work part-time than men: True/False
Source: Labour force historical review: table cd1t15an. [computer file] 2006 ed.
…this:
Full vs part-time employment by gender, Canada, 2005
Of those who work full-time, 2/3 are men: True/FalseOf those who work part-time, 2/3 are women: True/FalseAlmost twice as many women work part-time as full-time: True/False
100%
100%
Source: Labour force historical review: table cd1t15an. [computer file] 2006 ed.
…but the table behind the numbers is…
Source: Labour force historical review: table cd1t15an. [computer file] 2006 ed.
Do you agree with this Toronto Star reporter?
Source: Toronto Star, Dec. 9, 2006
Now for a slightly more complex table:
Source: Labour force historical review: table cd1t15an. [computer file] 2006 edLess than 15% of males who work full time are over 55: True/FalseOf males who work part time, the largest number are youth: True/FalseFewer women 25-54 work part-time than full-time: True/False
Same table – but where’s the 100% now?
Source: Labour force historical review: table cd1t15an. [computer file] 2006 ed
Twice as many young women as young men work part-time: True/FalseTwice as many women as men over 65 work part-time: True/FalseWomen over 65 are twice as likely to work part-time as men: True/FalseMost of the men who work part time are under 24 or over 65: True/False
And here’s what the table values/counts are:
Source: Labour force historical review: table cd1t15an. [computer file] 2006 ed
In this table, where’s the 100% total?
Lesson 1:
• Can compare sizes of percentages and rates only within the row/column in which they have been computed (ie add up to 100%)
• Between rows/columns, can only compare relative proportions or likelihoods, or counts.
Source: Census of Canada, 2001: legal marital status, age groups, and sex for population (Topic based tabulations; 97f0004xcb2001001)
Source: Census of Canada, 2001: legal marital status, common-law status, age groups, sex and household living arrangements for population 15 years and over (Topic based tabulations; 97f0004xcb2001040)
Why are these two numbers so different?
Which one is correct?
Lesson 2: make sure you can identify what’s in the denominator as well as what’s in the
numerator!
Here’s what the academic called the table
Source: Chappell, N. et al./ Aging in contemporary Canada. Toronto: Prentice Hall, 2003. Page 131.
And this is what the original Statistics Canada publication called the same table:
Same table, different titles. Which one would you use?
Source: Women in Canada. STC cat no. 89-503, pl. 116
Employment rate and participation rate are not the same thing:
• participation rate = ((labour force) *100
(total population 15 and over) • employment rate = ((employed labour force) *100
(total population 15 and over)
Source: Labour force historical review 1999 ed.: table tab01an.ivt.
This is the original table from the Labour force historicalReview cd-rom
participation rate = (labour force / total population 5 and over) * 100)
Lesson 3: whenever possible, go back to the original data collector.