COMMUNICATING DATA USING GRAPHICS MIS2502 Data Analytics.

Post on 04-Jan-2016

217 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

Transcript

COMMUNICATING DATA USING GRAPHICSMIS2502

Data Analytics

What makes a good chart?

Minard’s map of Napoleon’s campaign into Russia, 1869Reprinted in Tufte (2009), p. 41

What makes a good chart?

http://www.popvssoda.com/countystats/total-county.html

What makes a good chart?

Zhang et al. (2010), “A case study of micro-blogging in the enterprise: use, value, and related issues,” Proceedings of the 28th International Conference on Human Factors in Computing Systems.

This is from an academic conference

paper.

What are the problems with

this chart?

Some basic principles (adapted from Tufte 2009)

Tufte’s fundamental principle:Above all else show the data

Principle 1: The chart should tell a story

Examples?http://www.evl.uic.edu/aej/491/week03.html

http://flowingdata.com/2009/11/26/fox-news-makes-the-best-pie-chart-ever/

http://www.nejm.org/doi/pdf/10.1056/NEJMon1211064

http://www.ngoilgas.com/news/oil-spill-latest-the-cost-of-clumsiness/

Principle 2: The chart should have graphical integrity• Basically, it should not “lie” or mislead the reader.

Tufte’s “Lie Factor”

Should be ~ 1

< 1 = understated effect

> 1 = exaggerated effect

• Lie Factor = Graphical (Drawn) Difference / Actual Differences

Examples of the “lie factor”

Reprinted from Tufte (2009), p. 57 & p. 62

A more recent, basic example

http://20bits.com/articles/politics-and-tuftes-lie-factor/

The original graphic from Real Clear Politics, 2008.

(Look at the y-axis)The adjusted graphic.

Other tips to avoid “lying”

80

90

100

110

120

130

140

2003 2004 2005 2006 2007 2008 2009 2010Year

Hypothetical Industries, Inc.

Revenue

Adjusted Revenue

350

360

370

380

390

400

410

2009 2010

Theft

s pe

r 100

000

citiz

ens

Hypothetical City Crime

25

75

125

175

225

275

325

375

425

2003 2004 2005 2006 2007 2008 2009 2010Th

efts

per 1

0000

0 ci

tizen

s

Hypothetical City Crime

vs.

Avoid an Implied Comparison of Incomparable Values

Other tips to avoid “lying” or misleading

Principle 3: The chart should minimize graphical complexity

Generally, the simpler the better…

When a table is better than a chart• For a few data points, a table can do just as well…

$0.00

$50,000.00

$100,000.00

$150,000.00

$200,000.00

$250,000.00

Total Sales by SalespersonSalesperson Total Sales

Peacock $225,763.68

Leverling $201,196.27

Davolio $182,500.09

Fuller $162,503.78

Callahan $123,032.67

King $116,962.99

Dodsworth $75,048.04

Suyama $72,527.63

Buchanan $68,792.25

The table carries more information in less space and is more precise.

The Ultimate Table: The Box Score

• Large amount of information in a very small space

• So why does this work?• Depends on the

reader’s knowledge of the data

The Business Box Score?

• Applying the same concept to our salesforce example.

• How does this help? How could it hurt?

Sales Performance – March 2011

Salesperson TS WD BD NC DOR

Peacock 225 3 40 20 28

Leverling 201 2 45 18 27

Davolio 182 5 38 22 28

Fuller 162 2 22 16 20

Callahan 123 1 15 14 15

King 116 0.5 20 12 18

Dodsworth 75 0.3 12 10 20

Suyama 72 0 8 10 8

Buchanan 68 0 8 8 12

Key:TS – total salesWD – worst dayBD – best dayNC – number of customersDOR – days on the road

Data Ink

Should be ~ 1

< 1 = more non-data related ink in graphic

= 1 implies all ink devoted to data

Tufte’s principle:Erase ink whenever possible

Being conscious of data ink

25

75

125

175

225

275

325

375

425

2003 2004 2005 2006 2007 2008 2009 2010

Theft

s pe

r 100

000

citiz

ens

Hypothetical City Crime

25

75

125

175

225

275

325

375

425

2003 2004 2005 2006 2007 2008 2009 2010

Theft

s pe

r 100

000

citiz

ens

Hypothetical City Crime

200

270

320 330

370350

400370

2003 2004 2005 2006 2007 2008 2009 2010

Hypothetical City Crime

Lower data-ink ratio(worse)

Higher data-ink ratio(better)

What makes a good chart?

020000400006000080000

100000120000140000160000

2011 Total Sales

Order Date

Sum of Extended Price

020000400006000080000

100000120000140000160000

2011 Total Sales

Order Date

Sum of Extended Price

Sometimes it’s really a matter of

preference.

These both minimize data

ink.

Why isn’t a table better here?

3-D Charts

$0.00

$50,000.00

$100,000.00

$150,000.00

$200,000.00

$250,000.00

Total Sales by Salesperson

Evaluate this from a data-ink perspective.How does it affect the clarity of the chart?

Chartjunk: Data Ink “gone wild”

25

75

125

175

225

275

325

375

425

2003 2004 2005 2006 2007 2008 2009 2010

Theft

s pe

r 100

000

citiz

ens

Hypothetical City Crime

Example: Moiré effects (Tufte 2009)

$0.00

$50,000.00

$100,000.00

$150,000.00

$200,000.00

$250,000.00

Total Sales by Salesperson

Example: The Grid

25

75

125

175

225

275

325

375

425

2003 2004 2005 2006 2007 2008 2009 2010

Theft

s pe

r 100

000

citiz

ens

Hypothetical City Crime

Why are these examples of chartjunk?

What could you do to

remedy it?

Data Ink Working Against Us

Evaluate this chart in terms of Data Ink.

Are there better

visualizations?

Data Ink Working For Us

Evaluate this chart in terms of Data Ink.

Imagine this as a bar chart.

As a table!!

Stacked Bar Charts are Often Trouble

• Original chart from the BBC website

• Why is this so difficult to read?

• What would be a better way to visualize it?

http://j-walkblog.com/index.php?/weblog/posts/bad_charts/

Avoid Multi-Axis Graphs

top related