Top Banner
25 VISUALIZATIONS EDUARDO ARIÑO DE LA RUBIA CHIEF DATA SCIENTIST [email protected] AN “OUT OF MY LEAGUE” PRODUCTION AND WHEN TO USE THEM
52

PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

Jan 08, 2017

Download

Data & Analytics

Plotly
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

25 VISUALIZATIONS

E D U A R D O A R I Ñ O D E L A R U B I A C H I E F D A T A S C I E N T I S T

E D U A R D O @ D O M I N O D A T A L A B . C O M

A N “ O U T O F M Y L E A G U E ” P R O D U C T I O N

A N D W H E N T O U S E T H E M

Page 2: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

PICTURE SLIDEDATA SCIENTIST

A BIT ABOUT ME

Page 3: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

{Robotics, Vision Systems

Job Shop Scheduling, Optimization/Ops,

Neural Networks, NLP

Page 4: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes
Page 5: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes
Page 6: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes
Page 7: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes
Page 8: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

GOD

Page 9: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

THE GUY GOD ASKS DATAVIZ ADVICE

Page 10: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

???

Page 11: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes
Page 12: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes
Page 13: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

A DISCLAIMERThere are many kinds of dataI am only talking about tabular data.

That is, arranged in a table or systema7c arrangement by columns,

rows, etc…

There is non-tabular data out there, like networks and trees and

whatnot. I ain’t messin’ with that. (Except maps)

C O W A R D L Y S T A T E M E N T

Page 14: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

STANDING ON THE SHOULDERS OF GIANTS IS NICE…

This presentation is based on the work of Dr. Andrew Abela’s “Extreme Presentation” method, as well as the Financial Times fantastic Chart Doctor feature. There is a lot of amazing work out there

to help you pick the right way to present your data. None of what I’m saying is my own personal research. It’s reading other smart peoples stuff and then telling you.

CITATION

Page 15: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

Product: Open/Flexible + Full-Lifecycle Support

3. Opera&onalize / Deploy

2. Experiment & HardenFaster Experimenta&on

More Collabora&on

Reproducibility &Audi&ng

Integrate models into the business

More Time for Research

AutomaVc Version Control

Environment Management

Sharing and Discussion

Publishing & DeploymentTools

Data

Code

Compute automaVon

https://app.dominodatalab.com/u/earino/plotcon2016

Page 16: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

DEVIATION

Emphasize varia7ons (+/-) from a fixed reference point. Typically

the reference point is zero but it can also be a target or a long-

term average. Can also be used to show sen7ment (posi7ve/

neutral/nega7ve).

OUR CATEGORIES

CORRELATION

Show the rela7onship between two or more variables. Be mindful

that, unless you tell them otherwise, many readers will assume the rela7onships you

show them to be causal (i.e. one causes the other).

RANKING

Use where an item’s posi7on in an ordered list is more important than its absolute or rela7ve value.

Don’t be afraid to highlight the points of interest.

DISTRIBUTION

Show values in a dataset and how oSen they occur. The shape (or ‘skew’) of a distribu7on can be a memorable way of highligh7ng

the lack of uniformity or equality in the data.

Page 17: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

CHANGE

Give emphasis to changing trends. These can be short (intra-

day) movements or extended series traversing decades or

centuries: Choosing the correct 7me period is important to

provide suitable context for the reader.

OUR CATEGORIES

COMPOSITION

Show how a single en7ty can be broken down into its component elements. If the reader’s interest

is solely in the size of the components, consider a

magnitude-type chart instead.

SPATIAL

Used only when precise loca7ons or geographical paXerns in data

are more important to the reader than anything else.

Page 18: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

1DEVIATIONEmphasize varia7ons (+/-) from a fixed reference point.

Page 19: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

Da

DEVIATIONA simple standard bar chart that can handle

both negative and positive magnitude

values.

DIVERGING BAR

Page 20: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

Da

DEVIATIONSplits a single value into

2 contrasting components (eg Male/

Female).

SPINE CHART

Page 21: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

DDEVIATION

The shaded area of these charts allows a balance to be shown – either against

a baseline or between two series.

AREA CHART

Page 22: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

2CORRELATION

Show the rela7onship between two or more variables.

Page 23: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

SCATTERPLOT

The standard way to show the rela7onship between two

con7nuous variables, each of which has its own axis.

C O R R E L A T I O N

Page 24: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

BUBBLE

Like a scaXerplot, but adds addi7onal detail by sizing the

circles according to a third variable and color to a fourth

C O R R E L A T I O N

Page 25: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

ANIMATED BUBBLE

Like a scaXerplot, but adds addi7onal detail by sizing the

circles according to a third variable and color to a fourth

and anima7on for a fiSh!

C O R R E L A T I O N

Page 26: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

HEAT MAPA good way of showing the

paXerns between 2 categories of data, less good at showing fine differences in amounts.

Ordering the entries can be quite powerful!

C O R R E L A T I O N

Page 27: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

3Use where an item’s posi7on in an ordered list is more important than its absolute or rela7ve value.

RANKING

Page 28: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

Ra

RANKINGStandard bar charts display the ranks of

values much more easily when sorted into order.

ORDERED BAR

Page 29: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

Ra

RANKINGSEE PREVIOUS SLIDE

ORDERED COLUMN

Page 30: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

Ra

RANKING

Perfect for showing how ranks have changed over

time or vary between categories.

There are many ggplot2 implementations :)

SLOPE GRAPH

Page 31: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

With great power, comes great responsibility. These can quickly become an unmanageable mess…

Page 32: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

Ra

RANKINGLollipops draw more

attention to the data value than standard bar/

column and can also show rank and value

effectively.

LOLLIPOP CHART

Page 33: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

4Show values in a dataset and how oSen they occur.

DISTRIBUTION

Page 34: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

HISTOGRAM

The standard way to show a sta7s7cal distribu7on - keep the gaps between columns

small to highlight the ‘shape’ of the data.

D I S T R I B U T I O N

Page 35: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

BOX PLOT

Summarize mul7ple distribu7ons by showing the median (centre) and range of

the data

D I S T R I B U T I O N

Page 36: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

VIOLIN PLOTSimilar to a box plot but more

effec7ve with complex distribu7ons (data that cannot

be summarized with simple average).

Also, only nerds understand it

D I S T R I B U T I O N

Page 37: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

POPULATION PYRAMID

A standard way for showing the age and sex breakdown of

a popula7on distribu7on; effec7vely, back to back

histograms.

D I S T R I B U T I O N

Page 38: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

5 Give emphasis to changing trends. These can be short (intra-day) movements or extended series

CHANGE

Page 39: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

Ca

CHANGEThe standard way to

show a changing time series. If data are

irregular, consider markers to represent data

points.

LINE CHART

Page 40: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

Ca

CHANGEUse to show the

uncertainty in future projections - usually this

grows the further forward to projection.

FAN CHART

Page 41: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

Ca

CHANGEUse with care – these are good at showing changes

to total, but seeing change in components

can be very difficult.

AREA CHART

Page 42: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

Ca

CHANGEA great way of showing

temporal patterns (daily, weekly, monthly) – at the

expense of showing precision in quantity.

CALENDAR HEAT MAP

Page 43: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

6 Show how a single en7ty can be broken down into its component elements.

COMPOSITION

Page 44: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

STACKED COLUMN

A simple way of showing part-to-whole rela7onships but can be difficult to read with more

than a few components.

C O M P O S I T I O N

Page 45: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

PIE CHART

A common way of showing part-to-whole data – but be

aware that it’s difficult to accurately compare the size of

the segments.

C O M P O S I T I O N

Page 46: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

WAFFLE

Good for showing % informa7on, they work best

when used on whole numbers and work well in mul7ple

layout form.

C O M P O S I T I O N

Page 47: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

7 Used only when precise loca7ons or geographical paXerns in data are more important to the reader than anything else.

SPATIAL

Page 48: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

Sa

SPATIAL

A great way of showing how areas have different

population sizes and different behaviors, not distorted by geographic

size.

(tilegramsR is amazing)

POPULATION TILES

Page 49: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

Sa

SPATIALKeeps the overall shape

and layout of the geography so that it’s

identifiable, yet let’s you focus on the state or

province level analysis

REGION HEX

Page 50: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

Sa

SPATIALGrid-based data values

mapped with an intensity color scale. As choropleth

map – but not snapped to an admin/political unit.

HEAT MAP

Page 51: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

AND FINALLY…

Gosh there are a lot of choices. You mean you can’t just pick whichever one is prettiest? Well, you can, it just may not communicate anything to anyone, that’s up to you. Understanding what you’re

trying to communicate, and what the key components of that communication are, makes the difference between effective and ineffective data visualization.

CONCLUSION

Page 52: PLOTCON NYC: At Least 23 Visualizations and When to Use Them in 30 Minutes

THANK YOUE D U A R D O A R I Ñ O D E L A R U B I A

C H I E F D A T A S C I E N T I S T D O M I N O D A T A L A B

P L O T L Y A N D P L O T C O N A N D A N N A !

H T T P S : // A P P . D O M I N O D A T A L A B . C O M / U / E A R I N O / P L O T C O N 2 0 1 6