Visualization and
Data Mining
22
Napoleon Invasion of Russia, 1812
Napoleon
33
Marley, 1885
44
55
Snow’s Cholera Map, 1855
66
Asia at night
77
South and North Korea at night
Seoul,South Korea
North KoreaNotice how darkit is
88
Visualization Role Support interactive exploration Help in result presentation Disadvantage: requires human eyes Can be misleading
99
Bad Visualization: Spreadsheet with misleading Y -axisYear Sales
1999 21102000 21052001 21202002 21212003 2124
Sales
20952100210521102115212021252130
1999 2000 2001 2002 2003
Sales
Y-Axis scale gives WRONGimpression of big change
1010
Better VisualizationYear Sales
1999 21102000 21052001 21202002 21212003 2124
Sales
0
500
1000
1500
2000
2500
3000
1999 2000 2001 2002 2003
Sales
Axis from 0 to 2000 scale gives correct impression of small change
1111
Lie Factor=14.8
(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)
1212
Lie Factor
dataineffectofsizegraphicinshowneffectofsizeFactorLie
8.14528.0833.7
18)0.185.27(
6.0)6.03.5(
Tufte requirement: 0.95<Lie Factor<1.05
(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)
1313
Tufte’s Principles of Graphical Excellence Give the viewer
the greatest number of ideas in the shortest time with the least ink in the smallest space.
Tell the truth about the data!
(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)
1414
Visualization Methods Visualizing in 1-D, 2-D and 3-D
well-known visualization methods Visualizing more dimensions
Parallel Coordinates Other ideas
1515
1-D (Univariate) Data Representations
7
5
3
1
0 20
Mean
low highMiddle 50%
Tukey box plot
Histogram
1616
2-D (Bivariate) Data Scatter plot, …
price
mileage
1717
3-D Data (projection)
price
1818
3-D image (requires 3-D blue and red glasses)
Taken by Mars Rover Spirit, Jan 2004
1919
Visualizing in 4+ Dimensions Scatterplots Parallel Coordinates Chernoff faces …
2020
Multiple Views
Give each variable its own display
A B C D E1 4 1 8 3 52 6 3 4 2 13 5 7 2 4 34 2 6 3 1 5
A B C D E
1
2
3
4
Problem: does not show correlations
2121
Scatterplot Matrix
Represent each possiblepair of variables in theirown 2-D scatterplot (car data)
Q: Useful for what? A: linear correlations (e.g. horsepower & weight)
Q: Misses what? A: multivariate effects
2222
Parallel Coordinates • Encode variables along a horizontal row• Vertical line specifies values
Dataset in a Cartesian coordinates
Same dataset in parallel coordinates
Invented by Alfred Inselberg while at IBM, 1985
2323
Example: Visualizing Iris Data
sepal length
sepal width
petal length
petal width
5.1 3.5 1.4 0.2
4.9 3 1.4 0.2
... ... ... ...
5.9 3 5.1 1.8
Iris setosa
Iris versicolor
Iris virginica
2424
Flower PartsPetal, a non-reproductive part of the flower
Sepal, a non-reproductive part of the flower
2525
Parallel Coordinates Sepal Length
5.1
sepal length
sepal width
petal length
petal width
5.1 3.5 1.4 0.2
2626
Parallel Coordinates: 2 DSepal Length
5.1
Sepal Width
3.5
sepal length
sepal width
petal length
petal width
5.1 3.5 1.4 0.2
2727
Parallel Coordinates: 4 DSepal Length
5.1
Sepal Width
Petal length
Petal Width
3.5
sepal length
sepal width
petal length
petal width
5.1 3.5 1.4 0.2
1.4 0.2
2828
5.1
3.5
1.4 0.2
Parallel Visualization of Iris data
2929
Parallel Visualization Summary Each data point is a line Similar points correspond to similar lines Lines crossing over correspond to negatively correlated attributes
Interactive exploration and clustering
Problems: order of axes, limit to ~20 dimensions
3030
Chernoff FacesEncode different variables’ values in characteristicsof human face
http://www.cs.uchicago.edu/~wiseman/chernoff/http://hesketh.com/schampeo/projects/Faces/chernoff.html
Cute applets:
3131
Interactive Face
3232
Chernoff faces, example
3333
Visualization Summary Many methods Visualization is possible in more than 3-D Aim for graphical excellence