Top Banner
Information Information Visualization in Data Visualization in Data Mining Mining S.T. Balke S.T. Balke Department of Chemical Department of Chemical Engineering and Applied Engineering and Applied Chemistry Chemistry University of Toronto University of Toronto
24

Information Visualization in Data Mining

Feb 18, 2016

Download

Documents

Sudeshna Sen

Information Visualization in Data Mining. S.T. Balke Department of Chemical Engineering and Applied Chemistry University of Toronto. Motivation. Data visualization relies primarily on human cognition for value discovery; - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Information Visualization in Data Mining

Information Information Visualization in Data Visualization in Data MiningMiningS.T. BalkeS.T. BalkeDepartment of Chemical Department of Chemical Engineering and Applied Engineering and Applied ChemistryChemistryUniversity of TorontoUniversity of Toronto

Page 2: Information Visualization in Data Mining

MotivationMotivation Data visualization Data visualization

– relies primarily on human cognition for relies primarily on human cognition for value discovery;value discovery;

– permits direct incorporation of human permits direct incorporation of human ingenuity and analytic capabilities into data ingenuity and analytic capabilities into data mining;mining;

– can very effectively deal with very large can very effectively deal with very large quantities of data;quantities of data;

– powerfully combines with machine-based powerfully combines with machine-based discovery techniques.discovery techniques.

Page 3: Information Visualization in Data Mining

UsesUses Explorative AnalysisExplorative Analysis

– Data cleaningData cleaning– Provide hypothesesProvide hypotheses

Confirmative AnalysisConfirmative Analysis– Confirm or reject hypothesesConfirm or reject hypotheses

PresentationPresentation– Communicate your workCommunicate your work

Page 4: Information Visualization in Data Mining

http://www.alz.washington.edu/DATA2001/GERALD1/sld011.htm

Page 5: Information Visualization in Data Mining

Calculated Properties Calculated Properties of the Anscombe Data of the Anscombe Data SetsSets

mean of the x values = 9.0 mean of the y values = 7.5 equation of the least-squared regression line is: y = 3 + 0.5x sums of squared errors (about the mean) = 110.0

Page 6: Information Visualization in Data Mining

Calculated Properties Calculated Properties of the Anscombe Data of the Anscombe Data SetsSets

regression sums of squared errors (variance accounted for by x) = 27.5 residual sums of squared errors (about the regression line) = 13.75 correlation coefficient = 0.82 coefficient of determination = 0.67

Page 7: Information Visualization in Data Mining

The Anscombe DataThe Anscombe Data

Page 8: Information Visualization in Data Mining

Marley, 1885

Page 9: Information Visualization in Data Mining

Snow’s Cholera Map, 1855

Page 10: Information Visualization in Data Mining

http://pupgg.princeton.edu/disk20/anonymous/groth/lick/licknorth.gif

Page 11: Information Visualization in Data Mining

Graphical ExcellenceGraphical Excellence

Graphical displays should:Graphical displays should: show the datashow the data induce the viewer to think about the substance, not induce the viewer to think about the substance, not

the methodologythe methodology avoid distorting what the data saysavoid distorting what the data says present many numbers in a small spacepresent many numbers in a small space make large data sets coherentmake large data sets coherent encourage the eye to compare different pieces of dataencourage the eye to compare different pieces of data reveal the data at several levels of detail (broad reveal the data at several levels of detail (broad

overview to fine structure)overview to fine structure) serve a reasonably clear purpose: description, serve a reasonably clear purpose: description,

exploration, tabulation, or decorationexploration, tabulation, or decoration be closely integrated with the statistical and verbal be closely integrated with the statistical and verbal

descriptions of the data set.descriptions of the data set.

(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

Page 12: Information Visualization in Data Mining

Graphical ExcellenceGraphical Excellence

Gives the viewer the greatest Gives the viewer the greatest number of ideas in the shortest number of ideas in the shortest time with the least ink in the time with the least ink in the smallest space.smallest space.

Nearly always multivariate.Nearly always multivariate. Requires telling the truth about Requires telling the truth about

the data.the data.(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

Page 13: Information Visualization in Data Mining

Lie Factor=14.8

(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

Page 14: Information Visualization in Data Mining

Lie FactorLie Factor

dataineffectofsizegraphicinshowneffectofsizeFactorLie

8.14

6.0100)6.03.5(

18100)0.185.27(

FactorLie

Require: 0.95<Lie Factor<1.05

(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

Page 15: Information Visualization in Data Mining

Using Area for One Using Area for One Dimensional DataDimensional Data

Lie Factor=2.8

(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

Page 16: Information Visualization in Data Mining

More guidelines:More guidelines: The number of information-

carrying (variable) dimensions depicted should not exceed the number of dimensions in the data.

No legends: use labels on graph Graphics must not quote data out

of context.(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

Page 17: Information Visualization in Data Mining

Data Ink RatioData Ink Ratio

graphictheprtousedinktotalinkdataRatioinkDataint

Data ink Ratio = proportion of a graphic’s ink devoted to the

non-redundant display of data-information.

Data ink Ratio=1.0-(proportion of a graphic that can be erasedwithout loss of data-information)

(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

Page 18: Information Visualization in Data Mining

Maximize Data DensityMaximize Data Density

graphicdataofareamatrixdatatheinentriesofnumbergraphicaofdensitydata

(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

Page 19: Information Visualization in Data Mining

Beware ChartjunkBeware Chartjunk

NO

“Isn’t it remarkable that the computer can be programmedto draw like that.”

YES:

“My, what interesting data!”

(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

Page 20: Information Visualization in Data Mining

How to Say Nothing with How to Say Nothing with Information Visualization Information Visualization http://www.crs4.it/~zip/13ways.htmlhttp://www.crs4.it/~zip/13ways.html Never include a color legend.Never include a color legend. Avoid annotation.Avoid annotation. Never mention error characteristics of the Never mention error characteristics of the

visualization method.visualization method. When in doubt, smooth.When in doubt, smooth. Don’t say how long it required to plot.Don’t say how long it required to plot. Never compare your results with other data Never compare your results with other data

visualization techniques.visualization techniques. Never cite references for the data.Never cite references for the data. Claim generality but show results from a single Claim generality but show results from a single

data set.data set. Use viewing angle to hide blemishes in 3D Use viewing angle to hide blemishes in 3D

objects.objects.

Page 21: Information Visualization in Data Mining

An Overview of An Overview of Information Information Visualization MethodsVisualization Methodshttp://www.informatik.uni-http://www.informatik.uni-halle.de/~keim/tutorials.htmlhalle.de/~keim/tutorials.html

Page 22: Information Visualization in Data Mining

Methods of InterestMethods of Interest Scatterplot MatricesScatterplot Matrices Parallel CoordinatesParallel Coordinates Pixel Oriented MethodsPixel Oriented Methods Icon based MethodsIcon based Methods Dimensional StackingDimensional Stacking TreemapTreemap

Page 23: Information Visualization in Data Mining

Assignment 1: see Assignment 1: see handouthandout

Page 24: Information Visualization in Data Mining

Some websites of Some websites of interest:interest: http://http://

dmoz.org/Computers/Software/Databases/Data_Miningdmoz.org/Computers/Software/Databases/Data_Mining/ / Public_Domain_SoftwarePublic_Domain_Software//

http://www.cs.man.ac.uk/~ngg/InfoViz/Projects_and_Prohttp://www.cs.man.ac.uk/~ngg/InfoViz/Projects_and_Products/Visualization/ducts/Visualization/

Try a search at google.com using Try a search at google.com using the followng key words together:the followng key words together:

name_of_method download softwarename_of_method download software