Top Banner
1 Interactive Exploration of Multidimensional Data By: Sanket Sinha Nitin Madnani
16

Interactive Exploration of Multidimensional Data

Dec 31, 2015

Download

Documents

jayme-ruiz

Interactive Exploration of Multidimensional Data. By: Sanket Sinha Nitin Madnani. Is It Really That Common ?. You Bet: Demographics Economics Census Microarray Gene Expression Engineering Psychology Health. I can ’ t see it, I tell ya !. Visualization challenges for >= 3D: - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Interactive Exploration of Multidimensional Data

1

Interactive Exploration of Multidimensional Data

Interactive Exploration of Multidimensional Data

By:

Sanket Sinha

Nitin Madnani

By:

Sanket Sinha

Nitin Madnani

Page 2: Interactive Exploration of Multidimensional Data

2

Is It Really That Common ?Is It Really That Common ?

You Bet: Demographics Economics Census Microarray Gene Expression Engineering Psychology Health

You Bet: Demographics Economics Census Microarray Gene Expression Engineering Psychology Health

Page 3: Interactive Exploration of Multidimensional Data

3

I can’t see it, I tell ya !I can’t see it, I tell ya !

Visualization challenges for >= 3D: Relationship comprehension is difficult Discovering outliers, clusters and gaps is almost

impossible Orderly exploration is not possible with standard

visualization systems Navigation is cognitively onerous and disorienting

(3D) Occlusion (3D)

Visualization challenges for >= 3D: Relationship comprehension is difficult Discovering outliers, clusters and gaps is almost

impossible Orderly exploration is not possible with standard

visualization systems Navigation is cognitively onerous and disorienting

(3D) Occlusion (3D)

Page 4: Interactive Exploration of Multidimensional Data

4

Standard SolutionStandard Solution

Can you say “Pro-jek-shun” ? Use lower dimensional projections of

data:

Can you say “Pro-jek-shun” ? Use lower dimensional projections of

data:

1D : Histograms 2D : Scatterplots

Page 5: Interactive Exploration of Multidimensional Data

5

But there are so many !But there are so many !

For 13 dimensions (columns) : Number of histograms = 13 Number of scatterplots = C(13,2) = 78

Must examine a series of these to gain insights

Unsystematic == Inefficient Must have order !

For 13 dimensions (columns) : Number of histograms = 13 Number of scatterplots = C(13,2) = 78

Must examine a series of these to gain insights

Unsystematic == Inefficient Must have order !

Page 6: Interactive Exploration of Multidimensional Data

6

Introducing Rank-by-featureIntroducing Rank-by-feature Allows projections to be examined in an

orderly fashion A powerful framework for interactive

detection of: Inter-dimension relationships Gaps Outliers Patterns

Allows projections to be examined in an orderly fashion

A powerful framework for interactive detection of: Inter-dimension relationships Gaps Outliers Patterns

Page 7: Interactive Exploration of Multidimensional Data

7

How does it work ?How does it work ?

Framework defines ranking criteria for 1D & 2D projections

User selects criterion of interest All projections are scored on the

criterion and ranked User examines projections in the order

recommended Eureka* !!

Framework defines ranking criteria for 1D & 2D projections

User selects criterion of interest All projections are scored on the

criterion and ranked User examines projections in the order

recommended Eureka* !!

*Disclaimer: All users may not be able to make life-altering discoveries

Page 8: Interactive Exploration of Multidimensional Data

8

Ranking Criteria - 1DRanking Criteria - 1D

Normality: Indicative of how “Gaussian” the dataset is

Uniformity: How “uniform” is the dataset ?(How high is the entropy ?)

Outliers: The number of potential outliers in the dataset

Gap: The size of the biggest gap Uniqueness: Number of unique data points

Normality: Indicative of how “Gaussian” the dataset is

Uniformity: How “uniform” is the dataset ?(How high is the entropy ?)

Outliers: The number of potential outliers in the dataset

Gap: The size of the biggest gap Uniqueness: Number of unique data points

Page 9: Interactive Exploration of Multidimensional Data

9

Ranking Criteria - 2DRanking Criteria - 2D

Linear Correlation: Pearson’s correlation coefficient

LSE: Least Square Error from the optimal quadratic curve fit

Quadracity: Quadratic coefficient from fitting curve equation

Uniformity: Joint entropy ROI: Number of items in a Region Of Interest Outliers: Number of potential outliers

Linear Correlation: Pearson’s correlation coefficient

LSE: Least Square Error from the optimal quadratic curve fit

Quadracity: Quadratic coefficient from fitting curve equation

Uniformity: Joint entropy ROI: Number of items in a Region Of Interest Outliers: Number of potential outliers

Page 10: Interactive Exploration of Multidimensional Data

10

Put A Demo Where Your Mouth Is !

Put A Demo Where Your Mouth Is !

Page 11: Interactive Exploration of Multidimensional Data

11HCE OverviewHCE Overview

Page 12: Interactive Exploration of Multidimensional Data

12

The Input Dialog BoxThe Input Dialog BoxPerform Filtering & NormalizationPerform Filtering & Normalization

Page 13: Interactive Exploration of Multidimensional Data

13

Histogram OrderingHistogram Ordering

Page 14: Interactive Exploration of Multidimensional Data

14

Scatterplot OrderingScatterplot Ordering

Page 15: Interactive Exploration of Multidimensional Data

15

Tabular View of DataTabular View of Data

Select specific data records and annotate if neededSelect specific data records and annotate if needed

Page 16: Interactive Exploration of Multidimensional Data

16

Questions/CritiquesQuestions/Critiques

What does “outlierness” mean? Cannot identify datapoints in histogram or

scatterplot browser without switching to table view Especially in ROI

How to intuitively interpret: Outliers in 2D LSE Quadracity

What does “outlierness” mean? Cannot identify datapoints in histogram or

scatterplot browser without switching to table view Especially in ROI

How to intuitively interpret: Outliers in 2D LSE Quadracity