Top Banner
Information Visualization and Visual Analytics roles, challenges, and examples Giuseppe Santucci
57

Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Mar 10, 2018

Download

Documents

doanmien
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Information Visualization and

Visual Analytics roles, challenges, and examples

Giuseppe Santucci

Page 2: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

VisDis and the Database & User Interface • The VisDis and the Database/Interface group background is about:

– Visual Information Access – Data quality – Data integration – Adaptive Interfaces – User Centered Design – Usability and Accessibility – Infovis evaluation – Visual quality metrics – Visual Analytics

• Data sampling • Density map optimization

Page 3: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Outline

• Information Visualization – Main issues

• Data overloading – Visual Analytics – Automatic data analysis – Three examples

• Projects and books

Page 4: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Information visualization !

1. Infovis is perfect for exploration, when we don’t know exactly what

to look at. It supports vague goals

2. Infovis is perfect to explain complex data and to support decisions

• Other approaches to data analysis – Statistics: strong verification but does not support exploration

and vague goals – Data mining: actionable and reliable but black box, not

interactive, question-response style – Visual analytics (formerly Visual Data Mining) is trying to join

the two worlds

Page 5: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Canonical steps in infovis – STEP 1

DATA Internal Representation

Encoding of values Univariate data Bivariate data Trivariate data Multidimensional data

Encoding of relations Temporal data Map & Diagrams Graphs/Trees Data streams

Sport

Literature

Mathematics

Physics

History

Geography

Art

Chemistry

Page 6: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Canonical steps in infovis – STEP 2

Internal Representation

Space limitations Scrolling Overview + details Distortion Suppression Zoom & pan Semantic zoom

Time limitation Perceptual issues Cognitive issues

Presentation

Page 7: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

SO WE ARE DONE! (?)

Page 8: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Outline

• Information Visualization • Data overloading

– Visual Analytics – Automatic data analysis – Three examples

• Projects and books and conferences

Page 9: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Data size and complexity ! • 100 million FedEx transactions per day • 150 million VISA credit card transactions per day • 300 million long distance ATT calls per day • 50 billion e-mails per day • 600 billion IP packets per day • 1 trillion (1012) of web pages (according to Google),

corresponding to about 3 petabytes of data • Google processes 20 petabytes of data per day • Data streams (sensor network, IP traffic, etc)

kilobyte, megabyte, gigabyte, terabyte, petabyte …

Page 10: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Rescuing information • In different situations people need to exploit and to use hidden

information resting in unexplored large data sets – decision-makers – analysts – engineers – emergency response teams – ...

• Several techniques exist devoted to this aim – Automatic analysis techniques (e.g., data mining) – Manual analysis techniques (e.g., Information visualization)

• Petabyte datasets require a joint effort:

Page 11: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Visual Analytics

Page 12: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

VA is highly interdisciplinary

Scientific & Information

Visualisation

Data Management

Data Mining

Spatio-Temporal

Data

Human Perception+Cognition Infrastructure Infrastructure

Evaluation Evaluation

Each component presents challenging issues

Page 13: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Visualization • Scientific Visualization & Information Visualization

– interactivity & scalability issues • Challenges: design of new scalable structure that

support: – Visual abstractions (e.g., clustering, sampling, etc.) – Rapid update of visual displays for billion record

databases (10 frames per second)

Page 14: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Data Management • Answering a query against a large data set is now possible Among the other challenges: • Integration of heterogeneous data such as numeric data,

graphs, text, audio and video signals, semi-structured data • Data streams - In many application data are continuously

produced (sensor data, stock market data, news data, etc.) • Data provenance - Understanding where data come from • Data reduction - Visualizing billion records is not possible.

We need to reduce and abstract the data to support interaction at different detail levels (see, e.g., Google Earth)

• ...

Page 15: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Data mining • Methods to automatically extract insights

– Supervised learning from examples: using training samples to learn models for the classification (or prediction) of previously unseen data sample

– Cluster analysis, which aims to extract structure from unknown data, grouping data instances into classes based on mutual similarity, and to identify outliers

– Association rule mining (analysis of co-occurrence of data items) and dimensionality reduction

• Challenges come from: – semi-structured and complex data (web data,

documents) – interaction with visualizations

Page 16: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Spatio - Temporal Data

• Data about time and space are widely spread – geographic measurements – GPS position data – remote sensing applications (e.g., satellite data)

• Finding spatial relationships and patterns among this data is of special interest

• The analysis of data with references both in space and in time is a challenging research topic: – scale: clusters and other phenomena may only occur at

particular scales, which may not be the scale at which data is recorded

– uncertainty: spatio-temporal data are often incomplete, interpolated, collected at different times, etc.

– …

Page 17: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Perception and cognition

• A critical element is the human being () – Visual analysis tasks require the careful

design of apt human-computer interfaces – Challenges: need to integrate Psychology,

Sociology, Neurosciences, and Design issues • user-centred analysis and modelling • multimodal interaction techniques for

visualization and exploration of large information spaces

• availability of improved display resources • novel interaction algorithms • perceptual, cognitive and graphical

principles which in combination lead to improved visual communication of data and analysis results

Form Intention

Form Action plan

ExecuteAction

Evaluatio

Interpretatio

Perception

Page 18: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Evaluation and Infrastructure

• How to assess (evaluate) the effectiveness of visual analytics environment is a topic of lively debate

• The same happens for infrastructures: agreed solutions are still under investigation

Both topics are still in the phase of workshop results... D3!

Page 19: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Back to the Automatic Data Analysis

We can classify the automatic activities in three main groups 1. Deriving new values from the dataset for ad-hoc visualization

• This is the less standard and the more creative part of the process 2. Data reduction / data mining

• Clustering /classification /… • Sampling / pixel oriented visualization • Dimension reduction

3. Visualization improvement • Data distribution • Perceptual issues • Cognitive issues

Page 20: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Example for group 1

Deriving new values from the dataset for ad-hoc visualization

(you are going to visualize DERIVED data)

Page 21: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

A Visual Analytics example (Group 1) Deriving new values from the dataset for ad-hoc visualization

• How to visually compare J. London and M. Twain books ? • [D. A. Keim and D. Oelke. Literature Fingerprinting: A New Method for

Visual Literary Analysis. 2007 IEEE Symp. on Visual Analytics Science and Technology (VAST '07) ]

1. Split the book in several text block (e.g., pages, paragraph,

sentences) 2. Measure, for each text block, a relevant feature (e.g.,

average sentence length, word usage, etc. ) 3. Associate the relevant feature to a visual attribute (e.g.,

color) 4. Visualize it

Page 22: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

J.London vs M.Twain average sentence lengths

Page 23: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

User interaction (a non uniform book?)

Page 24: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Details of a book

Page 25: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

What about the Bible?

Page 26: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Example 2 Data reduction / data mining

Page 27: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Visual Analytics of Anomaly Detection in Large Data Streams (paper from Daniel Keim group)

• You have to monitor a network composed of 8 systems with 16 servers each

• Each server provide basic information – CPU % occupation – DISK % occupation – MEM % occupation – ... – That corresponds to 128 temporal data streams (overplotting !!)

time

CPU %

Page 28: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Pixel oriented visualization

28 days (5 min windows), about 8k observations Each observation takes a pixel The color codes the CPU %

Page 29: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

The whole system

Color is preattentive!

Page 30: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Automated analysis

• Computing high CPU % clusters • That selects hot time intervals

Page 31: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Automated analysis...

• Detecting persistent anomalies

Page 32: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Looking for correlations

Page 33: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Example 3 Visualization improvement

Page 34: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

A Visual Analytics example (Group 3 – Visualization improvement) Data distribution and perceptual issues

Density maps

8x8 pixels

empty pixel

4 data items are plotted on the same pixel:d=4

we can map the density values to a 256 levels grey or color scale

Page 35: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

The case study (Infovis contest 2005)

• About 60,000 USA companies plotted on a 800x450 (360,000 pixels) scatter plot

• 126 distinct density values ranging on [1..1,633] • 7,042 active pixels (i.e., hosting at least one company):

– 2526 pixels (36%) host exactly one company (d=1) – 1182 pixels (17%) host two companies (d=2) – ... – 1 pixel (0.0001 %) hosts 1633 companies (d=1633)

Page 36: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

What is the problem? • The choice of the right mapping is crucial, because

of density frequency distribution presents very skewed behaviour

Density (126 distinct values)

Pixe

l num

ber

36%

17%

0.001%

1633

Page 37: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

The mapping

126 different data densities = { 1, 2, … , 1,633 }

256 Color Codes = { 0,1, 2, … , 255}

? Available solutions

- Linear mapping - Non linear mappings

Page 38: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Linear mapping

Most pixels share very low color codes

Few color codes are used (46 out of 256)

Different low density values are represented by the same color code: densities in [1..10] are mapped on codes {1,2}

−=

minmax

min255)(dd

ddRounddColorCode

•Straightforward solution

•Useless in this situation

Color code frequency distribution

Transfer Function

collisions

colors

Page 39: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Density function mapping

Color code frequency distribution

TF

= ∑

=

j

i AP

iAPj N

dDNRounddColorCode1

)(255)(

•Hermann et al. [HMM00] •Quite similar to histogram aequalization •Better than linear mapping

Few color codes are used (39 out of 256)

Lowest color code unnecessarily high

Codes ranging only on [91..255]

Different high density values are represented by the same color code: densities in [48..1,633] -> [250,255]

Page 40: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Our proposal We take into account that: • densities and color codes are discrete and finite • too close color codes are hardly distinguishable

(for human beings)

[E. Bertini, A. Di Girolamo, G.Santucci - See what you know: analyzing data distribution to improve density map visualization – Eurovis 2007 conference]

Page 41: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

uniform scale mapping We use a reduced color scale, e.g. with 15 codes (NL=15)

0 18 36 55 73 91 109 128 146 164 182 200 219 237 255

1c 2c

L

AP

NN

3c NLc

Target color code frequency distribution

This implies that different density values will be necessarily represented by the same color code: to reduce the degradation the mapping is performed through an algorithm that tries to assign to each code the same number of pixels

Page 42: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

NDV>NL : uniform scale mapping

Color code frequency distribution

Because of densities are discrete the algorithm cannot ensure the NAP/NL value and through a peak analysis it minimizes the variance

Full color scale usage [0..255]

All the color codes are used

Maximum color code separation

PixelsDistributedColorCode j =)(

Page 43: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Visual comparison

Linear mapping Density function mapping

Uniform scale mapping

Page 44: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Visual comparison

Page 45: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •
Page 46: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •
Page 47: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

The parcel dataset Postal parcels plotted by weight (x) and volume (y)

Page 48: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Grey scale

Linear CSU=0.53 CsAR=1 CS=2.83

Density Function CSU=0.18 CsAR=0.62 CS=5.23

Uniform color sc. CSU=1 CsAR=1 CS=8.79

Page 49: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Conclusions • Visual Analytics is a new (exciting) emerging research field • Information visualization is a core component of VA • Automated data analysis could be classified in three main

groups – Deriving new values (more creative) – Data reduction (sometimes creative) – Image improvement (very technical)

• It is highly interdisciplinary and require a collaborative approach

• It is mainly a METHODOLOGY / VISION than a technique • However a collection of available results / proposal is

quickly growing

Page 50: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

The new (European) book on VA • Illuminating the path : The

Research and Development Agenda for Visual Analytics – 2005, focusing on USA

homeland security

• Managing the Information Age Solving Problems with Visual Analytics (2010) – One of the major outcome of

Vismaster – Availble for free at:

– http://www.vismaster.eu/

Page 51: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

5 books you HAVE to read (greedy order)

• Robert Spence - Information Visualization: Design for Interaction (2nd Edition) - Addison-Wesley (ACM Press) - BASIC ISSUES

• Chaomei Chen - Information Visualization - Second Edition - Springer - AN UPDATED OVERVIEW

• Managing the Information Age Solving Problems with Visual Analytics (2010) VISMASTER BOOK

• Colin Ware - Information Visualization, Third Edition: Perception for Design (Interactive Technologies) - Morgan Kaufmann - PERCEPTUAL ISSUES

• Card, Mackinlay, Shneiderman - Reading in Information Visualization - 1999 HYSTORICAL

Page 52: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

Visual Analytics projects

Page 53: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

The Vismaster CA project

Page 54: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

The Promise NoE project

Page 55: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

PanopteSec Network Cyber Security

• 3 years European IP project!

Page 56: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •
Page 57: Information Visualization and Visual Analytics roles ...santucci/InformationVisualization/Slides/08... · Information Visualization and Visual Analytics roles, challenges, ... •

PanopteSec: Call for Master Thesis

• Design implement and test a Visual Analytics Environment for Network security

• D3 framework • It includes the Information

visualization homework