Top Banner
Visual Data Exploration: Having a Conversation With Complex Data to Understand What Else it Contains SDV Nice April 2016 Simon Fitall CEO Galileo Analytics
25

II-SDV 2016 Simon Fitall -

Jan 21, 2017

Download

Internet

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: II-SDV 2016 Simon Fitall -

Visual Data Exploration:Having a Conversation With

Complex Data to Understand

What Else it Contains

SDV Nice April 2016

Simon Fitall

CEO Galileo Analytics

Page 2: II-SDV 2016 Simon Fitall -

“All truths are easy to understand once they are discovered;

the point is to discover them.”

Galileo Galilei, 15 February 1564 – 8 January

1642

Data discovery and the Scientific Process

Page 3: II-SDV 2016 Simon Fitall -

What Would Galileo Do (WWGD)?

What DID Galileo Do?

• Invention

Page 4: II-SDV 2016 Simon Fitall -

What Would Galileo Do (WWGD)?

What DID Galileo Do?

• Invention

• Experimentation

Page 5: II-SDV 2016 Simon Fitall -

What Would Galileo Do (WWGD)?

What DID Galileo Do?

• Invention

• Experimentation

• Observation

What has been the traditional approach to data

analysis of the last 40years?

Page 6: II-SDV 2016 Simon Fitall -
Page 7: II-SDV 2016 Simon Fitall -
Page 8: II-SDV 2016 Simon Fitall -
Page 9: II-SDV 2016 Simon Fitall -

Edgar F. CoddAugust 23, 1923 – April 18,

2003

"A Relational Model of Data for

Large Shared Data Banks“

1970

Page 10: II-SDV 2016 Simon Fitall -

Why is a New Approach Necessary?

A revolution in the availability of data

A revolution in the sources of data

A revolution in the creation of data

The observable universe has grown beyond

all recognition – and continues to grow at an

increasing rate

Page 11: II-SDV 2016 Simon Fitall -

From Experimentation to Exploration

Experimentation

• Predetermined data variables

• Predefined cohorts

• Aggregated data for most studies

• Predefined analytics

• Thousands of separate studies covering population groups – point solutions

Exploration

• N x 102 data variables

• N x 103 data sources

• N x 106 points of service

• N x 109 patients

• N x 1012 data points PER DAY

• Almost infinite granularity at longitudinal patient level – multi-point solutions

Predefined Cohorts

N x 1012 data points

PER DAY

Page 12: II-SDV 2016 Simon Fitall -

The magnitude of the analytical issue….

Diagnosis to SNP – 1-1 is smaller than a golf ball

1-3 is a sports stadium

1-2 is a basket ball

Page 13: II-SDV 2016 Simon Fitall -

Clearly we need an alternative• We CANNOT test all the hypotheses and find all

the cohorts

• However, research suggests that we need to look

at the broad scope of dimensions available in the

data

• So we must restrict the dimensions of interest

• Visual data exploration is a possible route for

analysis…….

Page 14: II-SDV 2016 Simon Fitall -

Characteristics of Effective Data

Exploration

WHAT – are we looking for?

WHO – do we want looking for it?

HOW – do we want to look for it?

Page 15: II-SDV 2016 Simon Fitall -

Insights…....

Point the Hubble telescope at an

apparently empty piece of space

and what do you find?

Thousands of

GALAXIES!

Page 16: II-SDV 2016 Simon Fitall -

Insights Example: Top Non-Respiratory Co-morbidities

Patients with COPD v’s All Patient Average

0%

10%

20%

30%

40%

50%

60%

ANXIETY ANDDEPRESSION

JOINT PAIN HEART DISEASE CANCER OEDEMA

57%

33%30%

22%20%

31%

26%

20%

15%

9%

Source : Cegedim / Galileo Cosmos

% COPD patients with diagnosis

% All patients with diagnosis

Page 17: II-SDV 2016 Simon Fitall -

WHO – do we want looking for

Insights?Content matter experts

With intellectual curiosity

• A clinician with an unusual patient cohort

• A researcher needing to recruit to a clinical trial

• A public health specialist wishing to better understand disease patterns

Page 18: II-SDV 2016 Simon Fitall -

Visual Exploration of Data• Visual analytics

• Organizes data in solar

systems of interrelated

variables (cohorts)

• Easy to use and

understand

• Explore multiple

hypotheses

• Coding free to allow

access to content matter

experts

Page 19: II-SDV 2016 Simon Fitall -

Visual Exploration of Data

• Dynamic charting (by cohort)

• Characterises the cohort -

especially outliers

• Charting adjusts with

changes in underlying

analysis

• Define and refine cohortss

as you explore the data

• Full descriptive statistics

(direct interface with “R”)

Page 20: II-SDV 2016 Simon Fitall -

Findings: Cohort with inconsistent lab results

associated with prescribing of different types of drug

Source : Cegedim / Galileo

Cosmos

Prescribing by Product Class

Lab results by class

These values were

unexpected

because they are

inconsistent with

normal usage

Page 21: II-SDV 2016 Simon Fitall -

Findings: Cohorts with unexpected characteristics of

fourth level of co-morbidities in respiratory disease

Source : Cegedim / Galileo

Cosmos

Expanding the display to

explore co-morbidities of

interest…..

Analyse any node to create pivot

table of all data at lower levels

Page 22: II-SDV 2016 Simon Fitall -

Findings: Identified new cohort of men with a

disproportionate presence of co-morbidity

Source : Cegedim / Galileo

Cosmos

COPD + Heart Disease + Cancer

show significantly more men with

Anxiety & Depression

Page 23: II-SDV 2016 Simon Fitall -

Visual Exploration SummaryVisual

ARRAY OF visual methods for exploring and viewing the data

Combining array based data mapping with browser-based GUI

Fast

Rapid iteration of multiple cohorts, with full characterization

Calculation on demand to reduce overhead costs

Flexible

Any combination of variables explored in multiple ways

Array mapping allows unlimited cross-analysis

“Near Limitless”Parallel processing, sharding, multi-core expansion limited only

by available hardware

Page 24: II-SDV 2016 Simon Fitall -

Just part of a 17,000 node display

Each node

represents a

unique patient

cohort

Node colors

represent a

different stage

of therapy

• Background calculations can explore multiple

characteristics of all nodes

• Display can select only those nodes that meet

required criteria

• The USER selects what they want to view

• All done across large datasets including clinical

and genomic

• Watch this space…….

Page 25: II-SDV 2016 Simon Fitall -

[email protected]

Thank You

Questions