Top Banner
On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU
24

On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

Mar 31, 2015

Download

Documents

Heidi Degon
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

On the role of Interactivity and Data Placement in Big Data Analytics

Srini ParthasarathyOSU

Page 2: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

The Data Deluge: Data Data Everywhere

22

Page 3: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

600$ to buy a disk drive that can store all of the

world’s music

3

[McKinsey Global Institute Special Report, June ’11]

Data Storage is Cheap

Page 4: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

Data does not exist in isolation.

4

Page 5: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

Data almost always exists in connection with other data – integral

part of the value proposition.

5

Page 6: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

6

Social networks Protein Interactions Internet

VLSI networks Data dependenciesNeighborhood graphs

Page 7: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

7

Big Data Problem: All this data is only useful if we can scalably extract useful knowledge from such complex data

Page 8: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

THIS TALK

• THE ROLE OF DATA PLACEMENT IN BIG DATA SYSTEMS

• THE ROLE OF VISUALIZATION AND INTERACTION IN BIG DATA ANALYSIS

Page 9: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

GLOBAL GRAPHS

Page 10: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

GLOBAL GRAPHS

• What? – System for deploying applications processing complex data

• Why? – Seeks balance between high productivity and high performance

• How?– Built on top of PNL’s GlobalArrays– Trees (GlobalTrees, GlobalForests)– Relational Arrays (ArrayDB-GA)– Graphs (GlobalGraphs)

• Data Placement is key to high performance

Page 11: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

Importance of Data Placement

• Locality– Placing related items close to each other so they may be

processed together

• Mitigating Impact of Data Skew– Reducing load imbalance in a parallel setting– Reducing variance in partition samples

• Generating Stratified Samples– Improving interactive performance

Page 12: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

Key Ideas

• Pivotization– Convert data with complex structure into sets– Each element of set captures features of local topology

• Hashing into Strata: Hash related sets into similar bins– Can employ a sketch-clustering algorithm

• Partitioning: Place Strata into partitions for• Locality • Mitigating Data Skew• Samples

Page 13: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

SK

ETCH

SORT

or S

KETC

HCL

UST

ER

S-1 : : S-4(Δ1, SK-1)(Δ5, SK-5)(Δ12,SK-12)(Δ25,SK-25) : : :

S-5 : : : S-128 : : :

PART

ITIO

NIN

G &

REP

LICA

TIO

N

P-1 : P-2 S-4 S-7 S-8 S-12 : S-128

P-3 : : : P-8 S-3 S-4 S-9S-12 : S-127

PIVO

T

T

RAN

SFO

RMAT

ION

S

A

B C

LE

A

B C

LE F

.

.

.

.

Δ1

Δ25

DATA (Δ)

A

B C

A

F C

A

E C

A

F L

B

E F

A

E L

A

B L

A

B C

A

E CA

E L

A

B L

.

.

.

.

(PS-1)

(PS-25)

PIVOT SETS (PS)

MIN

WIS

E H

ASH

ING

on

PIVO

T SE

TS

{1050, 2020,3130,1800} (SK-1)

{1050, 2020,7225, 2020} (SK-25)

.

.

.

.

.

.SKETCHES(SK) Strata (S)

Page 14: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

Frequent Tree Mining

• Our proposed approaches shows 100X gains

Page 15: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

WebGraph Compression

• Linear Scaleup with no loss in compression ratio

Page 16: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

PRISM-HD -

PRobing the Intrinsic Structure and Makeup of High-dimensional Data

HD

Page 17: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

Visualization and Interactivity are key to discovery

17

Page 18: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

PRISM-HD• What?

– A novel mechanism for exploring complex data

• Why?– User is often overwhelmed with

characteristics of data– Befuddled on where to start

• How?– Given, similarity measure-of-interest– Compute similarity graph at threshold (t)

• Key: Graphs are dimensionless

– Provide user graph visualization cues• User determines next threshold and

repeats

HD

Page 19: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

HD

HIGH THRESHOLD MODERATE THRESHOLD LOW THRESHOLD

Page 20: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

Benefits of Knowledge CachingHD

Page 21: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

Benefits of Incremental Processing on Twitter

Incremental estimates on Twitter t1 = 0.95

HD

Page 22: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

PRISM-HD and Global Graphs in Context:Leveraging Social Media in Emergency Response

HD

Page 23: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

Concluding Remarks

• Data is everywhere• Data is fraught with complexities

– Dimensionality, dynamics, structure, massive…• Both data placement and data interactivity

have an important role to play in big data analytics– PRISM-HD and GlobalGraphs can help!

HD

Page 24: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

Thanks for your attentionContact: [email protected]

Mining Simulation Data

Medical Image Analysis

Protein Interaction Network (yeast)

Acknowledgements: Various NSF, NIH, DOE and industry grants