YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

On the role of Interactivity and Data Placement in Big Data Analytics

Srini ParthasarathyOSU

Page 2: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

The Data Deluge: Data Data Everywhere

22

Page 3: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

600$ to buy a disk drive that can store all of the

world’s music

3

[McKinsey Global Institute Special Report, June ’11]

Data Storage is Cheap

Page 4: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

Data does not exist in isolation.

4

Page 5: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

Data almost always exists in connection with other data – integral

part of the value proposition.

5

Page 6: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

6

Social networks Protein Interactions Internet

VLSI networks Data dependenciesNeighborhood graphs

Page 7: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

7

Big Data Problem: All this data is only useful if we can scalably extract useful knowledge from such complex data

Page 8: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

THIS TALK

• THE ROLE OF DATA PLACEMENT IN BIG DATA SYSTEMS

• THE ROLE OF VISUALIZATION AND INTERACTION IN BIG DATA ANALYSIS

Page 9: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

GLOBAL GRAPHS

Page 10: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

GLOBAL GRAPHS

• What? – System for deploying applications processing complex data

• Why? – Seeks balance between high productivity and high performance

• How?– Built on top of PNL’s GlobalArrays– Trees (GlobalTrees, GlobalForests)– Relational Arrays (ArrayDB-GA)– Graphs (GlobalGraphs)

• Data Placement is key to high performance

Page 11: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

Importance of Data Placement

• Locality– Placing related items close to each other so they may be

processed together

• Mitigating Impact of Data Skew– Reducing load imbalance in a parallel setting– Reducing variance in partition samples

• Generating Stratified Samples– Improving interactive performance

Page 12: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

Key Ideas

• Pivotization– Convert data with complex structure into sets– Each element of set captures features of local topology

• Hashing into Strata: Hash related sets into similar bins– Can employ a sketch-clustering algorithm

• Partitioning: Place Strata into partitions for• Locality • Mitigating Data Skew• Samples

Page 13: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

SK

ETCH

SORT

or S

KETC

HCL

UST

ER

S-1 : : S-4(Δ1, SK-1)(Δ5, SK-5)(Δ12,SK-12)(Δ25,SK-25) : : :

S-5 : : : S-128 : : :

PART

ITIO

NIN

G &

REP

LICA

TIO

N

P-1 : P-2 S-4 S-7 S-8 S-12 : S-128

P-3 : : : P-8 S-3 S-4 S-9S-12 : S-127

PIVO

T

T

RAN

SFO

RMAT

ION

S

A

B C

LE

A

B C

LE F

.

.

.

.

Δ1

Δ25

DATA (Δ)

A

B C

A

F C

A

E C

A

F L

B

E F

A

E L

A

B L

A

B C

A

E CA

E L

A

B L

.

.

.

.

(PS-1)

(PS-25)

PIVOT SETS (PS)

MIN

WIS

E H

ASH

ING

on

PIVO

T SE

TS

{1050, 2020,3130,1800} (SK-1)

{1050, 2020,7225, 2020} (SK-25)

.

.

.

.

.

.SKETCHES(SK) Strata (S)

Page 14: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

Frequent Tree Mining

• Our proposed approaches shows 100X gains

Page 15: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

WebGraph Compression

• Linear Scaleup with no loss in compression ratio

Page 16: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

PRISM-HD -

PRobing the Intrinsic Structure and Makeup of High-dimensional Data

HD

Page 17: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

Visualization and Interactivity are key to discovery

17

Page 18: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

PRISM-HD• What?

– A novel mechanism for exploring complex data

• Why?– User is often overwhelmed with

characteristics of data– Befuddled on where to start

• How?– Given, similarity measure-of-interest– Compute similarity graph at threshold (t)

• Key: Graphs are dimensionless

– Provide user graph visualization cues• User determines next threshold and

repeats

HD

Page 19: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

HD

HIGH THRESHOLD MODERATE THRESHOLD LOW THRESHOLD

Page 20: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

Benefits of Knowledge CachingHD

Page 21: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

Benefits of Incremental Processing on Twitter

Incremental estimates on Twitter t1 = 0.95

HD

Page 22: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

PRISM-HD and Global Graphs in Context:Leveraging Social Media in Emergency Response

HD

Page 23: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

Concluding Remarks

• Data is everywhere• Data is fraught with complexities

– Dimensionality, dynamics, structure, massive…• Both data placement and data interactivity

have an important role to play in big data analytics– PRISM-HD and GlobalGraphs can help!

HD

Page 24: On the role of Interactivity and Data Placement in Big Data Analytics Srini Parthasarathy OSU.

Thanks for your attentionContact: [email protected]

Mining Simulation Data

Medical Image Analysis

Protein Interaction Network (yeast)

Acknowledgements: Various NSF, NIH, DOE and industry grants


Related Documents