Top Banner
Visualizing “Big” Data Sean Kandel & Jerey Heer Trifacta Inc. @trifacta
58

2013.10.24 big datavisualization

Sep 08, 2014

Download

Technology

Sean Kandel

When the number of data elements gets large - thousands to billions or more data points - standard visual representations and interaction techniques break down. In this talk, we will survey methods for scaling interactive visualizations to data sets too large to process or explore using traditional means. I will compare data reduction techniques such as sampling, aggregation and model fitting, as well as interesting hybrid approaches, and discuss their trade-offs. I will also describe methods to enable real-time interactive exploration within standards-compliant web browsers. Attendees will learn effective visualization techniques and interaction methods that are applicable to billion+ element databases.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 2013.10.24 big datavisualization

Visualizing “Big” DataSean Kandel & Je!rey Heer Trifacta Inc. @trifacta

Page 2: 2013.10.24 big datavisualization

How can we visualize and interact with billion+ record

databases in real-time?

Page 3: 2013.10.24 big datavisualization

Two Challenges:1. E!ective visual encoding2. Real-time interaction

Page 4: 2013.10.24 big datavisualization

Perceptual and interactive scalability should be limited by the chosen resolution of the visualized data, not the

number of records.

Page 5: 2013.10.24 big datavisualization

Perception

Page 6: 2013.10.24 big datavisualization

Data Sampling

ModelingBinning

Page 7: 2013.10.24 big datavisualization

Google Fusion Tables (Sampling)

Page 8: 2013.10.24 big datavisualization

imMens (Binned Aggregation)

Page 9: 2013.10.24 big datavisualization

Bin > Aggregate (> Smooth) > Plot

1. Bin Divide data domain into discrete “buckets”Categories: Already discrete (but check cardinality)Numbers: Choose bin intervals (uniform, quantile, ...)Time: Choose time unit: Hour, Day, Month, etc.Geo: Bin x, y coordinates after cartographic projection

Page 10: 2013.10.24 big datavisualization

Number of Bins?

Page 11: 2013.10.24 big datavisualization

100,000 Data Points Rectangular BinsHexagonal Bins

Hexagonal or Rectangular Bins?

Hex bins better estimate density for 2D plots,but the improvement is marginal [Scott 92], whilerectangles support reuse and query processing.

Page 12: 2013.10.24 big datavisualization

Bin > Aggregate (> Smooth) > Plot

1. Bin Divide data domain into discrete “buckets”Categories: Already discrete (but check cardinality)Numbers: Choose bin intervals (uniform, quantile, ...)Time: Choose time unit: Hour, Day, Month, etc.Geo: Bin x, y coordinates after cartographic projection

2. Aggregate Count, Sum, Average, Min, Max, ...

Page 13: 2013.10.24 big datavisualization

Bin > Aggregate (> Smooth) > Plot

1. Bin Divide data domain into discrete “buckets”Categories: Already discrete (but check cardinality)Numbers: Choose bin intervals (uniform, quantile, ...)Time: Choose time unit: Hour, Day, Month, etc.Geo: Bin x, y coordinates after cartographic projection

2. Aggregate Count, Sum, Average, Min, Max, ...

(3. Smooth Optional: smooth aggregates [Wickham ’13])

Page 14: 2013.10.24 big datavisualization

[1] Wickham 2013

Page 15: 2013.10.24 big datavisualization

Bin > Aggregate (> Smooth) > Plot

1. Bin Divide data domain into discrete “buckets”Categories: Already discrete (but check cardinality)Numbers: Choose bin intervals (uniform, quantile, ...)Time: Choose time unit: Hour, Day, Month, etc.Geo: Bin x, y coordinates after cartographic projection

2. Aggregate Count, Sum, Average, Min, Max, ...

(3. Smooth Optional: smooth aggregates [Wickham ’13])

4. Plot Visualize the aggregate summary values

Page 16: 2013.10.24 big datavisualization

Plot: Visual Encoding

Choose Most E!ective Encoding [Cleveland & McGill ’84]

1D Plot -> Position or Length EncodingHistograms, line charts, etc.

2D Plot -> Area or Color EncodingSpatial dimensions (x, y) already allocated.While less e!ective than area for magnitude estimation, color can be used at the per-pixel level and provides an overall “gestalt”

Page 17: 2013.10.24 big datavisualization

Standard Color RampCounts near zero are white.

-> Outliers are missed

Add Discontinuity after ZeroCounts near zero remain visible.

-> Outliers can be seen

Page 18: 2013.10.24 big datavisualization

Linear Alpha Interpolationis not perceptually linear.

Cube-Root Alpha Interpolationapproximates perceptual linearity.

Page 19: 2013.10.24 big datavisualization

Color Encoding

Luminance (in range 0-1)

Min. Non-Zero Intensity (α=0.15) [1] Perceptual Scaling (γ=1/3) [2]

User-Adjustable Min/Max Values [3]

[1] Keep small non-zero values visible (outliers!)[2] Match color ramp to perceptual distances[3] Enable exploration across value ranges

Page 20: 2013.10.24 big datavisualization

Design Space of Binned Plots

Page 21: 2013.10.24 big datavisualization

Interaction

Page 22: 2013.10.24 big datavisualization

Interaction Techniques?1. Select Detail-on-Demand2. Navigate Pan & Zoom3. Query Brush & Link

Page 23: 2013.10.24 big datavisualization
Page 24: 2013.10.24 big datavisualization
Page 25: 2013.10.24 big datavisualization

5-D Data CubeMonth, Day, Hour, X, Y

X

Y

256

767

512 1023…

Day

Hour

Month

23…

0 1 … 30

0 …

11

1

23…

0…

11

0 1 … 30 0 1 … 30 0

23…

0

11

10

10

12 x 31 x 24 x 512 x 512 = ~2.3 billion cells

Page 26: 2013.10.24 big datavisualization

X

Y

256

767

512 1023…

Day

Hour

Month

23…

0 1 … 30

0 …

11

1

23…

0…

11

0 1 … 30 0 1 … 30 0

23…

0

11

10

10

Brushing JanuaryMonth, Day, Hour, X, Y

31 x 24 x 512 x 512 = ~195 million cells

Page 27: 2013.10.24 big datavisualization
Page 28: 2013.10.24 big datavisualization
Page 29: 2013.10.24 big datavisualization

Multivariate Data Tiles1. Send data, not pixels2. Embed multi-dim data

Page 30: 2013.10.24 big datavisualization

Full 5-D Cube

For any pair of 1D or 2D binned plots, the maximum number of dimensions needed to support brushing & linking is four.

Σ Σ Σ Σ

Page 31: 2013.10.24 big datavisualization

X : 512 bins

Y :

512

bins

Page 32: 2013.10.24 big datavisualization
Page 33: 2013.10.24 big datavisualization
Page 34: 2013.10.24 big datavisualization
Page 35: 2013.10.24 big datavisualization
Page 36: 2013.10.24 big datavisualization
Page 37: 2013.10.24 big datavisualization
Page 38: 2013.10.24 big datavisualization
Page 39: 2013.10.24 big datavisualization
Page 40: 2013.10.24 big datavisualization
Page 41: 2013.10.24 big datavisualization
Page 42: 2013.10.24 big datavisualization
Page 43: 2013.10.24 big datavisualization
Page 44: 2013.10.24 big datavisualization
Page 45: 2013.10.24 big datavisualization
Page 46: 2013.10.24 big datavisualization
Page 47: 2013.10.24 big datavisualization
Page 48: 2013.10.24 big datavisualization

~2.3B bins

~17.6M bins (in 352KB!)

Full 5-D Cube

13 3-D Data Tiles

Σ Σ Σ Σ

Page 49: 2013.10.24 big datavisualization

Query & Render on GPU via WebGL

Pack data tiles as PNG image files,bind to WebGL as image textures.

Page 50: 2013.10.24 big datavisualization

Query & Render on GPU via WebGL

Σ

Invoke program for each output bin.Executes in parallel on GPU.

Page 51: 2013.10.24 big datavisualization

Query & Render on GPU via WebGL

Σ

Page 52: 2013.10.24 big datavisualization

Performance BenchmarksSimulate interaction:brushing & linkingacross binned plots.

- imMens vs. Profiler- 4x4 and 5x5 plots- 10 to 50 bins

Measure time from selection to render.

Test setup:2.3 GHz MacBook Pro (4-core)

NVIDIA GeForce GT 650MGoogle Chrome v.23.0

Page 53: 2013.10.24 big datavisualization

~50fps querying of visualsummaries of 1B data points.

In-Memory Data Cube

imMens

Number of Data Points

5 dimensions x 50 bins/dim x 25 plots

Page 54: 2013.10.24 big datavisualization

[1] Lins et. al. Infovis 2013

[2] Sismanis et. al. SIGMOD 2002

NanoCubes

Page 55: 2013.10.24 big datavisualization

[1] Lins et. al. Infovis 2013

NanoCubes

Page 56: 2013.10.24 big datavisualization

ResourcesimMens vis.stanford.edu/projects/immensTableau Public tableausoftware.com/publicBigVis (R) github.com/hadley/bigvisNanocubes nanocubes.netBlinkDB blinkdb.orgMapD geops.csail.mit.edu/docs/

Page 57: 2013.10.24 big datavisualization

AcknowledgmentsZhicheng “Leo” LiuBiye Jiang

Page 58: 2013.10.24 big datavisualization

Visualizing “Big” DataSean Kandel & Je!rey Heer Trifacta Inc. @trifacta