Top Banner
SC05 November, 2005 [email protected] Supercomputing • Communications • NCAR Scientific Computing Div Desktop techniques for the exploration of terascale size, time-varying data sets John Clyne & Alan Norton Scientific Computing Division National Center for Atmospheric Research Boulder, CO USA
24

SC05 November, 2005 [email protected] Desktop techniques for the exploration of terascale size, time-varying data sets John Clyne & Alan Norton Scientific.

Dec 16, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SC05 November, 2005 clyne@ncar.ucar.edu Desktop techniques for the exploration of terascale size, time-varying data sets John Clyne & Alan Norton Scientific.

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Desktop techniques for the exploration of terascale size, time-varying data sets

John Clyne & Alan Norton

Scientific Computing Division

National Center for Atmospheric Research

Boulder, CO USA

Page 2: SC05 November, 2005 clyne@ncar.ucar.edu Desktop techniques for the exploration of terascale size, time-varying data sets John Clyne & Alan Norton Scientific.

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

National Center for Atmospheric Research

Space Weather Turbulence

Atmospheric ChemistryClimate Weather

The Sun

More than just the atmosphere… from the earth’s oceans to the solar interior

Page 3: SC05 November, 2005 clyne@ncar.ucar.edu Desktop techniques for the exploration of terascale size, time-varying data sets John Clyne & Alan Norton Scientific.

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Goals

1. Improve scientist’s ability to investigate and understand complex phenomena found in high-resolution fluid flow simulations– Accelerate analysis process and improve scientific productivity

– Enable exploration of data sets heretofore impractical due to unwieldy size

– Gain insight into physical processes governing fluid dynamics widely found in the natural world

2. Demonstrate visualization’s ability to aid in day-to-day scientific discovery process

Page 4: SC05 November, 2005 clyne@ncar.ucar.edu Desktop techniques for the exploration of terascale size, time-varying data sets John Clyne & Alan Norton Scientific.

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Problem motivation:Analysis of high resolution numerical turbulence simulations

• Simulations are huge!!– May require months of supercomputer time

– Multi-variate (typically 5 to 8 variables)

– Time-varying data

– A single experiment may yield terabytes of numerical data

• Analysis requirements are formidable– Numerical outputs simulate phenomena not easily observed!!!

– Interesting domain regions (ROIs) may not be known apriori

• Additionally…– Historical focus of computing centers on batch processing

– Dichotomy of batch and interactive processing needs

– Currently available analysis tools inadequate for large data needs• Single threaded, 32bit, in-core algorithms

• Lack advanced visualization capabilities

– Currently available visualization tools ill-suited for analysis

Page 5: SC05 November, 2005 clyne@ncar.ucar.edu Desktop techniques for the exploration of terascale size, time-varying data sets John Clyne & Alan Norton Scientific.

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

[Numerical] models that can currently be run on typical supercomputing platforms produce data in amounts that make storage expensive, movement cumbersome, visualization difficult, and detailed analysis impossible.  The result is a significantly reduced scientific return from the nation's largest computational efforts.

And furthermore…

Page 6: SC05 November, 2005 clyne@ncar.ucar.edu Desktop techniques for the exploration of terascale size, time-varying data sets John Clyne & Alan Norton Scientific.

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

A sampling of various technology performance curves

• Not all technologies advance at same rate!!!

Performance gains from 1980 to present

1

10

100

1000

10000

100000Im

pro

vem

ent

Disk Drive Internal DataRate

Disk Drive InterfaceData RateEthernet NetworkBandwidth

Intel MicroprocessorClock SpeedDrive Capacity

Page 7: SC05 November, 2005 clyne@ncar.ucar.edu Desktop techniques for the exploration of terascale size, time-varying data sets John Clyne & Alan Norton Scientific.

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Example: Compressible plume dynamics

• 504x504x2048

• 5 variables (u,v,w,rho,temp)

• ~500 time steps saved

• 9TBs storage

• Six months compute time required on 112 IBM SP RS/6000 processors

• Three months for post-processing

• Data may be analyzed for several years

M. Rast, 2004. Image courtesy of Joseph Mendoza, NCAR/SCD

Page 8: SC05 November, 2005 clyne@ncar.ucar.edu Desktop techniques for the exploration of terascale size, time-varying data sets John Clyne & Alan Norton Scientific.

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Visualization and Analysis Platform for oceanic, atmospheric, and solar Research (VAPoR)

Key componentsDomain specific

numerically simulated turbulence in the natural sciences

Data processing languageData post processing and quantitative analysis

Advanced visualizationIdentify spatial/temporal ROIs

MultiresolutionEnable speed/quality tradeoffs

This work is funded in part through a U.S. National Science Foundation, Information Technology Research program grant

Combination of visualization with multiresolution data representation that provide sufficient data reduction to enable interactive work on time-varying data

Page 9: SC05 November, 2005 clyne@ncar.ucar.edu Desktop techniques for the exploration of terascale size, time-varying data sets John Clyne & Alan Norton Scientific.

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Page 10: SC05 November, 2005 clyne@ncar.ucar.edu Desktop techniques for the exploration of terascale size, time-varying data sets John Clyne & Alan Norton Scientific.

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Multiresolution Data Representation

• Geometry Reduction (Schroeder et al, 1992; Lindrstrom & Silva, 2001;Shaffer and Garland, 2001)

• Wavelet based progressive data access– Mathematical transforms similar to Fourier

transformations– Invertible and lossless – Numerically efficient forward and inverse transform – No additional storage costs– Permit hierarchical representations of functions– See Clyne, VIIP2003

Transform

(e.g. Iso, cut plane)

Render

geometryData

Source

data Pixels

Analyze & Manipulate

Text, 2D graphics

Visualization Pipeline

Reduce Reduce

• Data reduction (Cignoni, et al 1994; Wilhelms & Van Gelder, 1994; Pascucci & Frank, 2001; Clyne 2003)

Page 11: SC05 November, 2005 clyne@ncar.ucar.edu Desktop techniques for the exploration of terascale size, time-varying data sets John Clyne & Alan Norton Scientific.

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Putting it all together

• Visual data browsing permits rapid identification of features of interest, reducing data domain

• Multiresolution data representation affords a second level of data reduction by permitting speed/quality trade offs enabling rapid hypothesis testing

• Quantitative operators and data processing enable data analysis

• Result: Integrated environment for large-data exploration and discovery

Goal: Avoid unnecessary and expensive full-domain calculations

– Execute on human time scales!!!

Visual data browsing

Datamanipulation

Quantitativeanalysis

Refine

Coarsen

Page 12: SC05 November, 2005 clyne@ncar.ucar.edu Desktop techniques for the exploration of terascale size, time-varying data sets John Clyne & Alan Norton Scientific.

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Compressible Convection

1283 5123M. Rast, 2002

Page 13: SC05 November, 2005 clyne@ncar.ucar.edu Desktop techniques for the exploration of terascale size, time-varying data sets John Clyne & Alan Norton Scientific.

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

504x504x2048

Full

252x252x1024

1/8

126x126x512

1/64

63x63x256

1/512

Compressible plume data set shown at native and progressively coarser resolutions

Compressible plume

Resolution:

Problem size:

Page 14: SC05 November, 2005 clyne@ncar.ucar.edu Desktop techniques for the exploration of terascale size, time-varying data sets John Clyne & Alan Norton Scientific.

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Rendering timings

0.1

1

10

100

1000

Full 1/2 1/4 1/8

Resolution

Tim

e in

se

con

ds

Mdb

Vtk

0.01

0.1

1

10

Full 1/2 1/4 1/8

Resolution

Tim

e in

se

con

ds

Mdb

5123 Compressible Convection 5042x2048 Compressible Plume

Reduced resolution affords responsive interaction while preserving all but finest features

SGI Octane2, 1x600MHz R14k

SGI Origin, 10x600MHz R14k

Interactive!!

Page 15: SC05 November, 2005 clyne@ncar.ucar.edu Desktop techniques for the exploration of terascale size, time-varying data sets John Clyne & Alan Norton Scientific.

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Derived quantities

p: pressure

: density

T: temperature

: ionization potential

: Avogadro’s number

me: electron mass

k: Boltzmann’s constant

h: Planck’s constant

(1) Tp

(2)

2323

2

2

2

1kTe e

N

T

h

km

y

y

(3)22 u

Derived quantities produced from the simulation’s field variables as a post-process

Page 16: SC05 November, 2005 clyne@ncar.ucar.edu Desktop techniques for the exploration of terascale size, time-varying data sets John Clyne & Alan Norton Scientific.

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Calculation timings for derived quantities

0.01

0.1

1

10

100

1000

10000

Full 1/2 1/4 1/8

Resolution

Tim

e in

Se

co

nd

s

pressure (eq 1)

ionization (eq 2)

enstrophy (eq 3)

Note: 1/2th resolution is 1/8th problem size, etc

Deriving new quantities on interactive time scales only possible with data reduction

SGI Origin, 10x600MHz R14k

Page 17: SC05 November, 2005 clyne@ncar.ucar.edu Desktop techniques for the exploration of terascale size, time-varying data sets John Clyne & Alan Norton Scientific.

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Error in approximations

• Error is highly dependent on operation performed

• Algebraic operations tested introduced low error even after substantial coarsening

• Error grows rapidly for gradient calculation

• Point-wise error gives no indication of global (average) error

Point-wise, normalized, Point-wise, normalized, maximum, absolute errormaximum, absolute error

i

iii

s

ss

,ˆmax

Resolution P

Eq 1

Y

Eq 2

2

Eq 3

Full 0 0 0

1/2 1.09 0.03 85.57

1/4 2.53 0.14 97.3

1/8 3.79 0.65 99.8

Page 18: SC05 November, 2005 clyne@ncar.ucar.edu Desktop techniques for the exploration of terascale size, time-varying data sets John Clyne & Alan Norton Scientific.

Integrated visualization and analysis on interactively selected subdomains:

u

2ur

pg

z

1 pr

1 pr

2ur

z

Vertical vorticity of the flow

Mach number of the vertical velocityFull domain seen from above Subdomain from side

Full domain seen from above Subdomain from side

Efficient analysis requires rapid calculation and visualization of unanticipated derived quantities. This can be facilitated by a combination of subdomain selection and resolution reduction.

Page 19: SC05 November, 2005 clyne@ncar.ucar.edu Desktop techniques for the exploration of terascale size, time-varying data sets John Clyne & Alan Norton Scientific.

A test of multiresolution analysis: Force balance in supersonic downflows

Sites of supersonic downflow are also those of very high vertical vorticity. The core of the vortex tubes are evacuated, with centripetal acceleration balancing that due to the inward directed pressure gradient. Buoyancy forces are maximum on the tube periphery due to mass flux convergence.

The same interpretation results from analysis at half resolution.

1 pr

u

2ur

pg

z

1 pr

2ur

z

u

2ur

pg

z

1 pr

1 pr

2ur

z

Full

Half

Resolution

Subdomain selection and reduced resolution together yield data reduction by a factor of 128

Page 20: SC05 November, 2005 clyne@ncar.ucar.edu Desktop techniques for the exploration of terascale size, time-varying data sets John Clyne & Alan Norton Scientific.

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Summary

• Presented prototype, integrated analysis environment aimed at aid investigation of high-resolution numerical fluid flow simulations

• Orders of magnitude data reduction achieved through:1. Visualization: Reduce full domain to ROI

2. Multiresolution: Enable speed/quality trade-offs

• Coarsened data frequently suitable for rapid hypothesis testing that may later be verified at full resolution

Page 21: SC05 November, 2005 clyne@ncar.ucar.edu Desktop techniques for the exploration of terascale size, time-varying data sets John Clyne & Alan Norton Scientific.

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Future work

• Quantify and predict error in results obtained with various mathematical operations applied to coarsened data

• Investigate lossy and lossless data compression

• Add support for less regular meshes

• Explore other scientific domains – Climate, weather, atmospheric chemistry,…

Page 22: SC05 November, 2005 clyne@ncar.ucar.edu Desktop techniques for the exploration of terascale size, time-varying data sets John Clyne & Alan Norton Scientific.

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Future???

Original 20:1 Lossy Compression

Page 23: SC05 November, 2005 clyne@ncar.ucar.edu Desktop techniques for the exploration of terascale size, time-varying data sets John Clyne & Alan Norton Scientific.

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Acknowledgements

• Steering Committee– Nic Brummell - CU, JILA

– Aimé Fournier – NCAR, IMAGe

– Helene Politano - Observatoire de la Cote d'Azur

– Pablo Mininni, NCAR, IMAGe

– Yannick Ponty - Observatoire de la Cote d'Azur

– Annick Pouquet - NCAR, ESSL

– Mark Rast - NCAR, HAO

– Duane Rosenberg - NCAR, IMAGe

– Matthias Rempel - NCAR, HAO

– Yuhong Fan - NCAR, HAO

• Developers– Alan Norton – NCAR, SCD

– John Clyne – NCAR, SCD

• Research Collaborators– Kwan-Liu Ma, U.C. Davis

– Hiroshi Akiba, U.C. Davis

– Han-Wei Shen, Ohio State

– Liya Li, Ohio State

• Systems Support– Joey Mendoza, NCAR, SCD

Page 24: SC05 November, 2005 clyne@ncar.ucar.edu Desktop techniques for the exploration of terascale size, time-varying data sets John Clyne & Alan Norton Scientific.

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Questions???

http://www.scd.ucar.edu/hss/dasg/software/vapor