Top Banner
SDVIS AND IN-SITU VISUALIZATION ON TACC’S STAMPEDE-KNL Paul A. Navrátil, Ph.D. Manager – Scalable Visualization Technologies [email protected] 1
37

SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

Jun 25, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

SDVIS AND IN-SITU VISUALIZATIONON TACC’S STAMPEDE-KNL

Paul A. Navrátil, Ph.D.Manager – Scalable Visualization Technologies

[email protected]

1

Page 2: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

2High-Fidelity Visualization Natively on Xeon and Xeon Phi

Page 3: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

OUTLINE� Stampede Architecture

� Stampede – Sandy Bridge� Stampede - KNL� Stampede 2 – KNL + Skylake

� Software-Defined Visualization Stack� VNC� OpenSWR� OSPRay

� Path to In-Situ� ParaView Catalyst� VisIt Libsim� Direct renderer integration

3

Page 4: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

4

STAMPEDE ARCHITECTURE

Page 5: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

STAMPEDE SANDY BRIDGE

� Mellanox FDR Interconnect

� 6400 compute nodes, each with:� 2x Intel Xeon E5-2680

“Sandy Bridge”

� 1x Intel Xeon Phi SE10P

� 32 GB RAM / 8 GB Phi RAM

� 16 Large Memory nodes, each with:� 4x Intel Xeon E5-4650 “Sandy Bridge”

� 2x NVIDIA Quadro 2000 “Fermi”

� 1 TB RAM

� 128 GPU nodes, each with:� 2x Intel Xeon E5-2680

� 1x Intel Xeon Phi SE10P

� 1x NVIDIA Tesla K20 “Kepler”

� 32 GB RAM

5

Page 6: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

STAMPEDE KNL

� Deployed in 2016 as planned upgrade to Stampede

� First KNL-based system in Top500� Intel OmniPath interconnect� 508 nodes, each with:

� 1x Intel Xeon Phi 7250� 96 GB RAM + 16 GB MCDRAM

� Notes:� Shared $WORK and $SCRATCH

Separate $HOME directories� Separate Login Node

login-knl1.stampede.tacc.utexas.edu

� Login is Intel Xeon E5-2695 “Haswell”

� Compile on compute nodeor use “–xMIC-AVX512” on login

� “normal” and ”development” queues are Cache-Quadrant

� Other MCDRAM configs available by queue name

6

Page 7: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

7

Page 8: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

STAMPEDE 2 (COMING 2017)

� ~18 PF Dell Intel Xeon + Intel Xeon Phi system

� Combine KNL + Skylake + OmniPath + 3D XPoint

� Phase 1: Spring 2017� Stampede KNL + 4200 new KNL nodes + new filesystem

� 60% of Stampede Sandy Bridge to remain operational during this phase

� Phase 2: Fall 2017� 1736 Intel Skylake nodes

� Phase 3: Spring 2018� Add 3D XPoint memory to subset of nodes

8

Page 9: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

KEY ARCHITECTURAL TAKE-AWAY

� Current and near-future cyberinfrastructure will use processors with many cores� Each core contains wide vector units: use them for max utilization (e.g., AVX512)

� Fortunately the Software-Defined Visualization stack is optimized for such processors!

� Use your preferred rendering method independent of the underlying hardware� Performant rasterization

� Performant ray tracing

� Visualization and analysis on the simulation machine

9

Page 10: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

SOFTWARE-DEFINED VISUALIZATION – WHY?

10

FILE SIZE100GBPS

10 GBPS 1 GBPS 300 MBPS 54 MBPS

1 GB < 1 sec 1 sec 10 sec 35 sec 2.5 min

1 TB ~100 sec ~17 min ~3 hours ~10 hours ~43 hours

1 PB ~1 day ~12 days ~121 days >1 year ~5 years

Increasingly Difficult to Move Data from Simulation Machine

Page 11: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

11

SOFTWARE-DEFINED VISUALIZATION

Page 12: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

SOFTWARE-DEFINED VISUALIZATION – WHY?

12

Page 13: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

SOFTWARE-DEFINED VISUALIZATION – WHY?

13

Page 14: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

SOFTWARE-DEFINED VISUALIZATION – WHY?

14

Page 15: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

SOFTWARE-DEFINED VISUALIZATION – WHY?

15

Page 16: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

SOFTWARE-DEFINED VISUALIZATION STACK� OpenSWR Software Rasterizer

� openswr.org

� Performant rasterization for Xeon and Xeon Phi

� Thread-parallel vector processing (previous parallel Mesa3D only has threaded fragments)

� Support for wide vector instruction sets, particularly AVX2 (and soon AVX512)

� Integrated into Mesa3D 12.0 as optional driver (mesa3d.org)

� Best Uses� Lines

� Graphs

� User Interfaces

16

Page 17: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

SOFTWARE-DEFINED VISUALIZATION STACK

� OSPRay Ray Tracer� ospray.org

� Performant ray tracing for Xeon and Xeon Phi incorporating Embree kernels

� Thread- and wide-vector parallel using Intel ISPC (including AVX512 support)

� Parallel rendering support via distributed framebuffer

� Best Uses� Photorealistic rendering

� Realistic lighting

� Realistic material effects

� Large geometry

� Implicit geometry (e.g., molecular ”ball and stick” models)

17

Page 18: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

SOFTWARE-DEFINED VISUALIZATION STACK

� GraviT Scheduling Framework� tacc.github.io/GraviT/

� Large-scale, data-distributed ray tracing (uses OSPRay for rendering engine target)

� Parallel rendering support via distributed ray scheduling

� Best Uses� Large distrubted data

� Data outside of renderer control

� Incoherent ray-intensive sampling (e.g., global illumination approximations)

18

Page 19: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

OSPRAY TEST SUITE – SAMPLE IMAGES

19

Test 0 Test 1 Test 4

Test 2 Test 3 Test 5

Test 6Test 7 Test 8

Page 20: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

OSPRAY TEST SUITE – MCDRAM PERFORMANCE RESULTS

20

Page 21: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

PARAVIEW TEST SUITE – MANYSPHERES

21

Page 22: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

22

Likely VNC limited

Page 23: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

23

Likely VNC limited

Page 24: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

24

Definitely VNC limited!

Page 25: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

FIU CORE SAMPLE – SAMPLE IMAGE

25

Page 26: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

26

Likely VNC limited

Page 27: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

27

Likely VNC limited

Page 28: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

28

Likely hitting VNC desktop limits

Definitely VNC limited!

Page 29: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

29

PATH TO IN-SITU VISUALIZATION

Page 30: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

WHY IN-SITU VISUALIZATION?

� Processors (like KNL) enabling larger, more detailed simulations

� File system technologies not scaling at same rate (if at all….)

� Touching disk is expensive:� During simulation: time checkpointing is (often) not time computing

� During analysis: loading the data is (often) the overwhelming majority of runtime

� In-situ capabilities overcome this data bottleneck� Render directly from resident simulation data

� Tightly coupled vis opens doors for online analysis, computational steering, etc

30

Page 31: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

CURRENT IN-SITU OPTIONS

� Simulation developer� Implement visualization API (ParaView Catalyst, VisIt libsim, VTK)

� Implement data framework (ADIOS, etc)

� Implement direct rendering calls (OSPRay API, etc)

� Simulation user� Hope the developers do one of the above J

� Do one of the above yourself L

� Hope technology keeps post-hoc analysis viable (3D XPoint NVRAM might help)

31

Page 32: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

IN-SITU VISUALIZATION APIS

� ParaView Catalyst (and Cinema)(www.paraview.org/in-situ/)

� VisIt Libsim(www.visitusers.org/index.php?title=Libsim_Batch)

� Direct VTK integration(www.vtk.org)

� Visualization ops already implemented

� Need coordination b/t teams to ensure simulation and vis performance

32

Image courtesy of Kitware Inc.

Page 33: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

IN-SITU-COMPATIBLE DATA FRAMEWORKS

� ADIOS – https://www.olcf.ornl.gov/center-projects/adios/

� Damaris – https://hal.inria.fr/hal-00859603/en

� DIY – http://www.mcs.anl.gov/~tpeterka/software.html

� GLEAN – http://www.mcs.anl.gov/project/glean-situ-visualization-and-analysis

� SCIRun – http://www.sci.utah.edu/cibc-software/scirun.html

� (Possibly) more invasive implementation effort

� (Possibly) broader benefits beyond visualization (framework now controls data)

� Requires engagement from simulation team to ensure performance and accuracy

33

Page 34: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

IN-SITU DIRECT RENDERING

� Render directly within simulation using API (e.g., OSPRay, OpenGL, etc)

� Must implement visualization operations within simulation code� Lightest weight, lowest overhead

� Requires visualization algorithm knowledge for efficient implementation

� Locks in particular rendering and visualization modes

34

Page 35: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

IN-SITU FUTURE?

35

Useful perspectives at ISAV – http://conferences.computer.org/isav/2016/

Page 36: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

TACC/KITWARE IPCC – UNIMPEDED IN SITUVISUALIZATION ON INTEL XEON AND INTEL XEON PHI

� Optimize ParaView Catalyst for current and near-future Intel architectures� KNL, Skylake, Omnipath, 3D XPoint NVRAM

� Use Stampede-KNL as testbed to target TACC Stampede 2, NERSC Cori, LANL Trinity

� Focus on data and rendering paths for OpenSWR and OSPRay� Parallelize VTK data processing filters

� Catalyst integration with simulation

� Targeted algorithm improvements � Increase processor and memory utilization

36

Page 37: SDVIS AND IN-SITU VISUALIZATION TACC’STAMPEDE-KNL · 2x Intel Xeon E5 -2680 “Sandy Bridge” 1x Intel Xeon Phi SE10P 32 GB RAM / 8 GB Phi RAM 16 Large Memory nodes, each with:

THANK [email protected]

37