Top Banner
Introduction to Research 2011 Introduction to Research 2011 Ashok Srinivasan Florida State University www.cs.fsu.edu/~asriniva Images from ORNL, IBM, NVIDIA Part of the machine room at ORNL The Cell processor powers the Roadrunner at LANL NVIDIA GPUs power Tianhe-1A in China
19

Introduction to Research 2011

Jan 19, 2016

Download

Documents

aysel

Introduction to Research 2011. Ashok Srinivasan Florida State University www.cs.fsu.edu/~asriniva. Part of the machine room at ORNL. The Cell processor powers the Roadrunner at LANL. NVIDIA GPUs power Tianhe-1A in China. Images from ORNL, IBM, NVIDIA. Outline. Research - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to Research 2011

Introduction to Research 2011Introduction to Research 2011

Ashok Srinivasan

Florida State University

www.cs.fsu.edu/~asriniva

Ashok Srinivasan

Florida State University

www.cs.fsu.edu/~asriniva

Images from ORNL, IBM, NVIDIAImages from ORNL, IBM, NVIDIA

Part of the machine room at ORNLPart of the machine room at ORNL

The Cell processor powers the Roadrunner at LANL

The Cell processor powers the Roadrunner at LANL

NVIDIA GPUs power Tianhe-1A in China

NVIDIA GPUs power Tianhe-1A in China

Page 2: Introduction to Research 2011

OutlineOutline

Research High Performance Computing Applications and Software

Multicore processors Massively parallel processors Computational nanotechnology Simulation-based policy making

Potential Research Topics

Page 3: Introduction to Research 2011

Research AreasResearch Areas

High Performance Computing, Applications in Computational Sciences, Scalable Algorithms, Mathematical Software

Current topics: Computational Nanotechnology, HPC on Multicore Processors, Massively Parallel Applications

New Topics: Simulation-based policy analysis

Old Topics: Computational Finance, Parallel Random Number Generation, Monte Carlo Linear Algebra, Computational Fluid Dynamics, Image Compression

Page 4: Introduction to Research 2011

Importance of SupercomputingImportance of Supercomputing

Fundamental scientific understanding Nano-materials, drug design

Solution of bigger problems Climate modeling

More accurate solutions Automobile crash tests

Solutions with time constraints Disaster mitigation

Study of complex interactions for policy decisions Urban planning

Page 5: Introduction to Research 2011

Some ApplicationsSome Applications

Increasing relevance to industry In 1993, fewer than 30% of top 500 supercomputers were

commercial, now, 57% are commercial A variety of application areas

Commercial Finance and insurance Medicine Aerospace and Automobiles Telecom Oil exploration Shoes! (Nike) Potato chips! Toys!

Scientific Weather prediction Earthquake modeling Epidemic modeling Materials Energy Computational biology Astro-physics

Page 6: Introduction to Research 2011

Supercomputing PowerSupercomputing Power

The amount of parallelism too is increasing, with the high end having over 200,000 cores

The amount of parallelism too is increasing, with the high end having over 200,000 cores

Page 7: Introduction to Research 2011

Geographic DistributionGeographic Distribution

North America has over half the top 500 systems However, Europe and East Asia too have a significant

share China is determined to be a supercomputing

superpower Two of its national supercomputing centers have top-five

supercomputers

Japan has the top machine and two in the top five Planning a $ 1.3 billion exascale supercomputer in 2020

Page 8: Introduction to Research 2011

Asian Supercomputing TrendsAsian Supercomputing Trends

Page 9: Introduction to Research 2011

Challenges in SupercomputingChallenges in Supercomputing

Hardware can be obtained with enough money But obtaining good performance on large systems is difficult

Some DOE applications ran at 1% efficiency on 10,000 cores They will have to deal with a million threads soon, and with a

billion at the exa-scale Don’t think of supercomputing as a means of solving current

problems faster, but as a means of solving problems we earlier thought we could not solve

Development of software tools to make use of the machines easier

Page 10: Introduction to Research 2011

Architectural TrendsArchitectural Trends

Massive parallelism 10K processor systems will be commonplace Large end already has over 500K processors

Single chip multiprocessing All processors will be multicore Heterogeneous multicore processors

Cell used in the PS3 GPGPU 80-core processor from Intel Processors with hundreds of cores are already commercially

available

Distributed environments, such as the Grid But it is hard to get good performance on these

systems

Page 11: Introduction to Research 2011

Accelerating Applications with GPUsAccelerating Applications with GPUs

Over a hundred cores per GPU Hide memory latency with thousands of threads Can accelerate a traditional computer to a teraflop GPU cluster at FSU

Quantum Monte Carlo applications Algorithms

Linear algebra, FFT, compression, etc

Page 12: Introduction to Research 2011

Small Discrete Fourier Transforms Small Discrete Fourier Transforms (DFT) on GPUs(DFT) on GPUs

GPUs are effective for large DFTs, but not small DFTs However, they can be effective for a large number of small DFTs

Useful for AFQMC

We use the asymptotically slow matrix-multiplication based DFT for very small sizes

We combine it with mixed-radix for larger sizes

We use asynchronous memory transfer to deal with host-device data transfer overhead

Page 13: Introduction to Research 2011

Comparison of DFT PerformanceComparison of DFT Performance

Comparison of 512 simultaneous DFTs without host-device data transfer

2-D DFTs

3-D DFTs

Page 14: Introduction to Research 2011

Petascale Quantum Monte CarloPetascale Quantum Monte Carlo

Originally a DOE funded project involving collaboration between ORNL, UIUC, Cornell, UTK, CWM, and NCSU

Now funded by ORAU/ORNL

Scale Quantum Monte Carlo applications to petascale (one million gigaflops) machines Load balancing, fault tolerance, other optimizations

Page 15: Introduction to Research 2011

Load BalancingLoad Balancing

In current implementations, such as QWalk and QMCPack, cores send excess walkers to cores with fewer walkers

In the new algorithm (alias method), cores may send more than their excess, and receive walkers even if they originally had an excess Load can be balanced with each core receiving from at most

one other core

Also optimal in maximum number of walkers received

Total number of walkers sent may be twice the optimal

Page 16: Introduction to Research 2011

Performance ComparisonPerformance Comparison

Mean number of walkers migrated

Maximum number of receives

Comparisons with QWalk

Page 17: Introduction to Research 2011

Process-Node AffinityProcess-Node Affinity

Node allocation is not necessarily ideal for minimizing communication

Process-node affinity can, therefore, be important

Allocated nodes for a 12,000 core run on Jaguar

Page 18: Introduction to Research 2011

Load Balancing with AffinityLoad Balancing with Affinity

Renumbering the nodes improves load balancing and AllGather time

Basic load balancing Load balancing after renumbering

Results on Jaguar

Page 19: Introduction to Research 2011

Potential Research TopicsPotential Research Topics

High Performance Computing on Multicore Processors Algorithms, Applications, and libraries on GPUs

Applications on Massively Parallel Processors Quantum Monte Carlo applications Load balancing and communication optimizations

Simulation-based policy decisions Combine scientific computing with models of social interactions

to help make policy decisions