Top Banner
09/22/2008 CS4960 1 CS4960: Parallel Programming Guest Lecture: Parallel Programming for Scientific Computing Mary Hall September 22, 2008
20

09/22/2008CS49601 CS4960: Parallel Programming Guest Lecture: Parallel Programming for Scientific Computing Mary Hall September 22, 2008.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 09/22/2008CS49601 CS4960: Parallel Programming Guest Lecture: Parallel Programming for Scientific Computing Mary Hall September 22, 2008.

09/22/2008 CS4960 1

CS4960:Parallel Programming

Guest Lecture: Parallel Programming for

Scientific Computing

Mary HallSeptember 22, 2008

Page 2: 09/22/2008CS49601 CS4960: Parallel Programming Guest Lecture: Parallel Programming for Scientific Computing Mary Hall September 22, 2008.

09/22/2008 CS4960 2

Outline

•Introduction

•The fastest computer in the world today

•Large-scale scientific simulations

•Why writing fast parallel programs is hard

•New parallel programming languages

Material for this lecture provided by: Kathy Yelick and Jim Demmel, UC Berkeley Brad Chamberlain, Cray

Page 3: 09/22/2008CS49601 CS4960: Parallel Programming Guest Lecture: Parallel Programming for Scientific Computing Mary Hall September 22, 2008.

09/22/2008 CS4960 3

Parallel and Distributed Computing

•Limited to supercomputers?- No! Everywhere!

•Scientific applications?- These are still important, but also many new

commercial applications and new consumer applications are going to emerge.

•Programming tools adequate and established?- No! Many new research challenges

My Research Area

Page 4: 09/22/2008CS49601 CS4960: Parallel Programming Guest Lecture: Parallel Programming for Scientific Computing Mary Hall September 22, 2008.

09/22/2008 CS4960 4

Why is This Course Important? Why now?•We are seeing a convergence of high-end, conventional and embedded computing

- On-chip architectures look like parallel computers

- Languages, software development and compilation strategies originally developed for high end (supercomputers) are now becoming important for many other domains

•Why?- Technology trends

•Looking to the future- Parallel computing for the masses demands better

parallel programming paradigms

- And more people who are trained in writing parallel programs (you!)

- How to put all these vast machine resources to the best use!

Page 5: 09/22/2008CS49601 CS4960: Parallel Programming Guest Lecture: Parallel Programming for Scientific Computing Mary Hall September 22, 2008.

09/22/2008 CS4960 5

The fastest computer in the world today

•What is its name?

•Where is it located?

•How many processors does it have?

•What kind of processors?

•How fast is it?

RoadRunner

Los Alamos National Laboratory

18,802 processor chips(~123,284 “processors”)

AMD Opterons and IBM Cell/BE (in Playstations)

1.026 Petaflop/secondOne quadrilion operations/s1 x 1016

See http://www.top500.org

Page 6: 09/22/2008CS49601 CS4960: Parallel Programming Guest Lecture: Parallel Programming for Scientific Computing Mary Hall September 22, 2008.

09/22/2008 CS4960 6

Scientific Simulation: The Third Pillar of Science

• Traditional scientific and engineering paradigm:

1)Do theory or paper design.

2)Perform experiments or build system.

• Limitations:- Too difficult -- build large wind tunnels.

- Too expensive -- build a throw-away passenger jet.

- Too slow -- wait for climate or galactic evolution.

- Too dangerous -- weapons, drug design, climate experimentation.

• Computational science paradigm:

3)Use high performance computer systems to simulate the phenomenon- Base on known physical laws and efficient numerical

methods.

Page 7: 09/22/2008CS49601 CS4960: Parallel Programming Guest Lecture: Parallel Programming for Scientific Computing Mary Hall September 22, 2008.

09/22/2008 CS4960 7

Some Particularly Challenging Computations•Science

- Global climate modeling- Biology: genomics; protein folding; drug design- Astrophysical modeling- Computational Chemistry- Computational Material Sciences and Nanosciences

•Engineering- Semiconductor design- Earthquake and structural modeling- Computation fluid dynamics (airplane design)- Combustion (engine design)- Crash simulation

•Business- Financial and economic modeling- Transaction processing, web services and search engines

•Defense- Nuclear weapons -- test by simulations- Cryptography

Page 8: 09/22/2008CS49601 CS4960: Parallel Programming Guest Lecture: Parallel Programming for Scientific Computing Mary Hall September 22, 2008.

09/22/2008 CS4960 8

Example: Global Climate Modeling Problem•Problem is to compute:

f(latitude, longitude, elevation, time)

temperature, pressure, humidity, wind velocity

• Approach:- Discretize the domain, e.g., a measurement point every

10 km

- Devise an algorithm to predict weather at time t+t given t• Uses:

- Predict major events, e.g., El Nino

- Use in setting air emissions standards

Source: http://www.epm.ornl.gov/chammp/chammp.html

Page 9: 09/22/2008CS49601 CS4960: Parallel Programming Guest Lecture: Parallel Programming for Scientific Computing Mary Hall September 22, 2008.

High Resolution Climate Modeling on NERSC-3 – P. Duffy,

et al., LLNL

Page 10: 09/22/2008CS49601 CS4960: Parallel Programming Guest Lecture: Parallel Programming for Scientific Computing Mary Hall September 22, 2008.

09/22/2008 CS4960 10

Some Characteristics of Scientific Simulation

•Discretize physical or conceptual space into a grid

- Simpler if regular, may be more representative if adaptive

•Perform local computations on grid- Given yesterday’s temperature and weather pattern,

what is today’s expected temperature?

•Communicate partial results between grids- Contribute local weather result to understand global

weather pattern.

•Repeat for a set of time steps

•Possibly perform other calculations with results- Given weather model, what area should evacuate for a

hurricane?

Page 11: 09/22/2008CS49601 CS4960: Parallel Programming Guest Lecture: Parallel Programming for Scientific Computing Mary Hall September 22, 2008.

09/22/2008 CS4960 11

More Examples: Parallel Computing in Data Analysis•Finding information amidst large quantities of data

•General themes of sifting through large, unstructured data sets:

- Has there been an outbreak of some medical condition in a community?

- Which doctors are most likely involved in fraudulent charging to medicare?

- When should white socks go on sale?

- What advertisements should be sent to you?

•Data collected and stored at enormous speeds (Gbyte/hour)

- remote sensor on a satellite

- telescope scanning the skies

- microarrays generating gene expression data

- scientific simulations generating terabytes of data

- NSA analysis of telecommunications

Page 12: 09/22/2008CS49601 CS4960: Parallel Programming Guest Lecture: Parallel Programming for Scientific Computing Mary Hall September 22, 2008.

09/22/2008 CS4960 12

Why writing (fast) parallel programs is hard

Page 13: 09/22/2008CS49601 CS4960: Parallel Programming Guest Lecture: Parallel Programming for Scientific Computing Mary Hall September 22, 2008.

09/22/2008 CS4960 13

Parallel Programming Complexity

An Analogy to Preparing Thanksgiving Dinner

• Enough parallelism? (Amdahl’s Law)- Suppose you want to just serve turkey

• Granularity- How frequently must each assistant report to the chef

- After each stroke of a knife? Each step of a recipe? Each dish completed?

• Locality- Grab the spices one at a time? Or collect ones that are

needed prior to starting a dish?

• Load balance- Each assistant gets a dish? Preparing stuffing vs. cooking

green beans?

• Coordination and Synchronization- Person chopping onions for stuffing can also supply green

beans

- Start pie after turkey is out of the oven

All of these things makes parallel programming even harder than sequential programming.

Page 14: 09/22/2008CS49601 CS4960: Parallel Programming Guest Lecture: Parallel Programming for Scientific Computing Mary Hall September 22, 2008.

09/22/2008 CS4960 14

Finding Enough Parallelism•Suppose only part of an application seems

parallel

•Amdahl’s law- let s be the fraction of work done sequentially, so

(1-s) is fraction parallelizable

- P = number of processorsSpeedup(P) = Time(1)/Time(P)

<= 1/(s + (1-s)/P)

<= 1/s

•Even if the parallel part speeds up perfectly performance is limited by the sequential part

Page 15: 09/22/2008CS49601 CS4960: Parallel Programming Guest Lecture: Parallel Programming for Scientific Computing Mary Hall September 22, 2008.

09/22/2008 CS4960 15

Overhead of Parallelism•Given enough parallel work, this is the biggest

barrier to getting desired speedup

•Parallelism overheads include:- cost of starting a thread or process

- cost of communicating shared data

- cost of synchronizing

- extra (redundant) computation

•Each of these can be in the range of milliseconds (=millions of flops) on some systems

•Tradeoff: Algorithm needs sufficiently large units of work to run fast in parallel (I.e. large granularity), but not so large that there is not enough parallel work

Page 16: 09/22/2008CS49601 CS4960: Parallel Programming Guest Lecture: Parallel Programming for Scientific Computing Mary Hall September 22, 2008.

09/22/2008 CS4960

Locality and Parallelism

• Large memories are slow, fast memories are small

• Storage hierarchies are large and fast on average

• Parallel processors, collectively, have large, fast cache- the slow accesses to “remote” data we call “communication”

• Algorithm should do most work on local data

ProcCache

L2 Cache

L3 Cache

Memory

Conventional Storage Hierarchy

ProcCache

L2 Cache

L3 Cache

Memory

ProcCache

L2 Cache

L3 Cache

Memory

potentialinterconnects

Page 17: 09/22/2008CS49601 CS4960: Parallel Programming Guest Lecture: Parallel Programming for Scientific Computing Mary Hall September 22, 2008.

09/22/2008 CS4960 17

Load Imbalance•Load imbalance is the time that some processors

in the system are idle due to- insufficient parallelism (during that phase)

- unequal size tasks

•Examples of the latter- adapting to “interesting parts of a domain”

- tree-structured computations

- fundamentally unstructured problems

•Algorithm needs to balance load

Page 18: 09/22/2008CS49601 CS4960: Parallel Programming Guest Lecture: Parallel Programming for Scientific Computing Mary Hall September 22, 2008.

09/22/2008 CS4960 18

New Parallel Programming Languages for Scientific

Computing

Page 19: 09/22/2008CS49601 CS4960: Parallel Programming Guest Lecture: Parallel Programming for Scientific Computing Mary Hall September 22, 2008.

09/22/2008 CS4960 19

A Brief Look at Chapel (Cray)•History:

- Starting in 2002, the Defense Advanced Research Projects Agency (DARPA) funded 5 industry players to investigate the new revolutionary, commercially-viable high-end computer system of 2010

- Now, two teams (IBM and Cray) are still building systems to be deployed next year

- Both have introduced new languages, along with Sun- Chapel (Cray)

- X10 (IBM)

- Fortress (Sun)

•We will look at Chapel for the next few slides

Page 20: 09/22/2008CS49601 CS4960: Parallel Programming Guest Lecture: Parallel Programming for Scientific Computing Mary Hall September 22, 2008.

09/22/2008 CS4960 20

Summary of Lecture•Scientific simulation discretizes some space into a

grid- Perform local computations on grid

- Communicate partial results between grids

- Repeat for a set of time steps

- Possibly perform other calculations with results

•Writing fast parallel programs is difficult- Amdahl’s Law Must parallelize most of computation

- Data Locality

- Communication and Synchronization

- Load Imbalance

•Challenge for new productive parallel programming languages

- Express data partitioning and parallelism at a high leve

- Still obtain high performance!