Top Banner
1 HOPL 2007 Kathy Yelick Parallel Languages: Past, Present and Future Katherine Yelick U.C. Berkeley and Lawrence Berkeley National Lab
10

Parallel Languages: Past, Present and Future

Oct 16, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Parallel Languages: Past, Present and Future

1HOPL 2007 Kathy Yelick

Parallel Languages: Past, Presentand Future

Katherine Yelick

U.C. Berkeley and Lawrence Berkeley National Lab

Page 2: Parallel Languages: Past, Present and Future

Kathy Yelick, 2HOPL 2007

Internal Outline• Two components: control and data

(communication/sharing)• One key question: how much to virtualize, i.e., hide

machine?• Tradeoff: hiding improves programmability (productivity), portability,

while exposing gives programmers control to improve performance• Important of machine trends

• Future partitioned vs. cc shared• Transactions will save us

• PGAS: what is it? What about OpenMP?• Looking ahead towards multicore: these are not SMPs. Partitioned vs

cc shared memory• What works for performance: nothing virtualized *at

runtime*, except Charm++• Open problem: load balancing with locality

Page 3: Parallel Languages: Past, Present and Future

Kathy Yelick, 3HOPL 2007

Two Parallel Language Questions• What is the parallel control model?

• What is the model for sharing/communication?

implied synchronization for message passing, not shared memory

data parallel(singe thread of control)

dynamicthreads

single programmultiple data (SPMD)

shared memoryload

storesend

receive

message passing

Page 4: Parallel Languages: Past, Present and Future

Kathy Yelick, 4HOPL 2007

vector machines

distributed memory machines

DSM

Page 5: Parallel Languages: Past, Present and Future

Kathy Yelick, 5HOPL 2007

100

1000

10000

100000

1E+06

1E+07

1E+08

1E+09

1E+10

1E+11

1E+12

1993 1996 1999 2002 2005 2008 2011 2014

SUM

#1

#500

Desktop

Expon.(Desktop)

Petaflop Desktop By 2026 ?1Eflop/s

100 Pflop/s

10 Pflop/s

1 Pflop/s

100 Tflop/s

10 Tflops/s

1 Tflop/s

100 Gflop/s

10 Gflop/s

1 Gflop/s

10 MFlop/s

1 PFlop system (100K cores?)

Slide source Horst Simon, LBNL

6-8 years

8-10 years

Page 6: Parallel Languages: Past, Present and Future

Kathy Yelick, 6HOPL 2007

HPC Programming: Where are We?• BG/L at LLNL has 64K processor cores

• There were 68K transistors in the MC68000• A BG/Q system with 1.5M processors may have more

processors than there are logic gates per processor• Trend towards simpler cores, but more of them

• HPC Applications developers write programs that are ascomplex as describing where every single bit must movebetween the transistors in the MC68000

• We need to at least get to “assembly language” level

Slide source: Horst Simon and John Shalf, LBNL/NERSC

Page 7: Parallel Languages: Past, Present and Future

Kathy Yelick, 7HOPL 2007

A Brief History of Languages• When vector machines were king

• Parallel “languages” were loop annotations (IVDEP)• Performance was fragile, but there was good user support

• When SIMD machines were king• Data parallel languages popular and successful (CMF, *Lisp, C*, …)• Quite powerful: can handle irregular data (sparse mat-vec multiply)• Irregular computation is less clear (multi-physics, adaptive meshes,

backtracking search, sparse matrix factorization)• When shared memory multiprocessors (SMPs) were king

• Shared memory models, e.g., OpenMP, Posix Threads, are popular• When clusters took over

• Message Passing (MPI) became dominant

We are at the mercy of hardware, but we’ll take the blame.

Page 8: Parallel Languages: Past, Present and Future

Kathy Yelick, 8HOPL 2007

Partitioned Global Address Space Languages• Global address space: any thread may directly read/write

data allocated by another shared memory semantics• Partitioned: data is designated local/remote message

passing performance model

parti

tione

d gl

obal

addr

ess

spac

e

x: 1y:

l: l: l:

g: g: g:

x: 5y:

x: 7y: 0

p0 p1 pn

• 3 older languages: UPC, CAF, and Titanium• All three use an SPMD execution model• Success: in current NSF PetaApps RFP, procurements, etc• Why: Portable (multiple compilers, including source-to-source); Simple

compiler / runtime; Performance sometimes better than MPI• 3 newer HPCS languages: X10, Fortress, and Chapel

• All three use a dynamic parallelism model with data parallel constructsChallenge: improvement over past models that are just large enough

Page 9: Parallel Languages: Past, Present and Future

Kathy Yelick, 9HOPL 2007

Open Problems• Can load balance if we don’t care about locality (Cilk)

• Can we mix in locality?• If user places the work explicitly can we move it? They can

unknowingly overload resources at the “place” because of anexecution schedule chosen by the runtime

• Can generate SPMD from data parallel (ZPL, NESL, HPF)• But those performance results depend on pinning• E.g., compiled a program and run it on P processors, what happens if

task needs to use some of them?• Can multicore support better programming models?

• A multicore chip is not an SMP (and certainly not a cluster)• 10-100x higher bandwidth on chip• 10-100x lower latency on chip

• Are transactions a panacea?

Page 10: Parallel Languages: Past, Present and Future

Kathy Yelick, 10HOPL 2007

Predictions• Parallelism will explode

• Number of cores will double every 12-24 months• Petaflop (million processor) machines will be common

in HPC by 2015 (all top 500 machines will have this)• Performance will become a software problem

• Parallelism and locality are key will be concerns formany programmers – not just an HPC problem

• A new programming model will emerge formulticore programming• Can one language cover laptop to top500 space?

• Locality will continue to be important• On-chip to off-chip as well as node to node