Top Banner
Parallel Processing 1 Parallel Processing (CS 676) Overview Jeremy R. Johnson
17

Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Dec 17, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Parallel Processing 1

Parallel Processing (CS 676)

Overview

Jeremy R. Johnson

Page 2: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Parallel Processing 2

Goals

• Parallelism: To run large and difficult programs fast.

• Course: To become effective parallel programmers– “How to Write Parallel Programs”– “Parallelism will become, in the not too distant future, an essential part

of every programmer’s repertoire”– “Coordination – a general phenomenon of which parallelism is one

example – will become a basic and widespread phenomenon in CS”

• Why? – Some problems require extensive computing power to solve– The most powerful computer by definition is a parallel machine– Parallel computing is becoming ubiquitous– Distributed & networked computers with simultaneous users require

coordination

Page 3: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Parallel Processing 3

Top 500

Page 4: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Parallel Processing 4

LINPACK Benchmark

• Solve a dense N N system of linear equations, y = Ax, using Gaussian Elimination with partial pivoting

– 2/3N3 + 2N2 FLOPS

• High Performance LINPACK used to measure performance for TOP500 (introduced by Jack Dongarra)

uuuuuu

lllll

l

aaaaaaaaa

33

2322

131211

333231

2221

11

333231

232221

131211

00

00

00

Page 5: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Parallel Processing 5

Example LU Decomposition

• Solve the following linear system

• Find LU decomposition A = PLU

1

1

1

yx

zx

zy

011

101

110

A

Page 6: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Parallel Processing 6

Big Machines

Cray 2DoE-Lawrence Livermore

National Laboratory (1985)3.9 gigaflops

8 processor vector machine

Cray XMP/4DoE, LANL,… (1983)

941 megaflops4 processor vector machine

Page 7: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Parallel Processing 7

Big Machines

Cray JaguarORNL (2009)

1.75 petaflops224,256 AMD Opteron cores

Tianhe-1ANSC Tianjin, China (2010)

2.507 petaflops14,336 Xeon X5670 processors 7,168 Nvidia Tesla M2050 GPUS

Page 8: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Parallel Processing 8

Need for Parallelism

Page 9: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Parallel Processing 9

Multicore

Intel Core i7

Page 10: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Parallel Processing 10

Multicore

IBM Blue Gene/L2004-2007

478.2 teraflops65,536 "compute nodes”

Cyclops6480 gigaflops

80 cores @ 500 megahertzmultiply-accumulate

Page 11: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Parallel Processing 11

Multicore

Page 12: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Parallel Processing 12

Multicore

Page 13: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Parallel Processing 13

GPU

Nvidia GTX 480 1.34 teraflops

480 SP (700 MHz)Fermi chip 3 billion transistors

Page 14: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Parallel Processing 14

Google Server

• 2003: 15,000 servers ranging from 533 MHz Intel Celeron to dual 1.4 GHz Intel Pentium III

• 2005: 200,000 servers

• 2006: upwards of servers

Page 15: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Drexel Machines

• Tux• 5 nodes

– 4 Quad-Core AMD Opteron 8378 processors (2.4 GHz)

– 32 GB RAM

• Draco• 20 nodes

– Dual Xeon Processor X5650 (2.66 GHz)

– 6 GTX 480– 72 GB RAM

• 4 nodes– 6 C2070 GPUs

Parallel Processing 15

Page 16: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Parallel Processing 16

Programming Challenge

• “But the primary challenge for an 80-core chip will be figuring out how to write software that can take advantage of all that horsepower.”

• Read more: http://news.cnet.com/Intel-shows-off-80-core-processor/21001006_36158181.html?tag=mncol#ixzz1AHCK1LEc

Page 17: Parallel Processing1 Parallel Processing (CS 676) Overview Jeremy R. Johnson.

Parallel Processing 17

Basic Idea

• One way to solve a problem fast is to break the problem into pieces, and arrange for all of the pieces to be solved simultaneously.

• The more pieces, the faster the job goes - upto a point where the pieces become too small to make the effort of breaking-up and distributing worth the bother.

• A “parallel program” is a program that uses the breaking up and handing-out approach to solve large or difficult problems.