Top Banner
Parallel & Distributed Computer Systems Dr. Mohammad Ansari
43

Example : parallelize a simple problem

Jul 04, 2015

Download

Economy & Finance

MrMaKKaWi

Dr. Mohammad Ansari http://uqu.edu.sa/staff/ar/4300205
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Example : parallelize a simple problem

Parallel & Distributed

Computer Systems

Dr. Mohammad Ansari

Page 2: Example : parallelize a simple problem

Course Details

Delivery◦ Lectures/discussions: English

◦ Assessments: English

◦ Ask questions in class if you don’t understand

◦ Email me after class if you do not want to ask in class

◦ DO NOT LEAVE QUESTIONS TILL THE DAY BEFORE THE EXAM!!!

Assessments (this may change)◦ Homework (~1 per week): 10%

◦ Midterm: 20%

◦ 1 project + final exam OR 2 projects: 35%+35%

Page 3: Example : parallelize a simple problem

Course Details

Textbook◦ Principles of Parallel Programming, Lin & Snyder

Other sources of information:◦ COMP 322, Rice University

◦ CS 194, UC Berkeley

◦ Cilk lectures, MIT

Many sources of information on the

internet for writing parallelized code

Page 4: Example : parallelize a simple problem

Teaching Materials & Assignments

Everything is on Jusur◦ Lectures

◦ Homeworks

Submit homework through Jusur

Homework is given out on Saturday

Homework due following Saturday

You lose 10% for each day late

Page 5: Example : parallelize a simple problem

Homework 1

First homework is available on Jusur◦ Install Linux on your computer

It is needed for future homework

It is needed to access the supercomputers

◦ Check settings/hardware

Submit pictures of your settings

Submit description of your processor

◦ Deadline: 27/03/1431 (submit on Jusur)

Page 6: Example : parallelize a simple problem

Cheating in Homework/Projects

Cheating◦ If you cheat, you get zero

◦ If you help others cheat, you will also get zero

◦ Copy + paste from Internet, e.g. Wikipedia, or

elsewhere, is also cheating (called plagiarism)

◦ You can read any source of information, but you

must write answers in your own words

◦ If you have problems, please ask for help.

Page 7: Example : parallelize a simple problem

Outline

Previous lecture:◦ Why study parallel computing?

◦ Topics covered on this course

This lecture:◦ Example problem

Next week:◦ Parallel processor architectures

Page 8: Example : parallelize a simple problem

Example Problem

We will parallelize a simple problem

Begin to explore some of the issues

related to parallel programming, and

performance of parallel programs

Page 9: Example : parallelize a simple problem

Example Problem: Array Sum

Add all the numbers in a large array

It has 100 million elements

int size = 100000000;

int array[] = {7,3,15,10,13,18,6,4,…};

What code should we write for a

sequential program?

Page 10: Example : parallelize a simple problem

Example Problem: Sequential

int sum = 0;

int i = 0;

for(i = 0; i < size; i++) {

sum += array[i]; //sum=sum+array[i];

}

Page 11: Example : parallelize a simple problem

Example Problem: Sequential

Page 12: Example : parallelize a simple problem

How Do We Parallelize?

Objective: Thinking about parallelism◦ Multiple processors need something to do

A program/software has to be split into parts

Each part can be executed on a different processor.

◦ How do we improve performance over single processor?

If a problem takes 2 seconds on a single processor

And we break it into two (equal) parts: 1 second for each part

And we execute the two parts separately, but in parallel, on two processors, then we improve performance

Page 13: Example : parallelize a simple problem

How Do We Parallelize?

Part 0 Part 1

Sequential Parallel

Part 0

Part 1

CPU 0 CPU 0 CPU 1

Time

1

2

Page 14: Example : parallelize a simple problem

How Do We Start Parallelizing?

What parts can be done separately?◦ What parts can we do on separate processors?

◦ Meaning: What parts have no data dependence

◦ Data dependence:

The execution of an instruction (or line of

code) is dependent on execution of a previous

instruction (or line of code).

◦ Data independence:

The execution of an instruction (or line of

code) is not dependent on execution of a

previous instruction (or line of code).

Page 15: Example : parallelize a simple problem

Example of Data Dependence

int x = 0;

int y = 5;

x = 3;

y = y + x; //Is this line dependent on

the previous line?

Page 16: Example : parallelize a simple problem

Data Dependence & Parallelism

In a sequential program, data dependence does not matter: each instruction executes in sequence. ◦ Instructions execute one by one

In a parallel program, data independence allows parallel execution of instructions. Data dependence prevents parallel execution of instructions.◦ Reduces parallel performance

◦ Reduces number of processors that can be used

Page 17: Example : parallelize a simple problem

Why is Data Dependence Bad For

Parallel Programs? Does not allow correct parallel execution

CPU0 CPU1

x = 3; y = 5; //(5 + 0)

x = 3; y = y + x;

Page 18: Example : parallelize a simple problem

Why is Data Dependence Bad For

Parallel Programs? Does not allow correct parallel execution

CPU0 CPU1

x = 3;

y = 8; //(5 + 3)

x = 3;

y = y + x;

WAIT

Page 19: Example : parallelize a simple problem

Why is Data Dependence Bad For

Parallel Programs? Does not allow correct parallel execution

CPU0

x = 3; y = 8;

x = 3;

y = y + x;

Page 20: Example : parallelize a simple problem

Example of Data Independence

int x = 0;

int y = 5;

x = 3;

y = y + 5; //Is this line dependent on

the previous line?

Page 21: Example : parallelize a simple problem

Why is Data Independence Useful?

Allows correct parallel execution

CPU0 CPU1

x = 3; y = 10;

x = 3; y = y + 5;

Page 22: Example : parallelize a simple problem

Back to Array Sum Example

Does the code have data dependence?

int sum = 0;

for(int i = 0; i < size; i++) {

sum += array[i]; //sum=sum+array[i];

}

Page 23: Example : parallelize a simple problem

Back to Array Sum Example

Does the code have data dependence?

int sum = 0;

for(int i = 0; i < size; i++) {

sum += array[i]; //sum=sum+array[i];

}

Not so easy to see

Page 24: Example : parallelize a simple problem

Back to Array Sum Example

Let’s unroll the loop:

int sum = 0;sum += array[0]; //sum=sum+array[0];sum += array[1]; //sum=sum+array[1];sum += array[2]; //sum=sum+array[2];sum += array[3]; //sum=sum+array[3];…

Now we can see dependence!

Page 25: Example : parallelize a simple problem

Example Problem: Sequential

Page 26: Example : parallelize a simple problem

Removing Dependencies

Sometimes this is possible.◦ Dependencies discussed in detail later.

Tip: Can be useful to look at the

problem being solved by the

code, and not the code itself.

Page 27: Example : parallelize a simple problem

Break Sum into Pieces

7 3 1 0 2 9 5 8 3 6

SUM

S1S0

P0 P1

Page 28: Example : parallelize a simple problem

Some Details…

A program executes inside a process

If we want to use multiple processors◦ We need multiple processes

◦ One process for each processor (not fixed rule)

Processes are big, heavyweight

Threads are lighter than processes◦ But same strategy

◦ One thread for each processor (not fixed rule)

We will talk about threads and processes later, if necessary

Page 29: Example : parallelize a simple problem

What Does the Code Look Like?

int numThreads = 2; //Assume one thread per core, and 2 cores

int sum = 0;

int i = 0;

int middleSum[numThreads];

int threadSetSize = size/numThreads

//Each thread will execute this code with a different threadID

for( i = threadID*threadSetSize; i < (threadID+1)*threadSetSize; i++)

{

middleSum[threadID] += array[i];

}

//Only thread 0 will execute this code

if (threadID==0) {

for(i = 0; i < numThreads; i++) {

sum += middleSum[i];

}

}

Page 30: Example : parallelize a simple problem

Load Balancing

Which processor is doing more work?

7 3 1 0 2 9 5 8 3 6

SUM

S1S0

P0 P1

Page 31: Example : parallelize a simple problem

Load Balancing

Part 0

Part 1

Sequential Parallel

Part 0

Part 1

P0 P0 P1

Time

1.0

2.0

1.3

Page 32: Example : parallelize a simple problem

Example Problem: Array Sum

Parallelized code is more complex

Requires us to think differently about

how to solve the problem◦ Need to think about breaking it into parts

◦ Analyze data dependencies, remove if possible

◦ Need to load balance for better performance

Page 33: Example : parallelize a simple problem

Example Problem: Array Sum

However, the parallel code is broken◦ Thread 0 adds all the middle sums.

◦ What if thread 0 finishes its own work, but

other threads have not?

Page 34: Example : parallelize a simple problem

Synchronization

P0 will probably finish before P1

7 3 1 0 2 9 5 8 3 6

SUM

S1S0

P0 P1

Page 35: Example : parallelize a simple problem

How Can We Fix The Code to

GUARANTEE It Works Correctly?int numThreads = 2; //Assume one thread per core, and 2 cores

int sum = 0;

int i = 0;

int middleSum[numThreads];

int threadSetSize = size/numThreads

//Each thread will execute this code with a different threadID

for( i = threadID*threadSetSize; i < (threadID+1)*threadSetSize; i++)

{

middleSum[threadID] += array[i];

}

//Only thread 0 will execute this code

if (threadID==0) {

for(i = 0; i < numThreads; i++) {

sum += middleSum[i];

}

}

Page 36: Example : parallelize a simple problem

Synchronization

Sometimes we need to

coordinate/organize threads

If we don’t, the code might calculate the

wrong answer to the problem

Can happen even if load balance is perfect

Synchronization is concerned with this

coordination / organization

Page 37: Example : parallelize a simple problem

Code with Synchronization Fixed

int numThreads = 2; //Assume one thread per core, & 2 cores

int sum = 0;

int i = 0;

int middleSum[numThreads];

int threadSetSize = size/numThreads

//Each thread will execute this code with a different threadID

for( i = threadID*threadSetSize; i < (threadID+1)*threadSetSize; i++)

{

middleSum[threadID] += array[i];

}

waitForAllThreads(); //Wait for all threads

//Only thread 0 will execute this code

if (threadID==0) {

for(i = 0; i < numThreads; i++) {

sum += middleSum[i];

}

}

Page 38: Example : parallelize a simple problem

Synchronization

The example shows a barrier

This is one type of synchronization

Barriers require all threads to reach

that point in the code, before any

thread is allowed to continue

It is like a gate. All threads come to

the gate, and then it opens.

Page 39: Example : parallelize a simple problem

Generalizing the Solution

We only looked at how to parallelize

for 2 threads

But the code is more general◦ Can use any number of threads

◦ Important that code is written this way

◦ We will look at this in more detail later

Page 40: Example : parallelize a simple problem

Parallel Program

Page 41: Example : parallelize a simple problem

Performance

Now the program is correct Let’s look at performance

0

0.2

0.4

0.6

0.8

1

1 Thread 2 Threads 4 Threads

Time on 2-core Processor

Page 42: Example : parallelize a simple problem

Performance

Two-threads are not 2x fast. Why?

◦ The problem is called false sharing

◦ To understand this, we have to look at the

computer architecture

◦ We will study this in the next lecture

Four-threads slower than two-threads.

Why?

◦ The processor only has two cores

◦ Four threads adds scheduling overhead, wastes

time

Page 43: Example : parallelize a simple problem

Summary

Used an example to start looking at

how to parallelize code, and some of

the main issues◦ Data dependence

◦ Load balancing

◦ Synchronization

Each will be discussed in more detail

in later lectures