Top Banner
Programming with OpenMP* Intel Software College
85

Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

Programming with OpenMP*

Intel Software College

Page 2: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

2

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Objectives

Upon completion of this module you will be able to use OpenMP to:

• implement data parallelism

• implement task parallelism

Page 3: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

3

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Agenda

What is OpenMP?

Parallel regions

Worksharing

Data environment

Synchronization

Curriculum Placement

Optional Advanced topics

Page 4: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

4

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

What Is OpenMP?

Portable, shared-memory threading API– Fortran, C, and C++– Multi-vendor support for both Linux and Windows

Standardizes task & loop-level parallelism

Supports coarse-grained parallelism

Combines serial and parallel code in single source

Standardizes ~ 20 years of compiler-directed threading experience

http://www.openmp.orgCurrent spec is OpenMP 3.0

318 Pages

(combined C/C++ and Fortran)

Page 5: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

5

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Programming Model

Fork-Join Parallelism: • Master thread spawns a team of threads as needed

• Parallelism is added incrementally: that is, the sequential program evolves into a parallel program

Parallel Regions

Master Thread

Page 6: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

6

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

A Few Syntax Details to Get Started

Most of the constructs in OpenMP are compiler directives or pragmas

• For C and C++, the pragmas take the form:#pragma omp construct [clause [clause]…]

• For Fortran, the directives take one of the forms:C$OMP construct [clause [clause]…] !$OMP construct [clause [clause]…]*$OMP construct [clause [clause]…]

Header file or Fortran 90 module#include “omp.h”use omp_lib

Page 7: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

7

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Agenda

What is OpenMP?

Parallel regions

Worksharing

Data environment

Synchronization

Curriculum Placement

Optional Advanced topics

Page 8: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

8

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Most OpenMP constructs apply to structured blocks

• Structured block: a block with one point of entry at the top and one point of exit at the bottom

• The only “branches” allowed are STOP statements in Fortran and exit() in C/C++

A structured block Not a structured block

Parallel Region & Structured Blocks (C/C++)

if (go_now()) goto more;#pragma omp parallel{ int id = omp_get_thread_num();more: res[id] = do_big_job(id); if (conv (res[id]) goto done; goto more;}done: if (!really_done()) goto more;

#pragma omp parallel{ int id = omp_get_thread_num();

more: res[id] = do_big_job (id);

if (conv (res[id]) goto more;}printf (“All done\n”);

Page 9: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

9

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Activity 1: Hello Worlds

Modify the “Hello, Worlds” serial code to run multithreaded using OpenMP*

Page 10: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

10

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Agenda

What is OpenMP?

Parallel regions

Worksharing – Parallel For

Data environment

Synchronization

Curriculum Placement

Optional Advanced topics

Page 11: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

11

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Worksharing

Worksharing is the general term used in OpenMP to describe distribution of work across threads.

Three examples of worksharing in OpenMP are:

•omp for construct

•omp sections construct

•omp task construct

Automatically divides work among threads

Page 12: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

12

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

omp for Construct

Threads are assigned an independent set of iterations

Threads must wait at the end of work-sharing construct

#pragma omp parallel

#pragma omp for

Implicit barrier

i = 1

i = 2

i = 3

i = 4

i = 5

i = 6

i = 7

i = 8

i = 9

i = 10

i = 11

i = 12

// assume N=12#pragma omp parallel#pragma omp for for(i = 1, i < N+1, i++) c[i] = a[i] + b[i];

Page 13: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

13

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Combining constructs

These two code segments are equivalent

#pragma omp parallel { #pragma omp for for (i=0;i< MAX; i++) { res[i] = huge(); } }

#pragma omp parallel for for (i=0;i< MAX; i++) { res[i] = huge(); }

Page 14: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

14

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

The Private Clause

Reproduces the variable for each task• Variables are un-initialized; C++ object is default constructed• Any value external to the parallel region is undefined

void* work(float* c, int N) { float x, y; int i; #pragma omp parallel for private(x,y) for(i=0; i<N; i++) {

x = a[i]; y = b[i]; c[i] = x + y; }}

Page 15: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

15

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Activity 2 – Parallel Mandelbrot

Objective: create a parallel version of Mandelbrot. Modify the code to add OpenMP worksharing clauses to parallelize the computation of Mandelbrot.

Follow the next Mandelbrot activity called Mandelbrot in the student lab doc

Page 16: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

16

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

The schedule clauseThe schedule clause affects how loop iterations are mapped onto threads

schedule(static [,chunk])• Blocks of iterations of size “chunk” to threads• Round robin distribution• Low overhead, may cause load imbalance

schedule(dynamic[,chunk])• Threads grab “chunk” iterations • When done with iterations, thread requests next set• Higher threading overhead, can reduce load imbalance

schedule(guided[,chunk])• Dynamic schedule starting with large block • Size of the blocks shrink; no smaller than “chunk”

Page 17: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

17

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Schedule Clause Example

#pragma omp parallel for schedule (static, 8) for( int i = start; i <= end; i += 2 ) { if ( TestForPrime(i) ) gPrimesFound++; }

Iterations are divided into chunks of 8• If start = 3, then first chunk is i={3,5,7,9,11,13,15,17}

Page 18: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

18

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Activity 3 –Mandelbrot Scheduling

Objective: create a parallel version of mandelbrot. That uses OpenMP dynamic scheduling

Follow the next Mandelbrot activity called Mandelbrot Scheduling in the student lab doc

Page 19: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

19

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Agenda

What is OpenMP?

Parallel regions

Worksharing – Parallel Sections

Data environment

Synchronization

Curriculum Placement

Optional Advanced topics

Page 20: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

20

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Task Decomposition

a = alice(); b = bob(); s = boss(a, b); c = cy(); printf ("%6.2f\n", bigboss(s,c));

alice,bob, and cy can be computed in parallel

alice bob

boss

bigboss

cy

Page 21: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

21

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

omp sections

#pragma omp sections

Must be inside a parallel region

Precedes a code block containing of N blocks of code that may be executed concurrently by N threads

Encompasses each omp section

#pragma omp section

Precedes each block of code within the encompassing block described above

May be omitted for first parallel section after the parallel sections pragma

Enclosed program segments are distributed for parallel execution among available threads

Page 22: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

22

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Functional Level Parallelism w sections

#pragma omp parallel sections{ #pragma omp section /* Optional */ a = alice();#pragma omp section b = bob();#pragma omp section c = cy();}

s = boss(a, b);printf ("%6.2f\n", bigboss(s,c));

Page 23: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

23

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Advantage of Parallel Sections

Independent sections of code can execute concurrently – reduce execution time

Serial Parallel

#pragma omp parallel sections

{

#pragma omp section

phase1();

#pragma omp section

phase2();

#pragma omp section

phase3();

}

Page 24: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

24

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Agenda

What is OpenMP?

Parallel regions

Worksharing – Tasks

Data environment

Synchronization

Curriculum Placement

Optional Advanced topics

Page 25: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

25

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

New Addition to OpenMP

Tasks – Main change for OpenMP 3.0

• Allows parallelization of irregular problems• unbounded loops• recursive algorithms• producer/consumer

Page 26: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

26

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

What are tasks?

Tasks are independent units of work

• Threads are assigned to perform the work of each task

• Tasks may be deferred

• Tasks may be executed immediately

The runtime system decides which of the above

• Tasks are composed of:• code to execute• data environment• internal control variables (ICV)

Serial Parallel

Page 27: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

27

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Simple Task Example

A pool of 8 threads is created here

#pragma omp parallel// assume 8 threads{ #pragma omp single private(p) { … while (p) { #pragma omp task { processwork(p); } p = p->next; } }}

One thread gets to execute the while loop

The single “while loop” thread creates a task for each instance of

processwork()

Page 28: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

28

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Task Construct – Explicit Task View

• A team of threads is created at the omp parallel construct

• A single thread is chosen to execute the while loop – lets call this thread “L”

• Thread L operates the while loop, creates tasks, and fetches next pointers

• Each time L crosses the omp task construct it generates a new task and has a thread assigned to it

• Each task runs in its own thread• All tasks complete at the barrier

at the end of the parallel region’s single construct

#pragma omp parallel{ #pragma omp single { // block 1 node * p = head; while (p) { //block 2 #pragma omp task process(p); p = p->next; //block 3 } }}

Page 29: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

29

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Why are tasks useful?

#pragma omp parallel{ #pragma omp single { // block 1 node * p = head; while (p) { //block 2 #pragma omp task process(p); p = p->next; //block 3 } }}

Have potential to parallelize irregular patterns and recursive function calls

Block 1Block 2Task 1

Block 2Task 2

Block 2Task 3

Block 3

Block 3

Tim

e

Single Threaded Block

1Block 3Block 3

Thr1 Thr2 Thr3 Thr4

Block 2Task 2

Block 2Task 1

Block 2Task 3

Time Saved

Idle

Idle

Page 30: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

30

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Activity 4 – Linked List using Tasks

while(p != NULL){ do_work(p->data); p = p->next;}

Objective: Modify the linked list pointer chasing code to implement tasks to parallelize the application

Follow the Linked List task activity called LinkedListTask in the student lab doc

Note: We also have a companion lab, that uses worksharing to solve the same problem LinkedListWorkSharing

We also have task labs on recursive functions - examples quicksort & fibonacci

Page 31: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

31

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Tasks are gauranteed to be complete:

• At thread or task barriers

• At the directive: #pragma omp barrier

• At the directive: #pragma omp taskwait

When are tasks gauranteed to be complete?

Page 32: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

32

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Task Completion Example

#pragma omp parallel{

#pragma omp taskfoo();#pragma omp barrier#pragma omp single{

#pragma omp taskbar();

}}

Multiple foo tasks created here – one for

each thread

All foo tasks guaranteed to be completed here

One bar task created here

bar task guaranteed to be completed here

Page 33: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

33

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Agenda

What is OpenMP?

Parallel regions

Worksharing

Data environment

Synchronization

Curriculum Placement

Optional Advanced topics

Page 34: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

34

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Data Scoping – What’s shared

OpenMP uses a shared-memory programming model

Shared variable - a variable whose name provides access to a the same block of storage for each task region

• Shared clause can be used to make items explicitly shared• Global variables are shared among tasks

•C/C++: File scope variables, namespace scope variables, static variables, Variables with const-qualified type having no mutable member are shared, Static variables which are declared in a scope inside the construct are shared.

•Fortran: COMMON blocks, SAVE variables, MODULE variables

Page 35: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

35

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Data Scoping – What’s private

But, not everything is shared...

• Examples of implicitly determined private variables:• Stack (local) variables in functions called from parallel regions are

PRIVATE• Automatic variables within a statement block are PRIVATE• Loop iteration variables are private• Implicitly declared private variables within tasks will be treated as

firstprivate

Firstprivate clause declares one or more list items to be private to a task, and initializes each of them with a value

Page 36: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

36

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

A Data Environment Example

temp

A, index, count

temp temp

A, index, count

Which variables are shared and which variables are private?

float A[10];

main ()

{

integer index[10];

#pragma omp parallel

{

Work (index);

}

printf (“%d\n”, index[1]);

}

extern float A[10];

void Work (int *index)

{

float temp[10];

static integer count;

<...>

}

A, index, and count are shared by all threads, but temp is local to each thread

Page 37: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

37

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

int fib ( int n ){

int x,y; if ( n < 2 ) return n;#pragma omp task x = fib(n-1);#pragma omp task y = fib(n-2);#pragma omp taskwait return x+y}

Data Scoping Issue – fib example

n is a private variableit will be treated as

firstprivate in both tasks

What’s wrong here?

Can’t use private variables Can’t use private variables outside of tasksoutside of tasks

x is a private variabley is a private variable

Page 38: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

38

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

int fib ( int n ){

int x,y; if ( n < 2 ) return n;#pragma omp task shared(x) x = fib(n-1);#pragma omp task shared(y) y = fib(n-2);#pragma omp taskwait return x+y;}

n is firstprivate in both tasks

x & y are shared Good solution

we need both values to compute the sum

Data Scoping Example – fib example

Page 39: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

39

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

List ml; //my_listElement *e;#pragma omp parallel#pragma omp single{ for(e=ml->first;e;e=e->next)#pragma omp task process(e);}

Data Scoping Issue – List Traversal

What’s wrong here?

Possible data race !Possible data race !Shared variable e Shared variable e

updated by multiple tasksupdated by multiple tasks

Page 40: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

40

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

List ml; //my_listElement *e;#pragma omp parallel#pragma omp single{ for(e=ml->first;e;e=e->next)#pragma omp task firstprivate(e) process(e);}

Good solution – e is firstprivate

Data Scoping Example – List Traversal

Page 41: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

41

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

List ml; //my_listElement *e;#pragma omp parallel#pragma omp single private(e){ for(e=ml->first;e;e=e->next)#pragma omp task process(e);}

Good solution – e is firstprivate

Data Scoping Example – List Traversal

Page 42: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

42

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

List ml; //my_list#pragma omp parallel#pragma omp forFor (int i=0; i<N; i++){ Element *e; for(e=ml->first;e;e=e->next)#pragma omp task process(e);}

Data Scoping Example – Multiple List Traversal

Good solution – e is firstprivate

Page 43: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

43

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Agenda

What is OpenMP?

Parallel regions

Worksharing

Data environment

Synchronization

Curriculum Placement

Optional Advanced topics

Page 44: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

44

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Example: Dot Product

float dot_prod(float* a, float* b, int N) { float sum = 0.0;#pragma omp parallel for shared(sum) for(int i=0; i<N; i++) { sum += a[i] * b[i]; } return sum;}

What is Wrong?What is Wrong?

Page 45: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

45

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Race Condition

A race condition is nondeterministic behavior caused by the times at which two or more threads access a shared variable

For example, suppose both Thread A and Thread B are executing the statement

area += 4.0 / (1.0 + x*x);

Page 46: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

46

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Two Timings

Value of area

Thread A Thread B

11.667

+3.765

15.432

15.432

+ 3.563

18.995

Value of area

Thread A Thread B

11.667

+3.765

11.667

15.432

+ 3.563

15.230

Order of thread execution causes non determinant behavior in a data race

Page 47: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

47

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Protect Shared Data

Must protect access to shared, modifiable data

float dot_prod(float* a, float* b, int N) { float sum = 0.0;#pragma omp parallel for shared(sum) for(int i=0; i<N; i++) {#pragma omp critical sum += a[i] * b[i]; } return sum;}

Page 48: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

48

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

#pragma omp critical [(lock_name)]

Defines a critical region on a structured block

OpenMP* Critical Construct

float RES;#pragma omp parallel{ float B; #pragma omp for for(int i=0; i<niters; i++){ B = big_job(i);#pragma omp critical (RES_lock) consum (B, RES); }}

Threads wait their turn –at a time, only one calls consum() thereby protecting RES from race conditions

Naming the critical construct RES_lock is optional

Good Practice – Name all critical sections

Page 49: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

49

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

OpenMP* Reduction Clause

reduction (op : list)

The variables in “list” must be shared in the enclosing parallel region

Inside parallel or work-sharing construct:• A PRIVATE copy of each list variable is created and initialized

depending on the “op”

• These copies are updated locally by threads

• At end of construct, local copies are combined through “op” into a single value and combined with the value in the original SHARED variable

Page 50: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

50

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Reduction Example

Local copy of sum for each thread

All local copies of sum added together and stored in “global” variable

#pragma omp parallel for reduction(+:sum) for(i=0; i<N; i++) { sum += a[i] * b[i]; }

Page 51: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

51

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

A range of associative operands can be used with reduction

Initial values are the ones that make sense mathematically

C/C++ Reduction Operations

Operand Initial Value

+ 0

* 1

- 0

^ 0

Operand Initial Value

& ~0

| 0

&& 1

|| 0

Page 52: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

52

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Numerical Integration Example

4.0

2.0

1.00.0

4.0

(1+x2)f(x) =

X

4.0

(1+x2) dx = 0

1

static long num_steps=100000; double step, pi;

void main(){ int i; double x, sum = 0.0;

step = 1.0/(double) num_steps; for (i=0; i< num_steps; i++){ x = (i+0.5)*step; sum = sum + 4.0/(1.0 + x*x); } pi = step * sum; printf(“Pi = %f\n”,pi);}}

Page 53: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

53

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Activity 5 - Computing Pi

Parallelize the numerical integration code using OpenMP

What variables can be shared?

What variables need to be private?

What variables should be set up for reductions?

static long num_steps=100000; double step, pi;

void main(){ int i; double x, sum = 0.0;

step = 1.0/(double) num_steps; for (i=0; i< num_steps; i++){ x = (i+0.5)*step; sum = sum + 4.0/(1.0 + x*x); } pi = step * sum; printf(“Pi = %f\n”,pi);}}

Page 54: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

54

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Denotes block of code to be executed by only one thread• First thread to arrive is chosen

Implicit barrier at end

Single Construct

#pragma omp parallel{ DoManyThings();#pragma omp single { ExchangeBoundaries(); } // threads wait here for single DoManyMoreThings();}

Page 55: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

55

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Denotes block of code to be executed only by the master thread

No implicit barrier at end

Master Construct

#pragma omp parallel{ DoManyThings();#pragma omp master { // if not master skip to next stmt ExchangeBoundaries(); } DoManyMoreThings();}

Page 56: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

56

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Implicit Barriers

Several OpenMP* constructs have implicit barriers• Parallel – necessary barrier – cannot be removed• for• single

Unnecessary barriers hurt performance and can be removed with the nowait clause

• The nowait clause is applicable to:•For clause•Single clause

Page 57: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

57

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Nowait Clause

Use when threads unnecessarily wait between independent computations

#pragma single nowait

{ [...] }

#pragma omp for nowait

for(...)

{...};

#pragma omp for schedule(dynamic,1) nowait for(int i=0; i<n; i++) a[i] = bigFunc1(i);

#pragma omp for schedule(dynamic,1) for(int j=0; j<m; j++) b[j] = bigFunc2(j);

Page 58: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

58

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Barrier Construct

Explicit barrier synchronization

Each thread waits until all threads arrive

#pragma omp parallel shared (A, B, C) {

DoSomeWork(A,B);printf(“Processed A into B\n”);

#pragma omp barrier DoSomeWork(B,C);printf(“Processed B into C\n”);

}

Page 59: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

59

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Atomic Construct

Special case of a critical section

Applies only to simple update of memory location

#pragma omp parallel for shared(x, y, index, n) for (i = 0; i < n; i++) { #pragma omp atomic x[index[i]] += work1(i); y[i] += work2(i); }

Page 60: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

60

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Agenda

What is OpenMP?

Parallel regions

Worksharing

Data environment

Synchronization

Curriculum Placement

Optional Advanced topics

Page 61: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

61

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

OpenMP materials suggested for:

This material is suggested for Faculty of:• Algorithms or Data Structure course• C/C++/Fortran language courses• Applied Math, Applied Science, Engineering programming

courses where loops access arrays in C/C++/ or Fortran

Page 62: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

62

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

What OpenMP essentials should be covered in university course?

Discussion of what OpenMP is and what it can do

Parallel regions

Work sharing – parallel for

Work queuing – tasks

Shared & private variables

Protection of shared data, critical sections, locks etc

Reduction clause

Optional topics• nowait, atomic, first private, last private,…• API – omp_set_num_threads(), get_num_threads()

Page 63: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

63

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Agenda

What is OpenMP?

Parallel regions

Worksharing

Data environment

Synchronization

Curriculum Placement

Optional Advanced topics

Page 64: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

Advanced Concepts

Page 65: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

65

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Parallel Construct – Implicit Task View

• Tasks are created in OpenMP even without an explicit task directive.

• Lets look at how tasks are created implicitly for the code snippet below• Thread encountering parallel construct

packages up a set of implicit tasks• Team of threads is created.• Each thread in team is assigned to one of the

tasks (and tied to it).• Barrier holds original master thread until all

implicit tasks are finished.

#pragma omp parallel

#pragma omp parallel { int mydata

code}

{ int mydata; code…}

{ mydatacode}

{ mydatacode}

{ mydatacode}

Thread

1Thread

2Thread

3

Barrier

Page 66: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

66

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Task Construct

#pragma omp task [clause[[,]clause] ...] structured-block

if (expression) untiedshared (list)private (list) firstprivate (list)default( shared | none )

where clause can be one of:

Page 67: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

67

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Tied & Untied Tasks

• Tied Tasks:• A tied task gets a thread assigned to it at its first execution and the same

thread services the task for its lifetime• A thread executing a tied task, can be suspended, and sent of to execute

some other task, but eventually, the same thread will return to resume execution of its original tied task

• Tasks are tied unless explicitly declared untied

• Untied Tasks:• An united task has no long term association with any given thread. Any

thread not otherwise occupied is free to execute an untied task. The thread assigned to execute an untied task may only change at a "task scheduling point".

• An untied task is created by appending “untied” to the task clause• Example: #pragma omp task untied

Page 68: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

68

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Task switching

task switching The act of a thread switching from the execution of one task to another task.

The purpose of task switching is distribute threads among the unassigned tasks in the team to avoid piling up long queues of unassigned tasks

Task switching, for tied tasks, can only occur at task scheduling points located within the following constructs

• encountered task constructs• encountered taskwait constructs• encountered barrier directives• implicit barrier regions• at the end of the tied task region

Untied tasks have implementation dependent scheduling points

Page 69: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

69

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

The thread executing the “for loop” , AKA the generating task, generates many tasks in a short time so...

The SINGLE generating task will have to suspend for a while when “task pool” fills up• Task switching is invoked to start draining the “pool”• When “pool” is sufficiently drained – then the single task

can being generating more tasks again

Task switching example

int exp; //exp either T or F;

#pragma omp single{ for (i=0; i<ONEZILLION; i++) #pragma omp task process(item[i]); }}

Page 70: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

70

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Optional foil - OpenMP* API

Get the thread number within a team

int omp_get_thread_num(void);

Get the number of threads in a team

int omp_get_num_threads(void);

Usually not needed for OpenMP codes• Can lead to code not being serially consistent• Does have specific uses (debugging)• Must include a header file

#include <omp.h>

Page 71: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

71

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Optional foil - Monte Carlo Pi

squarein darts of #

circle hitting darts of#4

41

squarein darts of #

circle hitting darts of#2

2

rr

loop 1 to MAX

x.coor=(random#)

y.coor=(random#)

dist=sqrt(x^2 + y^2)

if (dist <= 1)

hits=hits+1

pi = 4 * hits/MAX

r

Page 72: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

72

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Optional foil - Making Monte Carlo’s Parallel

hits = 0call SEED48(1)DO I = 1, max x = DRAND48() y = DRAND48() IF (SQRT(x*x + y*y) .LT. 1) THEN

hits = hits+1 ENDIF

END DOpi = REAL(hits)/REAL(max) * 4.0

What is the challenge here?What is the challenge here?

Page 73: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

73

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Optional Activity 6: Computing Pi

Use the Intel® Math Kernel Library (Intel® MKL) VSL: • Intel MKL’s VSL (Vector Statistics Libraries)

• VSL creates an array, rather than a single random number

• VSL can have multiple seeds (one for each thread)

Objective:• Use basic OpenMP* syntax to make Pi parallel

• Choose the best code to divide the task up

• Categorize properly all variables

Page 74: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

74

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Variables initialized from shared variable

C++ objects are copy-constructed

Firstprivate Clause

incr=0;#pragma omp parallel for firstprivate(incr)for (I=0;I<=MAX;I++) {

if ((I%2)==0) incr++;A(I)=incr;

}

Page 75: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

75

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Variables update shared variable using value from last iteration

C++ objects are updated as if by assignment

Lastprivate Clause

void sq2(int n, double *lastterm)

{ double x; int i; #pragma omp parallel #pragma omp for lastprivate(x) for (i = 0; i < n; i++){ x = a[i]*a[i] + b[i]*b[i]; b[i] = sqrt(x); } lastterm = x;}

Page 76: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

76

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Preserves global scope for per-thread storage

Legal for name-space-scope and file-scope

Use copyin to initialize from master thread

Threadprivate Clause

struct Astruct A;#pragma omp threadprivate(A)…

#pragma omp parallel copyin(A) do_something_to(&A);…

#pragma omp parallel do_something_else_to(&A);

Private copies of “A” persist between regions

Page 77: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

77

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

20+ Library Routines

Runtime environment routines:• Modify/check the number of threads

omp_[set|get]_num_threads()omp_get_thread_num()omp_get_max_threads()

• Are we in a parallel region?omp_in_parallel()

• How many processors in the system?omp_get_num_procs()

• Explicit locksomp_[set|unset]_lock()

• And many more...

Page 78: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

78

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Library Routines

To fix the number of threads used in a program • Set the number of threads• Then save the number returned

#include <omp.h>

void main (){ int num_threads; omp_set_num_threads (omp_num_procs ());

#pragma omp parallel { int id = omp_get_thread_num ();

#pragma omp single num_threads = omp_get_num_threads ();

do_lots_of_stuff (id); }}

Request as many threads as you have processors.

Request as many threads as you have processors.

Protect this operation because memory stores are not atomic

Protect this operation because memory stores are not atomic

Page 79: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

79

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Page 80: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

80

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Agenda

What is OpenMP?

Runtime functions/environment variables

Parallel regions

Worksharing/Workqueuing

Data environment

Synchronization

Other Helpful Constructs and Clauses

Page 81: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

81

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Environment Variables

Set the default number of threadsOMP_NUM_THREADS integer

Set the default scheduling protocolOMP_SCHEDULE “schedule[, chunk_size]”

Enable dynamic thread adjustmentOMP_DYNAMIC [TRUE|FALSE]

Enable nested parallelismOMP_NESTED [TRUE|FALSE]

Page 82: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

82

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

20+ Library Routines

Runtime environment routines:• Modify/check the number of threads

omp_[set|get]_num_threads()omp_get_thread_num()omp_get_max_threads()

• Are we in a parallel region?omp_in_parallel()

• How many processors in the system?omp_get_num_procs()

• Explicit locksomp_[set|unset]_lock()

• And many more...

In this course, focuses on the directives only approach to OpenMP – which makes incremental parallelism easy

Page 83: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

83

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

What is a Thread?

An execution entity with a stack and associated static memory, called threadprivate memory..

Page 84: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

84

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Load Imbalance

Unequal work loads lead to idle threads and wasted time.

time

Busy Idle

#pragma omp parallel{

#pragma omp for for( ; ; ){

}

}time

Page 85: Programming with OpenMP* Intel Software College. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or.

85

Copyright © 2008, Intel Corporation. All rights reserved.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

#pragma omp parallel{

#pragma omp critical { ... } ...}

Synchronization

Lost time waiting for locks

time

Busy Idle In Critical