Transcript
Intel® Software College
What Is Parallel Computing?
Attempt to speed solution of a particular task by
1. Dividing task into sub-tasks
2. Executing sub-tasks simultaneously on multiple processors
Successful attempts require both
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
1Recognizing Potential Parallelism
Successful attempts require both
1. Understanding of where parallelism can be effective
2. Knowledge of how to design and implement good solutions
Intel® Software College
Methodology
Study problem, sequential program, or code segment
Look for opportunities for parallelism
Try to keep all processors busy doing useful work
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
2Recognizing Potential Parallelism
Intel® Software College
Ways of Exploiting Parallelism
Domain decomposition
Task decomposition
Pipelining
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
3Recognizing Potential Parallelism
Intel® Software College
Domain Decomposition
First, decide how data elements should be divided among processors
Second, decide which tasks each processor should be doing
Example: Vector addition
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
4Recognizing Potential Parallelism
Intel® Software College
Domain Decomposition
Find the largest element of an array
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
5Recognizing Potential Parallelism
Intel® Software College
Domain Decomposition
Find the largest element of an array
CPU 0 CPU 1 CPU 2 CPU 3
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
6Recognizing Potential Parallelism
Intel® Software College
Domain Decomposition
Find the largest element of an array
CPU 0 CPU 1 CPU 2 CPU 3
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
7Recognizing Potential Parallelism
Intel® Software College
Domain Decomposition
Find the largest element of an array
CPU 0 CPU 1 CPU 2 CPU 3
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
8Recognizing Potential Parallelism
Intel® Software College
Domain Decomposition
Find the largest element of an array
CPU 0 CPU 1 CPU 2 CPU 3
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
9Recognizing Potential Parallelism
Intel® Software College
Domain Decomposition
Find the largest element of an array
CPU 0 CPU 1 CPU 2 CPU 3
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
10Recognizing Potential Parallelism
Intel® Software College
Domain Decomposition
Find the largest element of an array
CPU 0 CPU 1 CPU 2 CPU 3
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
11Recognizing Potential Parallelism
Intel® Software College
Domain Decomposition
Find the largest element of an array
CPU 0 CPU 1 CPU 2 CPU 3
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
12Recognizing Potential Parallelism
Intel® Software College
Domain Decomposition
Find the largest element of an array
CPU 0 CPU 1 CPU 2 CPU 3
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
13Recognizing Potential Parallelism
Intel® Software College
Domain Decomposition
Find the largest element of an array
CPU 0 CPU 1 CPU 2 CPU 3
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
14Recognizing Potential Parallelism
Intel® Software College
Domain Decomposition
Find the largest element of an array
CPU 0 CPU 1 CPU 2 CPU 3
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
15Recognizing Potential Parallelism
Intel® Software College
Domain Decomposition
Find the largest element of an array
CPU 0 CPU 1 CPU 2 CPU 3
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
16Recognizing Potential Parallelism
Intel® Software College
Task (Functional) Decomposition
First, divide tasks among processors
Second, decide which data elements are going to be accessed (read and/or written) by which processors
Example: Event-handler for GUI
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
17Recognizing Potential Parallelism
Intel® Software College
Task Decomposition
f()
r()q()
h()
g()
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
18Recognizing Potential Parallelism
s()
q()h()
Intel® Software College
Task Decomposition
f()
r()q()
h()
g()
CPU 0
CPU 2
CPU 1
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
19Recognizing Potential Parallelism
s()
q()h()
Intel® Software College
Task Decomposition
f()
r()q()
h()
g()
CPU 0
CPU 2
CPU 1
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
20Recognizing Potential Parallelism
s()
q()h()
Intel® Software College
Task Decomposition
f()
r()q()
h()
g()
CPU 0
CPU 2
CPU 1
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
21Recognizing Potential Parallelism
s()
q()h()
Intel® Software College
Task Decomposition
f()
r()q()
h()
g()
CPU 0
CPU 2
CPU 1
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
22Recognizing Potential Parallelism
s()
q()h()
Intel® Software College
Task Decomposition
f()
r()q()
h()
g()
CPU 0
CPU 2
CPU 1
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
23Recognizing Potential Parallelism
s()
q()h()
Intel® Software College
Pipelining
Special kind of task decomposition
“Assembly line” parallelism
Example: 3D rendering in computer graphics
RasterizeClipProjectModel
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
24Recognizing Potential Parallelism
Input Output
Intel® Software College
Processing One Data Set (Step 1)
RasterizeClipProjectModel
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
25Recognizing Potential Parallelism
Intel® Software College
Processing One Data Set (Step 2)
RasterizeClipProjectModel
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
26Recognizing Potential Parallelism
Intel® Software College
Processing One Data Set (Step 3)
RasterizeClipProjectModel
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
27Recognizing Potential Parallelism
Intel® Software College
Processing One Data Set (Step 4)
RasterizeClipProjectModel
The pipeline processes 1 data set in 4 steps
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
28Recognizing Potential Parallelism
Intel® Software College
Processing Two Data Sets (Step 1)
RasterizeClipProjectModel
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
29Recognizing Potential Parallelism
Intel® Software College
Processing Two Data Sets (Time 2)
RasterizeClipProjectModel
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
30Recognizing Potential Parallelism
Intel® Software College
Processing Two Data Sets (Step 3)
RasterizeClipProjectModel
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
31Recognizing Potential Parallelism
Intel® Software College
Processing Two Data Sets (Step 4)
RasterizeClipProjectModel
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
32Recognizing Potential Parallelism
Intel® Software College
Processing Two Data Sets (Step 5)
RasterizeClipProjectModel
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
33Recognizing Potential Parallelism
The pipeline processes 2 data sets in 5 steps
Intel® Software College
Pipelining Five Data Sets (Step 1)
Data set 0
Data set 1
Data set 2
CPU 0 CPU 1 CPU 2 CPU 3
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
34Recognizing Potential Parallelism
Data set 2
Data set 3
Data set 4
Intel® Software College
Pipelining Five Data Sets (Step 2)
Data set 0
Data set 1
Data set 2
CPU 0 CPU 1 CPU 2 CPU 3
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
35Recognizing Potential Parallelism
Data set 2
Data set 3
Data set 4
Intel® Software College
Pipelining Five Data Sets (Step 3)
Data set 0
Data set 1
Data set 2
CPU 0 CPU 1 CPU 2 CPU 3
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
36Recognizing Potential Parallelism
Data set 2
Data set 3
Data set 4
Intel® Software College
Pipelining Five Data Sets (Step 4)
Data set 0
Data set 1
Data set 2
CPU 0 CPU 1 CPU 2 CPU 3
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
37Recognizing Potential Parallelism
Data set 2
Data set 3
Data set 4
Intel® Software College
Pipelining Five Data Sets (Step 5)
Data set 0
Data set 1
Data set 2
CPU 0 CPU 1 CPU 2 CPU 3
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
38Recognizing Potential Parallelism
Data set 2
Data set 3
Data set 4
Intel® Software College
Pipelining Five Data Sets (Step 6)
Data set 0
Data set 1
Data set 2
CPU 0 CPU 1 CPU 2 CPU 3
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
39Recognizing Potential Parallelism
Data set 2
Data set 3
Data set 4
Intel® Software College
Pipelining Five Data Sets (Step 7)
Data set 0
Data set 1
Data set 2
CPU 0 CPU 1 CPU 2 CPU 3
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
40Recognizing Potential Parallelism
Data set 2
Data set 3
Data set 4
Intel® Software College
Pipelining Five Data Sets (Step 8)
Data set 0
Data set 1
Data set 2
CPU 0 CPU 1 CPU 2 CPU 3
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
41Recognizing Potential Parallelism
Data set 2
Data set 3
Data set 4
Intel® Software College
Dependence Graph
Graph = (nodes, arrows)
Node for each
Variable assignment (except index variables)
Constant
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
42Recognizing Potential Parallelism
Operator or function call
Arrows indicate use of variables and constants
Data flow
Control flow
Intel® Software College
Dependence Graph Example #1
for (i = 0; i < 3; i++)a[i] = b[i] / 2.0;
b[0] b[1] b[2]2 2 2
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
43Recognizing Potential Parallelism
a[0] a[1] a[2]
///
Intel® Software College
Dependence Graph Example #1
for (i = 0; i < 3; i++)a[i] = b[i] / 2.0;
b[0] b[1] b[2]2 2 2
Domain decompositionpossible
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
44Recognizing Potential Parallelism
a[0] a[1] a[2]
///
Intel® Software College
Dependence Graph Example #2
for (i = 1; i < 4; i++)a[i] = a[i-1] * b[i];
b[1] b[2] b[3]a[0]
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
45Recognizing Potential Parallelism
a[1] a[2] a[3]
***
Intel® Software College
Dependence Graph Example #2
for (i = 1; i < 4; i++)a[i] = a[i-1] * b[i];
b[1] b[2] b[3]a[0]
No domain decomposition
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
46Recognizing Potential Parallelism
a[1] a[2] a[3]
***
Intel® Software College
Dependence Graph Example #3
a = f(x, y, z);b = g(w, x);t = a + b;c = h(z);s = t / c;
x
f
w y z
g h
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
47Recognizing Potential Parallelism
ab
t
c
s
/
Intel® Software College
Dependence Graph Example #3
a = f(x, y, z);b = g(w, x);t = a + b;c = h(z);s = t / c;
x
f
w y z
g h
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
48Recognizing Potential Parallelism
ab
t
c
s
/
Taskdecompositionwith 3 CPUs.
Intel® Software College
Speculative Computation in a Turn-Based Strategy Game
Make moves for distantAI-controlled countries
in parallel
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
49Recognizing Potential Parallelism
Intel® Software College
Risk: Unexpected Interaction
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
50Recognizing Potential Parallelism
Intel® Software College
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
51Recognizing Potential Parallelism
Intel® Software College
Orange Cannot Move a Ship that Has Already Been Sunk by Green
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
52Recognizing Potential Parallelism
Intel® Software College
Solution: Reverse Time
Must be able to “undo” an erroneous, speculative computation
Analogous to what is done in hardware after incorrect branch prediction
Speculative computations typically do not have a big
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
53Recognizing Potential Parallelism
Speculative computations typically do not have a big payoff in parallel computing
Intel® Software College
Fork/Join Programming Model
When program begins execution, only master thread active
Master thread executes sequential portions of program
For parallel portions of program, master thread forks
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
54Shared-Memory Model and Threads
For parallel portions of program, master thread forks(creates or awakens) additional threads
At join (end of parallel section of code), extra threads are suspended or die
Intel® Software College
Relating Fork/Join to Code
for {
}
Sequential code
Parallel code
Sequential code
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
55Shared-Memory Model and Threads
for {
}
Sequential code
Parallel code
Sequential code
Intel® Software College
Incremental Parallelization
Sequential program a special case of threaded program
Programmers can add parallelism incrementally
Profile program execution
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
56Shared-Memory Model and Threads
Repeat
Choose best opportunity for parallelization
Transform sequential code into parallel code
Until further improvements not worth the effort
Intel® Software College
Utility of Threads
Threads are flexible enough to implement
Domain decomposition
Functional decomposition
Pipelining
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
57Shared-Memory Model and Threads
Pipelining
Intel® Software College
Domain Decomposition Using Threads
SharedMemory
Thread 0 Thread 2
f ( ) f ( )
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
58Shared-Memory Model and Threads
Memory
Thread 1
f ( )
Intel® Software College
SharedMemory
Functional Decomposition Using Threads
Thread 0 Thread 1
e ( ) f ( )
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
59Shared-Memory Model and Threads
g ( )h ( )
Intel® Software College
Pipelining Using Threads
Thread 0 Thread 2Thread 1
e ( ) f ( ) g ( )
Dataset 2
Dataset 4 Data
set 3 Data
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
60Shared-Memory Model and Threads
Shared Memory
Input Output
set 2
Data sets5, 6, ...
set 3 Dataset 1
Intel® Software College
Shared versus Private Variables
SharedVariables
PrivateVariables
PrivateVariables
Thread
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
61Shared-Memory Model and Threads
Variables VariablesVariables
Thread
Intel® Software College
The Threads Model
Thread
PrivateVariables
Thread
PrivateVariables
Thread
PrivateVariables
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
62Shared-Memory Model and Threads
Shared Variables
Intel® Software College
What Is OpenMP?
OpenMP is an API for parallel programming
First developed by the OpenMP Architecture Review Board (1997), now a standard
Designed for shared-memory multiprocessors
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
63Implementing Domain Decompositions
Set of compiler directives, library functions, and environment variables, but not a language
Can be used with C, C++, or Fortran
Based on fork/join model of threads
Intel® Software College
Strengths and Weaknesses of OpenMP
Strengths
Well-suited for domain decompositions
Available on Unix and Windows NT
Weaknesses
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
64Implementing Domain Decompositions
Weaknesses
Not well-tailored for functional decompositions
Compilers do not have to check for such errors as deadlocks and race conditions
Intel® Software College
Syntax of Compiler Directives
A C/C++ compiler directive is called a pragma
Pragmas are handled by the preprocessor
All OpenMP pragmas have the syntax:
#pragma omp <rest of pragma>
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
65Implementing Domain Decompositions
#pragma omp <rest of pragma>
Pragmas appear immediately before relevant construct
Intel® Software College
Pragma: parallel for
The compiler directive
#pragma omp parallel for
tells the compiler that the for loop which
immediately follows can be executed in parallel
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
66Implementing Domain Decompositions
The number of loop iterations must be computable at run time before loop executes
Loop must not contain a break, return, or exit
Loop must not contain a goto to a label outside loop
Intel® Software College
Example
int first, *marked, prime, size;
...
#pragma omp parallel for
for (i = first; i < size; i += prime)
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
67Implementing Domain Decompositions
for (i = first; i < size; i += prime)
marked[i] = 1;
Intel® Software College
Matching Threads with CPUs
Function omp_get_num_procs returns the number of physical processors available to the parallel program
int omp_get_num_procs (void);
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
68Implementing Domain Decompositions
Example:
int t;
...
t = omp_get_num_procs();
Intel® Software College
Matching Threads with CPUs (cont.)
Function omp_set_num_threads allows you to set the number of threads that should be active in parallel sections of code
void omp_set_num_threads (int t);
The function can be called with different arguments at different points in the program
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
69Implementing Domain Decompositions
at different points in the program
Example:
int t;
…
omp_set_num_threads (t);
Intel® Software College
Which Loop to Make Parallel?
main () {
int i, j, k;
float **a, **b;
...
for (k = 0; k < N; k++) Loop-carried dependences
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
70Implementing Domain Decompositions
for (k = 0; k < N; k++)
for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
a[i][j] = MIN(a[i][j], a[i][k] + a[k][j]);
Loop-carried dependences
Can execute in parallel
Can execute in parallel
Intel® Software College
Grain Size
There is a fork/join for every instance of#pragma omp parallel for
for ( ) {
...
}
Since fork/join is a source of overhead, we want to
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
71Implementing Domain Decompositions
Since fork/join is a source of overhead, we want to maximize the amount of work done for each fork/join; i.e., the grain size
Hence we choose to make the middle loop parallel
Intel® Software College
Almost Right, but Not Quite
main () {
int i, j, k;
float **a, **b;
...
for (k = 0; k < N; k++)
Problem: j is a shared variable
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
72Implementing Domain Decompositions
for (k = 0; k < N; k++)
#pragma omp parallel for
for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
a[i][j] = MIN(a[i][j], a[i][k] + a[k][j]);
Intel® Software College
Problem Solved with private Clause
main () {
int i, j, k;
float **a, **b;
...
for (k = 0; k < N; k++)
Tells compiler to makelisted variables private
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
73Implementing Domain Decompositions
for (k = 0; k < N; k++)
#pragma omp parallel for private (j)
for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
a[i][j] = MIN(a[i][j], a[i][k] + a[k][j]);
listed variables private
Intel® Software College
Example
int i;
float *a, *b, *c, tmp;
...
for (i = 0; i < N; i++) {
tmp = a[i] / b[i];
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
74Implementing Domain Decompositions
tmp = a[i] / b[i];
c[i] = tmp * tmp;
}
Loop is perfectly parallelizable except for shared
variable “tmp”
Intel® Software College
Solution
int i;
float *a, *b, *c, tmp;
...
#pragma omp parallel for private (tmp)
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
75Implementing Domain Decompositions
for (i = 0; i < N; i++) {
tmp = a[i] / b[i];
c[i] = tmp * tmp;
}
Intel® Software College
More about Private Variables
Each thread has its own copy of the private variables
If j is declared private, then inside the for loop no thread can access the “other” j (the j in
shared memory)
No thread can use a previously defined value of j
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
76Implementing Domain Decompositions
No thread can use a previously defined value of j
No thread can assign a new value to the shared j
Private variables are undefined at loop entry and loop exit, reducing execution time
Intel® Software College
Clause: firstprivate
The firstprivate clause tells the compiler that the
private variable should inherit the value of the shared variable upon loop entry
The value is assigned once per thread, not once per loop iteration
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
77Implementing Domain Decompositions
Intel® Software College
Example
a[0] = 0.0;
for (i = 1; i < N; i++)
a[i] = alpha (i, a[i-1]);
#pragma omp parallel for firstprivate (a)
for (i = 0; i < N; i++) {
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
78Implementing Domain Decompositions
for (i = 0; i < N; i++) {
b[i] = beta (i, a[i]);
a[i] = gamma (i);
c[i] = delta (a[i], b[i]);
}
Intel® Software College
Clause: lastprivate
The lastprivate clause tells the compiler that the
value of the private variable after the sequentially last loop iteration should be assigned to the shared variable upon loop exit
In other words, when the thread responsible for the sequentially last loop iteration exits the loop,
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
79Implementing Domain Decompositions
sequentially last loop iteration exits the loop, its copy of the private variable is copied back to the shared variable
Intel® Software College
Example
#pragma omp parallel for lastprivate (x)
for (i = 0; i < N; i++) {
x = foo (i);
y[i] = bar(i, x);
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
80Implementing Domain Decompositions
y[i] = bar(i, x);
}
last_x = x;
Intel® Software College
Pragma: parallel
In the effort to increase grain size, sometimes the code that should be executed in parallel goes beyond a single for loop
The parallel pragma is used when a block of code
should be executed in parallel
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
81Implementing Domain Decompositions
Intel® Software College
Pragma: for
The for pragma is used inside a block of code already marked with the parallel pragma
It indicates a for loop whose iterations should be
divided among the active threads
There is a barrier synchronization of the threads at
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
82Implementing Domain Decompositions
There is a barrier synchronization of the threads at the end of the for loop
Intel® Software College
Pragma: single
The single pragma is used inside a parallel block of
code
It tells the compiler that only a single thread should execute the statement or block of code immediately following
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
83Implementing Domain Decompositions
Intel® Software College
Clause: nowait
The nowait clause tells the compiler that there is no
need for a barrier synchronization at the end of a parallel for loop or single block of code
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
84Implementing Domain Decompositions
Intel® Software College
Case: parallel, for, single Pragmas
for (i = 0; i < N; i++)
a[i] = alpha(i);
if (delta < 0.0) printf (“delta < 0.0\n”);
for (i = 0; i < N; i++)
b[i] = beta (i, delta);
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
85Implementing Domain Decompositions
b[i] = beta (i, delta);
Intel® Software College
Solution: parallel, for, single Pragma
#pragma omp parallel
{
#pragma omp for nowait
for (i = 0; i < N; i++)
a[i] = alpha(i);
#pragma omp single nowait
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
86Implementing Domain Decompositions
#pragma omp single nowait
if (delta < 0.0) printf (“delta < 0.0\n”);
#pragma omp for
for (i = 0; i < N; i++)
b[i] = beta (i, delta);
}
Intel® Software College
Extended Example
for (i = 0; i < m; i++) {
low = a[i];
high = b[i];
if (low > high) {
printf (“Exiting during iteration %d\n”, i);
break;
}
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
87Implementing Domain Decompositions
}
for (j = low; j < high; j++)
c[j] += alpha (i, j);
}
Intel® Software College
Extended Example
for (i = 0; i < m; i++) {
low = a[i];
high = b[i];
if (low > high) {
printf (“Exiting during iteration %d\n”, i);
break;
}
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
88Implementing Domain Decompositions
}
#pragma omp parallel for
for (j = low; j < high; j++)
c[j] += alpha (i, j);
}
Intel® Software College
Extended Example
#pragma omp parallel private (i, j, low, high)
for (i = 0; i < m; i++) {
low = a[i];
high = b[i];
if (low > high) {
printf (“Exiting during iteration %d\n”, i);
break;
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
89Implementing Domain Decompositions
break;
}
#pragma omp for nowait
for (j = low; j < high; j++)
c[j] += alpha (i, j);
}
Intel® Software College
Extended Example
#pragma omp parallel private (i, j, low, high)
for (i = 0; i < m; i++) {
low = a[i];
high = b[i];
if (low > high) {
#pragma omp single nowait
printf (“Exiting during iteration %d\n”, i);
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
90Implementing Domain Decompositions
printf (“Exiting during iteration %d\n”, i);
break;
}
#pragma omp for nowait
for (j = low; j < high; j++)
c[j] += alpha (i, j);
}
Intel® Software College
Potential Pitfall?
double area, pi, x;
int i, n;
...
area = 0.0;
for (i = 0; i < n; i++) {
x = (i + 0.5)/n;
area += 4.0/(1.0 + x*x);
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
91Congronting Race Conditions
area += 4.0/(1.0 + x*x);
}
pi = area / n;
What happens when we make the for loop parallel?
Intel® Software College
Race Condition
A race condition is nondeterministic behavior caused by the times at which two or more threads access a shared variable
For example, suppose both Thread A and Thread B are executing the statement
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
92Congronting Race Conditions
area += 4.0 / (1.0 + x*x);
Intel® Software College
Value of area Thread A Thread B
11.667
One Timing ⇒⇒⇒⇒ Correct Sum
+3.765
15.432
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
93Congronting Race Conditions
15.432
15.432
+ 3.563
18.995
Intel® Software College
Value of area Thread A Thread B
11.667
Another Timing ⇒⇒⇒⇒ Incorrect Sum
+3.765
11.667
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
94Congronting Race Conditions
11.667
15.432
+ 3.563
15.230
Intel® Software College
Another Race Condition Example
struct Node {
struct Node *next;
int data; };
struct List {
struct Node *head; }
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
95Congronting Race Conditions
void AddHead (struct List *list,
struct Node *node) {
node->next = list->head;
list->head = node;
}
Intel® Software College
Original Singly-Linked List
headdata
listnode_a
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
96Congronting Race Conditions
headdata
next
Intel® Software College
Thread 1 after Stmt. 1 of AddHead
headdata
listnode_a
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
97Congronting Race Conditions
headdata
next
data
next
node_b
Intel® Software College
Thread 2 Executes AddHead
headdata
listnode_a
data
next
node_c
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
98Congronting Race Conditions
headdata
next
data
next
node_b
Intel® Software College
Thread 1 After Stmt. 2 of AddHead
headdata
listnode_a
data
next
node_c
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
99Congronting Race Conditions
headdata
next
data
next
node_b
Intel® Software College
Why Race Conditions Are Nasty
Programs with race conditions exhibit nondeterministic behavior
Sometimes give correct result
Sometimes give erroneous result
Programs often work correctly on trivial data sets
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
100Congronting Race Conditions
Programs often work correctly on trivial data sets and small number of threads
Errors more likely to occur when number of threads and/or execution time increases
Hence debugging race conditions can be difficult
Intel® Software College
Mutual Exclusion
We can prevent the race conditions described earlier by ensuring that only one thread at a time references and updates shared variable or data structure
Mutual exclusion refers to a kind of synchronizationthat allows only a single thread or process
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
101Congronting Race Conditions
that allows only a single thread or process at a time to have access to a shared resource
Mutual exclusion is implemented using some form of locking
Intel® Software College
Do Flags Guarantee Mutual Exclusion?
int flag = 0;
void AddHead (struct List *list,
struct Node *node) {
while (flag != 0) /* wait */ ;
flag = 1;
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
102Congronting Race Conditions
flag = 1;
node->next = list->head;
list->head = node;
flag = 0;
}
Intel® Software College
Flags Don’t Guarantee Mutual Exclusion
int flag = 0;
void AddHead (struct List *list,
struct Node *node) {
while (flag != 0) /* wait */ ;
flag = 1;
Thread 1
0
flag
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
103Congronting Race Conditions
flag = 1;
node->next = list->head;
list->head = node;
flag = 0;
}
Intel® Software College
Flags Don’t Guarantee Mutual Exclusion
int flag = 0;
void AddHead (struct List *list,
struct Node *node) {
while (flag != 0) /* wait */ ;
flag = 1;
Thread 1
0
flag Thread 2
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
104Congronting Race Conditions
flag = 1;
node->next = list->head;
list->head = node;
flag = 0;
}
Intel® Software College
Flags Don’t Guarantee Mutual Exclusion
int flag = 0;
void AddHead (struct List *list,
struct Node *node) {
while (flag != 0) /* wait */ ;
flag = 1;
Thread 1
1
flag Thread 2
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
105Congronting Race Conditions
flag = 1;
node->next = list->head;
list->head = node;
flag = 0;
}
Intel® Software College
Flags Don’t Guarantee Mutual Exclusion
int flag = 0;
void AddHead (struct List *list,
struct Node *node) {
while (flag != 0) /* wait */ ;
flag = 1;
Thread 1
1
flag Thread 2
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
106Congronting Race Conditions
flag = 1;
node->next = list->head;
list->head = node;
flag = 0;
}
Intel® Software College
Flags Don’t Guarantee Mutual Exclusion
int flag = 0;
void AddHead (struct List *list,
struct Node *node) {
while (flag != 0) /* wait */ ;
flag = 1;
Thread 1
0
flag Thread 2
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
107Congronting Race Conditions
flag = 1;
node->next = list->head;
list->head = node;
flag = 0;
}
Intel® Software College
Flags Don’t Guarantee Mutual Exclusion
int flag = 0;
void AddHead (struct List *list,
struct Node *node) {
while (flag != 0) /* wait */ ;
flag = 1;
Thread 1
0
flag Thread 2
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
108Congronting Race Conditions
flag = 1;
node->next = list->head;
list->head = node;
flag = 0;
}
Intel® Software College
Locking Mechanism
The previous method failed because checking the value of flag and setting its value were two
distinct operations
We need some sort of atomic test-and-set
Operating system provides functions to do this
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
109Congronting Race Conditions
Operating system provides functions to do this
The generic term “lock” refers to a synchronization mechanism used to control access to shared resources
Intel® Software College
Critical Sections
A critical section is a portion of code that threads execute in a mutually exclusive fashion
The critical pragma in OpenMP immediately
precedes a statement or block representing a critical section
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
110Congronting Race Conditions
Good news: critical sections eliminate race conditions
Bad news: critical sections are executed sequentially
More bad news: you have to identify critical sections yourself
Intel® Software College
Reminder: Motivating Example
double area, pi, x;
int i, n;
...
area = 0.0;
for (i = 0; i < n; i++) {
x = (i + 0.5)/n;
area += 4.0/(1.0 + x*x);
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
111Congronting Race Conditions
area += 4.0/(1.0 + x*x);
}
pi = area / n;
Where is the critical section?
Intel® Software College
Solution #1
double area, pi, x;
int i, n;
...
area = 0.0;
#pragma omp parallel for private(x)
for (i = 0; i < n; i++) {
x = (i + 0.5)/n;
#pragma omp critical
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
112Congronting Race Conditions
#pragma omp critical
area += 4.0 / (1.0 + x*x);
}
pi = area / n;
This ensures area will end up with the correct value.How can we do better?
Intel® Software College
Solution #2
double area, pi, tmp, x;
int i, n;
...
area = 0.0;
#pragma omp parallel for private(x,tmp)
for (i = 0; i < n; i++) {
x = (i + 0.5)/n;
tmp = 4.0/(1.0 + x*x);
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
113Congronting Race Conditions
tmp = 4.0/(1.0 + x*x);
#pragma omp critical
area += tmp;
}
pi = area / n;
This reduces amount of time spent in critical section.How can we do better?
Intel® Software College
Solution #3
double area, pi, tmp, x;int i, n;...area = 0.0;#pragma omp parallel private(tmp){
tmp = 0.0;#pragma omp for private (x)
for (i = 0; i < n; i++) {x = (i + 0.5)/n;
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
114Congronting Race Conditions
x = (i + 0.5)/n;tmp += 4.0/(1.0 + x*x);
}#pragma omp critical
area += tmp;}pi = area / n;
Why is this better?
Intel® Software College
Reductions
Given associative binary operator ⊕ the expression
a1 ⊕ a2 ⊕ a3 ⊕ ... ⊕ an
is called a reduction
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
115Congronting Race Conditions
The π-finding program performs a sum-reduction
Intel® Software College
OpenMP reduction Clause
Reductions are so common that OpenMP provides a reduction clause for the parallel for pragma
Eliminates need for
Creating private variable
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
116Congronting Race Conditions
Dividing computation into accumulation of local answers that contribute to global result
Intel® Software College
Solution #4
double area, pi, x;
int i, n;
...
area = 0.0;
#pragma omp parallel for private(x) \
reduction(+:area)
for (i = 0; i < n; i++) {
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
117Congronting Race Conditions
for (i = 0; i < n; i++) {
x = (i + 0.5)/n;
area += 4.0/(1.0 + x*x);
}
pi = area / n;
Intel® Software College
Important: Lock Data, Not Code
Locks should be associated with data objects
Different data objects should have different locks
Suppose lock associated with critical section of code instead of data object
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
118Congronting Race Conditions
Mutual exclusion can be lost if same object manipulated by two different functions
Performance can be lost if two threads manipulating different objects attempt to execute same function
Intel® Software College
Example: Hash Table Creation
NULL
NULL
NULL
NULL
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
119Congronting Race Conditions
...
NULL
NULL
Intel® Software College
Locking Code: Inefficient
#pragma omp parallel for private (index)
for (i = 0; i < elements; i++) {
index = hash(element[i]);
#pragma omp critical
insert_element (element[i], index);
}
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
120Congronting Race Conditions
}
Intel® Software College
Locking Data: Efficient
/* Static variable */
omp_lock_t hash_lock[HASH_TABLE_SIZE];
/* Inside function ‘main’ */
for (i = 0; i < HASH_TABLE_SIZE; i++)
omp_init_lock(&hash_lock[i]);
Declaration
Initialization
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
121Congronting Race Conditions
void insert_element (ELEMENT e, int i)
{
omp_set_lock (&hash_lock[i]);
/* Code to insert element e */
omp_unset_lock (&hash_lock[i]);
}
Use
Intel® Software College
Locks Are Dangerous
Suppose a lock is used to guarantee mutually exclusive access to a shared variable
Imagine two threads, each with its own critical section
Thread A Thread B
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
122Congronting Race Conditions
Thread A Thread B
a += 5; b += 5;
b += 7; a += 7;
a += b; a += b;
a += 11; b += 11;
Intel® Software College
Faulty Implementation
Thread A Thread B
lock (lock_a); lock (lock_b);
a += 5; b += 5;
lock (lock_b); lock (lock_a);
b += 7; a += 7;
What happens ifthreads are atthis point at thesame time?
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
123Congronting Race Conditions
b += 7; a += 7;
a += b; a += b;
unlock (lock_b); unlock (lock_a);
a += 11; b += 11;
unlock (lock_a); unlock (lock_b);
Intel® Software College
Deadlock
A situation involving two or more threads (processes) in which no thread may proceed because each is waiting for a resource held by another
Can be represented by a resource allocation graph
sem_bwants held by
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
124Congronting Race Conditions
A graph of deadlock contains a cycle
Thread A Thread A
sem_b
sem_a
wants
wants
held by
held by
Intel® Software College
More on Deadlocks
A program exhibits a global deadlock if every thread is blocked
A program exhibits local deadlock if only some of the threads in the program are blocked
A deadlock is another example of a nondeterministic
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
125Congronting Race Conditions
A deadlock is another example of a nondeterministic behavior exhibited by a parallel program
Adding debugging output to detect source of deadlock can change timing and reduce chance of deadlock occurring
Intel® Software College
Four Conditions for Deadlock
Mutually exclusive access to a resource
Threads hold onto resources they have while they wait for additional resources
Resources cannot be taken away from threads
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
126Congronting Race Conditions
Cycle in resource allocation graph
Intel® Software College
Deadlock Prevention Strategies
Don’t allow mutually exclusive access to resource
Make resource shareable
Don’t allow threads to wait while holding resources
Only request resources when have none. That means only hold one resource at a time or request all resources at once.
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
127Congronting Race Conditions
request all resources at once.
Allow resources to be taken away from threads.
Allow preemption. Works for CPU and memory. Doesn’t work for locks.
Ensure no cycle in request allocation graph.
Rank resources. Threads must acquire resources in order.
Intel® Software College
Correct Implementation
Thread A Thread B
lock (lock_a); lock (lock_a);
a += 5; lock (lock_b);
lock (lock_b); b += 5;
b += 7; a += 7;
Threads must locklock_a before lock_b
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
128Congronting Race Conditions
b += 7; a += 7;
a += b; a += b;
unlock (lock_b); unlock (lock_a);
a += 11; b += 11;
unlock (lock_a); unlock (lock_b);
Intel® Software College
Another Problem with Locks
Every call to function lock should be matched with a call to unlock, representing the start and the
end of the critical section
A program may be syntactically correct (i.e., may compile) without having matching calls
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
129Congronting Race Conditions
A programmer may forget the unlock call or may pass the wrong argument to unlock
A thread that never releases a shared resource creates a deadlock
Intel® Software College
Case Study: The N Queens Problem
Is there a way to placeN queens on an N-by-Nchessboard such thatno queen threatensanother queen?
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
130Implementing Task Decompositions
Intel® Software College
A Solution to the 4 Queens Problem
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
131Implementing Task Decompositions
Intel® Software College
Exhaustive Search
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
132Implementing Task Decompositions
Intel® Software College
Design #1 for Parallel Search
Create threads to explore different parts of the search tree simultaneously
If a node has children
The thread creates child nodes
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
133Implementing Task Decompositions
The thread explores one child node itself
Thread creates a new thread for every other child node
Intel® Software College
Design #1 for Parallel Search
Thread W
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
134Implementing Task Decompositions
Thread W NewThread X
NewThread Y
NewThread Z
Intel® Software College
Pros and Cons of Design #1
Pros
Simple design, easy to implement
Balances work among threads
Cons
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
135Implementing Task Decompositions
Cons
Too many threads created
Lifetime of threads too short
Overhead costs too high
Intel® Software College
Design #2 for Parallel Search
One thread created for each subtree rooted at a particular depth
Each thread sequentially explores its subtree
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
136Implementing Task Decompositions
Intel® Software College
Design #2 in Action
Thread1
Thread2
Thread3
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
137Implementing Task Decompositions
Intel® Software College
Pros and Cons of Design #2
Pros
Thread creation/termination time minimized
Cons
Subtree sizes may vary dramatically
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
138Implementing Task Decompositions
Subtree sizes may vary dramatically
Some threads may finish long before others
Imbalanced workloads lower efficiency
Intel® Software College
Design #3 for Parallel Search
Main thread creates work pool—list of subtrees to explore
Main thread creates finite number of co-worker threads
Each subtree exploration is done by a single thread
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
139Implementing Task Decompositions
Each subtree exploration is done by a single thread
Inactive threads go to pool to get more work
Intel® Software College
Work Pool Analogy
More rows than workers
Each worker takes an unpicked row and picks the crop
After completing a row, the
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
140Implementing Task Decompositions
After completing a row, the worker takes another unpicked row
Process continues until all rows have been harvested
Intel® Software College
Design #3 in Action
Thread1
Thread2
Thread3
Thread3
Thread1
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
141Implementing Task Decompositions
Intel® Software College
Pros and Cons of Strategy #3
Pros
Thread creation/termination time minimized
Workload balance better than strategy #2
Cons
Threads need exclusive access to data
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
142Implementing Task Decompositions
Threads need exclusive access to data structure containing work to be done, a sequential component
Workload balance worse than strategy #1
Conclusion
Good compromise between designs 1 and 2
Intel® Software College
Implementing Strategy #3 for N Queens
Work pool consists of N boards representing N possible placements of queen on first row
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
143Implementing Task Decompositions
Intel® Software College
Parallel Program Design
One thread creates list of partially filled-in boards
Fork: Create one thread per CPU
Each thread repeatedly gets board from list, searches for solutions, and adds to solution count, until no more board on list
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
144Implementing Task Decompositions
no more board on list
Join: Occurs when list is empty
One thread prints number of solutions found
Intel® Software College
Search Tree Node Structure
/* The ‘board’ struct contains information about a
node in the search tree; i.e., partially filled-
in board. The work pool is a singly linked
list of ‘board’ structs. */
struct board {
int pieces; /* # of queens on board*/
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
145Implementing Task Decompositions
int pieces; /* # of queens on board*/
int places[MAX_N]; /* Queen’s pos in each row */
struct board *next; /* Next search tree node */
};
Intel® Software College
Key Code in main Function
struct board *stack;
...
stack = NULL;
for (i = 0; i < n; i++) {
initial=(struct board *)malloc(sizeof(struct board));
initial->pieces = 1;
initial->places[0] = i;
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
146Implementing Task Decompositions
initial->places[0] = i;
initial->next = stack;
stack = initial;
}
num_solutions = 0;
search_for_solutions (n, stack, &num_solutions);
printf ("The %d-queens puzzle has %d solutions\n", n,
num_solutions);
Intel® Software College
Insertion of OpenMP Code
struct board *stack;...stack = NULL;for (i = 0; i < n; i++) {initial=(struct board *)malloc(sizeof(struct board));initial->pieces = 1;initial->places[0] = i;initial->next = stack;stack = initial;
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
147Implementing Task Decompositions
stack = initial;}num_solutions = 0;
omp_set_num_threads (omp_get_num_procs());#pragma omp parallelsearch_for_solutions (n, stack, &num_solutions);printf ("The %d-queens puzzle has %d solutions\n", n,
num_solutions);
Intel® Software College
Original C Function to Get Work
void search_for_solutions (int n,
struct board *stack, int *num_solutions)
{
struct board *ptr;
void search (int, struct board *, int *);
while (stack != NULL) {
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
148Implementing Task Decompositions
while (stack != NULL) {
ptr = stack;
stack = stack->next;
search (n, ptr, num_solutions);
free (ptr);
}
}
Intel® Software College
C/OpenMP Function to Get Work
void search_for_solutions (int n,
struct board *stack, int *num_solutions)
{
struct board *ptr;
void search (int, struct board *, int *);
while (stack != NULL) {
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
149Implementing Task Decompositions
while (stack != NULL) {
#pragma omp critical{ ptr = stack; stack = stack->next; }search (n, ptr, num_solutions);
free (ptr);
}
}
Intel® Software College
Original C Search Function
void search (int n, struct board *ptr,int *num_solutions)
{int i;int no_threats (struct board *);
if (ptr->pieces == n) {(*num_solutions)++;
} else {
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
150Implementing Task Decompositions
} else {ptr->pieces++;for (i = 0; i < n; i++) {
ptr->places[ptr->pieces-1] = i;if (no_threats(ptr))
search (n, ptr, num_solutions);}ptr->pieces--;
}
}
Intel® Software College
C/OpenMP Search Function
void search (int n, struct board *ptr,int *num_solutions)
{int i;int no_threats (struct board *);
if (ptr->pieces == n) {
#pragma omp critical(*num_solutions)++;
} else {
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
151Implementing Task Decompositions
} else {ptr->pieces++;for (i = 0; i < n; i++) {
ptr->places[ptr->pieces-1] = i;if (no_threats(ptr))
search (n, ptr, num_solutions);}ptr->pieces--;
}}
Intel® Software College
Only One Problem: It Doesn’t Work!
OpenMP program throws an exception
Culprit: Variable stack
Heap
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
152Implementing Task Decompositions
stack
Intel® Software College
Problem Site
int main ()
{
struct board *stack;
...
#pragma omp parallelsearch_for_solutions
(n, stack, &num_solutions);
...
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
153Implementing Task Decompositions
...
}
void search_for_solutions (int n,
struct board *stack, int *num_solutions)
{
...
while (stack != NULL) ...
Intel® Software College
1. Both Threads Point to Top
stack stack
Thread 1 Thread 2
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
154Implementing Task Decompositions
Intel® Software College
2. Thread 1 Grabs First Element
stack
Thread 1 Thread 2
stack
ptr
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
155Implementing Task Decompositions
ptr
Intel® Software College
3. Error #1:Thread 2 grabs same element
Thread 1 Thread 2
stack
ptr
stack
ptr
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
156Implementing Task Decompositions
ptr ptr
Intel® Software College
4. Error #2:Thread 1 deletes element and thenThread 2’s stack ptr dangles
stack
Thread 1 Thread 2
stack
ptr
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
157Implementing Task Decompositions
ptr
?
Intel® Software College
Remedy 1: Make stack Static
Thread 1 Thread 2
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
158Implementing Task Decompositions
stack
Intel® Software College
Remedy 2: Use Indirection
stack stack
Thread 1 Thread 2
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
159Implementing Task Decompositions
stack
Intel® Software College
Corrected main Function
struct board *stack;...stack = NULL;for (i = 0; i < n; i++) {initial=(struct board *)malloc(sizeof(struct board));initial->pieces = 1;initial->places[0] = i;initial->next = stack;stack = initial;
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
160Implementing Task Decompositions
stack = initial;}num_solutions = 0;
omp_set_num_threads (omp_get_num_procs());#pragma omp parallelsearch_for_solutions (n, &stack, &num_solutions);printf ("The %d-queens puzzle has %d solutions\n", n,
num_solutions);
Intel® Software College
Corrected Stack Access Function
void search_for_solutions (int n,
struct board **stack, int *num_solutions){
struct board *ptr;
void search (int, struct board *, int *);
while (*stack != NULL) {
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
161Implementing Task Decompositions
while (*stack != NULL) {
#pragma omp critical{ ptr = *stack;*stack = (*stack)->next; }
search (n, ptr, num_solutions);
free (ptr);
}
}
Intel® Software College
Case Study: Fancy Web Browser
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
162Implementing Task Decompositions
Intel® Software College
Case Study: Fancy Web Browser
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
163Implementing Task Decompositions
You can see snapshot ofpage before deciding
whether to click on the link
Intel® Software College
C Code
page = retrieve_page (url);
find_links (page, &num_links, &link_url);
for (i = 0; i < num_links; i++)
snapshots[i].image = NULL;
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
164Implementing Task Decompositions
snapshots[i].image = NULL;
for (i = 0; i < num_links; i++)
generate_preview (&snapshots[i]);
display_page (page);
Intel® Software College
Pseudocode, Option A
Retrieve page
Identify links
Enter parallel region
Thread gets ID number (id)
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
165Implementing Task Decompositions
Thread gets ID number (id)
If id = 0 draw page
else fetch page & build snapshot image (id-1)
Exit parallel region
Intel® Software College
Timeline of Option A
Display page
Fetch page, create snapshot
Fetch, create
Fetch, create
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
166Implementing Task Decompositions
Retrieve page
Identify links
Enter parallel block
Display page
Barrier at end ofparallel block
Intel® Software College
C/OpenMP Code, Option A
page = retrieve_page (url);
find_links (page, &num_links, &link_url);
for (i = 0; i < num_links; i++)
snapshots[i].image = NULL;
omp_set_num_threads (num_links + 1);
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
167Implementing Task Decompositions
#pragma omp parallel private (id)
{
id = omp_get_thread_num();
if (id == 0) display_page (page);
else generate_preview (&snapshots[id-1]);
}
Intel® Software College
Pseudocode, Option B
Retrieve page
Identify links
Two activities happen in parallel
1. Draw page
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
168Implementing Task Decompositions
1. Draw page
2. For all links do in parallel
Fetch page and build snapshot image
Intel® Software College
Parallel Sections
#pragma omp parallel sections
{
<code block A>
#pragma omp section
Meaning: The followingblock contains sub-blocksthat may execute inparallel
Each block executed by one thread
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
169Implementing Task Decompositions
<code block B>
#pragma omp section
<code block C>
}
Dividers between sections
Intel® Software College
Nested Parallelism
We can use parallel sections to specify two different concurrent activities: drawing the Web page and creating the snapshots
We are using a for loop to create multiple snapshots; number of iterations is known only at run time
We would like to make for loop parallel
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
170Implementing Task Decompositions
We would like to make for loop parallel
OpenMP allows nested parallelism: a parallel region inside another parallel region
A thread entering a parallel region creates a newteam of threads to execute it
Intel® Software College
Timeline of Option B
Display page
Fetch page, create snapshot
Fetch, create
Fetch, create
Enterparallel for
Barrier
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
171Implementing Task Decompositions
Retrieve page
Identify links
Enter parallel sections
Display page
Barrier
Intel® Software College
C/OpenMP Code, Option B
page = retrieve_page (url);
find_links (page, &num_links, &link_url);
omp_set_num_threads (2);
#pragma omp parallel sections
{
display_page (page);
#pragma omp section
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
172Implementing Task Decompositions
#pragma omp section
omp_set_num_threads (num_links);
#pragma omp parallel for
for (i = 0; i < num_links; i++)
generate_preview (&snapshots[i]);
}
top related