Top Banner
“Going Parallel with C++11” SUPERCOMPUTING 2012 Joe Hummel, PhD UC-Irvine [email protected] http://www.joehummel.net/downloads.html
40

Joe Hummel, PhD UC-Irvine [email protected] .

Apr 01, 2015

Download

Documents

Jaren Andrew
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

“Going Parallel with C++11” SUPERCOMPUTING 2012

Joe Hummel, PhD

[email protected]

http://www.joehummel.net/downloads.html

Page 2: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

2

New standard of C++ has been ratified◦ “C++0x” ==> “C++11”

Lots of new features Workshop will focus on concurrency

features

C++ 11

Going Parallel with C++11

Page 3: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

3

Why are we here?

Async programming: Better responsiveness…

GUIs (desktop, web, mobile)

Cloud

Windows 8

Parallel programming:

Better performance…

Financials

Scientific

Big data

C C

C CC C

C C

Going Parallel with C++11

Page 4: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

4

Hello World --- of threading

#include <thread>#include <iostream>

void func(){ std::cout << "**Inside thread " << std::this_thread::get_id() << "!" << std::endl;}

int main(){ std::thread t; t = std::thread( func );

t.join(); return 0;}

A simple function for thread to do…

Create and schedule thread…

Wait for thread to finish…

Going Parallel with C++11

Page 5: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

5

Hello world…

Demo #1

Going Parallel with C++11

Page 6: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

6

Avoiding early program termination…

#include <thread>#include <iostream>

void func(){ std::cout << "**Hello world...\n";}

int main(){ std::thread t; t = std::thread( func );

t.join(); return 0;}

(1) Thread function must do exception handling; unhandled exceptions ==> termination…

void func(){ try { // computation: } catch(...) { // do something: }}

(2) Must join, otherwise termination… (avoid use of detach( ), difficult to use safely)

Going Parallel with C++11

Page 7: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

7

Old school:◦ distinct thread functions (what we just saw)

New school:◦ lambda expressions (aka anonymous functions)

Pick your style…

Going Parallel with C++11

Page 8: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

8

A thread that loops until we tell it to stop…

Demo #2

When user presses ENTER, we’ll tell thread to stop…

Going Parallel with C++11

Page 9: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

9

(1) via thread function

#include <thread>#include <iostream>#include <string>

using namespace std;

. . .

int main(){ bool stop(false); thread t(loopUntil, &stop);

getchar(); // wait for user to press enter:

stop = true; // stop thread: t.join(); return 0;}

void loopUntil(bool *stop){ auto duration = chrono::seconds(2);

while (!(*stop)) { cout << "**Inside thread...\n"; this_thread::sleep_for(duration); }}

Going Parallel with C++11

Page 10: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

10

(2) via lambda expressionint main(){

bool stop(false); thread t = thread( [&]() { auto duration = chrono::seconds(2);

while (!stop) { cout << "**Inside thread...\n"; this_thread::sleep_for(duration); } } );

getchar(); // wait for user to press enter:

stop = true; // stop thread: t.join(); return 0;}

thread t( [&] () { auto duration = chrono::seconds(2);

while (!stop) { cout << "**Inside thread...\n"; this_thread::sleep_for(duration); } } );

lambda expression

Closure semantics: [ ]: none, [&]: by ref, [=]: by val, …

lambda arguments

Page 11: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

11

Lambdas:◦ Easier and more readable -- code remains inline◦ Potentially more dangerous ([&] captures everything by

ref)

Functions:◦ More efficient -- lambdas involve class, function objects◦ Potentially safer -- requires explicit variable scoping◦ More cumbersome and illegible

Trade-offs

Going Parallel with C++11

Page 12: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

12

Multiple threads looping…

Demo #3

When user presses ENTER, all threads

stop…

Going Parallel with C++11

Page 13: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

13

Solution#include <thread>#include <iostream>#include <string>#include <vector>#include <algorithm>

using namespace std;

int main(){ cout << "** Main Starting **\n\n"; bool stop = false; . . .

getchar();

cout << "** Main Done **\n\n"; . . . return 0;}

vector<thread> workers;

for (int i = 1; i <= 3; i++){ workers.push_back( thread([i, &stop]() { while (!stop) { cout << "**Inside thread " << i << "...\n"; this_thread::sleep_for(chrono::seconds(i)); } }) );}

stop = true; // stop threads:

// wait for threads to complete:for ( thread& t : workers ) t.join();

Going Parallel with C++11

Page 14: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

14

Matrix multiply…

A Real Example

Going Parallel with C++11

Page 15: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

15

Multi-threaded solutionint rows = N / numthreads;int extra = N % numthreads;int start = 0; // each thread does [start..end)int end = rows;

vector<thread> workers;

for (int t = 1; t <= numthreads; t++){

if (t == numthreads) // last thread does extra rows:end += extra;

workers.push_back( thread([start, end, N, &C, &A, &B](){

for (int i = start; i < end; i++)for (int j = 0; j < N; j++){

C[i][j] = 0.0;for (int k = 0; k < N; k++)

C[i][j] += (A[i][k] * B[k][j]);}

}));

start = end;end = start + rows;

}

for (thread& t : workers) t.join();

// 1 thread per core:numthreads = thread::hardware_concurrency();

Page 16: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

16

Parallelism alone is not enough…

High-Performance Computing

HPC == Parallelism + Memory Hierarchy ─ Contention

Expose parallelism

Maximize data locality:• network• disk• RAM• cache• core

Minimize interaction:• false sharing• locking• synchronization

Going Parallel with C++11

Page 17: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

17

Cache-friendly matrix multiply

XGoing Parallel with C++11

Page 18: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

18

Loop interchange is first step…

Cache-friendly solution

workers.push_back( thread([start, end, N, &C, &A, &B](){

for (int i = start; i < end; i++)for (int j = 0; j < N; j++)

C[i][j] = 0.0;

for (int i = start; i < end; i++)

for (int k = 0; k < N; k++)for (int j = 0; j < N; j++)

C[i][j] += (A[i][k] * B[k][j]);

}));

Next step is to block multiply…

Going Parallel with C++11

Page 19: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

19

C++11 Features and Status

Going Parallel with C++11

Page 20: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

20

No compiler as yet fully implements C++11

Visual C++ 2012 has best concurrency support

◦ Part of Visual Studio 2012

gcc 4.7 has best overall support◦ http://gcc.gnu.org/projects/cxx0x.html

clang 3.1 appears very good as well◦ I did not test

◦ http://clang.llvm.org/cxx_status.html

Compilers…

Going Parallel with C++11

Page 21: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

21

Compiling with gcc

# makefile

# threading library: one of these should work# tlib=threadtlib=pthread

# gcc 4.6:ver=c++0x# gcc 4.7:# ver=c++11

build:g++ -std=$(ver) -Wall main.cpp -l$(tlib)

Going Parallel with C++11

Page 22: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

22

Executive SummaryConcept Header Summary

Threads <thread> Standard, low-level, type-safe; good basis for building HL systems (futures, tasks, …)

Futures <future> Via async function; hides threading, better harvesting of return value & exception handling

Locking <mutex> Standard, low-level locking primitives

Condition Vars <condition_variable> Low-level synchronization primitives

Atomics <atomic> Predictable, concurrent access without data race

Memory Model “Catch Fire” semantics; if program contains a data race, behavior of memory is undefined

Thread Local Thread-local variables [ problematic => avoid ]

Going Parallel with C++11

Page 23: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

23

Use mutex to protect against concurrent access…

Locking

thread t1([&]() { m.lock(); sum += compute();

m.unlock(); });

#include <mutex>

mutex m;int sum;

thread t2([&]() { m.lock(); sum += compute();

m.unlock(); });

Going Parallel with C++11

Page 24: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

24

“Resource Acquisition Is Initialization”◦ Advocated by B. Stroustrup for resource management

◦ Uses constructor & destructor to properly manage resources (files, threads, locks, …) in presence of exceptions, etc.

RAII

thread t([&](){ m.lock();

sum += compute();

m.unlock();

});

thread t([&]() { lock_guard<mutex> lg(m);

sum += compute();

}

);

should be written as…

Locks m in constructor

Unlocks m in destructor

Going Parallel with C++11

Page 25: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

25

Use atomic to protect shared variables…◦ Lighter-weight than locking, but much more limited in applicability

Atomics

thread t1([&]() { count++; });

#include <atomic>

atomic<int> count;count = 0;

thread t2([&]() { count++; });

thread t3([&]() { count = count + 1; });Xnot safe…Going Parallel with C+

+11

Page 26: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

26

Atomics enable safe, lock-free programming◦ “Safe” is a relative word…

Lock-free programming

thread t1([&]() { x = 42; done = true;

});

thread t2([&]() { while (!done) ; assert(x==42);

});

int x;atomic<bool> done;done = false;

doneflag

thread t1([&]() { if (!initd) { lock_guard<mutex> _(m); x = 42; initd = true; } << consume x, … >> });

int x;atomic<bool> initd;initd = false;

thread t2([&]() { if (!initd) { lock_guard<mutex> _(m); x = 42; initd = true; } << consume x, … >> });

lazyinit

Page 27: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

27

Demo Prime numbers…

Going Parallel with C++11

Page 28: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

28

Futures provide a higher-level of abstraction◦ Starts an asynchronous operation on some thread, await result…

Futures

#include <future> . .

future<int> fut = async( []() -> int { int result = PerformLongRunningOperation(); return result; });..

try{ int x = fut.get(); // join, harvest result: cout << x << endl;}catch(exception &e){ cout << "**Exception: " << e.what() << endl;}

return type…

Going Parallel with C++11

Page 29: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

29

May run on the current thread May run on a new thread Often better to let system decide…

Execution of futures

// run on current thread when someone asks for value (“lazy”):future<T> fut1 = async( launch::sync, []() -> ... );future<T> fut2 = async( launch::deferred, []() -> ... );

// run on a new thread:future<T> fut3 = async( launch::async, []() -> ... );

// let system decide:future<T> fut4 = async( launch::any, []() -> ... );future<T> fut5 = async( []() ... );

Going Parallel with C++11

Page 30: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

30

Demo

NetflixMovieReview

s(.txt)

Netflix Data

Mining App

Computes average review for a movie…

Netflix data-mining…

Going Parallel with C++11

Page 31: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

31

C++ committee thought long and hard on memory model semantics…

◦ “You Don’t Know Jack About Shared Variables or Memory Models”, Boehm and Adve, CACM, Feb 2012

Conclusion:◦ No suitable definition in presence of race conditions

Solution:◦ Predictable memory model *only* in data-race-free

codes◦ Computer may “catch fire” in presence of data races

Memory Model

Going Parallel with C++11

Page 32: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

32

Example…

thread t1([&]() { x = 1;

r1 = y; });

t1.join();

int x, y, r1, r2;

x = y = r1 = r2 = 0;

thread t2([&]() { y = 1;

r2 = x; });

t2.join();

What can we say about r1

and r2?Going Parallel with C++11

Page 33: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

33

Dekker’s example…

If we think in terms of all possible thread interleavings

(aka “sequential consistency”),

then we know r1 = 1, r2 = 1, or both

In C++ 11? Not only are the values of x, y, r1 and r2

undefined, but the program may crash!

Going Parallel with C++11

Page 34: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

34

A program is data-race-free (DRF) if no sequentially-consistent execution results in a data race. Avoid anything else.

C++ 11 Memory Model

Def: two memory accesses conflict if they 1. access the same scalar object or contiguous sequence of bit fields, and2. at least one access is a store.

Def: two memory accesses participate in a data race if they 1. conflict, and2. can occur simultaneously.

via independent threads, locks, atomics, …

Going Parallel with C++11

Page 35: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

35

Beyond Threads

Going Parallel with C++11

Page 36: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

36

Tasks are a higher-level abstraction

◦ Idea: developers identify work run-time system deals with execution details

Tasks vs. Threads

Task: a unit of work; an object denoting an ongoing operation or computation.

Going Parallel with C++11

Page 37: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

37

Microsoft PPL: Parallel Patterns Library

Example

#include <ppl.h>

for (int i = 0; i < N; i++)for (int j = 0; j < N; j++)

C[i][j] = 0.0;

// for (int i = 0; i < N; i++)Concurrency::parallel_for(0, N, [&](int i){for (int k = 0; k < N; k++)for (int j = 0; j < N; j++)C[i][j] += (A[i][k] * B[k][j]);

});Matrix Multiply

Going Parallel with C++11

Page 38: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

38

Execution Model

C C

C CC C

C C

Windows Process

Thread Pool

workerthread

workerthread

workerthread

workerthread

parallel_for( ... );tasktasktasktask

global work queue

Parallel Patterns Library

Resource Manager

Task Scheduler

Windows

Page 39: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

39

That’s it!

Going Parallel with C++11

Page 40: Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu .

40

Presenter: Joe Hummel◦ Email: [email protected]◦ Materials: http://www.joehummel.net/downloads.html

References:◦ Book: “C++ Concurrency in Action”, by Anthony Williams

◦ Talks: Bjarne and friends at MSFT’s “Going Native 2012” http://channel9.msdn.com/Events/GoingNative/GoingNative-2012

◦ Tutorials: really nice series by Bartosz Milewski http://bartoszmilewski.com/2011/08/29/c11-concurrency-tutorial/

◦ FAQ: Bjarne Stroustrup’s extensive FAQ http://www.stroustrup.com/C++11FAQ.html

Thank you for attending

Going Parallel with C++11