Effective Concurrency: Exercises · 2017. 10. 10. · Exercises Herb Sutter Software Development Consultant © Herb Sutter except material otherwise referenced. Date updated: April

Exercises Herb Sutter

Software Development Consultant

www.gotw.ca/training

© Herb Sutter

except material otherwise referenced.

Date updated: April 26, 2017

Page: 1

Effective Concurrency:

Exercises

Herb Sutter

Effective Concurrency

Introduction and Fundamentals

Pillar 1: Concurrency For Isolation

Pillar 2: Parallelism For Scalability

Machine Architecture

Scalability and Migration

Pillar 3: Consistency

Locks

Lock-Free

Concluding Thoughts




© Herb Sutter



Page: 2

Group Exercise Reimplement Calculator::GetResult() to be more usable, and not

require the caller to wait until the Calculator object is destroyed.class Calculator {public:

/*…*/ GetResult( /*…*/ ) { a.Send( bind( &Calculator::DoGetResult, this, /*…*/ ) ); }

private:/*…*/ DoGetResult( /* … */ ) {

}double result;Active a;/* etc. */

};

You have 10 minutes.

Calculator::GetResult(), Take 1 One option is to use an asynchronous value (e.g., future<T>).

For example:public: future<double> GetResult() {

promise<double> p;future<double> ret = p.get_future();a.Send( bind( &Calculator::DoGetResult, this, move(p) ) );return ret;

}private: void DoGetResult( promise<double>&& p ) {

p.set_value( result );}

Sample calling code:future<double> answer;Calculator calc;calc.Add( 10 );calc.Add( 32 );answer = calc.GetResult();// … do concurrent work …Use( answer.get() ); // waits until available, if necessary




© Herb Sutter



Page: 3

Calculator::GetResult(), Take 2 Another option is to send a message back. For example:

public: void GetResult( function<void(double)> d ) {a.Send( bind( &Calculator::DoGetResult, this, d ) );

}private: void DoGetResult( function<void(double)> d ) {

d( result );}

Sample calling code:void MyActiveClass::DoSomeWork() {

calc.Add( 10 );calc.Add( 32 );calc.GetResult( bind(MyClass::AcceptResult, this) );

} // result will be pumped as a new messagevoid MyActiveClass::AcceptResult( double d ) {

a.Send( [=] {DoSomethingWith( d ); // use once it’s available

} );}

One More Interface Flaw… The two options provide these interfaces:

Option 1: Return a future.public: future<double> GetResult();Calculator calc;calc.Add( 10 );calc.Add( 32 );answer = calc.GetResult();

Option 2: Accept a callback.public: void GetResult( function<void(double)> d );Calculator calc;calc.Add( 10 );calc.Add( 32 );calc.GetResult( bind(MyClass::AcceptResult, this) );

Q: What’s the problem with both?




© Herb Sutter



Page: 4

Calculator::GetResult() and Interface Flaws

The two options provide these interfaces: Option 1: Return a future.

public: future<double> GetResult();Calculator calc;calc.Add( 10 );calc.Add( 32 );answer = calc.GetResult();

Option 2: Accept a callback.public: void GetResult( function<void(double)> d );Calculator calc;calc.Add( 10 );calc.Add( 32 );calc.GetResult( bind(MyClass::AcceptResult, this) );

Q: What’s the problem with both? A: There’s a window between Add and Get. If another thread is

using this Calculator, we might not Get the intended answer. Q2: How can we fix this timing-dependent code?

An Improved Interface

In principle, a caller should be able to get a “future” handle to the result of any given asynchronous call.

Calculator calc;calc.Add( 10 );future<double> answer = calc.Add( 32 );

// … sometime later …

cout << answer.get();

In this case, that’s also a handy place to return the state as of the end of the call.

The caller can easily ask for any result he cares about, with a simpler interface and no timing window.




© Herb Sutter



Page: 5

Group Exercise

Implement the pipelined version of SendPackets using an active object for each pipeline stage.

Bonus points for:

Avoiding a distinct class type for each stage.

Leveraging scoped lifetimes in C++/C#/Java.





© Herb Sutter



Page: 6

Pipeline Stage (C++)

Each stage does just one part.

class Stage {public:

Stage( function<void(Buffer*)> w ) : work(w) { }

void Process( Buffer* buf ) { a.Send( [=] { work( b );

} ); }

private:function<void(Buffer*)> work;Active a;

};

Setting Up the Pipeline (C++11)

C++-ish syntax:

void SendPackets( Buffers& bufs ) {Stage encryptor ( [] ( Buffer* b ) { Encrypt(b); } ); Stage compressor ( [&]( Buffer* b ) { Compress(b);

encryptor.Process(b); } );Stage decorator ( [&]( Buffer* b ) { Decorate(b);

compressor.Process(b); } );for( auto& b : bufs ) {

decorator.Process( &b );}

} // automatically blocks waiting for pipeline to finish




© Herb Sutter



Page: 7

Setting Up the Pipeline (C#)

C#-ish syntax:public void SendPackets( Buffers bufs ) {

using( Stage encryptor =new Stage( (Buffer b) => { Encrypt(b); } ) ) {

using( Stage compressor = new Stage( (Buffer b) => { Compress(b); encryptor.Process(b); } ) {

using( Stage decorator = new Stage( (Buffer b) => { Decorate(b); compressor.Process(b); } ) {

foreach( b in bufs ) {decorator.Process( b );

}

} } } // automatically blocks waiting for pipeline to finish

}

Setting Up the Pipeline (Java) Java-ish syntax:

public void SendPackets( Buffers bufs ) {Stage encryptor = null;Stage compressor = null;Stage decorator = null;try {

encryptor = new Stage( new EncryptRunnable() );compressor = new Stage( new CompressRunnable( encryptor ) );decorator = new Stage( new DecorateRunnable( compressor ) );for( b : bufs ) {

decorator.Decorate( b );}

}finally { // buggy dispose

if( encryptor != null ) encryptor.dispose(); // automatically blockif( compressor != null ) compressor.dispose(); // waiting for theif( decorator != null ) decorator.dispose(); // pipeline to finish

}}




© Herb Sutter



Page: 8

Setting Up the Pipeline (Java) Java-ish syntax:

public void SendPackets( Buffers bufs ) {Stage encryptor = null;Stage compressor = null;Stage decorator = null;try {

encryptor = new Stage( new EncryptRunnable() );compressor = new Stage( new CompressRunnable( encryptor ) );decorator = new Stage( new DecorateRunnable( compressor ) );for( b : bufs ) {

decorator.Decorate( b );}

}finally { // correct dispose

if( decorator != null ) decorator.dispose(); // automatically blockif( compressor != null ) compressor.dispose(); // waiting for theif( encryptor != null ) encryptor.dispose(); // pipeline to finish

}}

Post Mortem

What are the advantages and disadvantages of writing the pipeline using:

Explicit messages + queues.

Active objects.

Discuss.




© Herb Sutter



Page: 9

And More Flexibility… C++-ish syntax:

void SendPackets( Buffers& bufs ) {Stage encryptor ( [] ( Buffer* b ) { Encrypt(b); } ); Stage archiver ( [] ( Buffer* b ) { Archive(b); } ); Stage compressor ( [&]( Buffer* b ) { Compress(b);

if( b->something() )encryptor.Process(b);

else archiver.Process(b);

} );Stage decorator ( [&]( Buffer* b ) { Decorate(b);

compressor.Process(b); );}

for( auto b : bufs ) {decorator.Process( &b );

}} // automatically blocks waiting for pipeline to finish




© Herb Sutter



Page: 10








Locks

Lock-Free

Concluding Thoughts

Group Exercise Given:

struct TreeNode {double value;TreeNode* leftChild;TreeNode* rightChild;

};

Implement a ParallelSum function that returns the sum of the values stored in a tree:

double ParallelSum( TreeNode* root ) {

}





© Herb Sutter



Page: 11

To Be Traversed…

ParallelSum for Trees, Take 1 (Flawed) Given:


};

Sum the values in a tree:double ParallelSum( TreeNode* n ) {

if( !n ) return 0;future<double> f1 = pool.run( [=]{ return ParallelSum( n->leftChild ); } );future<double> f2 = pool.run( [=]{ return ParallelSum( n->rightChild ); } );return n->value + f1.get() + f2.get(); // join, wait for results

}

Q: What’s wrong with this code?




© Herb Sutter



Page: 12

ParallelSum for Trees, Take 1 (Flawed) Given:


};


if( !n ) return 0;future<double> f1 = pool.run( [=]{ return ParallelSum( n->leftChild ); } );future<double> f2 = pool.run( [=]{ return ParallelSum( n->rightChild ); } );return n->value + f1.get() + f2.get(); // join, wait for results

}

Q: What’s wrong with this code? A: Today, granularity.

Leaf computations are small and not worth shipping to a thread pool. Worse news: Leaf computations dominate.

Take 1: Resulting Tasks

null tasksleaf tasks




© Herb Sutter



Page: 13

ParallelSum for Trees, Take 2 (Better) Sum the values in a tree:

double ParallelSum( TreeNode* n ) {int result = 0;vector<future<double>> v;while( n ) {

result += n->value;if( n->leftChild ) { // send someone that way

v.push_back( pool.run( [=]{ return ParallelSum( n->leftChild ); } ) );}n = n->rightChild; // while I go this way

}for( auto& f : v ) result += f.get(); // join, wait for resultsreturn result;

}

Better granularity, requires no knowledge of tree: “Vertical” tasks – if pointer traversal is significant, it’s sequential anyway. 50% fewer “leaf” computations. 100% fewer null computations (for nonempty tree).





© Herb Sutter



Page: 14

ParallelSum for Trees, Take 3 (Better) Given:


};

Sum the values in a tree:double ParallelSum( TreeNode* n, int depth = 0 ) {

if( !n ) return 0;if( depth > log2(treeSize)-K ) return OtherSum( n );future<double> f =

pool.run( [=]{ return ParallelSum( n->leftChild, depth+1 ); } );return n->value + ParallelSum( n->rightChild, depth+1 ) + f.get();

}

Better still, requires “a little” approximate knowledge of tree. “treeSize” heuristic only needs to be approximate. Tree doesn’t need to be perfectly balanced, just not badly unbalanced. Might still be a handful of null tasks.





© Herb Sutter



Page: 15

RECALL: ParallelSum for Trees, Take 1

Given:struct TreeNode {

double value;TreeNode* leftChild;TreeNode* rightChild;

};


if( !n ) return 0;future<double> f1 = pool.run( [=]{ return ParallelSum( n->leftChild ); } );future<double> f2 = pool.run( [=]{ return ParallelSum( n->rightChild ); } );

return n->value + f1.get() + f2.get(); // join, wait for results}

Group Exercise Given:

struct GraphNode {double value;vector<GraphNode*> children; /*…*/

};

Implement a ParallelSum function that returns the sum of the values stored in a potentially-cyclic graph:

double ParallelSum( GraphNode* root ) {

}





© Herb Sutter



Page: 16

ParallelSum for Graphs, Take 1 (Flawed)

Given:struct GraphNode {

double value;vector<GraphNode*> children; /*…*/

};

Sum the values in a graph:double ParallelSum( GraphNode* n ) {

if( !n ) return 0;vector<future<double>> vf;for( GraphNode* p : n->children ) // spin off tasksvf.push_back( pool.run( [=]{ return ParallelSum( p ); } ) );

double result = n->value;for( auto f : vf ) result += f.get(); // “join” barrier, wait for resultsreturn result;

}

Q: What’s wrong with this code?




};




}

Q: What’s wrong with this code? A1 (today): Granularity, as before.




© Herb Sutter



Page: 17




};




}

Q: What’s wrong with this code? A2: Paths and cycles. Double-counting if a node is reachable along

multiple paths. Infinite looping if there’s a cycle.

ParallelSum for Graphs, Take 2 (Better) Given:

struct GraphNode {double value;vector<GraphNode*> children; atomic<bool> visited; // initially = false

};


if( !n ) return 0;if( n->visited.exchange(true) ) return 0;

vector<future<double>> vf;for( GraphNode* p : n->children )vf.push_back( pool.run( [=]{ return ParallelSum( p ); } ) );


}




© Herb Sutter



Page: 18

ParallelSum for Graphs, Take 3 (Better:

Allows Concurrency, Addresses Cycles)


double value;vector<GraphNode*> children;

};

Sum the values in a graph:double ParallelSum() { return ParallelSum( root, new ConcurrentSet() ); }

double ParallelSum( GraphNode* n, ConcurrentSet* ourSet ) {if( !n ) return 0;if( !ourSet.AtomicUniqueInsert(n) ) return 0;

vector<future<double>> vf;for( GraphNode* p : n->children )vf.push_back( pool.run( [=]{ return ParallelSum( p, ourSet ); } ) );


}


Stronger Concurrency, Addresses Cycles)


double value;vector<GraphNode*> children; atomic<unsigned> turn, visited; // initially = 1

};atomic<unsigned> nextTurn = 1;

Sum the values in a graph:double ParallelSum() { return ParallelSum( root, nextTurn++ ); }

double ParallelSum( GraphNode* n, unsigned myTurn ) {if( !n ) return 0;while( n->turn < myTurn ) { } // let earlier traversers leave if( ! n->visited.compare_exchange( myTurn, myTurn+1 ) ) return 0;

vector<future<double>> vf;for( GraphNode* p : n->children )

vf.push_back( pool.run( [=]{ return ParallelSum( p, myTurn ); } ) );double result = n->value;n->turn = myTurn+1; // let next traverser infor( auto f : vf ) result += f.get(); // “join” barrier, wait for resultsreturn result;

}




© Herb Sutter



Page: 19


Stronger Concurrency, Addresses Cycles)


double value;vector<GraphNode*> children; atomic<unsigned> turn, visited, subgraphDepth; // exact or approximate

};atomic<unsigned> nextTurn = 1;

Sum the values in a graph:double ParallelSum() { return ParallelSum( root, nextTurn++ ); }

double ParallelSum( GraphNode* n, unsigned myTurn ) {if( !n ) return 0;while( n->turn < myTurn ) { } // let earlier traversers leave if( ! n->visited.compare_exchange( myTurn, myTurn+1 ) ) return 0;

if( n->subgraphDepth <= limit ) return OtherSum( n );vector<future<double>> vf;for( GraphNode* p : n->children )

vf.push_back( pool.run( [=]{ return ParallelSum( p, myTurn ); } ) );double result = n->value;n->turn = myTurn+1; // let next traverser infor( auto f : vf ) result += f.get(); // “join” barrier, wait for results

return result; }




© Herb Sutter



Page: 20

Group Exercise

Implement a parallel search that regains determinism: Return not “any” match, but the “first” match.

template<class RAIter, class T>RAIter p_find_with_determinism(

const RAIter first, const RAIter last, const T& value ) {

}


Deterministic Parallel Search, Take 1

(Flawed)

A simple reduction:template<class RAIter, class T>RAIter p_find( const RAIter first, const RAIter last, const T& value ) {

size_t chunkSize = max( 1, /*something*/ ), LP = (last-first)/chunkSize;vector<RAIter> result(LP,last); // no matches found yetfor( int i = 0; i < LP; ++i ) pool.run( [=, &result] {

RAIter myFirst = min( first + i*chunkSize, last ),myLast = myFirst + min( chunkSize, distance(myFirst,last) );

for( ; myFirst != myLast; ++myFirst ) if( *myFirst == value ) { result[i] = myFirst; break; } // match found

} );pool.join();for( int i = 0; i < LP; ++i )

if( result[i] != last ) return result[i];return last;

}

Q: What’s the overhead?




© Herb Sutter



Page: 21


(Flawed)


size_t chunkSize = max( 1, /*something*/ ), LP = (last-first)/chunkSize;vector<RAIter> result(LP,last); // no matches found yetfor( int i = 0; i < LP; ++i ) pool.run( [=, &result] {


for( ; myFirst != myLast; ++myFirst ) if( *myFirst == value ) { result[i] = myFirst; break; } // match found



}


A: (Large) No early termination. Still same big-Oh, but a higher constant.(Small) Sequential reduction step at end.


(Flawed)


size_t chunkSize = max( 1, /*something*/ ), LP = (last-first)/chunkSize;atomic<int> bestFound = LP; // no matches found yetvector<RAIter> result(LP+1,last); for( int i = 0; i < LP; ++i ) pool.run( [=, &bestFound, &result] {


for( ; bestFound > i && myFirst != myLast; ++myFirst ) if( *myFirst == value ) { bestFound = i; result[i] = myFirst; break; }

} );pool.join();return result[bestFound];

}

Q: What’s the problem?




© Herb Sutter



Page: 22


(Flawed)


size_t chunkSize = max( 1, /*something*/ ), LP = (last-first)/chunkSize;atomic<int> bestFound = LP; // no matches found yetvector<RAIter> result(LP+1,last); for( int i = 0; i < LP; ++i ) pool.run( [=, &bestFound, &result] {


for( ; bestFound > i && myFirst != myLast; ++myFirst ) if( *myFirst == value ) { bestFound = i; result[i] = myFirst; break; }

} );pool.join();return result[bestFound];

}

Q: What’s the problem?

A: Buggy code. Timing: Could fail to record “correct” bestFound result.

Deterministic Parallel Search, Take 3 (Fixed)


size_t chunkSize = max( 1, /*something*/ ), LP = (last-first)/chunkSize;atomic<int> bestFound = LP; // no matches found yetmutex mutWriters;RAIter result = last; for( int i = 0; i < LP; ++i ) pool.run( [=, &bestFound, &result] {


for( ; bestFound > i && myFirst != myLast; ++myFirst ) if( *myFirst == value ) {

lock_guard lock(mutWriters); // acquire lockif( bestFound > i ) { result = myFirst; bestFound = i; }break;

} // release lock} );pool.join();return result;

}




© Herb Sutter



Page: 23


(Fixed, Lock-Free)


size_t chunkSize = max( 1, /*something*/ ), LP = (last-first)/chunkSize;vector<atomic<bool>> stop(LP,false);vector<RAIter> result(LP,last); // no matches found yetfor( int i = 0; i < LP; ++i ) pool.run( [=, &stop, &result] {


for( ; !stop[i] && myFirst != myLast; ++myFirst ) if(*myFirst == value) { result[i] = myFirst; while(i < LP) stop[i++] = true; break; }



}



(Fixed, Lock-Free)


size_t chunkSize = max( 1, /*something*/ ), LP = (last-first)/chunkSize;vector<atomic<bool>> stop(LP,false);vector<RAIter> result(LP,last); // no matches found yetfor( int i = 0; i < LP; ++i ) pool.run( [=, &stop, &result] {


for( ; !stop[i] && myFirst != myLast; ++myFirst ) if(*myFirst == value) { result[i] = myFirst; while(i < LP) stop[i++] = true; break; }



}


A: (Medium) Less early termination. Same big-Oh, different constant.(Small) Sequential reduction step at end.




© Herb Sutter



Page: 24

Regaining Determinism

Q: What does it cost to get determinism back in a parallel search? (Return not “any” match, but the “first” match.) This is important in migrating existing code to parallelism,

because some callers might rely on the deterministic semantics.

A: Only a smaller constant factor. We still benefit from parallel searching, we just don’t get to stop all other workers (only some) when one finds a match. Approach: Workers that are exploring later/worse subranges

can stop once a result that precedes their range is found.

Cost: For the one-match case, we typically have to wait for half the workers to search their whole subranges instead of half (avg. 50% extra work).

Result: Still O(N/MP), with the constant factor higher.




© Herb Sutter



Page: 25








Locks

Lock-Free

Concluding Thoughts








Locks

Lock-Free

Concluding Thoughts

Effective Concurrency: Exercises · 2017. 10. 10. · Exercises Herb Sutter Software Development Consultant © Herb Sutter except material otherwise referenced. Date updated: April

Documents