Page 1
Exercises Herb Sutter
Software Development Consultant
www.gotw.ca/training
© Herb Sutter
except material otherwise referenced.
Date updated: April 26, 2017
Page: 1
Effective Concurrency:
Exercises
Herb Sutter
Effective Concurrency
Introduction and Fundamentals
Pillar 1: Concurrency For Isolation
Pillar 2: Parallelism For Scalability
Machine Architecture
Scalability and Migration
Pillar 3: Consistency
Locks
Lock-Free
Concluding Thoughts
Page 2
Exercises Herb Sutter
Software Development Consultant
www.gotw.ca/training
© Herb Sutter
except material otherwise referenced.
Date updated: April 26, 2017
Page: 2
Group Exercise Reimplement Calculator::GetResult() to be more usable, and not
require the caller to wait until the Calculator object is destroyed.class Calculator {public:
/*…*/ GetResult( /*…*/ ) { a.Send( bind( &Calculator::DoGetResult, this, /*…*/ ) ); }
private:/*…*/ DoGetResult( /* … */ ) {
}double result;Active a;/* etc. */
};
You have 10 minutes.
Calculator::GetResult(), Take 1 One option is to use an asynchronous value (e.g., future<T>).
For example:public: future<double> GetResult() {
promise<double> p;future<double> ret = p.get_future();a.Send( bind( &Calculator::DoGetResult, this, move(p) ) );return ret;
}private: void DoGetResult( promise<double>&& p ) {
p.set_value( result );}
Sample calling code:future<double> answer;Calculator calc;calc.Add( 10 );calc.Add( 32 );answer = calc.GetResult();// … do concurrent work …Use( answer.get() ); // waits until available, if necessary
Page 3
Exercises Herb Sutter
Software Development Consultant
www.gotw.ca/training
© Herb Sutter
except material otherwise referenced.
Date updated: April 26, 2017
Page: 3
Calculator::GetResult(), Take 2 Another option is to send a message back. For example:
public: void GetResult( function<void(double)> d ) {a.Send( bind( &Calculator::DoGetResult, this, d ) );
}private: void DoGetResult( function<void(double)> d ) {
d( result );}
Sample calling code:void MyActiveClass::DoSomeWork() {
calc.Add( 10 );calc.Add( 32 );calc.GetResult( bind(MyClass::AcceptResult, this) );
} // result will be pumped as a new messagevoid MyActiveClass::AcceptResult( double d ) {
a.Send( [=] {DoSomethingWith( d ); // use once it’s available
} );}
One More Interface Flaw… The two options provide these interfaces:
Option 1: Return a future.public: future<double> GetResult();Calculator calc;calc.Add( 10 );calc.Add( 32 );answer = calc.GetResult();
Option 2: Accept a callback.public: void GetResult( function<void(double)> d );Calculator calc;calc.Add( 10 );calc.Add( 32 );calc.GetResult( bind(MyClass::AcceptResult, this) );
Q: What’s the problem with both?
Page 4
Exercises Herb Sutter
Software Development Consultant
www.gotw.ca/training
© Herb Sutter
except material otherwise referenced.
Date updated: April 26, 2017
Page: 4
Calculator::GetResult() and Interface Flaws
The two options provide these interfaces: Option 1: Return a future.
public: future<double> GetResult();Calculator calc;calc.Add( 10 );calc.Add( 32 );answer = calc.GetResult();
Option 2: Accept a callback.public: void GetResult( function<void(double)> d );Calculator calc;calc.Add( 10 );calc.Add( 32 );calc.GetResult( bind(MyClass::AcceptResult, this) );
Q: What’s the problem with both? A: There’s a window between Add and Get. If another thread is
using this Calculator, we might not Get the intended answer. Q2: How can we fix this timing-dependent code?
An Improved Interface
In principle, a caller should be able to get a “future” handle to the result of any given asynchronous call.
Calculator calc;calc.Add( 10 );future<double> answer = calc.Add( 32 );
// … sometime later …
cout << answer.get();
In this case, that’s also a handy place to return the state as of the end of the call.
The caller can easily ask for any result he cares about, with a simpler interface and no timing window.
Page 5
Exercises Herb Sutter
Software Development Consultant
www.gotw.ca/training
© Herb Sutter
except material otherwise referenced.
Date updated: April 26, 2017
Page: 5
Group Exercise
Implement the pipelined version of SendPackets using an active object for each pipeline stage.
Bonus points for:
Avoiding a distinct class type for each stage.
Leveraging scoped lifetimes in C++/C#/Java.
You have 20 minutes.
Page 6
Exercises Herb Sutter
Software Development Consultant
www.gotw.ca/training
© Herb Sutter
except material otherwise referenced.
Date updated: April 26, 2017
Page: 6
Pipeline Stage (C++)
Each stage does just one part.
class Stage {public:
Stage( function<void(Buffer*)> w ) : work(w) { }
void Process( Buffer* buf ) { a.Send( [=] { work( b );
} ); }
private:function<void(Buffer*)> work;Active a;
};
Setting Up the Pipeline (C++11)
C++-ish syntax:
void SendPackets( Buffers& bufs ) {Stage encryptor ( [] ( Buffer* b ) { Encrypt(b); } ); Stage compressor ( [&]( Buffer* b ) { Compress(b);
encryptor.Process(b); } );Stage decorator ( [&]( Buffer* b ) { Decorate(b);
compressor.Process(b); } );for( auto& b : bufs ) {
decorator.Process( &b );}
} // automatically blocks waiting for pipeline to finish
Page 7
Exercises Herb Sutter
Software Development Consultant
www.gotw.ca/training
© Herb Sutter
except material otherwise referenced.
Date updated: April 26, 2017
Page: 7
Setting Up the Pipeline (C#)
C#-ish syntax:public void SendPackets( Buffers bufs ) {
using( Stage encryptor =new Stage( (Buffer b) => { Encrypt(b); } ) ) {
using( Stage compressor = new Stage( (Buffer b) => { Compress(b); encryptor.Process(b); } ) {
using( Stage decorator = new Stage( (Buffer b) => { Decorate(b); compressor.Process(b); } ) {
foreach( b in bufs ) {decorator.Process( b );
}
} } } // automatically blocks waiting for pipeline to finish
}
Setting Up the Pipeline (Java) Java-ish syntax:
public void SendPackets( Buffers bufs ) {Stage encryptor = null;Stage compressor = null;Stage decorator = null;try {
encryptor = new Stage( new EncryptRunnable() );compressor = new Stage( new CompressRunnable( encryptor ) );decorator = new Stage( new DecorateRunnable( compressor ) );for( b : bufs ) {
decorator.Decorate( b );}
}finally { // buggy dispose
if( encryptor != null ) encryptor.dispose(); // automatically blockif( compressor != null ) compressor.dispose(); // waiting for theif( decorator != null ) decorator.dispose(); // pipeline to finish
}}
Page 8
Exercises Herb Sutter
Software Development Consultant
www.gotw.ca/training
© Herb Sutter
except material otherwise referenced.
Date updated: April 26, 2017
Page: 8
Setting Up the Pipeline (Java) Java-ish syntax:
public void SendPackets( Buffers bufs ) {Stage encryptor = null;Stage compressor = null;Stage decorator = null;try {
encryptor = new Stage( new EncryptRunnable() );compressor = new Stage( new CompressRunnable( encryptor ) );decorator = new Stage( new DecorateRunnable( compressor ) );for( b : bufs ) {
decorator.Decorate( b );}
}finally { // correct dispose
if( decorator != null ) decorator.dispose(); // automatically blockif( compressor != null ) compressor.dispose(); // waiting for theif( encryptor != null ) encryptor.dispose(); // pipeline to finish
}}
Post Mortem
What are the advantages and disadvantages of writing the pipeline using:
Explicit messages + queues.
Active objects.
Discuss.
Page 9
Exercises Herb Sutter
Software Development Consultant
www.gotw.ca/training
© Herb Sutter
except material otherwise referenced.
Date updated: April 26, 2017
Page: 9
And More Flexibility… C++-ish syntax:
void SendPackets( Buffers& bufs ) {Stage encryptor ( [] ( Buffer* b ) { Encrypt(b); } ); Stage archiver ( [] ( Buffer* b ) { Archive(b); } ); Stage compressor ( [&]( Buffer* b ) { Compress(b);
if( b->something() )encryptor.Process(b);
else archiver.Process(b);
} );Stage decorator ( [&]( Buffer* b ) { Decorate(b);
compressor.Process(b); );}
for( auto b : bufs ) {decorator.Process( &b );
}} // automatically blocks waiting for pipeline to finish
Page 10
Exercises Herb Sutter
Software Development Consultant
www.gotw.ca/training
© Herb Sutter
except material otherwise referenced.
Date updated: April 26, 2017
Page: 10
Effective Concurrency
Introduction and Fundamentals
Pillar 1: Concurrency For Isolation
Pillar 2: Parallelism For Scalability
Machine Architecture
Scalability and Migration
Pillar 3: Consistency
Locks
Lock-Free
Concluding Thoughts
Group Exercise Given:
struct TreeNode {double value;TreeNode* leftChild;TreeNode* rightChild;
};
Implement a ParallelSum function that returns the sum of the values stored in a tree:
double ParallelSum( TreeNode* root ) {
}
You have 10 minutes.
Page 11
Exercises Herb Sutter
Software Development Consultant
www.gotw.ca/training
© Herb Sutter
except material otherwise referenced.
Date updated: April 26, 2017
Page: 11
To Be Traversed…
ParallelSum for Trees, Take 1 (Flawed) Given:
struct TreeNode {double value;TreeNode* leftChild;TreeNode* rightChild;
};
Sum the values in a tree:double ParallelSum( TreeNode* n ) {
if( !n ) return 0;future<double> f1 = pool.run( [=]{ return ParallelSum( n->leftChild ); } );future<double> f2 = pool.run( [=]{ return ParallelSum( n->rightChild ); } );return n->value + f1.get() + f2.get(); // join, wait for results
}
Q: What’s wrong with this code?
Page 12
Exercises Herb Sutter
Software Development Consultant
www.gotw.ca/training
© Herb Sutter
except material otherwise referenced.
Date updated: April 26, 2017
Page: 12
ParallelSum for Trees, Take 1 (Flawed) Given:
struct TreeNode {double value;TreeNode* leftChild;TreeNode* rightChild;
};
Sum the values in a tree:double ParallelSum( TreeNode* n ) {
if( !n ) return 0;future<double> f1 = pool.run( [=]{ return ParallelSum( n->leftChild ); } );future<double> f2 = pool.run( [=]{ return ParallelSum( n->rightChild ); } );return n->value + f1.get() + f2.get(); // join, wait for results
}
Q: What’s wrong with this code? A: Today, granularity.
Leaf computations are small and not worth shipping to a thread pool. Worse news: Leaf computations dominate.
Take 1: Resulting Tasks
null tasksleaf tasks
Page 13
Exercises Herb Sutter
Software Development Consultant
www.gotw.ca/training
© Herb Sutter
except material otherwise referenced.
Date updated: April 26, 2017
Page: 13
ParallelSum for Trees, Take 2 (Better) Sum the values in a tree:
double ParallelSum( TreeNode* n ) {int result = 0;vector<future<double>> v;while( n ) {
result += n->value;if( n->leftChild ) { // send someone that way
v.push_back( pool.run( [=]{ return ParallelSum( n->leftChild ); } ) );}n = n->rightChild; // while I go this way
}for( auto& f : v ) result += f.get(); // join, wait for resultsreturn result;
}
Better granularity, requires no knowledge of tree: “Vertical” tasks – if pointer traversal is significant, it’s sequential anyway. 50% fewer “leaf” computations. 100% fewer null computations (for nonempty tree).
Take 2: Resulting Tasks
Page 14
Exercises Herb Sutter
Software Development Consultant
www.gotw.ca/training
© Herb Sutter
except material otherwise referenced.
Date updated: April 26, 2017
Page: 14
ParallelSum for Trees, Take 3 (Better) Given:
struct TreeNode {double value;TreeNode* leftChild;TreeNode* rightChild;
};
Sum the values in a tree:double ParallelSum( TreeNode* n, int depth = 0 ) {
if( !n ) return 0;if( depth > log2(treeSize)-K ) return OtherSum( n );future<double> f =
pool.run( [=]{ return ParallelSum( n->leftChild, depth+1 ); } );return n->value + ParallelSum( n->rightChild, depth+1 ) + f.get();
}
Better still, requires “a little” approximate knowledge of tree. “treeSize” heuristic only needs to be approximate. Tree doesn’t need to be perfectly balanced, just not badly unbalanced. Might still be a handful of null tasks.
Take 3: Resulting Tasks
Page 15
Exercises Herb Sutter
Software Development Consultant
www.gotw.ca/training
© Herb Sutter
except material otherwise referenced.
Date updated: April 26, 2017
Page: 15
RECALL: ParallelSum for Trees, Take 1
Given:struct TreeNode {
double value;TreeNode* leftChild;TreeNode* rightChild;
};
Sum the values in a tree:double ParallelSum( TreeNode* n ) {
if( !n ) return 0;future<double> f1 = pool.run( [=]{ return ParallelSum( n->leftChild ); } );future<double> f2 = pool.run( [=]{ return ParallelSum( n->rightChild ); } );
return n->value + f1.get() + f2.get(); // join, wait for results}
Group Exercise Given:
struct GraphNode {double value;vector<GraphNode*> children; /*…*/
};
Implement a ParallelSum function that returns the sum of the values stored in a potentially-cyclic graph:
double ParallelSum( GraphNode* root ) {
}
You have 10 minutes.
Page 16
Exercises Herb Sutter
Software Development Consultant
www.gotw.ca/training
© Herb Sutter
except material otherwise referenced.
Date updated: April 26, 2017
Page: 16
ParallelSum for Graphs, Take 1 (Flawed)
Given:struct GraphNode {
double value;vector<GraphNode*> children; /*…*/
};
Sum the values in a graph:double ParallelSum( GraphNode* n ) {
if( !n ) return 0;vector<future<double>> vf;for( GraphNode* p : n->children ) // spin off tasksvf.push_back( pool.run( [=]{ return ParallelSum( p ); } ) );
double result = n->value;for( auto f : vf ) result += f.get(); // “join” barrier, wait for resultsreturn result;
}
Q: What’s wrong with this code?
ParallelSum for Graphs, Take 1 (Flawed)
Given:struct GraphNode {
double value;vector<GraphNode*> children; /*…*/
};
Sum the values in a graph:double ParallelSum( GraphNode* n ) {
if( !n ) return 0;vector<future<double>> vf;for( GraphNode* p : n->children ) // spin off tasksvf.push_back( pool.run( [=]{ return ParallelSum( p ); } ) );
double result = n->value;for( auto f : vf ) result += f.get(); // “join” barrier, wait for resultsreturn result;
}
Q: What’s wrong with this code? A1 (today): Granularity, as before.
Page 17
Exercises Herb Sutter
Software Development Consultant
www.gotw.ca/training
© Herb Sutter
except material otherwise referenced.
Date updated: April 26, 2017
Page: 17
ParallelSum for Graphs, Take 1 (Flawed)
Given:struct GraphNode {
double value;vector<GraphNode*> children; /*…*/
};
Sum the values in a graph:double ParallelSum( GraphNode* n ) {
if( !n ) return 0;vector<future<double>> vf;for( GraphNode* p : n->children ) // spin off tasksvf.push_back( pool.run( [=]{ return ParallelSum( p ); } ) );
double result = n->value;for( auto f : vf ) result += f.get(); // “join” barrier, wait for resultsreturn result;
}
Q: What’s wrong with this code? A2: Paths and cycles. Double-counting if a node is reachable along
multiple paths. Infinite looping if there’s a cycle.
ParallelSum for Graphs, Take 2 (Better) Given:
struct GraphNode {double value;vector<GraphNode*> children; atomic<bool> visited; // initially = false
};
Sum the values in a graph:double ParallelSum( GraphNode* n ) {
if( !n ) return 0;if( n->visited.exchange(true) ) return 0;
vector<future<double>> vf;for( GraphNode* p : n->children )vf.push_back( pool.run( [=]{ return ParallelSum( p ); } ) );
double result = n->value;for( auto f : vf ) result += f.get(); // “join” barrier, wait for resultsreturn result;
}
Page 18
Exercises Herb Sutter
Software Development Consultant
www.gotw.ca/training
© Herb Sutter
except material otherwise referenced.
Date updated: April 26, 2017
Page: 18
ParallelSum for Graphs, Take 3 (Better:
Allows Concurrency, Addresses Cycles)
Given:struct GraphNode {
double value;vector<GraphNode*> children;
};
Sum the values in a graph:double ParallelSum() { return ParallelSum( root, new ConcurrentSet() ); }
double ParallelSum( GraphNode* n, ConcurrentSet* ourSet ) {if( !n ) return 0;if( !ourSet.AtomicUniqueInsert(n) ) return 0;
vector<future<double>> vf;for( GraphNode* p : n->children )vf.push_back( pool.run( [=]{ return ParallelSum( p, ourSet ); } ) );
double result = n->value;for( auto f : vf ) result += f.get(); // “join” barrier, wait for resultsreturn result;
}
ParallelSum for Graphs, Take 4 (Better:
Stronger Concurrency, Addresses Cycles)
Given:struct GraphNode {
double value;vector<GraphNode*> children; atomic<unsigned> turn, visited; // initially = 1
};atomic<unsigned> nextTurn = 1;
Sum the values in a graph:double ParallelSum() { return ParallelSum( root, nextTurn++ ); }
double ParallelSum( GraphNode* n, unsigned myTurn ) {if( !n ) return 0;while( n->turn < myTurn ) { } // let earlier traversers leave if( ! n->visited.compare_exchange( myTurn, myTurn+1 ) ) return 0;
vector<future<double>> vf;for( GraphNode* p : n->children )
vf.push_back( pool.run( [=]{ return ParallelSum( p, myTurn ); } ) );double result = n->value;n->turn = myTurn+1; // let next traverser infor( auto f : vf ) result += f.get(); // “join” barrier, wait for resultsreturn result;
}
Page 19
Exercises Herb Sutter
Software Development Consultant
www.gotw.ca/training
© Herb Sutter
except material otherwise referenced.
Date updated: April 26, 2017
Page: 19
ParallelSum for Graphs, Take 5 (Better:
Stronger Concurrency, Addresses Cycles)
Given:struct GraphNode {
double value;vector<GraphNode*> children; atomic<unsigned> turn, visited, subgraphDepth; // exact or approximate
};atomic<unsigned> nextTurn = 1;
Sum the values in a graph:double ParallelSum() { return ParallelSum( root, nextTurn++ ); }
double ParallelSum( GraphNode* n, unsigned myTurn ) {if( !n ) return 0;while( n->turn < myTurn ) { } // let earlier traversers leave if( ! n->visited.compare_exchange( myTurn, myTurn+1 ) ) return 0;
if( n->subgraphDepth <= limit ) return OtherSum( n );vector<future<double>> vf;for( GraphNode* p : n->children )
vf.push_back( pool.run( [=]{ return ParallelSum( p, myTurn ); } ) );double result = n->value;n->turn = myTurn+1; // let next traverser infor( auto f : vf ) result += f.get(); // “join” barrier, wait for results
return result; }
Page 20
Exercises Herb Sutter
Software Development Consultant
www.gotw.ca/training
© Herb Sutter
except material otherwise referenced.
Date updated: April 26, 2017
Page: 20
Group Exercise
Implement a parallel search that regains determinism: Return not “any” match, but the “first” match.
template<class RAIter, class T>RAIter p_find_with_determinism(
const RAIter first, const RAIter last, const T& value ) {
}
You have 10 minutes.
Deterministic Parallel Search, Take 1
(Flawed)
A simple reduction:template<class RAIter, class T>RAIter p_find( const RAIter first, const RAIter last, const T& value ) {
size_t chunkSize = max( 1, /*something*/ ), LP = (last-first)/chunkSize;vector<RAIter> result(LP,last); // no matches found yetfor( int i = 0; i < LP; ++i ) pool.run( [=, &result] {
RAIter myFirst = min( first + i*chunkSize, last ),myLast = myFirst + min( chunkSize, distance(myFirst,last) );
for( ; myFirst != myLast; ++myFirst ) if( *myFirst == value ) { result[i] = myFirst; break; } // match found
} );pool.join();for( int i = 0; i < LP; ++i )
if( result[i] != last ) return result[i];return last;
}
Q: What’s the overhead?
Page 21
Exercises Herb Sutter
Software Development Consultant
www.gotw.ca/training
© Herb Sutter
except material otherwise referenced.
Date updated: April 26, 2017
Page: 21
Deterministic Parallel Search, Take 1
(Flawed)
A simple reduction:template<class RAIter, class T>RAIter p_find( const RAIter first, const RAIter last, const T& value ) {
size_t chunkSize = max( 1, /*something*/ ), LP = (last-first)/chunkSize;vector<RAIter> result(LP,last); // no matches found yetfor( int i = 0; i < LP; ++i ) pool.run( [=, &result] {
RAIter myFirst = min( first + i*chunkSize, last ),myLast = myFirst + min( chunkSize, distance(myFirst,last) );
for( ; myFirst != myLast; ++myFirst ) if( *myFirst == value ) { result[i] = myFirst; break; } // match found
} );pool.join();for( int i = 0; i < LP; ++i )
if( result[i] != last ) return result[i];return last;
}
Q: What’s the overhead?
A: (Large) No early termination. Still same big-Oh, but a higher constant.(Small) Sequential reduction step at end.
Deterministic Parallel Search, Take 2
(Flawed)
A simple reduction:template<class RAIter, class T>RAIter p_find( const RAIter first, const RAIter last, const T& value ) {
size_t chunkSize = max( 1, /*something*/ ), LP = (last-first)/chunkSize;atomic<int> bestFound = LP; // no matches found yetvector<RAIter> result(LP+1,last); for( int i = 0; i < LP; ++i ) pool.run( [=, &bestFound, &result] {
RAIter myFirst = min( first + i*chunkSize, last ),myLast = myFirst + min( chunkSize, distance(myFirst,last) );
for( ; bestFound > i && myFirst != myLast; ++myFirst ) if( *myFirst == value ) { bestFound = i; result[i] = myFirst; break; }
} );pool.join();return result[bestFound];
}
Q: What’s the problem?
Page 22
Exercises Herb Sutter
Software Development Consultant
www.gotw.ca/training
© Herb Sutter
except material otherwise referenced.
Date updated: April 26, 2017
Page: 22
Deterministic Parallel Search, Take 2
(Flawed)
A simple reduction:template<class RAIter, class T>RAIter p_find( const RAIter first, const RAIter last, const T& value ) {
size_t chunkSize = max( 1, /*something*/ ), LP = (last-first)/chunkSize;atomic<int> bestFound = LP; // no matches found yetvector<RAIter> result(LP+1,last); for( int i = 0; i < LP; ++i ) pool.run( [=, &bestFound, &result] {
RAIter myFirst = min( first + i*chunkSize, last ),myLast = myFirst + min( chunkSize, distance(myFirst,last) );
for( ; bestFound > i && myFirst != myLast; ++myFirst ) if( *myFirst == value ) { bestFound = i; result[i] = myFirst; break; }
} );pool.join();return result[bestFound];
}
Q: What’s the problem?
A: Buggy code. Timing: Could fail to record “correct” bestFound result.
Deterministic Parallel Search, Take 3 (Fixed)
A simple reduction:template<class RAIter, class T>RAIter p_find( const RAIter first, const RAIter last, const T& value ) {
size_t chunkSize = max( 1, /*something*/ ), LP = (last-first)/chunkSize;atomic<int> bestFound = LP; // no matches found yetmutex mutWriters;RAIter result = last; for( int i = 0; i < LP; ++i ) pool.run( [=, &bestFound, &result] {
RAIter myFirst = min( first + i*chunkSize, last ),myLast = myFirst + min( chunkSize, distance(myFirst,last) );
for( ; bestFound > i && myFirst != myLast; ++myFirst ) if( *myFirst == value ) {
lock_guard lock(mutWriters); // acquire lockif( bestFound > i ) { result = myFirst; bestFound = i; }break;
} // release lock} );pool.join();return result;
}
Page 23
Exercises Herb Sutter
Software Development Consultant
www.gotw.ca/training
© Herb Sutter
except material otherwise referenced.
Date updated: April 26, 2017
Page: 23
Deterministic Parallel Search, Take 4
(Fixed, Lock-Free)
A simple reduction:template<class RAIter, class T>RAIter p_find( const RAIter first, const RAIter last, const T& value ) {
size_t chunkSize = max( 1, /*something*/ ), LP = (last-first)/chunkSize;vector<atomic<bool>> stop(LP,false);vector<RAIter> result(LP,last); // no matches found yetfor( int i = 0; i < LP; ++i ) pool.run( [=, &stop, &result] {
RAIter myFirst = min( first + i*chunkSize, last ),myLast = myFirst + min( chunkSize, distance(myFirst,last) );
for( ; !stop[i] && myFirst != myLast; ++myFirst ) if(*myFirst == value) { result[i] = myFirst; while(i < LP) stop[i++] = true; break; }
} );pool.join();for( int i = 0; i < LP; ++i )
if( result[i] != last ) return result[i];return last;
}
Q: What’s the overhead?
Deterministic Parallel Search, Take 4
(Fixed, Lock-Free)
A simple reduction:template<class RAIter, class T>RAIter p_find( const RAIter first, const RAIter last, const T& value ) {
size_t chunkSize = max( 1, /*something*/ ), LP = (last-first)/chunkSize;vector<atomic<bool>> stop(LP,false);vector<RAIter> result(LP,last); // no matches found yetfor( int i = 0; i < LP; ++i ) pool.run( [=, &stop, &result] {
RAIter myFirst = min( first + i*chunkSize, last ),myLast = myFirst + min( chunkSize, distance(myFirst,last) );
for( ; !stop[i] && myFirst != myLast; ++myFirst ) if(*myFirst == value) { result[i] = myFirst; while(i < LP) stop[i++] = true; break; }
} );pool.join();for( int i = 0; i < LP; ++i )
if( result[i] != last ) return result[i];return last;
}
Q: What’s the overhead?
A: (Medium) Less early termination. Same big-Oh, different constant.(Small) Sequential reduction step at end.
Page 24
Exercises Herb Sutter
Software Development Consultant
www.gotw.ca/training
© Herb Sutter
except material otherwise referenced.
Date updated: April 26, 2017
Page: 24
Regaining Determinism
Q: What does it cost to get determinism back in a parallel search? (Return not “any” match, but the “first” match.) This is important in migrating existing code to parallelism,
because some callers might rely on the deterministic semantics.
A: Only a smaller constant factor. We still benefit from parallel searching, we just don’t get to stop all other workers (only some) when one finds a match. Approach: Workers that are exploring later/worse subranges
can stop once a result that precedes their range is found.
Cost: For the one-match case, we typically have to wait for half the workers to search their whole subranges instead of half (avg. 50% extra work).
Result: Still O(N/MP), with the constant factor higher.
Page 25
Exercises Herb Sutter
Software Development Consultant
www.gotw.ca/training
© Herb Sutter
except material otherwise referenced.
Date updated: April 26, 2017
Page: 25
Effective Concurrency
Introduction and Fundamentals
Pillar 1: Concurrency For Isolation
Pillar 2: Parallelism For Scalability
Machine Architecture
Scalability and Migration
Pillar 3: Consistency
Locks
Lock-Free
Concluding Thoughts
Effective Concurrency
Introduction and Fundamentals
Pillar 1: Concurrency For Isolation
Pillar 2: Parallelism For Scalability
Machine Architecture
Scalability and Migration
Pillar 3: Consistency
Locks
Lock-Free
Concluding Thoughts