Discovering and Understanding Performance Bottlenecks in Transactional Applications Ferad Zyulkyarov 1,2 , Srdjan Stipic 1,2 , Tim Harris 3 , Osman S. Unsal 1 , Adrián Cristal 1,4 , Ibrahim Hur 1 , Mateo Valero 1,2 1 BSC-Microsoft Research Centre 2 Universitat Politècnica de Catalunya 3 Microsoft Research Cambridge 4 IIIA - Artificial Intelligence Research Institute CSIC - Spanish National Research Council 19th International Conference on Parallel Architectures and Compilation Techniques 11-15 September 2010 – Vienna
31
Embed
Discovering and Understanding Performance Bottlenecks in Transactional Applications
Discovering and Understanding Performance Bottlenecks in Transactional Applications. Ferad Zyulkyarov 1,2 , Srdjan Stipic 1,2 , Tim Harris 3 , Osman S. Unsal 1 , Adrián Cristal 1,4 , Ibrahim Hur 1 , Mateo Valero 1,2. 1 BSC-Microsoft Research Centre 2 Universitat Politècnica de Catalunya - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Discovering and Understanding Performance Bottlenecks in Transactional
ApplicationsFerad Zyulkyarov1,2, Srdjan Stipic1,2, Tim Harris3, Osman S. Unsal1,
Adrián Cristal1,4, Ibrahim Hur1, Mateo Valero1,2
1BSC-Microsoft Research Centre
2Universitat Politècnica de Catalunya
3Microsoft Research Cambridge
4IIIA - Artificial Intelligence Research Institute CSIC - Spanish National Research Council
19th International Conference on Parallel Architectures and Compilation Techniques11-15 September 2010 – Vienna
Abstract the TM Implementation
2
for (i = 0; i < N; i++){ atomic { x[i]++; }}
for (i = 0; i < N; i++){ atomic { y[i]++; }}
Thread 1 Thread 2Accesses to different arrays.
Accesses to different arrays.We can observe
overheads inherent to the TM implementation.
We can observe overheads inherent to the
TM implementation.We are not interested in
such bottlenecks.We are not interested in
such bottlenecks.
Abstract the TM Implementation
3
for (i = 0; i < N; i++){ atomic { x[i]++; }}
for (i = 0; i < N; i++){ atomic { x[i]++; }}
Thread 1 Thread 2Accesses to the same
arrays.Accesses to the same
arrays.Contention:
Bottleneck common to all implementations of the
TM programming model.
Contention:Bottleneck common to all
implementations of the TM programming model.
We are interested in this kind of bottlenecks.
We are interested in this kind of bottlenecks.
Can We Find This Kind of Bottlenecks?
4
atomic{ statement1;
statement2;
statement3;
statement4;
}
Abort rate 80%
Where aborts happen?
Where aborts happen?Which variables
conflict?Which variables
conflict?Are there false conflicts?
Are there false conflicts?
Can We Find This Kind of Bottlenecks?
5
atomic{ statement1;
statement2;
statement3;
statement4;
}
counter1=0;
counter2=0;
counter3=0;
counter4=0;
Can We Find This Kind of Bottlenecks?
6
atomic{ statement1;
statement2;
statement3;
statement4;
}
counter1=1;
counter2=0;
counter3=0;
counter4=0;
Can We Find This Kind of Bottlenecks?
7
atomic{ statement1;
statement2;
statement3;
statement4;
}
counter1=1;
counter2=1;
counter3=0;
counter4=0;
Conflict between statement2 and
statement4.
Conflict between statement2 and
statement4.
GoalProfiling techniques to find bottlenecks (important
conflicting locations) and why these conflicts happen.
Outline
Profiling Techniques
Implementation
Case Studies
8
Profiling Techniques
9
Visualizing transactions
Conflict point discovery
Identifying conflicting data structures
Transaction Visualizer (Genome)
10
Aborts occur at the first and last atomic blocks in
program order.
Aborts occur at the first and last atomic blocks in
program order.
Garbage CollectionGarbage Collection
14% Aborts
Wait on barrierWait on barrier
When these aborts
happen?
Aborts Graph (Bayes)
11
AB1 AB2
AB3
AB4
AB5
AB6
AB7
AB8
AB9
AB10
AB12
AB11
AB13
AB14
AB1593% Aborts93% Aborts
73% 20%
Number of Aborts vs Wasted Work
12
atomic{ counter++}
atomic{ hashtable.Rehash();}
Aborts = 9Aborts = 9 Aborts = 1Aborts = 1Wasted Work = 10%Wasted Work = 10% Wasted Work = 90%Wasted Work = 90%
Conflict Point Discovery
13
File:Line #Conf. Method Line
Hashtable.cs:51 152 Add If (_container[hashCode]…
Hashtable.cs:48 62 Add uint hashCode = HashSdbm(…
Hashtable.cs:53 5 Add _container[hashCode] = n …
Hashtable.cs:83 5 Add while (entry != null) …
ArrayList.cs:79 3 Contains for (int i = 0; i < count; i++ )
ArrayList.cs:52 1 Add if (count == capacity – 1) …
Conflicts Context
14
increment() { counter++;}
probability80 { probability = random() % 100; if (probability < 80) { atomic { increment(); } }}
probability20 { probability = random() % 100; if (probability >= 80) { atomic { increment(); } }}
Thread 1------------for (int i = 0; i < 100; i++) { probability80(); probability20();}
Thread 2------------for (int i = 0; i < 100; i++) { probability80(); probability20();}
public class FindBestTaskArg { public int toId; public Learner learnerPtr; public Query[] queries; public Vector queryVectorPtr; public Vector parentQueryVectorPtr; public int numTotalParent; public float basePenalty; public float baseLogLikelihood; public Bitmap bitmapPtr; public Queue workQueuePtr; public Vector aQueryVectorPtr; public Vector bQueryVectorPtr;}
Wrapper object for function arguments.Wrapper object for
public class FindBestTaskArg { public int toId; public Learner learnerPtr; public Query[] queries; public Vector queryVectorPtr; public Vector parentQueryVectorPtr; public int numTotalParent; public float basePenalty; public float baseLogLikelihood; public Bitmap bitmapPtr; public Queue workQueuePtr; public Vector aQueryVectorPtr; public Vector bQueryVectorPtr;}
• Design principles– Abstract the underlying TM system– Report results at the source language constructs– Low instrumentation probe effect and overhead
• Profiling techniques– Visualizing transactions– Conflict point discovery– Identifying conflicting data structures
30
PPoPP’2010
Debugging Programs that use Atomic Blocks and Transactional Memory
ICS’2009
QuakeTM: Parallelizing a Complex Serial Application Using Transactional Memory
PPoPP’2008
Atomic Quake: Using Transactional Memory in an Interactive Multiplayer Game Server