ASKALON ASKALON A Tool A Tool Set Set for for Cluster Cluster and Grid Computin and Grid Computin A Special Research Program funded by Automatic Performance Analysis: Real Tools T. Fahringer, A. Hofer, A. Jugravu, T. Fahringer, A. Hofer, A. Jugravu, S. Pllana, R. S. Pllana, R. Prodan, C. Seragiotto, Prodan, C. Seragiotto, J. Testori, H.-L. Truong, A. J. Testori, H.-L. Truong, A. Villazon, M. Welzl Villazon, M. Welzl Institute for Computer Science University of Innsbruck [email protected]informatik.uibk.ac.at/dps Cracow’03 Grid Workshop, Oct. 2003
A Special Research Program funded by ASF. Automatic Performance Analysis: Real Tools. ASKALON A Tool Set for Cluster and Grid Computing. Cracow’03 Grid Workshop, Oct. 2003. T. Fahringer, A. Hofer, A. Jugravu, S. Pllana, R. Prodan, C. Seragiotto, - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
ASKALONASKALONA Tool A Tool Set Set forfor Cluster Cluster
and Grid Computingand Grid Computing
A Special Research Program funded by ASF
Automatic Performance Analysis: Real Tools
T. Fahringer, A. Hofer, A. Jugravu, T. Fahringer, A. Hofer, A. Jugravu, S. Pllana, R. Prodan, C. Seragiotto, S. Pllana, R. Prodan, C. Seragiotto,
J. Testori, H.-L. Truong, A. Villazon, M. WelzlJ. Testori, H.-L. Truong, A. Villazon, M. Welzl
Institute for Computer ScienceUniversity of Innsbruck
– customizable tools instead of hard-coded analysis– multi-experiment instead of single-experiment analysis– online and scalable performance analysis
11
Aksum A Tool for Semi-Automatic Multi-Experiment Performance Analysis
• user-provided problem and machine sizes• automated instrumentation, experiment management,
performance interpretation, and search for performance bottlenecks
• performance analysis for single-entry single-exit regions• performance problems related to the program• targets OpenMP/MPI, and mixed programs • customizable (build your own performance tool)
– API for performance overheads– define performance problems and code regions of interest– influence the search (strategy, time, code regions)
13
Specification of Performance Problems with JavaPSL
• JavaPSL is a – API for the specification of performance problems.– high-level interface for raw performance data.
• pre-defined and user-defined JavaPSL problems• performance problems as values between 0 and 1
(interpretation)public class SynchronizationOverhead implements Property {
• Defines evaluationorder of performanceproperties
• Predefined hierarchies– OpenMP, MPI, mixed mode
• Can be customized• Each node has:
– a threshold (property instances with severity less than the threshold are discarded)
– reference code region– bean properties
Property hierarchy
16
Property Hierarchy (first levels)
Inefficiency
ParallelInefficiency
SerialInefficiency
DataMovementOverhead
LoadImbalance
ImperfectFloatingPointBehavior
SynchronizationOverhead
ControlOfParallelismOverhead
...
...
...
...
ImperfectCacheBehavior
17
NonScalability
Ineff iciency
DataMovementOverhead
SynchronizationOverhead
ControlOfParallelismOverhead
SerialIneff iciency
ImperfectFloatingPointBehavior
SharedMemoryCoPOverhead
MessagePassingCoPOverhead
ReplicatedComputationOverhead
LossOfParallelismOverhead
UnparallelizedComputationOverhead
PartiallyParallelizedComputationOverhead
ExecutionTimeImbalance
ComputationImbalance
CommunicationImbalance
SynchronizationImbalance
IOImbalance
LocalMemoryAccessImbalance MultipleAdressSpace
SynchOverhead
SingleAdressSpaceSynchOverhead
L1CacheMissOverhead
IOOverhead
CommunicationOverhead
LocalMemoryOverhead
ImperfectL1CacheBehavior
FlushOverhead
...
P2POverhead
CollectiveOverhead
LargeMessagesOverhead
SmallMessagesOverhead
SendOverhead
ReceiveOverhead
Collectiv eCommunicationOv erhead
CollectiveComputationOverhead
RemoteMemoryOverhead
PutOverhead
GetOverhead
RemoteMemoryInitializationOverhead
ReadOverhead
WriteOverhead
SmallIORequestsOverhead
LargeIORequestsOverhead
RemoteIOOverhead
LocalIOOverhead
NonStrippedIOOverhead
LargeStartupFileOverhead
LargeOutputFilesOverhead
ReceiveContentionOverhead
LateSendOverhead
LocalReceiveOverhead
LocalSendOverhead
LateReceiveOverhead
SmallMessages2SameDestinationOverhead
SerialCommunicationOverhead
CollectiveCommunicationOnLargeDataStructures
LatePartyOverhead
LocalCollectiveOverhead
GatherOverhead
LocalGatherOverhead
LateGatherOverhead
ScatterOverhead LocalScatterOverhead
LateScatterOverhead
BroadcastOverhead
LocalBroadcastOverhead
LateBroadcastOverhead
ManualBroadcastOverhead
CollectiveComputationOnLargeDataStructures
LateCollectiveComputationPartyOverhead
LocalCollectiveComputationOverhead
ManualCollectiveComputationOverhead
LargeNumberOfCollectiveComputations
RMALocksOverhead
RMACollectiveSynchOverhead
Def erredCommunicationOv erhead
MessagePassingBarrierOverhead
ExplicitLockOverhead
CriticalSectionSynchOverhead
WorkSharingSynchOverhead
AtomicSynchronizationOverhead
SharedMemoryOverhead
LoadImbalanceImperfectL2Cache
Behavior
L2CacheMissOverhead
ImperfectTLBBehavior
TLBMissOverhead
ImperfectPageCacheBehavior
PageFaultOverhead
ParallelRegionFinishOverhead
SharedMemoryLoopSchedulingOverhead
InitializationOverhead
FinalizationOverhead
ParallelRegionStartupOverhead
MessagePassingLoopSchedulingOverhead
ReductionOverhead
LastPrivateOverhead
JoinOverhead
CopyInForCommonBlocksInitOverhead
FirstPrivateClauseInitOverhead
Property Hierarch
y
19
Application parameters
• Strings to besubstituted insome or allof the input files
• Mapped toZEN directivesin the inputfiles
• Basis for experiment generation and execution done by ZENTURIO
20
Case study: LAPW0 material science code
21
Case study: LAPW0 (views)
22
Case study: LAPW0 (charts)
23
Outline
• Performance Analysis and the Grid
• Automatic Experiment Management
• JavaSymphony: A New Programming
Method for the Grid
• Summary
24
Management of Experiments and Parameter Studies
Currently scientists– manually create parameter studies– manage many different sets of input data– launch large number of compilations and executions– administer result files– invoke performance analysis tools– interpret/visualize performance and parameter
results, etc.
This is a tedious, error-prone, and time consuming process.
25
ZENTURIO: An Automatic Experiment Management Framework for Cluster and Grid ArchitecturesSupport for scientists to semi-
automatically conduct large sets of– parameter studies
• throughput versus high-performance computing
– performance studies– software tests
on cluster and Grid architectures.
26
ExperimentExecutorService
Scheduler
E-Site
ZENTURIO A Web Service based
Architecture
Mid
dlew
are
ExperimentPreparation
application
compilation executioncommand
machine
ExperimentData
Repository
ExperimentMonitor
User Portal
ExperimentGenerator
Service
G-Site
Instrumentation
ApplicationData
Visualiser
RegistryService
27
Application Parameters and Value Sets
• Performance and parameter results depend on application parameters and their value sets.– machine sizes {x CPUs, y Grid sites, …}– problem sizes {x atoms, matrix size, …}– program variables {1,2,3,16:110:2}– data distributions {block, cyclic, …}– loop scheduling strategies {static, guided, …}– communication networks {Myrinet, FastEthernet, …}– input/output file names, etc.
• An Experiment is defined by its sources with every application parameter replaced by a specific value.
28
ZEN: Directive-based LanguageSpecification of Arbitrary Complex Experiments
•Set of directives to specify value sets of interest for arbitrary application parameters.• Directives:
– assignment– substitute– constraint– performance
• Annotation of arbitrary source/input files– program files, Makefiles, scripts, input files, etc.
•ZENTURIO generates sources for every different experiment based on ZEN directives.
!ZEN$ CR CR_OMPDO, CR_CALLS PERF WTIME, OSYNC BEGIN!$OMP DO SCHEDULE(STATIC). . .!$OMP END DO NOWAIT!$OMP BARRIER
!ZEN$ END CR
ZEN Performance Behaviour Directive
• request performance data for arbitrary code regions– CR_P = entire program– CR_L = all loops– CR_OMPDO = OpenMP do regions– CR_CALLS = procedure calls– WTIME = execution time– ODATA = data movement– OSYNC = synchronisation
• 50 code region mnemonics• 40 performance metrics• supported by SCALEA
33
ExperimentPreparation
34
ZENTURIO User Portal
35
ApplicationDataVisualiser(ADV)
36
Scalability Fast Ethernet
38
Backward PricingTotal Price Evolution
5 10 15 20 25 30 35 40 45 50 55 600.01
0.05
0.09
0
2000
4000
6000
8000
10000
12000
14000
Total Price
Number of Timesteps
Coupon
Backward Pricing, delta-t = 1.0
0-2000 2000-4000 4000-6000 6000-8000
8000-10000 10000-12000 12000-14000
0.08
0.24
0.4
0.56
0.72
5
25
45
0
5000
10000
15000
20000
25000
30000
35000
Total Price
delta-t
Number of Timesteps
Backward Pricing, coupon=0.05
0-5000 5000-10000 10000-15000 15000-20000
20000-25000 25000-30000 30000-35000
39
JavaSymphony (100 % Java) - new object-oriented programming paradigm of concurrent and distributed systems
– portability– higher level programming– simple access to resources– explicit control of locality and parallelism– performance-oriented
JavaSymphony programming model:– dynamic virtual architectures (VAs)– API for system parameters– single- and multi-threaded remote distributed objects – distribution/migration of objects and code– asynchronous und one-sided (remote) method invocation– synchronization and events (distributed)
And all of that without programming RMI, sockets, and threads!
JavaSymphonyHigh-Level Object-Oriented Programming of Grid Applications
40
Summary• Performance analysis for the Grid
– higher-level analysis, performance interpretation, multi-experiments, automatic, customizable, – high-level performance instrumentation interface– standardization of performance data
• Multi-Experiment Performance Analysis and Parameter studies for the Grid– request for arbitrary number of experiments– automatic management of experiments– fault tolerance, events – combine with schedulers and performance tools
• JavaSymphony: A new Programming Model for Grid Applications– Explicit control of locality, parallelism, and load balancing at a high level– dynamic virtual architectures, events, synchronization, migration, multi-threaded objects, asynchronous/snychronour/one-sided
remote methods– no RMI, socket or thread programming
A Tool Set for Cluster and Grid Architectures
University of Innsbruck/ Institute for Computer Science / T. Fahringer 42
ArchitecturesArchitectures NOWs PC-Clusters SMP Clusters GRID Systems DM/SM Systems