Interactive analysis of large distributed systems with scalable topology-based visualization

Interactive Analysis of Large Distributed Systemswith Scalable Topology-based Visualization

Lucas Mello Schnorr Arnaud Legrand Jean-Marc VincentINRIA MESCAL Research Team, CNRS LIG Laboratory, University of Grenoble

Abstract

The performance of parallel and distributed applications ishighly dependent on the characteristics of the execution envi-ronment. In such environments, the network topology and char-acteristics tell how fast data can be transmitted and placed inthe resources. These are key phenomena to understand the be-havior of such applications and possibly improve it. Unfortu-nately few visualization available to the analyst are capableof accounting for such phenomena. In this paper, we proposean interactive topology-based visualization technique based ondata aggregation that enables to correlate network characteris-tics, such as bandwidth and topology, with application perfor-mance traces. We show that such kind of visualization enablesto explore and understand non trivial behavior that are impos-sible to grasp with classical visualization techniques. We alsoshow that the combination of multi-scale aggregation and dy-namic graph layout allows our visualization technique to scaleseamlessly to large distributed systems. These results are vali-dated through a detailed analysis of a high performance com-puting scenario and of a grid computing scenario.

1. IntroductionTo achieve desirable capability, today’s parallel or dis-

tributed platforms generally consist of hundreds to millions ofprocessing units interconnected by complex hierarchical net-works. The performance of large scale distributed applicationis thus highly dependent on the characteristics (bandwidth, la-tency, topology) of the interconnection network and of the pro-cessing nodes. To obtain good performance, application devel-opers need to keep in mind data locality and to organize datamovement both in space and time without being harmed by po-tential congestion arising from the application itself or fromother applications competing for resources. To perform suchoptimization and understand performance issues in such sys-tems it is thus crucial to rely on well-suited analysis tools.

Several visualization techniques have been developed totackle the issue of performance analysis and give a view of thebehavior of applications. Besides statistical charts with profil-ing information, a common technique is called timeline views,based on Gantt-charts [39]. It depicts the individual behaviorof processes and the interactions among them. Several types

of performance problems can be detected: slower processes,late senders, critical path identification and evaluation and soon. In the situations when performance issues might be corre-lated to the network, this visualization technique fails to linkthe application state to the actual cause of problem. The mainreason is because timelines have no way to depict topology to-gether with application traces. Other visualization techniquesshare the same problem. Communication matrices and statis-tical charts, for example, present per-process interactions andglobal summaries, with no network correlation. Existing solu-tions [14, 1] for a topological analysis of distributed systemssuffer from serious scalability problems.

In this paper, we propose a novel scalable technique forinteractive analysis of large-scale distributed systems using atopology-based visualization. The method easily correlatesnetwork bandwidth and topology with the application behav-ior. Although topology-based visualization are not new, theyare generally difficult to configure and to use in practice andscale generally badly with the amount of information to display.We address these issues with a combination of two techniques:Multi-scale data aggregation. Some behavior appears only

in a particular information scale. Our approach uses a dataaggregation policy to reduce and analyze information indifferent user-defined space and time scales. The spacedimension is chosen by the analyst by grouping nodes inthe topology-based representation, while the time dimen-sion is configured through the use of time-slices.

Dynamic and interactive graph layout. Dynamic node ag-gregation requires to recompute the graph layout, whichmay confuse the analyst if there is too much changes be-tween the two layouts. Therefore, our topology-based rep-resentation relies on a force-directed graph algorithm thatenables a smooth evolution of nodes position. The userinfluences the algorithm by interactively moving and ag-gregating/disaggregating group of nodes and possibly byadjusting repulsion and attraction parameters.

By letting the analyst a complete control on a few key parame-ters, we allow him to explore dynamically the application tracein correlation with the platform topology easily.

The paper is structured as follows. Section 2 presents therelated work and a discussion on the differences between ourapproach and existing solutions. Section 3 details the main in-gredients of our proposal: how traces are mapped to the graph,

https://www.researchgate.net/publication/2917166_Gantt_Charts_A_Centenary_Appreciation?el=1_x_8&enrichId=rgreq-588783f7-be90-42d3-9358-15253f024d11&enrichSource=Y292ZXJQYWdlOzIzNjk0NDk2NDtBUzoxMDEzNDI5NjE5OTU3NzZAMTQwMTE3MzQ1MzEyNg==

https://www.researchgate.net/publication/265040470_FULLY-DISTRIBUTED_DEBUGGING_AND_VISUALIZATION_OF_DISTRIBUTED_SYSTEMS_IN_ANONYMOUS_NETWORKS?el=1_x_8&enrichId=rgreq-588783f7-be90-42d3-9358-15253f024d11&enrichSource=Y292ZXJQYWdlOzIzNjk0NDk2NDtBUzoxMDEzNDI5NjE5OTU3NzZAMTQwMTE3MzQ1MzEyNg==

https://www.researchgate.net/publication/223473889_OverView_A_Framework_for_Generic_Online_Visualization_of_Distributed_Systems?el=1_x_8&enrichId=rgreq-588783f7-be90-42d3-9358-15253f024d11&enrichSource=Y292ZXJQYWdlOzIzNjk0NDk2NDtBUzoxMDEzNDI5NjE5OTU3NzZAMTQwMTE3MzQ1MzEyNg==

the multi-scale data aggregation and the dynamic and interac-tive layout. Section 4 presents some implementation detailsabout the interactivity of our approach. Section 5 presents a de-tailed analysis of two scenarios. The first case study is deriveda from high performance computing setting and illustrates non-trivial insights that can be obtained with our visualization andthat would be impossible to grasp with more classical repre-sentations. The second case study is based on a grid computingsetting and illustrates the necessity of multi-scale aggregationcapabilities and how it can be used to investigate locality andresource sharing between competing applications.

2. Related workWe classify related research in three sections: analy-

sis methodologies, performance visualization techniques, andgraph drawing techniques. The first presents the many dataanalysis methodologies employed to analyze collected infor-mation. The second presents interactive visualization tech-niques used to analyze traces. The third section presents ex-isting graph drawing techniques that could be used in perfor-mance analysis. The section ends with a discussion about howour approach combines the techniques of these related researchareas to provide a novel way to visualize performance data.

2.1. Data analysis methodologies

Several methodologies exist to do performance analysis.The analysis of profiles, traces and statistics of a given exe-cution may be, for example, text or graphics-based, interac-tive or automatic, online versus postmortem and so on. Themethodology adopted for a given large-scale application highlydepends on the nature of performance issues that are under in-vestigation. Questions such as the characteristics of the ap-plication, whether it is CPU or network-bound, among others,must be taken into account and be used to select the best anal-ysis methodology for each case.

Even if there is no single solution in terms of analysismethodology for performance analysis, a simple approach isto get an overview first, then look for details. The overviewcan be obtained using profiling techniques, executing either adirect or indirect measurement. After solving high-level perfor-mance problems, the analyst can obtain a more detailed behav-ior by tracing the application with well-chosen instrumentationpoints. Traces from large-scale distributed and parallel appli-cations are usually centralized to allow an interactive analysisof the data. Trace analysis is usually performed with the aid ofvisualization tools, sometimes called trace browsers. Examplesof these tools are Vampir [8], Paje [13], ViTE [12] among manyothers [21, 29, 40, 31].

More recently, automatic trace analysis [27, 18] emerged asa solution where previously known performance problems areharvested by a program. The development of automatic patterndetection appeared to be a more scalable alternative to the inter-active and serial analysis done by a human analyst. While thereis development to provide intra-process pattern detection [16],the combination of this with inter-process pattern detection is

also already explored [23]. The automatic approach for traceanalysis is also used to correlate the application-level commu-nication topology with possible performance issues [36]. Au-tomatic trace analysis is also implemented in Scalasca to lookfor known performance problems, such as the source of waitstates [5]. Although most of these solutions are used in apostmortem fashion, the online approach with automatic tun-ing [22] is also found in the literature.

Clustering algorithms [25, 20], sometimes with hierar-chies [2], are also explored in performance analysis. Group-ing processes behavior by similarity is used in tools such asVampir [8] to decrease the number of processes listed in thetime-space view. Although the different techniques of auto-matic trace analysis previously discussed also employ cluster-ing to group processes patterns, some argue that a subset ofclustering algorithms used at a coarse grain level are no longeradequate [18]. Another approach tries to define a similaritymetric that gives a good trace size reduction and at the sametime keeps enough data for a correct analysis [28].

2.2. Visualization techniques

Visualization techniques for performance analysis havebeen present since the earliest time of parallel and distributedapplication analysis. Different visual representations of tracesgive the analyst the power to observe outliers and look for per-formance issues by actually seeing what has happened duringthe execution. We classify the visualization techniques accord-ing to how data is presented: behavioral, with data depictedalong a timeline; structural, showing how monitored elementsare interconnected, without a timeline; and statistical, groupingscatter-plot representations.

The best well-known and intuitive example of a behav-ioral representation is the timeline view, derived from Gantt-charts [39]. It lists all the observed entities, sometimes orga-nized as a hierarchy [13], in the vertical axis. Their behavioris represented along time in the horizontal axis: rectangles rep-resent application states, while links represent communicationsamong the observed entities. Examples of tools providing thistype of visualization are Vampir [8], Paje [13] Projections [21]and many others [12, 29, 40]. The advantage of timeline viewsis the emphasis on time and event causality, enabling a finegrain performance analysis. Timeline views are, however, nat-urally limited by the size of the screen. Only a subset of entitiescan be observed at the same time. Some tools, especially Vam-pir, already incorporate techniques to work-around these visu-alization scalability issues. It has a clustering algorithm [8] toreduce the number of entities in the vertical axis, and a methodto hide repetitive patterns in the horizontal axis [23].

Structural techniques are a different kind of trace visualiza-tion where the structure of the observed application or systemis primarily depicted. Techniques that enter this classificationare the topology-based visualization of ParaGraph [19]; a threedimensional representation for large-scale grid environmentswith the network interconnection [30]; the call-graph represen-tation of ParaProf [6] and Virtue [35]; and communication ma-

https://www.researchgate.net/publication/220839745_Visualization_of_Repetitive_Patterns_in_Event_Traces?el=1_x_8&enrichId=rgreq-588783f7-be90-42d3-9358-15253f024d11&enrichSource=Y292ZXJQYWdlOzIzNjk0NDk2NDtBUzoxMDEzNDI5NjE5OTU3NzZAMTQwMTE3MzQ1MzEyNg==

https://www.researchgate.net/publication/220953456_Automatic_Trace-Based_Performance_Analysis_of_Metacomputing_Applications?el=1_x_8&enrichId=rgreq-588783f7-be90-42d3-9358-15253f024d11&enrichSource=Y292ZXJQYWdlOzIzNjk0NDk2NDtBUzoxMDEzNDI5NjE5OTU3NzZAMTQwMTE3MzQ1MzEyNg==

https://www.researchgate.net/publication/225142840_KOJAK_-_A_Tool_Set_for_Automatic_Performance_Analysis_of_Parallel_Programs?el=1_x_8&enrichId=rgreq-588783f7-be90-42d3-9358-15253f024d11&enrichSource=Y292ZXJQYWdlOzIzNjk0NDk2NDtBUzoxMDEzNDI5NjE5OTU3NzZAMTQwMTE3MzQ1MzEyNg==

https://www.researchgate.net/publication/220951613_A_systematic_multi-step_methodology_for_performance_analysis_of_communication_traces_of_distributed_applications_based_on_hierarchical_clustering?el=1_x_8&enrichId=rgreq-588783f7-be90-42d3-9358-15253f024d11&enrichSource=Y292ZXJQYWdlOzIzNjk0NDk2NDtBUzoxMDEzNDI5NjE5OTU3NzZAMTQwMTE3MzQ1MzEyNg==

https://www.researchgate.net/publication/220781502_Evaluating_similarity-based_trace_reduction_techniques_for_scalable_performance_analysis?el=1_x_8&enrichId=rgreq-588783f7-be90-42d3-9358-15253f024d11&enrichSource=Y292ZXJQYWdlOzIzNjk0NDk2NDtBUzoxMDEzNDI5NjE5OTU3NzZAMTQwMTE3MzQ1MzEyNg==

https://www.researchgate.net/publication/220918556_Comprehensive_Performance_Tracking_with_Vampir_7?el=1_x_8&enrichId=rgreq-588783f7-be90-42d3-9358-15253f024d11&enrichSource=Y292ZXJQYWdlOzIzNjk0NDk2NDtBUzoxMDEzNDI5NjE5OTU3NzZAMTQwMTE3MzQ1MzEyNg==

https://www.researchgate.net/publication/220918556_Comprehensive_Performance_Tracking_with_Vampir_7?el=1_x_8&enrichId=rgreq-588783f7-be90-42d3-9358-15253f024d11&enrichSource=Y292ZXJQYWdlOzIzNjk0NDk2NDtBUzoxMDEzNDI5NjE5OTU3NzZAMTQwMTE3MzQ1MzEyNg==

https://www.researchgate.net/publication/222693478_Paje_an_interactive_visualization_tool_for_tuning_multi-threaded_parallel_applications?el=1_x_8&enrichId=rgreq-588783f7-be90-42d3-9358-15253f024d11&enrichSource=Y292ZXJQYWdlOzIzNjk0NDk2NDtBUzoxMDEzNDI5NjE5OTU3NzZAMTQwMTE3MzQ1MzEyNg==

https://www.researchgate.net/publication/220949175_Automatic_Detection_of_Parallel_Applications_Computation_Phases?el=1_x_8&enrichId=rgreq-588783f7-be90-42d3-9358-15253f024d11&enrichSource=Y292ZXJQYWdlOzIzNjk0NDk2NDtBUzoxMDEzNDI5NjE5OTU3NzZAMTQwMTE3MzQ1MzEyNg==

https://www.researchgate.net/publication/220949175_Automatic_Detection_of_Parallel_Applications_Computation_Phases?el=1_x_8&enrichId=rgreq-588783f7-be90-42d3-9358-15253f024d11&enrichSource=Y292ZXJQYWdlOzIzNjk0NDk2NDtBUzoxMDEzNDI5NjE5OTU3NzZAMTQwMTE3MzQ1MzEyNg==

https://www.researchgate.net/publication/222833152_Scaling_applications_to_massively_parallel_machines_using_Projections_performance_analysis_tool?el=1_x_8&enrichId=rgreq-588783f7-be90-42d3-9358-15253f024d11&enrichSource=Y292ZXJQYWdlOzIzNjk0NDk2NDtBUzoxMDEzNDI5NjE5OTU3NzZAMTQwMTE3MzQ1MzEyNg==

https://www.researchgate.net/publication/4162429_Automatic_experimental_analysis_of_communication_patterns_in_virtual_topologies?el=1_x_8&enrichId=rgreq-588783f7-be90-42d3-9358-15253f024d11&enrichSource=Y292ZXJQYWdlOzIzNjk0NDk2NDtBUzoxMDEzNDI5NjE5OTU3NzZAMTQwMTE3MzQ1MzEyNg==

https://www.researchgate.net/publication/3045058_Measuring_benchmark_similarity_using_inherent_program_characteristics?el=1_x_8&enrichId=rgreq-588783f7-be90-42d3-9358-15253f024d11&enrichSource=Y292ZXJQYWdlOzIzNjk0NDk2NDtBUzoxMDEzNDI5NjE5OTU3NzZAMTQwMTE3MzQ1MzEyNg==

https://www.researchgate.net/publication/220285429_Triva_Interactive_3D_visualization_for_performance_analysis_of_parallel_applications?el=1_x_8&enrichId=rgreq-588783f7-be90-42d3-9358-15253f024d11&enrichSource=Y292ZXJQYWdlOzIzNjk0NDk2NDtBUzoxMDEzNDI5NjE5OTU3NzZAMTQwMTE3MzQ1MzEyNg==

https://www.researchgate.net/publication/2776852_Toward_Scalable_Performance_Visualization_with_Jumpshot?el=1_x_8&enrichId=rgreq-588783f7-be90-42d3-9358-15253f024d11&enrichSource=Y292ZXJQYWdlOzIzNjk0NDk2NDtBUzoxMDEzNDI5NjE5OTU3NzZAMTQwMTE3MzQ1MzEyNg==

https://www.researchgate.net/publication/4003807_A_trace-scaling_agent_for_parallel_application_tracing?el=1_x_8&enrichId=rgreq-588783f7-be90-42d3-9358-15253f024d11&enrichSource=Y292ZXJQYWdlOzIzNjk0NDk2NDtBUzoxMDEzNDI5NjE5OTU3NzZAMTQwMTE3MzQ1MzEyNg==

https://www.researchgate.net/publication/220953194_Towards_scalable_performance_analysis_and_visualization_through_data_reduction?el=1_x_8&enrichId=rgreq-588783f7-be90-42d3-9358-15253f024d11&enrichSource=Y292ZXJQYWdlOzIzNjk0NDk2NDtBUzoxMDEzNDI5NjE5OTU3NzZAMTQwMTE3MzQ1MzEyNg==

https://www.researchgate.net/publication/220767402_Run-Time_Automatic_Performance_Tuning_for_Multicore_Applications?el=1_x_8&enrichId=rgreq-588783f7-be90-42d3-9358-15253f024d11&enrichSource=Y292ZXJQYWdlOzIzNjk0NDk2NDtBUzoxMDEzNDI5NjE5OTU3NzZAMTQwMTE3MzQ1MzEyNg==

trices, implemented in Vampir [8] and others. Their main dif-ference is that they are independent of a timeline; they considervalues at a given point in time or time-aggregated data.

Statistical visualization techniques is the most common wayof representing data. It consists mainly in statistical charts,scatter-plots and summaries of tracing data. Vampir [8] hasa number of charts of this type, such as a function, processand message summary. It also has a call tree that fits into thisclassification. Bar and pie charts also are part of this cate-gory, and are present in several visualization tools such as Para-Graph [19], Paradyn [26], and Paje [13]. Kiviat diagram [19]and statistical 3D representations [6] are other examples of sta-tistical representations.

2.3. Graph Drawing

Graph drawing concerns with all the aspects related to thegeometric representation of graphs. Several applications makesuse of the techniques proposed by graph drawing, such as socialnetwork analysis, and cartography. Computer network topolo-gies also applies these techniques to create graphical represen-tations of the interconnection. By convention, graph drawingsare frequently drawn as bi-dimensional node-link diagrams.Several quality measures are taken into account when drawinga graph: area used, symmetry, angular resolution (the sharpestangles among edges), and crossing number (maximum numberof crossing edges) to cite some of them.

The definition of the nodes position is one of the most im-portant parts for graph drawing. There are many methods todefine them: circular, radial, hierarchical, and force-based, forexample. Each of these general methods have several spe-cific algorithms that may be applied depending on the graphconfiguration and visualization required. Graphviz [15] andthe Open Graph Drawing Framework [10] implement many ofthese methods, which generally follow a static approach: nodesand edges are given to the algorithm, generating a resultinglayout with node position. Some layout algorithms are betteradapted to dynamic layouts, which have to evolve due to graphchanges. Force-directed algorithms [17] iterate to converge toa stable layout, being well adapted to dynamic graphs.

2.4. Discussion

Our approach proposes a novel interactive graph visualiza-tion that considers the network topology in the performanceanalysis. In this comparison with related work, we contrast ourapproach with existing visualization techniques and then com-pare our methodology with other data analysis approaches.

The traditional timeline view is expected by most of usersof high performance computing, as can be observed by thenumber of visualization tools that implement it. Although use-ful and well-suited to behavior analysis in homogeneous envi-ronments, such as small-scale clusters where network latencyand bandwidth heterogeneity is limited, timeline views lacktopological information. Contrasting behavior with topologi-cal data is sometimes crucial for the comprehension of appli-cation behavior in distributed systems and heterogeneous sys-

tems. Our graph-based representation can be classified as astructural technique, where the topology, usually that from thelogical or physical interconnection, is represented. ParaGraph,Triva and OverView are examples of tools that have topolog-ical representations. ParaGraph’s approach differs from oursin two points: the graph representation is fixed according tothe physical network topology, whereas our method gives theanalyst the freedom to choose how elements of the graph aregoing to be connected in the representation; and second point,our graph representation is capable to represent time and spa-tial aggregated data, while ParaGraph topological representa-tion [19] only shows instantaneous information. Triva alreadyimplemented a three dimensional visualization with the net-work topology depicted in the base of the visualization [30], butthe technique only shows timestamped events along the verti-cal time axis, without the possibility to aggregate them in spaceand time and using these aggregated values to customize thegraph. Another aspect of our approach is that we employ a dy-namic force-directed layout system, improving the interactivityof the analyst with the monitored elements. OverView [14] hasa topological representation of distributed systems, also withforce-directed algorithm for positioning, but it does not allowto spatial or temporal aggregated data, which turns out to bean essential feature as we will explain later. Visualization tech-niques in two and three dimensions to analyze network trafficin contrast with application behavior is also present [24, 34],but they are limited to regular topologies such as those foundin Blue Gene systems.

Our analysis methodology is interactive and ex-ploratory [37] in time and space. The idea is to let theanalyst freely navigate through the different dimensions ofperformance data to find issues and potential improvementsto do. The automatic analysis is a different approach were acomputer program searches the traces for known and expectedproblems. We believe that these automatic techniques arelimited to the situations they were conceived to detect, possiblymissing unexpected behaviors. An automatic methodologyshould only be applied to guide the analyst through an ex-ploratory and interactive performance analysis. An example ofthis is Vampir, applying a clustering algorithm to increase thescalability of its timeline view.

The use of multi-scale data aggregation with dynamic graphlayout to obtain a scalable interactive graph-based visualiza-tion that allows to perform exploratory data analysis is the keyidea of our contribution. By using such visualization, the an-alyst is capable to navigate easily through time and space ag-gregated data, while keeping the ability to correlate such datato the topology at any time. Such feature enables the easy de-tection of several performance issues such as performance bot-tlenecks and inefficiencies in the hosts or in the network of aparallel and distributed environment. Next Section details ourapproach, followed by implementation details and case studiesthat show the usefulness of our proposal.

3. The Scalable Topology-based VisualizationOur visualization approach proposes an interactive analysis

of large-scale distributed systems considering topological in-formation. It tackles complexity in multiple ways: the diver-sity of trace data and how to represent this data visually; themulti-scale data aggregation techniques to address scalabilityissues; and the dynamic updates on the representation causedby changes of trace data in time and space, along with the inter-action from the analyst. Next subsections detail each of thesepoints.

3.1. Mapping trace metrics to the graph

In our approach, the nodes of the graph represent monitoredentities, while edges indicate only a relationship between twomonitored entities. The visualization is enhanced by mappingthe metrics of each monitored entity to all available geometricshapes and properties of its corresponding node. For example,a square can be used to represent a host, its size according toits computing power; a diamond to a network link, its size ac-cording to the bandwidth utilization, and so on.

The key-point of a certain mapping is how easy the visualdifferences among the objects are perceived by the analyst. Inour experience, we have observed how crucial these differencesare to perceive the data in the visualization. For this reason,we have chosen to limit the amount of geometric shapes andattributes. Only simple shapes and properties are used: square,diamond and circle as representations; node color and size, andan optional filling of the shapes as properties.

Any mapping defined can be dynamically changed at a givenpoint of the analysis. This might be motivated by needs fromthe analysis, but also by a different set of available metrics inanother part of the trace file that should be represented in adifferent way. We detail next an example of the mapping fromtrace to graph.

3.1.1 Example of mapping

This section presents a simplified example to illustrate theconcepts while Section 5 presents more realistic case stud-ies. Traces are composed of timestamped events that representthe available computing power for two hosts and the availablebandwidth for one link. The left side of Figure 1 depicts re-source availability (solid lines) and utilization (dashed lines)for them. We map hosts to squares and links to diamonds. Theavailable resource capacity defines the size of each geometricshape at a given timestamp; the resource utilization defines theproportional fill inside each of them. The plot in the left ofthe Figure 1 contains three cursors (A, B and C) that define thetimestamp used to draw the three graph representations.

How the monitored entities are connected to each other re-quires additional information which is obtained from differentsources. One possibility is to use traces with the messages ex-changed among processes, using the communication pattern tointerconnect processes. A second possibility is to use connec-tion data that is fixed, previously defined, as in the case when

time

HostB

MFlops

HostA

MFlops

LinkA

Mbits

HostA HostBLinkA

HostAHostB

LinkA

HostA

LinkAHostB

A B CA

B

C

Figure 1. From Trace metrics (left) to the graph representation (right:with squares as hosts; diamond as link).

the monitored entities are part of the network topology of adistributed systems. Finally, the information can be dynami-cally provided by the analyst, depending on the needs for theanalysis. Independently of the data source, the connection in-formation must be merged or combined to the rest of the tracesto provide a single visualization.

Mapping a selected subset of trace values to the topologicalrepresentation enables the analyst to reduce the analysis com-plexity. Even if an appropriate subset is chosen, the amount ofdata can be significant imposing restrictions on the analysis oflarge-scale traces containing thousands of elements and a verydetailed behavior registered along time. To tackle complex andlarge-scale machine and network configurations, our approachdata aggregates along both the space and time dimensions. Thetechniques employed are described in the next Subsection.

3.2. Multi-scale data aggregation

The observation of large-scale distributed systems over along period of time usually leads to large amounts of tracedata that are particularly hard to understand. Some behavioris present only at small time or space scale, others appear at alarger scale. The understanding of these different phenomenarequires easy navigation in all scales of trace data, especiallythose related to time and space.

We briefly detail how data aggregation is formally definedin our context. Let us denote by R the set of resources and byT the observation period. Assume we have measured a givenquantity ρ on each resource:

ρ :

{R× T → R(r, t) 7→ ρ(r, t)

In our context, ρ(r, t) could for example represent the com-puting power availability of resource r at time t. It could alsorepresent the (instantaneous) amount of computing power allo-cated to a given project on resource r at time t. As shown inFigure 1, we may have to depict several types of information atonce to investigate their correlation through a topology-basedrepresentation.

Assume we have a way to define a neighborhood NΓ,∆(r, t)of (r, t), where Γ represents the size of the spatial neigh-borhood and ∆ represents the size of the temporal neighbor-hood. In practice, we could for example choose NΓ,∆(r, t) =

[r−Γ/2, r+Γ/2]×[t−∆/2, t+∆/2], assuming our resourceshave been ordered. Then, we can define an approximation FΓ,∆

of ρ at the scale Γ and ∆ as:

FΓ,∆ :

R× T → R

(r, t) 7→∫∫

NΓ,∆(r,t)

ρ(r′, t′).dr′.dt′(1)

This function averages the behavior of ρ over a given neigh-borhood of size Γ and ∆. Since both Γ and ∆ can be con-tinuously adjusted by the analyst during a specific analysis, itis possible to choose a specific set of resources and a specifictime-slice. Once a set of resources has been defined, we can seeits evolution through time by shifting the corresponding frameconsidering other time intervals.

Now that we have formally defined the data aggregationfor our approach considering the space and time dimensions,we detail next how data aggregation affects the topology-basedrepresentation. For each of them, we detail how the neighbor-hood is dynamically defined by the analyst, and how they aremapped to the properties of the graph representation.

3.2.1 Temporal aggregation

Equation 1 shows that the data aggregation considers a neigh-borhood for the time and space scales. Taking into accountonly the time dimension, the temporal neighborhood is rep-resented by a time-slice, defined by the analyst according tothe analysis. Figure 2 illustrates an example where one moni-tored entity (HostA) has two trace metrics: computing powercapacity and utilization. The time-slice is represented in thefigure by the period of time between the two cursors (A1 andA2). Both values associated with the monitored entity are time-integrated, generating as result two values. These values arefinally mapped to the graph representation, defining the size ofthe node HostA (in the right part of the figure) as the valueof the time-integrated computing power within the time-slice,and the node filling as the time-integrated resource utilizationvalue.

HostA

MFlops

resourceutilization

HostA

MFlops

Computingpower available

HostAHostB

LinkA

A1 A2Time-slice

Figure 2. Time-aggregated metrics of HostA mapped to its represen-tation in the graph.

The analyst has the freedom to choose different time-slicesduring an analysis. Such feature should be configured withcare, since the aggregation as detailed here attenuates the be-havior of the traces for events that are smaller than the chosentime interval. Although this may lead to a bad interpretation ofthe topology-based representation, the benefit of freely choos-ing a time-frame usually leads to a better detection of anoma-

lies and unexpected behavior [33] by showing information thatwould be otherwise unavailable without time aggregation.

3.2.2 Spatial aggregation

For data aggregation in the space dimension, we consider thatthe analyst is capable to select a proper neighborhood of mon-itored entities. This neighborhood can be monitored entitiesthat are closely interconnected, such as a cluster of hosts, or apool of workstations in the same physical or virtual location.Depending on trace characteristics, neighborhood data can beinherited from them through the definition of groups, possiblyhierarchically organized. The choice of neighborhood may alsodepend on the analysis, depending if the analyst wants to groupsimilar entities to focus on outliers.

Figure 3 shows an example that illustrates how the spatialaggregation affects the topology-based representation. As be-fore, resources are represented by shapes whose size is accord-ing to their capacity; utilization is used as filling. For this ex-ample, the time-slice is fixed. In the left of the figure, GroupAindicates the first neighborhood taken into account during thefirst spatial aggregation. All data within this group is space ag-gregated following the Equation 1. The resulting representationis depicted in the center of the figure, surrounded by the dashedgray line: it combines a square, representing all hosts, and a di-amond, representing all links (in this case there is only one linkfrom GroupA). The properties of these two geometric shapesare calculated according to the space-aggregated values of thetraces, considering all the entities within the group used to dothe aggregation. The example ends with a second spatial ag-gregation, considering the whole GroupB, with all monitoredentities. The right-hand-side figure contains only one squarethat represents all the hosts and only one diamond that repre-sents all the links from the initial representation.

1st Space Aggregation 2nd Space AggregationGroupA

GroupB

Figure 3. Two spatial-aggregation operations and how they affect thetopology-based representation.

Besides increasing the quality of the analysis, spatial ag-gregation also plays a major role in the scalability of thetopological-based representation. Graph structures often raisescalability issues as we increase the number of nodes, com-puting a layout that does not hide correlation patterns becomesmore and more difficult. A possible solution is to interactivelyaggregate parts of the graph, keeping the average behavior ofsuch aggregated parts with spatially aggregated data. The nextSubsection describes how we dynamically define the positionof the nodes of the graph to enable the analysis of large-scalescenarios.

3.3. Dynamic graph layout

Another aspect of a graph view is the location of nodes andedges in the representation. In our context, the location of thenode in the screen depends on two factors:• First, the analyst needs to interact with the representation

to inspect and understand the behavior of the monitoredentities. This interaction requires, for instance, that somenodes be grouped using aggregated data, making a set ofnodes be replaced by an aggregated node, changing theirpositions.• Second, the traces we are considering register dynamic

distributed systems where monitored elements might bepresent during one period but not during another. Thepresence of the different types of events might also changeduring the observation period, influencing the mapping ofthose to the graph and finally their location.

An existing and widely used solution for graph positioningis a static layout, a technique already present in several graphdrawing tools as discussed in Section 2. A static layout mighthave several types of graph layout algorithms that are selectedaccording to the nature of the graph. It considers a fixed setof nodes and edges, defining their position in a non-interactivestep. If a node has to be added or removed, the whole algorithmmust be executed again to take it into account.

Static layout techniques are unfitted within our approachmostly because they are non-interactive and do not enableto cope with analyst intervention and needs. Tools such asGraphviz [15], which has a library to provide positioning fornodes and edges, have different layout algorithms that are usu-ally not scalable when a larger graph is provided. These rea-sons, together with the requirements for our approach, indicatethat a dynamic and interactive graph layout is a better solution.

We have opted to use a force-directed algorithm for graphdrawing to provide a dynamic and interactive layout. This classof algorithms works by assigning physical forces to nodes andedges, where the most common way of doing this is to assigna spring force to nodes that are connected, and an electricalcharge to nodes. When a new node has to be added to the graph,the algorithm keeps iterating and adapting the positions of ev-erybody taking into account the new node. The basic force-directed algorithm has severe performance problems on scale– O(n2) – with n being the number of nodes in the graph. Inour approach, we adopt the scalable Barnes-hut algorithm [3] –O(n log n) – combined with the hierarchical information fromthe traces.

4. Implementing InteractivityInteractivity plays a major role in our approach because it is

the way the analyst has to inspect and change the representationaccording to the analysis. This section presents implementationdecisions regarding interactivity: how multiple data scales arehandled, and how the force-directed algorithm is configured.The implementation is conducted as part of an existing open-source visualization tool called VIVA1, extending its features.

1http://github.com/schnorr/viva/

4.1. Dealing with multiple scales

The analysis of traces from distributed and parallel systems,as well from applications, is generally composed of severaltypes of information. Each type may potentially have metricswith their own scale. Computing power is likely to be mea-sured in Megaflops, network data traffic might be measured inMegabit/second, and so on. Such differences of scale leads toa topology-based visualization with geometric shapes whosesizes might be different, and thereby not comparable. Our im-plementation defines an independent scaling for each kind ofmetric present in the traces to define the pixel size of the geo-metric shapes in the topological representation. This multiple-scaling feature, one for each type of trace data, enables theanalysis of monitored entities of different nature. The imple-mentation automatically defines the initial scaling so that themaximum size of all objects are the same.

Figure 4 illustrate how the automatic scaling of our imple-mentation works. The values depicted in the figure within thegeometric shapes are in Megaflops, for hosts (squares), and inMegabits/second, for links (diamond). For the schemes A andB, the host and link size scales are kept in the middle of theirbottom sliders, meaning that the automatic scaling calculatedby our implementation is used by default. Considering the dataaggregation defined by the time-slice (on top) of scheme A,HostA has a value of 100 Megaflops, HostB is four-timessmaller, with 25 Megaflops, and the LinkA has a capacity of10000 Megabits/second. In scheme B, the change of the time-slice generates different aggregated values for the hosts. Thistime HostB has a bigger computing power than HostA. Thesize of HostB in the representation is the maximum size al-lowed for objects, meaning that the 40 Megaflops is mappedto a screen size that equals the 100 Megaflops size of schemeA. This is expected since we always map the bigger size of atype of object within a time-slice to the maximum pixel size ofobjects in the representation. Finally, in scheme C, we show achange in the interactive sliders that configure the independentper-object type scaling. We kept the same time-slice than theone of scheme B, but we configure hosts to be bigger, links tobe smaller. The analyst can interactively configured these slid-ers to focus the analysis on one type of objects, for example, oradjust the scaling according to other requirements.

HostA

100

LinkA

10000HostB

25

A time slice

host size scale link size scale

HostB

40

LinkA

10000HostA

10

B time slice


HostB

40

LinkA

10000

HostA

10

C time slice


Figure 4. Three scenarios showing the definition of per-type scales andthe operation of scaling interactive sliders.

4.2. Force-directed parameters

As previously discussed in Section 3.3, nodes position aredefined by the Barnes-Hut force-directed algorithm. This algo-rithm has three parameters that affect how fast it converges to agood result:

Charge is a screen size related force value for each nodes. Ifa node is an aggregated object, its charge is equal to thesum of all nodes it is grouping. These values are used inthe Coulomb’s repulsion physical law: higher their value,more disperse the nodes are in the view.

Spring is an attraction force present only between two con-nected nodes. The value is used by the force-based algo-rithm in the Hooke’s attraction physical law. There is nodifference in the value of this parameter when a node isconnected to an aggregated node.

Damping is a value applied after the charge and spring forcesare calculated: diminishing the force to calculate the newposition of a node. It can be used by the analyst to makethe algorithm converge faster, or to stop it by affectingnodes position.

These parameters are interactively configured through slid-ers. Figure 5 depicts how charge and spring parameters affectthe position of the nodes. They are changed by decreasing thevalue of charge, making nodes get closer, or decreasing the sizeof the spring, making only connected nodes get closer. The an-alyst is able to move nodes using the mouse during the algo-rithm execution. This is a very important feature since it mayfor example be important to the analyst to organize nodes ac-cording to the their respective locations in the physical world(e.g., machines being on the north of the country would be puton the top of the screen while those being on the south of thecountry would be put on the bottom of the screen) or to anyother convention that makes sense for the situation under in-vestigation. Furthermore, thanks to the dynamic layout algo-rithm, whenever a node is moved by the analyst, all his neigh-bors seamlessly follow this node and hence the graph alwaysremains well organized.

charge springA

springchargeB

springchargeC

Figure 5. Three situations on how charge and spring parameters affectthe layout of the nodes.

5. Case studies and resultsIn this section, we present two scenarios that illustrate the

potential of our visualization technique. The traces used inthese case studies were obtained using SMPI [11] in Section 5.1and the SimGrid simulation toolkit [9] in Section 5.2.

5.1. NAS-DT Benchmark Analysis

We use the NAS-DT class A benchmark with the WhiteHole (WH) algorithm to illustrate how the topology-based viewcan be used to detect performance issues in the network. Thetop-screenshot shown in Figure 6 shows the topology of the re-source allocation used for the experiment: two homogeneousand interconnected clusters (Adonis, on the left, and Griffon,on the right) each with eleven hosts. Processes are allocatedsequentially, starting on the hosts of Adonis cluster.

The top-screenshot of Figure 6 shows the bandwidth usedby the DT-A considering the whole execution time. We cansee that the links interconnecting the two clusters are almostsaturated, suggesting that this might be limiting the benchmarkexecution. Further investigation with a smaller time slice (thethree smaller screenshots on the bottom) in the beginning, mid-dle and the end of the execution confirms that the interconnect-ing links are saturated most of the time.

time slice

time slice time slicetime slice

Figure 6. Four topology-based views showing the network resourceutilization in different time slices of the NAS-DT class A White Holebenchmark executed with an ordinary host file.

If we consider more carefully the White Hole algorithmcommunication pattern and the network topology of resourcesfor this experiment (Figure 6), we can conclude that the se-quential process allocation is not the best deployment. In fact,if we better explore the locality of the communications, we canobtain an overall performance improvement. The communica-tion locality is easy to define for the NAS black hole algorithmby placing the forwarders processes close to the data sources,reducing the communication path and avoiding the intercon-nection between the two clusters.

To confirm the locality assumption through the topology-based visualization, we execute again the application using thenew deployment. The top-screenshot of Figure 7 already showsa reduced utilization of the links interconnecting the two clus-ters, which is expected since we are exploring the communi-cation locality within the clusters. The small network utiliza-tion in the cluster interconnection is caused by the beginning ofthe white hole algorithm execution (as shown by the leftmostscreenshot in the bottom of the same figure), when the data forthe first levels of white hole hierarchy are being transmitted.The center and rightmost topologies on the bottom shows theresource utilization for the middle and end of the execution. Wecan see that the network contention is now placed on the smallnetwork links on each of the clusters.

Using the topology-based view, we have reduced the execu-tion time of the NAS-DT class A with the white hole algorithmby 20% with the new deployment. In this small and known sce-nario, such an improvement was expected because of the local-ity characteristic of the white hole algorithm and the topologyused. Yet, such phenomenon is non-trivial and yet perfectlyreflected by such topology-based visualization. The same be-havior would be very difficult to detect using other approaches,such as Gantt-charts visualization and any other technique thathas no topological view.

time slice

time slice time slicetime slice

Figure 7. Four topology-based views showing the network resourceutilization in different time slices of the NAS-DT class A White Holebenchmark executed with a host file designed to explore communica-tion locality.

5.2. Non-Cooperative Master Worker Applications

We propose to study the behavior of two master-worker ap-plications competing for resources on a grid. The platform isa realistic model of Grid5000 [7] (with 2170 computing hosts)and the two application servers use the bandwidth-centric opti-mal strategy [4]. More precisely, every time a master communi-cates a task to a worker, it evaluates the worker’s effective band-width and uses this value to prioritize workers’ requests: whenseveral workers request some work, the one with the largestbandwidth is served in priority. Every worker has a prefetchbuffer of three tasks that it tries to maintain full to minimize hisidleness. In the situation we investigate, the first application isCPU bound while the second has a slightly higher communica-tion to computation ratio. Hence, we expect to observe the fol-lowing phenomenon: (1) The first application should achievean overall better resource usage than the second one. (2) Wecan expect a form of locality from the second application sinceit will send tasks in priority to workers that have a good band-width. (3) Although the two applications do not originate fromthe same sites, they may interfere on computing resources.

Clusters

Sites

Hosts

Grid

Figure 8. Four different levels of spatial aggregation of the Grid5000platform correlating host power, resource usage of both master workerapplications and the underlying network topology (for a fixed timeinterval).

Figure 8 illustrates four different levels of spatial aggrega-tion of the Grid5000 platform: no aggregation at all with 2170computing hosts, aggregation of all hosts belonging to the samecluster, to the same site, and to the whole grid. These views rep-resent the behavior of the whole platform for a given time sliceof the previous scenario. Although none of the three expectedphenomena (resource usage in favor of the CPU-bound appli-cation, locality of the application, interference between the two

applications) is visible in the host level representation, they arevery visible at the cluster and site level. The site level enables toquantify perfectly how much the CPU-bound performed com-pared to the network-bound application. This illustrates howessential the multi-scale aggregation technique is to the analy-sis. The connections we have manually drawn among the dif-ferent views to show the aggregation illustrate the fact that lay-out is smooth when aggregating, preventing the analyst to getconfused when changing scale2.

Clusters

Sites Grid

time slice time slice time slice time slicet0 t1 t2 t3A

BC

Figure 9. Evolution across time of platform usage at different scales.Although the sequence of screenshots hinders the perception of evo-lution compared to a real animation where the graphs are laid out ofeach others, the workload diffusion across time is visible.

Figure 9 illustrates another feature of the tool, which is theability to animate through time a given view to follow the tem-poral evolution of workload distribution. Interestingly, even ifthe first application is computation bound, one can notice thatits resource usage is not uniform through the whole platform.Some sites and cluster are assigned work before others. Forexample site B is filled quickly in [t0, t2] whereas site C has towait until time t2 before starting to receive work units. Thiscan easily be explained by the bandwidth-centric strategy ofthe servers. Another strategy that would not take such charac-teristics into account such a simple FIFO mechanism would notexhibit such locality and would exhibit an (inefficient) uniformresource usage all over the time.

6. ConclusionWith the advent of very large scale distributed systems

through complex interconnection network, phenomenon suchas locality and resource congestion have become more andmore critical to study and understand the performance of par-allel applications. Yet, no visualization tool enables either tohandle in a scalable way such workload or to provide deephindsight into such issues. We think this type of tool enablesan interactive and exploratory analysis, being adaptable to thevariety of situations that can be investigated. In this article, weexplain how we have built a graph-based visualization meet-ing the previous requirements. We have implemented the tech-nique in an open-source visualization tool called VIVA3 andwhich enables to study the correlation between quantities at aspatial and temporal level. The multiscale capability of thisvisualization allows the analyst to select the adequate level ofdetails and should be put in relation to what has been done for

2A video footage demonstrating the ease of interaction and the fluidity ofsuch visualization is available at http://github.com/schnorr/viva.

3http://github.com/schnorr/viva/

treemaps [32]. Our new visualization has the same aggrega-tion features but also allows to display topological information.Such multiscale capability is also essential to achieve a scalablevisualization both in terms of interactivity and semantic mean-ing. Yet, it is not sufficient in itself. The ability to aggregateand disaggregate dynamically groups of resources requires toadjust the layout of the graph, which we have done using a dy-namic force-directed graph layout mechanism. Such algorithmallows the analyst to reorganize easily as well readjust the lay-out in a very efficient way.

Although we have shown on non-trivial examples the effec-tiveness of our proposal, some issues still need to be addressed.• Although the technique we use for aggregating CPU re-

sources is very effective and meaningful since hosts are in-dependent resources, using the same technique for aggre-gating links is more questionable. Indeed, communicationflows typically span several network links and summingnon independent resource usage leads to hardly explain-able values. Therefore, although locality can be inves-tigated, network saturation and bottlenecks are currentlydifficult to emphasize in aggregated views.• Aggregating a large amount of values into a single object

leads to an important loss of information, which may beharmful. It would thus be interesting to provide additionalinformation (e.g., statistical indicators like the varianceour the median) that would allow the analysis to knowthat particular care should be taken to specific areas thatdeserve further investigation.• Currently only a few graphical objects are available in

VIVA. Increasing graphical object flexibility (e.g., pie-charts, histograms, . . . ) would allow to display other kindof information like process states, hardware counters orresource energy consumption. The philosophy of toolslike ggplot2 [38] that rely on a grammar of graphics toallow the expression of complex views is extremely inter-esting and is being considered in our future directions.

AcknowledgmentsThis work is partially funded by the project entitled SONGS

(Simulation of Next Generation Systems) under the contractnumber ANR-11-INFRA-13 of the Agence Nationale de laRecherche (ANR).

References[1] C. Aguerre, T. Morsellino, and M. Mosbah. Fully-

Distributed Debugging and Visualization of DistributedSystems in Anonymous Networks. In Proceedings ofthe International Conference on Information Visualiza-tion Theory and Applications, pages 764–767, 2012.

[2] G. Aguilera, P. Teller, M. Taufer, and F. Wolf. A Sys-tematic Multi-Step Methodology for Performance Analy-sis of Communication Traces of Distributed Applicationsbased on Hierarchical Clustering. In Parallel and Dis-tributed Processing Symposium (IPDPS), April 2006.

[3] J. Barnes and P. Hut. A Hierarchical 0 (N log N) Force-Calculation Algorithm. Nature, 324:4, 1986.

[4] O. Beaumont, L. Carter, J. Ferrante, A. Legrand, andY. Robert. Bandwidth-Centric Allocation of Indepen-dent Tasks on Heterogeneous Platforms. In Interna-tional Parallel and Distributed Processing SymposiumIPDPS’2002. IEEE Computer Society Press, 2002.

[5] D. Becker, F. Wolf, W. Frings, M. Geimer, B. Wylie,and B. Mohr. Automatic Trace-Based Performance Anal-ysis of Metacomputing Applications. In Parallel andDistributed Processing Symposium, 2007. IPDPS 2007.IEEE International, pages 1 –10, March 2007.

[6] R. Bell, A. Malony, and S. Shende. ParaProf: A Portable,Extensible, and Scalable Tool for Parallel PerformanceProfile Analysis. In Euro-Par 2003 Parallel Process-ing, volume 2790 of Lecture Notes in Computer Science,pages 17–26. Springer, 2003.

[7] R. Bolze, F. Cappello, E. Caron, M. Dayde, F. Desprez,E. Jeannot, Y. Jegou, S. Lanteri, J. Leduc, N. Melab,G. Mornet, R. Namyst, P. Primet, B. Quetier, O. Richard,E.-G. Talbi, and I. Touche. Grid’5000: a Large Scale andHighly Reconfigurable Experimental Grid Testbed. Inter-national Journal of High Performance Computing Appli-cations, 20(4):481–494, Nov. 2006.

[8] H. Brunst, D. Hackenberg, G. Juckeland, and H. Rohling.Comprehensive Performance Tracking with Vampir 7. InM. S. Muller, M. M. Resch, A. Schulz, and W. E. Nagel,editors, Tools for High Performance Computing 2009,pages 17–29. Springer Berlin Heidelberg, 2010.

[9] H. Casanova, A. Legrand, and M. Quinson. SimGrid: aGeneric Framework for Large-Scale Distributed Experi-ments. In 10th IEEE Int. Conference on Computer Mod-eling and Simulation, 2008.

[10] M. Chimani, C. Gutwenger, M. Junger, K. Klein,P. Mutzel, and M. Schulz. The Open Graph DrawingFramework. In 15th International Symposium on GraphDrawing, 2007.

[11] P.-N. Clauss, M. Stillwell, S. Genaud, F. Suter,H. Casanova, and M. Quinson. Single Node On-LineSimulation of MPI Applications with SMPI. In Inter-national Parallel & Distributed Processing Symposium,Anchorange (AK), United States, May 2011. IEEE.

[12] K. Coulomb, A. Degomme, M. Faverge, and F. Trahay.An Open-source Tool-chain for Performance Analysis.Tools for High Perf. Computing 2011, pages 37–48, 2012.

[13] J. C. de Kergommeaux, B. de Oliveira Stein, and P. E.Bernard. Paje, an Interactive Visualization Tool for Tun-ing Multi-threaded Parallel Applications. Parallel Com-puting, 26(10):1253–1274, 2000.

[14] T. Desell, H. Narasimha Iyer, C. Varela, and A. Stephens.OverView: A Framework for Generic Online Visualiza-tion of Distributed Systems. Electronic Notes in Theoret-ical Computer Science, 107:87–101, 2004.

[15] J. Ellson, E. Gansner, L. Koutsofios, S. North, andG. Woodhull. Graphviz – Open Source Graph DrawingTools. In P. Mutzel, M. Junger, and S. Leipert, editors,Graph Drawing, volume 2265 of Lecture Notes in Com-puter Science, pages 594–597. Springer, 2002.

[16] F. Freitag, J. Caubet, and J. Labarta. A Trace-ScalingAgent for Parallel Application Tracing. In Conference onTools with Artificial Intelligence, pages 494–499, 2002.

[17] T. M. J. Fruchterman and E. M. Reingold. Graph Draw-ing by Force-Directed Placement. Software: Practice andExperience, 21(11):1129–1164, 1991.

[18] J. Gonzalez, J. Gimenez, and J. Labarta. AutomaticDetection of Parallel Applications Computation Phases.Parallel and Distributed Processing Symposium, Interna-tional, 0:1–11, 2009.

[19] M. Heath and J. Etheridge. Visualizing the Performanceof Parallel Programs. IEEE software, 8(5):29–39, 1991.

[20] A. Joshi, A. Phansalkar, L. Eeckhout, and L. K. John.Measuring Benchmark Similarity Using Inherent Pro-gram Characteristics. IEEE Transactions on Computers,55:769–782, 2006.

[21] L. V. Kale, G. Zheng, C. W. Lee, and S. Kumar. ScalingApplications to Massively Parallel Machines using Pro-jections Performance Analysis Tool. Future GenerationComputer Systems, 22(3):347–358, 2006.

[22] T. Karcher and V. Pankratius. Run-Time Automatic Per-formance Tuning for Multicore Applications. In E. Jean-not, R. Namyst, and J. Roman, editors, Euro-Par 2011Parallel Processing, volume 6852 of Lecture Notes inComputer Science, pages 3–14. Springer, 2011.

[23] A. Knupfer, B. Voigt, W. E. Nagel, and H. Mix. Visu-alization of Repetitive Patterns in Event Traces. In Pro-ceedings of the 8th International Conference on AppliedParallel Computing: State of the Art in Scientific Com-puting, pages 430–439. Springer-Verlag, 2007.

[24] A. Landge, J. Levine, A. Bhatele, K. Isaacs, T. Gam-blin, M. Schulz, S. Langer, P.-T. Bremer, and V. Pas-cucci. Visualizing Network Traffic to Understand thePerformance of Massively Parallel Simulations. Visual-ization and Computer Graphics, IEEE Transactions on,18(12):2467 –2476, dec. 2012.

[25] C. Lee, C. Mendes, and L. Kale. Towards Scalable Perfor-mance Analysis and Visualization Through Data Reduc-tion. In IEEE International Symposium on Parallel andDistributed Processing (IPDPS), pages 1–8. IEEE, 2008.

[26] B. P. Miller, M. D. Callaghan, J. M. Cargille, J. K.Hollingsworth, R. B. Irvin, K. L. Karavanic, K. Kun-chithapadam, and T. Newhall. The Paradyn Paral-lel Performance Measurement Tool. IEEE Computer,28(11):37–46, 1995.

[27] B. Mohr and F. Wolf. KOJAK – a Tool Set for Auto-matic Performance Analysis of Parallel Programs. Lec-ture Notes in Computer Science, 2790:1301–1304, 2003.

[28] K. Mohror and K. L. Karavanic. Evaluating Similarity-based Trace Reduction Techniques for Scalable Perfor-mance Analysis. In Proceedings of the Conference onHigh Performance Computing Networking, Storage andAnalysis, pages 55:1–55:12, New York, 2009. ACM.

[29] V. Pillet, J. Labarta, T. Cortes, and S. Girona. PAR-AVER: A Tool to Visualise and Analyze Parallel Code.In Proceedings of Transputer and occam Developments,volume 44 of Transputer and Occam Engineering, pages17–31, Amsterdam, 1995. IOS Press.

[30] L. M. Schnorr, G. Huard, and P. O. A. Navaux. VisualMapping of Program Components to Resources Repre-sentation: a 3D Analysis of Grid Parallel Applications. InProceedings of the 21st Symposium on Computer Archi-tecture and High Performance Computing. IEEE, 2009.

[31] L. M. Schnorr, G. Huard, and P. O. A. Navaux. Triva:Interactive 3D Visualization for Performance Analysis ofParallel Applications. Future Generation Computer Sys-tems, 26(3):348–358, 2010.

[32] L. M. Schnorr, G. Huard, and P. O. A. Navaux. A Hierar-chical Aggregation Model to Achieve Visualization Scal-ability in the Analysis of Parallel Applications. ParallelComputing, 38(3):91 – 110, 2012.

[33] L. M. Schnorr, A. Legrand, and J.-M. Vincent. Detec-tion and Analysis of Resource Usage Anomalies in LargeDistributed Systems through Multi-scale Visualization.Concurrency and Computation: Practice and Experience,24(15):1792–1816, 2012.

[34] M. Schulz, J. Levine, P. Bremer, T. Gamblin, and V. Pas-cucci. Interpreting Performance Data Across IntuitiveDomains. In Parallel Processing (ICPP), 2011 Interna-tional Conference on, pages 206–215. IEEE, 2011.

[35] E. Shaffer, D. A. Reed, S. Whitmore, and B. Schaeffer.Virtue: Performance Visualization of Parallel and Dis-tributed Applications. Computer, 32(12):44–51, 1999.

[36] F. Song, F. Wolf, J. Dongarra, and B. Mohr. AutomaticExperimental Analysis of Communication Patterns in Vir-tual Topologies. In Proceedings of the 2005 InternationalConference on Parallel Processing, pages 465–472. IEEEComputer Society, 2005.

[37] J. W. Tukey. Exploratory Data Analysis. Pearson, 1977.[38] H. Wickham. ggplot2: Elegant Graphics for Data Analy-

sis. Springer New York, 2009.[39] J. M. Wilson. Gantt charts: A Centenary Appreciation.

European Journal of Operational Research, 149(2):430–437, September 2003.

[40] O. Zaki, E. Lusk, W. Gropp, and D. Swider. TowardScalable Performance Visualization with Jumpshot. Int.Journal of High Performance Computing Applications,13(3):277–288, 1999.

Interactive analysis of large distributed systems with scalable topology-based visualization

Documents