Top Banner
UAB Dynamic Monitoring and Tuning in Multicluster Environment Genaro Costa, Anna Morajko, Paola Caymes Scutari, Tomàs Margalef and Emilio Luque Universitat Autònoma de Barcelona Paradyn Week 2006 March 2006
22

UAB Dynamic Monitoring and Tuning in Multicluster Environment Genaro Costa, Anna Morajko, Paola Caymes Scutari, Tomàs Margalef and Emilio Luque Universitat.

Dec 27, 2015

Download

Documents

Cody Gregory
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: UAB Dynamic Monitoring and Tuning in Multicluster Environment Genaro Costa, Anna Morajko, Paola Caymes Scutari, Tomàs Margalef and Emilio Luque Universitat.

UABDynamic Monitoring and Tuning in

Multicluster Environment

Genaro Costa, Anna Morajko, Paola Caymes Scutari, Tomàs Margalef and Emilio Luque

Universitat Autònoma de Barcelona

Paradyn Week 2006March 2006

Page 2: UAB Dynamic Monitoring and Tuning in Multicluster Environment Genaro Costa, Anna Morajko, Paola Caymes Scutari, Tomàs Margalef and Emilio Luque Universitat.

2

Outline

Introduction Multicluster Systems Applications on Wide Systems MATE New Requirements Design Conclusions

Page 3: UAB Dynamic Monitoring and Tuning in Multicluster Environment Genaro Costa, Anna Morajko, Paola Caymes Scutari, Tomàs Margalef and Emilio Luque Universitat.

3

IntroductionSystem performance

New problems require more computation power. Performance is a key issue.

New wide systems are built over the available resources and the user does not have total control of where the application will run.

It became more difficult to reach high performance and efficiency for these wide systems.

Page 4: UAB Dynamic Monitoring and Tuning in Multicluster Environment Genaro Costa, Anna Morajko, Paola Caymes Scutari, Tomàs Margalef and Emilio Luque Universitat.

4

Introduction (II)

To reach performance goals, users need to find and solve bottlenecks.

Dynamic Monitoring and Tuning is a promising approach.

With dynamic systems’ properties, efficient resource use is hard to reach even for expert users.

Page 5: UAB Dynamic Monitoring and Tuning in Multicluster Environment Genaro Costa, Anna Morajko, Paola Caymes Scutari, Tomàs Margalef and Emilio Luque Universitat.

5

Multicluster Systems

New systems are built using existing resources. Examples are NOW and HNOW linked with multistage network interconnections.

Intra cluster communications have different latencies than inter cluster communications.

Generally multiclusters built of clusters (homogenous or heterogeneous) interconnected by WAN.

Page 6: UAB Dynamic Monitoring and Tuning in Multicluster Environment Genaro Costa, Anna Morajko, Paola Caymes Scutari, Tomàs Margalef and Emilio Luque Universitat.

6

Multicluster Systems (II)

Each cluster can have its own scheduler and can be exposed either through a head node or by all nodes

Cluster B

Cluster CCluster A

Condor/LSF/PBS

Headnode

job

Page 7: UAB Dynamic Monitoring and Tuning in Multicluster Environment Genaro Costa, Anna Morajko, Paola Caymes Scutari, Tomàs Margalef and Emilio Luque Universitat.

7

Applications on Wide Systems

Hierarchical Master/Worker Applications

Raise the possibility of performance bottlenecks

Load imbalance problems Inefficient resource use Non-deterministic inter cluster

bandwidth

Worker

WorkerWorker

Worker

Worker

Worker WorkerWorker

Master

SubMaster

Sub Master explores

data locality

Common data aretransmitted once

Cluster A

Cluster B

Page 8: UAB Dynamic Monitoring and Tuning in Multicluster Environment Genaro Costa, Anna Morajko, Paola Caymes Scutari, Tomàs Margalef and Emilio Luque Universitat.

8

Applications on Wide Systems (II)

Hierarchical Master/Worker Applications

Sub master is seen as a high processing node by the master.

Work distribution from master to sub master should be based on:

Available bandwidth Computing power

These characteristics may have dynamic behavior.

Page 9: UAB Dynamic Monitoring and Tuning in Multicluster Environment Genaro Costa, Anna Morajko, Paola Caymes Scutari, Tomàs Margalef and Emilio Luque Universitat.

9

MATEMonitoring, Analysis and Tuning Environment

Dynamic automatic tuning of parallel/distributed applications.

Modifications

Instrumentation

User

TuningMonitoring

Tool

SolutionProblem /

Performance analysis

Performance data

Application development

Application

Execution

Source

Events

DynInst

Page 10: UAB Dynamic Monitoring and Tuning in Multicluster Environment Genaro Costa, Anna Morajko, Paola Caymes Scutari, Tomàs Margalef and Emilio Luque Universitat.

10Machine 3

Machine 2Machine 1

MATE (II)

Analyzer

AC

instr.

events

modif.

events

DMLibDMLibDMLib

Task1 Task2Task3

instr.

AC

Application Controller - AC Dynamic Monitoring Library - DMLib Analyzer

Page 11: UAB Dynamic Monitoring and Tuning in Multicluster Environment Genaro Costa, Anna Morajko, Paola Caymes Scutari, Tomàs Margalef and Emilio Luque Universitat.

11

MATE (III)

Each tuning technique is implemented in MATE as a “tunlet”, a C/C++ library dynamically loaded to the Analyzer process.

measure points – what events are needed

performance model – how to determine bottlenecks and solutions

tuning actions/points/synchronization - what to change, where, when

Analyzer

DTAPITunlet

Performance model

Measure points

Tuning point, action, sync

Tunlet

Performance model

Measure points

Tuning point, action, sync

Page 12: UAB Dynamic Monitoring and Tuning in Multicluster Environment Genaro Costa, Anna Morajko, Paola Caymes Scutari, Tomàs Margalef and Emilio Luque Universitat.

12

New Requirements Transparent process tracking

AC should follow application process to any cluster.

Lower inter cluster instrumentation communication overhead Inter cluster communications generally have high

latency and lower bandwidth.

Page 13: UAB Dynamic Monitoring and Tuning in Multicluster Environment Genaro Costa, Anna Morajko, Paola Caymes Scutari, Tomàs Margalef and Emilio Luque Universitat.

13

Transparent process tracking

System Service Machine or Cluster can have MATE enabled as

daemon that detects startup of new processes.

MATE EnabledMachine

AC

MATE EnabledMachine

AC

Taskn

startupdetection

MATE EnabledMachine

DMLibAC

Tasknattach

control

receivesAnalyzer

information

Analyzersubscription

DESIGN

Page 14: UAB Dynamic Monitoring and Tuning in Multicluster Environment Genaro Costa, Anna Morajko, Paola Caymes Scutari, Tomàs Margalef and Emilio Luque Universitat.

14

new ‘Task’

Transparent process tracking

Application plug-in AC can be binary packaged with application binary.

DMLib

ACTask

DMLib

ACTask

Remote Machine

DMLib

Remote Machine

ACAC

Taskn

detects Dyninst create

control

Analyzersubscription

Job submission

new ‘Task’

create

DESIGN (II)

Page 15: UAB Dynamic Monitoring and Tuning in Multicluster Environment Genaro Costa, Anna Morajko, Paola Caymes Scutari, Tomàs Margalef and Emilio Luque Universitat.

15

Lower communication overhead Smart event collection

Total application trace may generate much overhead.

Event aggregationRemote trace events should be aggregated to

trace event abstractions, saving bandwidth.

Inter Cluster Trace Event Routing

DESIGN (III)

Page 16: UAB Dynamic Monitoring and Tuning in Multicluster Environment Genaro Costa, Anna Morajko, Paola Caymes Scutari, Tomàs Margalef and Emilio Luque Universitat.

16

Analyzer Approaches Centralized

Requires tunlets modification to distinguish instrumentation data of local application processes.

Hierarchical Requires tunlets dismembering into local tunlets and

global tunlets.

Distributed Requires that tunlets instances located on different

Analyzer instances cooperate to tune an application.

Page 17: UAB Dynamic Monitoring and Tuning in Multicluster Environment Genaro Costa, Anna Morajko, Paola Caymes Scutari, Tomàs Margalef and Emilio Luque Universitat.

17

Machine B3Machine B1

Machine B2Machine A3

Machine A2Machine A1

Lower communication overhead (II)

Centralized Analyzer Approach

Analyzer

ACTask1

Task2

Task3

AC ACTask1

Task4

Task3

AC

AC

Task2

Event Router

Cluster BCluster A

DESIGN (IV)

Page 18: UAB Dynamic Monitoring and Tuning in Multicluster Environment Genaro Costa, Anna Morajko, Paola Caymes Scutari, Tomàs Margalef and Emilio Luque Universitat.

18

Machine A4

GlobalAnalyzer Machine B2

Local Performance Model Analysis

Hierarchical Analyzer Approach

Abstract Events

Machine B3Machine B1

Machine A3

Machine A2Machine A1

LocalAnalyzer

ACTask1

Task2

Task3

AC ACTask1

Task4

Task3

AC

Cluster BCluster A

LocalAnalyzer

DESIGN (V)

Page 19: UAB Dynamic Monitoring and Tuning in Multicluster Environment Genaro Costa, Anna Morajko, Paola Caymes Scutari, Tomàs Margalef and Emilio Luque Universitat.

19

Distributed Monitoring, Analysis and Tuning Environment Distributed Analyzer Approach

Cluster A Cluster B

Machine B2

Machine B3Machine B1

Machine A3

Machine A2Machine A1

Analyzer

ACTask1

Task2

Task3

AC ACTask1

Task4

Task3

AC

Cluster BCluster A

AnalyzerTunlet instancescooperation

DESIGN (VI)

Page 20: UAB Dynamic Monitoring and Tuning in Multicluster Environment Genaro Costa, Anna Morajko, Paola Caymes Scutari, Tomàs Margalef and Emilio Luque Universitat.

20

Conclusions and future work

Conclusions

Interference of instrumentation information on inter cluster communication should be minimal.

Process tracking enables MATE for multicluster systems.

Centralized Analyzer approach benefits tunlet developer but does not scale.

Distributed Analyzer approach scales but requires different model based analysis.

Page 21: UAB Dynamic Monitoring and Tuning in Multicluster Environment Genaro Costa, Anna Morajko, Paola Caymes Scutari, Tomàs Margalef and Emilio Luque Universitat.

21

Conclusions and future work (II)

Future Work

Development of new tunlets for distributed and hierarchical Analyzer approach.

Tuning based only of local instrumentation data. Semantics of aggregation for Instrumentation

events. Patterns of distributed tunlets cooperation. Scenarios of distributed Analyzer cooperation in

multiclusters.

Page 22: UAB Dynamic Monitoring and Tuning in Multicluster Environment Genaro Costa, Anna Morajko, Paola Caymes Scutari, Tomàs Margalef and Emilio Luque Universitat.

22

Thank you…