PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

Post on 26-Mar-2015

216 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

Transcript

PARMON A Comprehensive Cluster Monitoring

System

PARMON Team

Centre for Development of Advanced

Computing, Bangalore, India

Contact: Rajkumar Buyya (buyya@computer.org)

Topics of Discussion

PARMON System Model & Architecture PARMON Server PARMON Client

PARMON Features and Services PARMON Installation and its Usage Monitoring with PARMON PARMON Integration with other products Conclusions and Future Directions

Motivations

Workstation clusters have off late become a cost-effective solution for HPC ? .

C-DAC’s PARAM OpenFrame is a large cluster of more than 40 Ultra-4 workstations interconnected through low-latency, high bandwidth communication networks.

Monitoring such huge systems is a tedious and challenging task since typical workstations are designed to work as a standalone system, rather than a part of workstation clusters.

System administrators require tools to effectively monitor such huge systems. PARMON provides the solution to this challenging problem.

CLUSTER HARDWARE

SOLARIS

Light Weight Protocols

Message Passing InterfacesC-MPI, PVM

SYSTEM MANAGEMENT

TOOLS

Parallel File

systemC-PFS Languages

C, F77, F90,

Development ToolsF90 IDE, DIVIA

APPLICATIONS

C-DAC HPCC Software Architecture

PARMON - Salient Features Online creation of Node and Group database Allows to monitor system activities at Component, Node, Group,

or entire Cluster level monitoring Designed using state-of-the-art Java technology Monitoring of System Components :

CPU, Memory, Disk and Network

Allows to monitor multiple instances of the same componet. Facility for definition of events and automatic notification Miscellaneous facilities : Message broadcast, Invocation of

system management commands (halt, reboot, etc.), System Information & Configuration

PARMON provides GUI interface for initiating activities/request and presents results graphically.

PARMON System Model

PARMONHigh-Speed

Switch

parmond

parmon

PARMON Serveron Solaris Node

PARMON Client on JVM

PARMON Implementation

Server Multithreaded using POSIX and Solaris Developed using C as it need to access system internals It is a stateless server

Client Developed using Java Java features are extensively used.. New Window is created for each client request, which interacts

with server Threads are used extensively to while creating online resource

utilization meters Dynamically configures with changes to node date base.

Setting up of PARMON

Server installation & invocation Binding to port Rights (requires root permission for full functionality) parmond or parmond <port-no>

(either at boot time or on-line) Needs to be loaded on all nodes to be monitored

Client installation & invocation Java based client (client machine can be PC/workstation

supporting JVM) CLASSPATH (pointing to classes.zip, parmon.jar) jar file (parmon.jar) java parmon or java parmon <port-no>

Setting up of PARMON

Server installation & invocation Binding to port Rights (requires root permission for full functionality) parmond or parmond <port-no>

(either at boot time or on-line) Needs to be loaded on all nodes to be monitored

Client installation & invocation Java based client (client machine can be PC/workstation

supporting JVM) CLASSPATH (pointing to classes.zip, parmon.jar) jar file (parmon.jar) java parmon or java parmon <port-no>

Monitoring System Activities and Resource Utilization

PARMON Launcher

Creation of Node Database

Node Deletion

Group Creation

Group Modification/Deletion

Resource Utilization at a Glance

Selection of Nodes/Group

CPU Usage Monitoring

Memory Usage monitoring

Disk/Network Usage Monitoring

Message Viewer (System logs)

Process activities

Kernel Data Catalog - CPU

Kernel Data Catalog - Memory

Kernel Data Catalog - Disk

Kernel Data Catalog - Network

Catalog of CPU Parameters

Component View - Physical

Component View - Logical

Message Broadcast

System Configuration

System Information

Issuing Commands : halt, shutdown, etc.

Node Diagnostics - Online (SunVTS)

Online Help

PARMON Integration with other Products

PARMON can send resource utilization information to any other product if protocols are made available

PARAM online bulletin board

parmond

Node 1

Node N

Conclusions and Future Directions

PARMON successfully used in monitoring PARAM OpenFrame Supercomputer, which is a cluster of 48 Ultra-4 workstations running SUN-Solaris operating system.

Portable across platforms supporting Java Comprehensive monitoring support and GUI PARMON supports Solaris and Linux clusters and

planned for supporting NT clusters. Can easily be extended to support web-based monitoring

of clusters, by creating a interface server (running on web-server) between client and PARMON server running on cluster nodes.

Thank YOU

?

top related