Top Banner
PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya ([email protected])
38

PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya ([email protected])

Mar 26, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

PARMON A Comprehensive Cluster Monitoring

System

PARMON Team

Centre for Development of Advanced

Computing, Bangalore, India

Contact: Rajkumar Buyya ([email protected])

Page 2: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

Topics of Discussion

PARMON System Model & Architecture PARMON Server PARMON Client

PARMON Features and Services PARMON Installation and its Usage Monitoring with PARMON PARMON Integration with other products Conclusions and Future Directions

Page 3: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

Motivations

Workstation clusters have off late become a cost-effective solution for HPC ? .

C-DAC’s PARAM OpenFrame is a large cluster of more than 40 Ultra-4 workstations interconnected through low-latency, high bandwidth communication networks.

Monitoring such huge systems is a tedious and challenging task since typical workstations are designed to work as a standalone system, rather than a part of workstation clusters.

System administrators require tools to effectively monitor such huge systems. PARMON provides the solution to this challenging problem.

Page 4: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

CLUSTER HARDWARE

SOLARIS

Light Weight Protocols

Message Passing InterfacesC-MPI, PVM

SYSTEM MANAGEMENT

TOOLS

Parallel File

systemC-PFS Languages

C, F77, F90,

Development ToolsF90 IDE, DIVIA

APPLICATIONS

C-DAC HPCC Software Architecture

Page 5: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

PARMON - Salient Features Online creation of Node and Group database Allows to monitor system activities at Component, Node, Group,

or entire Cluster level monitoring Designed using state-of-the-art Java technology Monitoring of System Components :

CPU, Memory, Disk and Network

Allows to monitor multiple instances of the same componet. Facility for definition of events and automatic notification Miscellaneous facilities : Message broadcast, Invocation of

system management commands (halt, reboot, etc.), System Information & Configuration

PARMON provides GUI interface for initiating activities/request and presents results graphically.

Page 6: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

PARMON System Model

PARMONHigh-Speed

Switch

parmond

parmon

PARMON Serveron Solaris Node

PARMON Client on JVM

Page 7: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

PARMON Implementation

Server Multithreaded using POSIX and Solaris Developed using C as it need to access system internals It is a stateless server

Client Developed using Java Java features are extensively used.. New Window is created for each client request, which interacts

with server Threads are used extensively to while creating online resource

utilization meters Dynamically configures with changes to node date base.

Page 8: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

Setting up of PARMON

Server installation & invocation Binding to port Rights (requires root permission for full functionality) parmond or parmond <port-no>

(either at boot time or on-line) Needs to be loaded on all nodes to be monitored

Client installation & invocation Java based client (client machine can be PC/workstation

supporting JVM) CLASSPATH (pointing to classes.zip, parmon.jar) jar file (parmon.jar) java parmon or java parmon <port-no>

Page 9: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

Setting up of PARMON

Server installation & invocation Binding to port Rights (requires root permission for full functionality) parmond or parmond <port-no>

(either at boot time or on-line) Needs to be loaded on all nodes to be monitored

Client installation & invocation Java based client (client machine can be PC/workstation

supporting JVM) CLASSPATH (pointing to classes.zip, parmon.jar) jar file (parmon.jar) java parmon or java parmon <port-no>

Page 10: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

Monitoring System Activities and Resource Utilization

Page 11: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

PARMON Launcher

Page 12: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

Creation of Node Database

Page 13: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

Node Deletion

Page 14: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

Group Creation

Page 15: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

Group Modification/Deletion

Page 16: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

Resource Utilization at a Glance

Page 17: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

Selection of Nodes/Group

Page 18: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

CPU Usage Monitoring

Page 19: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

Memory Usage monitoring

Page 20: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

Disk/Network Usage Monitoring

Page 21: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

Message Viewer (System logs)

Page 22: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

Process activities

Page 23: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

Kernel Data Catalog - CPU

Page 24: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

Kernel Data Catalog - Memory

Page 25: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

Kernel Data Catalog - Disk

Page 26: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

Kernel Data Catalog - Network

Page 27: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

Catalog of CPU Parameters

Page 28: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

Component View - Physical

Page 29: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

Component View - Logical

Page 30: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

Message Broadcast

Page 31: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

System Configuration

Page 32: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

System Information

Page 33: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

Issuing Commands : halt, shutdown, etc.

Page 34: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

Node Diagnostics - Online (SunVTS)

Page 35: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

Online Help

Page 36: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

PARMON Integration with other Products

PARMON can send resource utilization information to any other product if protocols are made available

PARAM online bulletin board

parmond

Node 1

Node N

Page 37: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

Conclusions and Future Directions

PARMON successfully used in monitoring PARAM OpenFrame Supercomputer, which is a cluster of 48 Ultra-4 workstations running SUN-Solaris operating system.

Portable across platforms supporting Java Comprehensive monitoring support and GUI PARMON supports Solaris and Linux clusters and

planned for supporting NT clusters. Can easily be extended to support web-based monitoring

of clusters, by creating a interface server (running on web-server) between client and PARMON server running on cluster nodes.

Page 38: PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya (buyya@computer.org)

Thank YOU

?