Top Banner
1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University [email protected] Ibrahim Haddad OSDL [email protected] Dr. Stephen L. Scott Oak Ridge National Laboratory [email protected] IEEE CLUSTER CONFERENCE September 26, 2005 -- Boston, MA, USA
210

1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University [email protected] Ibrahim.

Dec 11, 2015

Download

Documents

Kiley Trotter
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

1

Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR

Dr. Chokchai Box LeangsuksunLouisiana Tech University

[email protected]

Ibrahim HaddadOSDL

[email protected]

Dr. Stephen L. ScottOak Ridge National Laboratory

[email protected]

IEEE CLUSTER CONFERENCESeptember 26, 2005 -- Boston, MA, USA

Page 2: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

2

• Louisiana Tech University— Chokchai “Box” Leangsuksun— HA-OSCAR slides prepared by Venkata Kiriti Munganuru

• Oak Ridge National Laboratory— Stephen L. Scott— Thomas Naughton— John Mugler— Christian Engelmann

• Dell— Tong Liu

• Intel — Richard Libby

• Ericsson — Makan Pourzandi

• OSCAR — The entire OSCAR team, collaborators, and users.

• Open Source Development Labs— Ibrahim Haddad

Acknowledgments & Collaborations

Page 3: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

3

Agenda

1:00 – 1:05 Introduction Box

1:05 – 1:50 Clustering and HA Ibrahim

1:50 – 2:30 OSCAR Stephen

2:30 – 3:00 Break, Q&A

3:00 – 4:30 HA-OSCAR & Demo Box

Page 4: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

4

Clustering

Definitions

Beowulf

HA Clusters

Challenges

Ibrahim HaddadOSDL

[email protected]

Page 5: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

5

What is Clustering?Not yet another clustering definition!

• The use of multiple loosely coupled nodes to form what appears to users as a single highly available system.

Reliable & Fault - tolerant processor interconnect

Processor

Operating System

Middleware

Application

Reliable & Fault - tolerant processor interconnect

Processor

Operating System

Middleware

Application

Reliable & Fault-tolerant processor interconnect-

Processor

Operating System

Middleware

Application

Reliable & Fault-tolerant Disk Storage (RAID / SAN / …)

Page 6: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

6

The focus is not clustering. Clustering is just a means to an end.

• The focus is on scalability, high availability and reliability.

• Clustering is a technology which can be used to achieve the above.

Page 7: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

7

Goals – what do we want to achieve?

• High Availability

– Isolate or reduce the impact of a failure in the machine, resources, or device through redundancy and fail over techniques.

• Scalability

– Expand the capacity of servers in terms of processors, memory, storage, or other resources to support business growth

– Linear scalability

• Improved processing speed

• Efficient resource utilization

– Load Balancing and/or Traffic distribution

• Manageability

– Reduce system management costs through appropriate system management facilities

Page 8: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

8

High Availability Clusters

• HA-Clusters ensure service availability

– HA clusters have the ability to continue serving clients even if one (or more) server node fails and becomes unavailable

• HA clusters are not anymore regarded for traditional mission critical applications

– Business applications,

– Military,

– Bio/Pharma/medical,

– Telecom,

– Etc

Page 9: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

9

General HA Clustering Wish-list (1/2)

• Capacity scalability: Scale up any of the components in order

to achieve a linear increase

• Better resource utilization: Load balancing and/or traffic

distribution: Dynamic mechanisms that detect and react to

the unavailability, addition and removal of components in the

system

• Availability: provide HA services to its end users

• Operation and maintenance: Performed remotely without

affecting the system performance and availability

• Fast response time: Minimize serialized executions

Page 10: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

10

General HA Clustering Wish-list (2/2)

• Geographical Diversity

– Spread across several Points of Presence

– Support geographic mirroring

• Provide a single cluster IP Interface: Clients access the server

application through a single IP address

• Security: High security requirements depending on

deployment scenarios (open vs. closed networks)

Page 11: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

11

2 Main Challenges

• System Availability

• System Capacity

Page 12: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

12

• Availability defined as:

MTBF: Mean Time Between Failure

MMTR: Mean Time To Repair

• Example:

If a system offered a MTBF of 20,000 hours with a MTTR of 2 hours, then its availability would be 99.99%, “4-nines.”

System Availability – What does it mean?

MTBF

MTBF + MTTR

Page 13: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

13

Means to achieve higher availability

• Increase MTBF– Improve “quality” or “robustness”– Use redundancy / remove single points of failure

• Decrease MTTR– Streamline and accelerate fail-over

• Optimize boot / reboot time• Respond to fault conditions in real-time

– Make faults more granular in time and scope• Better to have many short faults than a few long ones• Limit scope of faults to smaller s/w and h/w components

Page 14: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

14

Degrees of HA

Source: Gartner

Page 15: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

15

System Availability Redundancy Models

3

4

5

6

1+0 1+1 N+M

System redundancy model

Availa

bili

ty (

num

ber

of

nin

e (

9))

Page 16: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

16

Beowulf Cluster

Page 17: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

17

Beowulf Cluster

• Beowulf is one approach to clustering COTS components to form a supercomputer

• A Beowulf cluster is a collection of COTS computers networked together to harvest high performance computing

• A typical Beowulf cluster has:– a single head node – multiple identical client nodes

Page 18: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

18

Beowulf Cluster Architecture

HeadNode: Entry point to the cluster Responsible for serving user requests Distributes jobs to compute clients via

scheduling and queuing software

Compute Clients Dedicated for computation

Communication: Using Ethernet network and/or fast connectivity: Myrinet, Infinitband, etc.

Head Node

Compute Nodes

Communication

End Users

Page 19: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

19

Beowulf Cluster – Advantages

1. COTS HW and SW components

2. Toward High Performance Computation (HPC)

3. Allows flexible configurations

4. Heterogeneous environment

5. Scalability (add more nodes)

6. Alternative choice for monolithic supercomputers

7. Good price/performance

Page 20: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

20

Beowulf Cluster – Issues

Head Node

Compute Nodes

Communication

End Users

• Single head node architecture– Vulnerable for SPOF

• Single communication path architecture– Vulnerable for SPOF

• Compute nodes are not accessible after above threat occurs, or when cluster services or OS upgrade takes place.

Page 21: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

21

HA & Linux Clusters

Page 22: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

22

Providing High Availability

• One technique of providing HA is by distributing

functionality across multiple nodes

• In response to HW and SW failures, HA systems

facilitate the rapid transfer of control from a faulty

CPU, peripheral, or software component to a

functional one, while preserving operations or

transactions in-progress at the time of failure.

Page 23: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

23

HA Supporting mechanisms

• HA Systems must support mechanisms for:– Error Detection– Damage Containment– Error Recovery– Fault Treatment (incl. dynamic reconfiguration)

• Box will discuss how HA-OSCAR support these mechanisms

• Assumption:We are dealing with systems comprising clusters of processors which share nothing.

Page 24: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

24

Redundancy in HA Systems

• Redundancy of key subsystems is important– Redundant Ethernet to ensure constant networking

connections

– Redundant power supplies

– Redundant disks

– etc.

Page 25: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

25

Other means to support HA

• Disk mirroring to ensure high levels of data reliability

• Hot swap (hot insert, hot remove, identity maintenance)

• Options for booting compressed and remotely hosted kernel images

• Support of compressed r/w and read-only Flash file systems

• Accelerated boot and daemon start times

• Fast shutdown / reboot

• Eliminating costly file system operations with journaling file systems

• Etc.

Page 26: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

26

Uptime

• Example from the telecom industry:

The main operator requirement is

No more than 30 seconds of service interruption per year

– Applies to the overall solution: hardware, software (OS and middleware), and the applications.

– Includes software and hardware upgrade and maintenance.

Page 27: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

27

High Availability of Cluster Hardware

• Hardware availability is very important

• In some cases, the platform may be available but not the application

– Software has bugs; it may cause applications to crash.

– Keeping redundancy in applications and maintaining processes state is complex.

• In telecom, the required uptime includes both platform and applications uptime

– End-users don’t care about running platforms when the required application is unavailable

Page 28: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

28

Cluster Concurrent Maintenance

• Allow (un)scheduled maintenance to be performed on a node of a cluster while other nodes continue to provide service without noticeable degradation.

• and doing it remotely.

Page 29: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

29

Failover

• Failover is the ability to detect problems in a node and to accommodate ongoing processing by routing applications to other nodes.

– This process may be programmed or scripted so that steps are taken automatically without operator/admin intervention

– Fundamental to failover is communication among nodes signaling that they are functioning correctly and reporting problems when they occur

Page 30: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

30

Characteristics of Failover

• Transparency

– Applications and users are automatically and transparently reconnected to another node/system

• Performance

– Depends on hardware configuration, instance recovery time and workload at time of failover

• Robustness

– The cluster should be able to survive multiple failures and still provide mission critical applications

Page 31: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

31

Linear Scalability

• We want clusters to support linear growth

– New processors can be added without disturbance

• Capacity grows linearly as processors are added

– Modular addition of HW and SW components happens

• If we double the number of processors, we should expect to almost double the throughput of the system.

2 4 6 8 10 12 14 16 18 …

Page 32: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

32

Highly Available Storage

• Storage is a critical and necessary part of a HA cluster

• Data should to be available to users even when a storage node fails or when errors occur with the distributed file system

• One popular technique is providing RAID support

– Other: NAS, SAN, etc.

Page 33: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

33

Online OS and Application Upgrades(required for teleco grade clusters)

• A requirement in mission critical environments

– Teleco, defense.

• When upgrading software (applications), old and new version of same process can coexist

– provide mechanisms to upgrade a running application

– The system will deal with the old and new running versions of the application simultaneously

• Applications will transfer states between old and new static process

Page 34: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

34

Manageability

• Single point of control

– Applications

– Software

– Hardware

– Data (Data movement, Security, Backup, Recovery, etc)

• Online configurability to reduce downtime • Capacity Control

– Overload protection by selectively rejecting jobs/requests when threshold is reached

Page 35: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

35

Heartbeat Mechanisms

• Linux-HA project

http://www.linux-ha.org

Release 2 is out.

Linux-HA BOF is on Friday

Router

MasterNode 1

1 1

1

2

2

2

MasterNode 2

Page 36: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

36

Non-Stop Operations – Summary (required for teleco grade clusters)

• No single point of failure

• No scheduled downtime

• In-service upgrade of software and hardware with

no disturbance to operation

• HA failover software

• Software Configuration Control

– Automatic restart of processes that originally executed on

a faulty processor on the ones that are working

Page 37: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

37

Fast Recovery of Applications

• Maximizing availability of applications and services is a priority

• If an application dies for some reason, it is very important to restart it ASAP

• Provide automatic failover and recovery capabilities with very minimum interruptions to the users.

• Monitoring mechanisms to detection of failures and trigger actions

Page 38: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

38

Redundancy in a HA Linux Cluster

Page 39: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

39

Redundancy Levels in a Cluster

Cluster Virtual Interface

Network Redundancy

Cluster Interface to theOutside World

Traffic Node Redundancy

Storage Redundancy(includes NFS redundancyand RAID 5)

Redundancy of Nodes Providing cluster services

1

2

3

4

3

Master Node A Master Node B

Traffic Node A Traffic Node N

Storage Node A Storage Node B

Firewall

1

2

4

55

• Redundant cluster services• NFS redundancy• Traffic node redundancy• Redundant image server• Network redundancy (routers and NIC)• Storage redundancy

Page 40: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

40

Ethernet Redundancy

The active Ethernet adapter provides the connection to the network.The standby Ethernet adapter is hiddenfrom applications and is know only tothe Ethernet redundancy daemon.

Active Standby Active Standby

The active Ethernet adapter has failed and Ethernet redundancy daemon designates the former standby Ethernetadapter as the new active adapter.

Before network adaptor swap After network adaptor swap

Page 41: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

41

1+1 active/standby

Public Network

ActiveMasterNode

StandbyMasterNode

Heartbeat Messages

Shared NetworkStorage

Dual Redundant Data Paths

Physically connectedbut not logically in use

Clients

Page 42: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

42

1+1 active/standby …

Public Network

ActiveMasterNode

StandbyMasterNode

Heartbeat Messages

Shared NetworkStorage

Failed Node

Now ActiveMaster Node

Physically connectedand in use

Physically connectedbut not logically in usedue to the failure of the master node

Clients

Page 43: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

43

1+1 active/active …

Public Network

ActiveMasterNode 1

ActiveMasterNode 2

Heartbeat Messages

Shared NetworkStorage

Physically connectedand in use providing redundant data pathfor master node 2

Clients

Physically connectedand in use providing redundant data pathfor master node 1

Page 44: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

44

N+M

Node A Node B Node C Node D

StandbyNode 2

StandbyNode 1

StandbyProcess

ActiveProcess

HA SharedStorage

State2

1

3

Page 45: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

45

N-way

Node A Node B Node C Node D

HA SharedStorage

State2

1 3

(1) State information is written to shared (2) State information is available on

shared storage(3) Traffic nodes have access to the state

information.

Page 46: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

46

HA Clusters: Nodes Topology & Redundancy Models

N Traffic Node, where N 2.Redundancy is at node levelRedundancy models: N active or N active and M standby, where N 2 and M 0

Redundant storage through the implementation of a HA NFS server.

N active N active / M standby

Redundant SpecializedStorage Nodes

Master Nodes:1+1 redundancy model:Active/Hot -standby or Active/Active

Storage:Specialized nodes: N = 2 Or use modified NFSimplementation

HA Tier

ServiceScalabilityTier

StorageTier

N Traffic Nodes, N 2.Redundancy is at node levelRedundancy models: N active, orN active and M standby, N and M 0

Redundant storage through the implementation of a HA NFS server.

N active N active / M standby

Redundant SpecializedStorage Nodes

Master Nodes:1+1 redundancy model:Active/Hot -standby or Active/Active

Storage:Specialized nodes: N = 2 or use modified NFSimplementation

HA Tier

ServiceScalabilityTier

StorageTier

2

Page 47: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

47

Challenges

Page 48: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

48

Challenges

• How to automatically build and boot the nodes?– Installation infrastructure

• Which (HA?) distributed file systems to use in the cluster?– File Systems

• What types of traffic distribution and load balancing mechanisms to use?

• How to build redundancy and to which extend?– Redundancy at the Network, File System , Disk and CPU Levels

• How to manage the cluster, remotely? – System management

• How to add/remove nodes without affecting the operations?– Dynamic reconfiguration

• How to achieve linear scalability?– Scalability

• How to secure cluster running on open networks?– Security

Page 49: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

49

Conclusion to the intro!

• From the start, the design of the software architecture of the application should take into account:

– Scalability

– Failure Handling

– Error (software bug) handling

– Future modification

– Hot software upgrade

• A complete HA solution requires close integration of:– HA hardware, – HA software solution, – HA middleware, and – Application software that can cause failover to redundant

systems.

Page 50: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

50

QUESTIONS ANSWERS

Ibrahim Haddad OSDL

[email protected]+1 503 906 1914

&

Page 51: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

51

Agenda

1:00 – 1:05 Introduction Box

1:05 – 1:40 Clustering and HA Ibrahim

1:40 – 2:30 OSCAR Stephen

2:30 – 3:00 Break, Q&A

3:00 – 4:30 HA-OSCAR & Demo Box

Page 52: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

52

OSCAR(Open Source Cluster Application Resources)

Dr. Stephen L. [email protected]

Page 53: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

53

What is OSCAR?

• Framework for cluster installation configuration and management

• Common used cluster tools• Wizard based cluster software installation

– Operating system– Cluster environment

• Administration• Operation

• Automatically configures cluster components• Increases consistency among cluster builds• Reduces time to build / install a cluster• Reduces need for expertise

Open Source Cluster Application Resources

Step 5

Step 8 Done!

Step 6

Step 1 Start…

Step 2

Step 3Step 4

Step 7

Page 54: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

54

OSCAR - the beginning

Page 55: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

55

• Extreme Linux• May 13, 1998• $29.95 CD

First cluster “distro”

Oak Ridge National Laboratory -- U.S. Department of Energy

Page 56: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

56

OSCAR Background

• Concept first discussed in January 2000

• First organizational meeting in April 2000– Cluster assembly is time consuming & repetitive

– Nice to offer a toolkit to automate

• First public release in April 2001

• Use “best practices” for HPC clusters– Leverage wealth of open source components

– Targeted modest size cluster (single network switch)

• Form umbrella organization to oversee cluster efforts– Open Cluster Group (OCG)

Page 57: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

57

Open Cluster Group

• Informal group formed to make cluster computing more practical for HPC research and development

• Membership is open, direct by steering committee– Research/Academic– Industry

• Current active working groups– OSCAR (core group)– Thin-OSCAR (Diskless Beowulf)– HA-OSCAR (High Availability)– SSS-OSCAR (Scalable Systems Software)– SSI-OSCAR (Single System Image)– BIO-OSCAR (Bioinformatics cluster system)

Page 58: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

58

OSCAR Core Participants

• Dell• IBM• Intel• Bald Guy Software• RevolutionLinux• INRIA• EDF• Canada’s Michael Smith Genome Sciences

Center

• Indiana University• NCSA• Oak Ridge National Laboratory• Université de Sherbrooke• Louisiana Tech Univ.• NEC Europe• Air Force Research Lab (USA)

November 2004

Page 59: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

59

Offer Variety of Flavors

HA-OSCAR, Thin-OSCAR, SSS-OSCAR, SSI-OSCAR,

SSS-OSCAR

• OSCAR is a snap-shot of best-known-methods for building, programming and using clusters of a “reasonable” size.

• To bring uniformity to clusters, foster commercial versions of OSCAR, and make clusters more broadly acceptable.

• Consortium of research, academic & industry members cooperating in the spirit of open source.

The OSCAR strategy

Open Source OSCAR with Linux

Commercially supported Value added

instantiations of OSCAR

Page 60: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

60

Today’s OSCAR

Page 61: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

61

OSCAR Components

• Administration/Configuration – SIS, C3, OPIUM, Kernel-Picker, NTPconfig cluster services (dhcp, nfs, ...)– Security: Pfilter, OpenSSH

• HPC Services/Tools– Parallel Libs: MPICH, LAM/MPI, PVM– Torque, Maui, OpenPBS– HDF5– Ganglia, Clumon, … [monitoring systems]– Other 3rd party OSCAR Packages

• Core Infrastructure/Management– System Installation Suite (SIS), Cluster Command & Control (C3), Env-Switcher, – OSCAR DAtabase (ODA), OSCAR Package Downloader (OPD)

Page 62: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

62

System Installation Suite (SIS)

Enhanced suite to the SystemImager tool.

Adds SystemInstaller and SystemConfigurator

• SystemInstaller – interface to installation, includes a stand-alone GUI – Tksis. Allows for description based image creation.

• SystemImager – base tool used to construct & distribute machine images.

• SystemConfigurator – extension that allows for on-the-fly style configurations once the install reaches the node, e.g. ‘/etc/modules.conf’.

Page 63: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

63

System Installation Suite (SIS)

• Used in OSCAR to install nodes– partitions disks, formats disks and installs nodes

• Construct “image” of compute node on headnode– Directory structure of what the node will contain– This is a “virtual”, chroot–able environment

/var/lib/systemimager/images/oscarimage/etc/

…/usr/

• Use rsync to copy only differences in files, so can be used for cluster management – maintain image and sync nodes to image

Page 64: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

64

Switcher

• Switcher provides a clean interface to edit environment without directly tweaking .dot files.– e.g. PATH, MANPATH, path for ‘mpicc’, etc.

• Edit/Set at both system and user level.

• Leverages existing Modules system

• Changes are made to future shells– To help prevent simple operator errors while making shell edits– Modules already offers facility for current shell manipulation, but no

persistent changes.

Page 65: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

65

OSCAR DAtabase (ODA)

• Used to store OSCAR cluster data

• Currently uses MySQL as DB engine

• User and program friendly interface for database access

• Capability to extend database commands as necessary.

Page 66: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

66

OSCAR Package Downloader (OPD)

Tool to download and extract OSCAR Packages.

• Can be used for timely package updates

• Packages that are not included, i.e. “3rd Party”

• Distribute packages with licensing constraints.

Page 67: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

67

C3 Power Tools

• Command-line interface for cluster system administration and parallel user tools.

• Parallel execution cexec – Execute across a single cluster or multiple clusters at same time

• Scatter/gather operations cpush/cget – Distribute or fetch files for all node(s)/cluster(s)

• Used throughout OSCAR and as underlying mechanism for tools like OPIUM’s useradd enhancements.

Page 68: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

68

C3 Building Blocks

• System administration• cpushimage - “push” image across cluster• cshutdown - Remote shutdown to reboot or halt cluster

• User & system tools• cpush - push single file -to- directory• crm - delete single file -to- directory• cget - retrieve files from each node• ckill - kill a process on each node• cexec - execute arbitrary command on each node

• cexecs – serial mode, useful for debugging• clist – list each cluster available and it’s type• cname – returns a node name from a given node position• cnum – returns a node position from a given node name

Page 69: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

69

C3 Power Tools

Example to run hostname on all nodes of default cluster:$ cexec hostname

Example to push an RPM to /tmp on the first 3 nodes$ cpush :1-3 helloworld-1.0.i386.rpm /tmp

Example to get a file from node1 and nodes 3-6$ cget :1,3-6 /tmp/results.dat /tmp

* Can leave off the destination with cget and will use the same location as source.

Page 70: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

70

Current OSCAR Release Notes (v4.1)

• Supported Distros:

– Red Hat 9

– Red Hat Enterprise Linux (RHEL) 3

• Supports both x86 and Itanium systems

– Fedora Core 2 support

– Mandrake 10.0 (experimental)

• Torque is included as the default scheduler (OpenaPBS can still be downloaded from OPD)

• DepMan / PackMan

– Resolves dependencies during “build node image”

– Used in install/uninstall packages

• APITest now part of OSCAR testing framework

• Versions of key software components:

– Ganglia 2.5.6-1B

– LAM-MPI 7.0.6-1

– MPICH-MPI 1.2.5

– Torque (PBS Replacement) 1.0.1

– MAUI 3.2.5

– SIS 3.3.2

Page 71: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

71

OSCAR Installation

Page 72: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

72

Server Installation and Configuration

• Install Linux on server machine (cluster head node)– workstation install w/ software development tools– 50+ page installation document!

• (quick install available)

• Download copy of OSCAR and unpack on server• Configure and install OSCAR on server

– readies the wizard install process

• Configure server Ethernet adapters– public– private

• Launch OSCAR Installer (wizard)

Page 73: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

73

OSCAR Wizard

Demo install

Demo add/delete node

Demo add/delete package

version 4.0

Page 74: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

74

OSCAR Wizard

Page 75: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

75

Step 0

Enables you to download additional packages

OPD – Oscar Package Downloader does download

OPDer – GUI front end to OPD

Page 76: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

76

OPDer

clumon and PVFS selected for download

Page 77: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

77

Alternate repositories, possibly a local machine

OPDer (2)

Page 78: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

78

Create your own flavor of cluster distribution

Select OSCAR packages to install.

Step 1

Page 79: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

79

Core packages are automatically selected for you and can not “unselect”

Download does not equal installation!

Packages downloaded with OPDer are selected for installation here

Package Selector

Page 80: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

80

Configure OSCAR packages that require special configuration tasks

Step 2

Page 81: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

81

Environment Switcher does configuration for default MPI use

make selection

Package configuration

Page 82: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

82

Install OSCAR Server (cluster head node) specific packages on cluster head node

May take a few minutes

Wait for button…

Step 3

Page 83: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

83

success

Install server packages

Page 84: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

84

Specify and build system image for client (compute) nodes

Step 4

Page 85: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

85

name your image

list of packages

package file location

disk partition file location

static or dynamic

halt, reboot, beep

enable multicast

Build image configure

Page 86: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

86

showing progress

Building image

Page 87: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

87

success

Building image finished

Page 88: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

88

Define client nodes

Step 5

Page 89: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

89

specify image name (from step 4 – or other saved image)

client IP domain name

client base name (oscarnodeXXX)

node count

starting index to append to base

padding to client names (3 = oscarnode009)

starting IP address

Subnet Mask

Default Gateway

Define client nodes

Page 90: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

90

success

Define client nodes

Page 91: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

91

in one operation – setup networking for all cluster client nodes

for first time in installation process we will “touch” the client nodes

Step 6

Page 92: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

92

machines named as specified in prior step 5

IP address as specified in prior step 5

Scan network for MACs or import from file

Setup network – initial window

Page 93: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

93

found MAC addresses will show here for network setup

stop collecting when done

Setup network - scanning network

Page 94: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

94

found and assigned all MAC addresses

Setup network – initial window

Page 95: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

95

reboot on own – “post install action” from step 4

or

manually reboot

Reboot Clients

Page 96: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

96

only after ALL clients have rebooted

runs “post install” scripts for packages that have them

cleanup and reinitialize where needed

Step 7

Page 97: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

97

success

Complete setup

Page 98: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

98

test suite provided to ensure that key cluster components are functioning properly

Step 8

Page 99: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

99

All Passed!!!

Test cluster setup

Page 100: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

100

Your OSCAR cluster is now ready to use

Quit OSCAR Wizard

Page 101: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

101

Demo install

Demo add/delete node

Demo add/delete package

version 4.0

OSCAR Wizard

Page 102: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

102

OSCAR Cluster Maintenance

Add Compute Nodes

Page 103: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

103

increase the number of compute nodes in the cluster

Add OSCAR Clients

Page 104: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

104

Operates in similar manner to steps 5, 6, and 7 in OSCAR installation

Behind the scene action differs somewhat…

step 5step 6

step 7

compare to standard install process:

Add OSCAR Clients

Page 105: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

105

Delete Compute Nodes

OSCAR Cluster Maintenance

Page 106: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

106

decrease the number of compute nodes in the cluster

Delete OSCAR Clients

Page 107: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

107

ready to select client(s) to delete

Delete OSCAR Clients

Page 108: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

108

client selected to delete

Delete OSCAR Clients

Page 109: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

109

success

Delete OSCAR Clients

Page 110: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

110

Demo install

Demo add/delete node

Demo add/delete package

version 4.0

OSCAR Wizard

Page 111: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

111

Install / Uninstall Packages

OSCAR Cluster Maintenance

Page 112: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

112

select to install or uninstall an OSCAR package

Install/UninstallOSCAR packages

Page 113: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

113

Install/UninstallOSCAR packages

Page 114: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

114

Open Cluster Groupwww.OpenClusterGroup.org/

OSCAR Home Pageoscar.OpenClusterGroup.org/

OSCAR Development sitesourceforge.net/projects/oscar/

Mailing [email protected]@lists.sourceforge.net

OSCAR Research supported by the Mathematics, Information and Computational Sciences Office, Office of Advanced Scientific Computing Research, Office of Science, U. S. Department of Energy, under contract No. DE-AC05-00OR22725 with UT-Battelle, LLC.

More OSCAR Information

Page 115: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

115

Agenda

1:00 – 1:05 Introduction Box

1:05 – 1:50 Clustering and HA Ibrahim

1:50 – 2:30 OSCAR Stephen

2:30 – 3:00 Break, Q&A

3:00 – 4:30 HA-OSCAR & Demo Box

Page 116: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

116

HA-OSCAR: High Availability - Open Source Cluster Application Resource

Dr. Chokchai Box [email protected]

Page 117: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

117

High Availability Needs

• High Availability is not anymore regarded only for traditional mission critical applications.

• Environment that needs High Availability:– Major shared resource centers

– Critical applications

• Scientific/medical/security, other services

– Long running HPC applications (aggregate performance)

– Telecomm services

– Inventory control and transaction processing systems

• HA-OSCAR can serve as springboard for many critical applications that demands high availability and high performance computing

Page 118: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

118

Our Goal with HA-OSCAR

• High Reliability and Availability for HPC cluster• Serviceability - Simplicity• Transparent - Preserve existing investments, No change

required, retrofitable • Production-quality software release• Robust security and fine-grained access control

Page 119: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

119

HA-OSCAR Overview 1/3

• Open source Production-quality clustering software that aims towards non-stop services in HEC environment

• Combined HA features and HPC capabilities to provide a Beowulf computing environment that is reliable and highly available

• The first field grade HA Beowulf cluster that provides high-availability and critical self-healing services

Page 120: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

120

HA-OSCAR Overview 3/3

• HA-OSCAR enhances serviceability and survivability

– serviceability aims toward effective means with which corrective and preventive maintenance can be performed

– survivability ensures system high availability

• HA-OSCAR alleviates unplanned downtime through component redundancy and Adaptive Self-Healing (ASH) mechanisms

– Replication, proactive monitoring, and recovery are HA-OSCAR’s essence.

Page 121: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

121

HA-OSCAR versus Beowulf cluster

Beowulf Beowulf versus HA-OSCAR Availability vs. Unavailability

Page 122: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

123

Beowulf Cluster

HeadNode: Entry point to the cluster Responsible for serving user requests Distributes jobs to compute clients via

scheduling and queuing software

Compute Clients Dedicated for computation

Communication: Using Ethernet network and/or fast connectivity: Myrinet, Infinitband, etc.

Page 123: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

125

Beowulf Cluster Management Systems (CMS)

• Provide a cluster installation and configuration tool set

• Some include management and monitoring tools

• Most objectives are– Reduces need for expertise

– Alleviates cluster installation & configuration complexities

• Widely used CMS packages are

– OSCAR

– ROCKS

– Scyld

– Warewulf

Page 124: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

126

Beowulf Cluster – Issues 1/2

HeadNode

Compute Nodes

Communication

What happens in the event of a Head nodeor communicationfailure?

Page 125: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

127

Beowulf Cluster – Issues 2/2

• Unavailability threat– Beowulf is a single head node architecture

• Vulnerable for single-point-of-failure– Beowulf is a single communication path architecture

• Vulnerable for single-point-of-failures– Compute nodes behind a firewall/local switch are not

accessible after above threat occurs– When cluster services or os upgrade takes place.

Page 126: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

128

Beowulf vs. HA-OSCAR Architecture 1/2

Beowulf HA-OSCAR

Page 127: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

129

Beowulf vs. HA-OSCAR Architecture 2/2

Beowulf HA-OSCAR

Page 128: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

132

HA-OSCAR Architecture

• HA-OSCAR (beta release) is an active/hot-standby architecture with automatic failover

• HA-OSCAR Major components:– Primary server

– Standby server

– Switches

– Multiple clients

Health Detection

Page 129: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

133

Head Node Stack

• Redundant H/W platform

• Intelligent sensors (optional)

• HPI wrapper (optional)

• Operating System (OS) hardware Interface

• OS Application Services

• Monitoring and Self-healing Core

• HA-OSCAR Management layer

Application Services

Monitoring & self-healing core

Page 130: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

134

Monitoring and Self-Healing

ServiceMonitor

ResourceMonitor

Healthchannel Monitor

Self-Healing Daemon

Monitored Services:

PBS, MAUI, NFS, and

HTTP

Monitored Resources:load_average, disk_usage, and free_memory

Monitored Interfaces: eth0,eth0:1

Page 131: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

135

HA-OSCAR Features

Page 132: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

136

HA-OSCAR Features

• Eliminates single-point-of-failures -> reduces unplanned downtime

• Provides service level fault tolerance

• Ease of installation with a GUI enabled Installation Wizard

• Self-configuration

• Cost-effective solution by dual head

• HA enabled HPC services (MAUI,PBS,SGE etc..)

• Image server powered by SIS for cloning and disaster recovery

• Fast failover and failback mechanisms

– Minimize application service outage

– Significant downtime improvement

• Webmin is used to introduce new services

• Provides customized policy configuration utility

Page 133: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

137

GUI Installation Wizard

Step1

Step2: create head imageStep3: clone image

Step4: config & Build Standby

Optional Step5: web admin to add/config more services

Page 134: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

138

Multi-Head Builder and Self-Configuration

• SystemImager facilitates upgrade and improve reliability with potential rollback and disaster recovery

• SystemImager is used to clone and build images• HA-OSCAR installation and deployment mechanism

employs SystemImager as a self-build, self-configuration tool to capture and deploy images

• SIS (System Installation Suit) component

• Updating Image on standby server• Editing the image itself: The image that has been

retrieved earlier will be automatically edited.

Page 135: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

139

Service Monitoring

• HA-OSCAR includes a default set of monitoring services to ensure critical services, hardware components and important resources are always available

– XINETD,HTTP,NFS,SNMP,MAUI,PBS, are critical services of a Beowulf cluster, whose failure makes entire cluster unavailable

– similarly system resources like free_memory ,disk space, cpu_load must be with in threshold level

Page 136: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

140

Service Monitoring – Implementation Notes

• Enhancement based kernel.org MON• Contributes to both detection and recovery policies of HA-

OSCAR• Associative and adaptive responses

• local restart• failover (simple or impersonated)

• net-SNMP• Contributes to HA-OSCAR detection policy• net-SNMP hooks in MON to monitor resources and critical

services of the system

• Monitoring Example• netsnmp–freespace.monitor => memory available• netsnmp-loadaverage.monitor => CPU load • netsnmp-proc.monitor => MAUI,HTTP,PBS

Page 137: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

141

Service Monitoring – ASH MON

• Adaptive Self Healing MON daemon monitors system for service availability

• MON policy scripts are Perl scripts of actions

— current release of HA-OSCAR monitors PBS, MAUI, NFS, HTTP services

• SNMP traps contributes MON to monitor resources of the system— currently release of HA-OSCAR monitors load average ,free memory and

disk space of the system

Page 138: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

142

Configuration Files

• MON conf file— /etc/mon/mon.cf

• MON alert scripts— /usr/lib/mon/alert.d/

—ex: /usr/lib/mon/alert.d/servicerestart.alert

• SNMP conf file— /etc/snmp/snmpd.conf

• Monitoring script – e.g. processes— /usr/lib/mon/mon.d/

—ex: /usr/lib/mon/mon.d/netsnmp-proc.monitor

• Rule of thumb: – Don’t edit config files if you don’t know what they are.

Page 139: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

143

MON/SNMP Configuration

• SNMP proc server monitoring critical processes– Monitoring & Detection

– Trigger alerts on failures

– Mon traps these alerts and perform proper action

– Recovery action

SNMP hooks in MON conf file

Page 140: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

144

MON/SNMP Configuration – Resource Watch

• SNMP server watching resource threshold

• Monitoring & Detection

– Trigger alerts at threshold level

– Mon trap the alerts and trigger mail alerts to administrator

– Ex :If CPU load is above 12% for 1 minute then it triggers an alert

• Recovery action

SNMP hooks in MON conf file

Page 141: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

145

Mon and net-SNMP

ServiceMonitor

ResourceMonitor

Healthchannel Monitor

Self-Healing Daemon (MON)

Monitored Services:

PBS, MAUI, NFS, and

HTTP

Monitored Resources:load_average, disk_usage, and free_memory

Monitored Interfaces: eth0,eth0:1

net-SNMP

Page 142: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

146

Mon configuration (continued)

• Monitoring primary server availability

• Alias IP watch server • HPC services monitoring

process• Respective recovery actions

Page 143: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

147

Snapshot of Mon alert example

• This is the servicerestart.alert file

• Following script is executed when a critical service like HTTP,MAUI,PBS dies

Page 144: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

148

Service/Resource Detection & Recovery

Working

FAILED

ALERT

Failover

Failed Services Detected

Recovery

Reach Maximum Counter(>3)

Resource Usage Reach Threshold

Standby server take control

Failback

Compare with Maximum Restart Counter (<=3)

Primary ServerPrimary Server

Servicerestart.alert

Shutdown.alert

Primaryserver_down.alert

Primaryserver_up.alert

Page 145: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

149

HA-OSCAR network interface configuration

Page 146: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

152

Downloading HA-OSCAR & More info

Page 147: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

153

Main HA-OSCAR websitexcr.cenit.latech.edu/ha-oscar

Page 148: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

154

Download page

Page 149: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

155

Survey page

Page 150: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

156

GPL

Page 151: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

157

HA-OSCAR tar ball

Page 152: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

158

Bug Report

Page 153: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

159

Installation Scenario: Provide step-by-step instructions of HA-

OSCAR installation process

Assumptions & Requirements HA-OSCAR Installation

Installation Walkthrough HA-OSCAR Monitoring and Configuration

Webmin

Page 154: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

160

Assumptions and Requirements

• HA-OSCAR 1.0 beta release – Supports Active-Hot standby dual head nodes

• Installation requirements• Redhat 9.0 Linux distribution • OSCAR 3.0 version

• HA-OSCAR 1.1 release• Support OSCAR 4.1

Page 155: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

161

Installation Process

• Adopt ease of build (self-build, config w/o OS loaded)

• 30 min – 1.5 hrs installation (retrofit)

• Take almost the same time for disaster recovery

• Webmin for new services

Step1

Step2: create head image

Step3: clone image

Step4: config & Build Standby

Optional Step5: web admin to add/config more services

Page 156: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

162

Installation Walkthrough 1/5

• Download HA-OSCAR http://xcr.cenit.latech.edu/ha-oscar • Extract the tar-file• Run ‘./haoscar_install eth0’ to

launch the following screen • It takes four simple steps to

install HA-OSCAR

Page 157: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

163

Installation Walkthrough 2/5

1. Installation of server packages to build an HA-OSCAR base.

2. The second step launches a fetch Image wizard by which Primaryserver image is grabbed and stored on Primaryserver.

— User can accept the defaults values in this window

— Finally user clicks the Fetch Image button and the image is fetched.

Page 158: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

164

Installation Walkthrough 3/5

3. Next step involves the configuration of standby server.

— The image name from the previous step (Serverimage) is selected to install on Standbyserver .

— Standbyserver’s local IP, public alias IP and gateway can be changed according to there network address.

— After entering all the fields, next, click on AddStandby Server button. 10.0.0.3

10.0.0.1

Page 159: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

165

Installation Walkthrough 4/5

4. Fourth step involves network setup (for PXE boot) to transfer the clone image on Primaryserver to remote Standbyserver.— First click on Setup Network Boot (A).

— Configure Standbyserver boot sequence to network boot and reboot the Standbyserver.

— Next Collect MAC Address (B) button is clicked to collect the MAC address of Standbyserver.

Note: For Build Autoinstall Floppy method refer to appendix 1

Page 160: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

166

Installation Walkthrough 5/5

After MAC address is collected, it will be associated to IP address (from previous step) of Standbyserver by clicking on Assign MAC to Node (E).

Then Configure DHCP Server (F) button is clicked to configure the DHCP on primary node to assign IP address to Standbyserver Setup Network Boot (G) is clicked and the Standbyserver is booted as PXE boot.

Once the Standbyserver is up, the last and final step complete installation is clicked which finishes the HA-OSCAR setup.

G

Page 161: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

167

Monitoring & Configuration with Webmin

• This procedure is optional. Normally, you don’t need to except for customization by advanced users only.

• HA-OSCAR Webmin is used:• Default configuration should be sufficient for a standard

Beowulf environment.• Only for advanced users• Available at http://localhost:10000• For customized detection channel configuration • Add/edit Services for monitoring• Customized Resource management

Page 162: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

168

HA-OSCAR Webmin 1/13

• User enters HA-OSCAR configuration monitoring screen by clicking HA-OSCAR icon

FF

Page 163: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

169

HA-OSCAR Webmin 2/13

• Detection channel configuration’ icon is to setup and configure both detection channels (ethX) between Primary server and Standby server.

Page 164: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

170

HA-OSCAR Webmin 3/13

• ‘Add/Edit Network interface’ icon is to add customized detection channel for Primaryserver.

Page 165: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

171

HA-OSCAR Webmin 4/13

• Clicking ‘Add a new interface’ link launches a window from which user can create a new interface.

Page 166: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

172

HA-OSCAR Webmin 5/13

• Example of adding new private virtual interface (eth0:1)– Name of the interface– Virtual IP address– Activate at boot – Commit settings

click here

Page 167: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

173

HA-OSCAR Webmin 6/13

• Similarly add customized public virtual interface.

FF

Page 168: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

174

HA-OSCAR Webmin 7/13

• Configuration of the HA-OSCAR (heartbeat) detection channel on the Primary server window is launched by accessing this icon

Page 169: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

175

HA-OSCAR Webmin 8/13

• Primary server network and detection channel configuration

– Name of the public virtual IP– Public virtual IP address– Name of the private virtual IP– Private virtual IP address– Private IP address– Commit settings

Page 170: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

176

HA-OSCAR Webmin 9/13

• HA-OSCAR Service Monitor’ icon is clicked to launch HA-OSCAR policy configuration main window

Page 171: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

177

HA-OSCAR Webmin 10/13

‘Monitoring Lists’ icon is to list

monitored service.

Page 172: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

178

HA-OSCAR Webmin 11/13

• ‘process_server’ watches the critical services running on primary_server (itself) to be up and running

• ‘loadaverage_server’ watches CPU load to be with In threshold level

• similarly ‘freespace_server’ monitors available memory/swap space

• To add a new service

Page 173: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

179

HA-OSCAR Webmin 12/13

• Adding new services– Name of the service – Monitoring time interval – Monitoring daemon– Default monitored services– Append to mon conf file – Duration in days– Event – Alert triggered

• This snapshot details ‘process_server’ monitoring policy with pbs_server, maui, http as default monitored processes. The same window is popped up without any populated data when add_service link is clicked.

Page 174: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

180

HA-OSCAR Webmin 13/13

• Same scenario is followed on Standby Server to add and configure customized ‘Detection channels’ and ‘services’

Page 175: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

181

Experiments and Test Results

Experiments Standbyserver Alert History Primaryserver Alert HistoryHA-OSCAR Measurements

Page 176: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

182

Experiment – HA-OSCAR Stack

• HA-OSCAR is successfully verified on a cluster system with OSCAR release 3.0 and RedHat 9.0

• Experimental cluster configuration:— Two dual Xeon server head nodes

— 1GB RAM each— 40GB hard drive with 2GB of free space— Two NIC cards

— 16 Intel client nodes — 512MB RAM — 40GB hard drive— Two NIC cards

— Network Switch

Page 177: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

183

Standby Server Alert History

• Testing Failover— Private Ethernet cable (eth0) of Primary server is unplugged — Login to the Primary server via the public IP and run MPI program— Client node gets 100% response from failover Standby server

• Testing Failback— Private Ethernet cable (eth0) of Primary server is plugged

— Client node pings (eth0) the Primary server and gets 100% response

Group Service Type Time Alert Args

1

primary_server Ping AlertMon Sept 29

21:28:07 2003

server_down.alert -

2

primary_server Ping upalertMon Sept 29

21:36:21 2003

server_up.alert -

Table shows an example of alert history log on Standbyserver

Page 178: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

184

Primary Server Alert History

Group Service Type Time Alert Args

1

primary_server ping alertMon Sept 29

21:36:17 2003

self_down.alert -

2

primary_server ping upalertMon Sept 29

21:36:26 2003

self_up.alert -

3

service_mon PBS alertMon Sept 29

21:35:16 2003

PBS.alert -

4

service_monPBS

serverupalert

Mon Sept 29 21:36:26

2003mail.alert -

Table demonstrates Primary server alert history

Page 179: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

185

HA-OSCAR Measurements

• 3-5 sec Manual failover time• 2 seconds poling interval (tunable)• 0.9% CPU usage at each monitoring interval

0

50

100

150

200

250

300

1 2 5 10 15 20 30 60

HA-OSCAR Mon polling interval (s)

HA

-OS

CA

R N

etw

ork

load in

Pack

ets

/Min

m

easu

red b

y

TC

Ptr

ace

Comparison of network usages for HA-OSCAR different polling sizes

Page 180: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

186

HA-OSCAR Availability Modeling: Present Availability Modeling and Analysis

Stochastic Petri Net modeling Comparison Results

Page 181: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

187

Reality Checks

• Great! We got HA Beowulf!

• But How much improvement?– The total uptime?– Performance?

• Analytical model and prediction– How many 9’s? (downtime per/year)– Stochastic Reward Net using SPNP package from

Duke U.

Page 182: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

188

HA-OSCAR SRN Model

Server sub-model

Switches

Compute nodes

Page 183: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

189

Server Sub-Model

P Server upP Server downFailoverP server repairFailback

S is up and readyS takes controlS Server downS repair

Page 184: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

192

Instantaneous Availability

Steady (A) = 99.993 (36 min) vs.

Beowulf (A) = 99.65 (30 hr)

Page 185: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

193

Availability Analysis

HA-OSCAR solution vs traditional BeowulfTotal Availability impacted by service nodes

90.580%

91.575%92.081% 92.251% 92.336% 92.387%

99.9896%

99.9951% 99.9962% 99.9966% 99.9968%

99.9684%

90.00%

91.00%

92.00%

93.00%

94.00%

95.00%

96.00%

97.00%

98.00%

99.00%

100.00%

Noda-wise mean time to failure (hr)

Avai

labi

lity

99.950%

99.955%

99.960%

99.965%

99.970%

99.975%

99.980%

99.985%

99.990%

99.995%

100.000%

Beowulf 0.905797 0.915751 0.920810 0.922509 0.923361 0.923873

HA-oscar 0.999684 0.999896 0.999951 0.999962 0.999966 0.999968

1000 2000 4000 6000 8000 10000

Model assumption:- scheduled downtime=200 hrs - nodal MTTR = 24 hrs- failover time=10s- During maintainance on the head, standby node acts as primary

Total availability analysis of HA-OSCAR versus Beowulf architecture

Page 186: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

194

Work in Progress

Page 187: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

195

Different flavors of HA-OSCAR

Monitoring HA-OSCAR Active-Hot Standby

HA-OSCAR 2+1 Multi-Active

(lab grade)

Pbs maui nfs httpd

SGE

NSF

nis

httpd

gmond ,gmetad

Heartbeat (3 sec)

CPU Fan Speed IPMI option IPMI option

CPU Temperature IPMI option IPMI option

CPU status IPMI option IPMI option

Memory bit error IPMI option IPMI option

Page 188: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

200

HA-OSCAR and GRID(Lab Grade)

Grid-enabled HA clusterHigh Availability architecture stack for grid

HA enabled grid architecture

Page 189: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

201

Grid enabled HA Cluster

• Grid computing allows:• Various organizations to work together to achieve

common goals and high performance operations • Provides local autonomy• Distributed computing

• Potential pitfall is single point of failure at head node

• Site unavailability • Reduces number of resources available

Page 190: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

202

HA Architecture Stack for Globus Grid

Operating System Applications

Cluster Software Stack

Grid Layer

HA-OSCAR Service and Job Level Monitoring

HA-OSCAR Policy based

recoverymechanism

Page 191: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

203

Critical Service Monitoring & Failover-Failback capability for site-manager

Site C

Site B

Site A

Standby HEAD Node

Primary Head Node

Service Nodes

GRID

HA-OSCAR

HA-OSCARModified Failover Aware Client

HA-OSCAR

HA-OSCAR

Client

Client submits MPI job

Site-Manager

HAOSCAR failover if

critical services

(Gatekeeper, gridFTP, PBS) die

Compute nodes

Stand-By

Page 192: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

204

Basic Components for Smart Failover

HA-OSCAR Smart Failover Architecture

`

Job Queue Monitor

Scheduler jobID to Globus

assigned jobID

mapper

Backup updater

ServiceMonitor

HW Health

Monitor

Monitoring Core Daemon

ResourceMonitor

Gatekeeper, GridFTP, PBS

Notify on critical event

Event Monitoring

Core Daemon

Notify system events

Notify on job addition & completion

mapping between the GjobID and the SjobID is the key information for transparent recovery

Event Generator

Scheduler

Page 193: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

205

In-progress Work – Security

Integrating DSI components with HA-OSCAR

Page 194: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

206

Security – Different types of attacks

• Denials of Service (DoS) • Impersonation • Exploits of Misconfiguration • Exploits of Implementation Flaws • Data Driven Attacks• Network Infrastructure Attacks

Page 195: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

207

Goals

• Design an architecture as a platform to support different security mechanisms for carrier class Internet servers running on a clustered system.

• Requirements:– Scalable, flexible, no single point of failure, no performance

bottleneck, supporting run time changes in security context and policy

• The platform must provide mechanisms to protect the system against:– External attacks: originating from Internet– Internal attacks:

• Break through a node in the cluster• O&M security• Attacks originating from Intranet

Page 196: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

208

Functionality

• Access control– Finer grained, wide range of operations

• Authentication– Between cluster nodes, and processes

• Confidentiality and integrity for communications– Securing distributed IPCs

• Auditing– Collection and analysis of alarms and warnings through

O&M

Page 197: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

209

DSI Overview

A primary Security Server (SS)

Multiple Security Managers (SM) (one per node)

SS and SMs communicate through an encrypted and authenticated channel

Security policy is enforced at kernel level

Primary Security Server Node

Node 1 Node 2 Node 3

DSMSS DSM DSM

Proc123 Proc978 Proc222

Ke r

ne l

Security Broker

Secondary

Data TrafficI nsi

de

th

e C

lus

ter

Security andO&M/IDS

Ou

tsi d

e th

e C

l us

ter

SS Security Server

SM Security Manager

AuthenticatedEncrypted Communications

SMSMSM

DSMDistributed Security Module

Page 198: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

210

DSI and HA-OSCAR

• One of the goals for 2005 is to integrate DSI components into HA-OSCAR to provide advanced security features.

Page 199: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

211

Distributed Security Infrastructure (DSI)

• Developed for Cluster Environment– Fine grained, Flexible, Adaptable

• High level of abstraction for access control– Separating administrative, network, computation into different

security zones

• Process level + User level – Kernel level module (DSM)– Real time checks based on the LSM hooksSELinux

Page 200: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

212

DSI – components (1/2)

• Security Server– Central point of management, policy holder

• Security Manager– Node based enforcement of policy

• Secure Communication Channel– Encrypted, authenticated. SS <-> SM

• Distributed Security Policy (DSP)– XML, rules spanning entire cluster

Page 201: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

213

DSP Architecture

Page 202: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

214

DSI-aware HA-OSCAR Architecture

Primary Security Server Node

Node 1 Node 2 Node 3

DSMSS DSM DSM

Proc123 Proc978 Proc222

Ke r

ne l

Security Broker

Secondary

Data TrafficI nsi

de

th

e C

lus

ter

Security andO&M/IDS

Ou

tsi d

e th

e C

l us

ter

SS Security Server

SM Security Manager

AuthenticatedEncrypted Communications

SMSMSM

DSMDistributed Security Module

Page 203: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

215

Current Status – DSI+HA-OSCAR

• Porting from DSM + LSM -> SELinux– 5 major classes

• DSP -> SELinux TE• Process and network mapping are done

• HA-OSCAR is ready to integrate• Design and Develop security provisioning – tricky

Page 204: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

216

Fault Tolerant Scheduling for Computational

Grid/Cluster environments

Page 205: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

217

Objectives

• To provide fault tolerance for jobs at cluster level.

• Retain the job run sequence in case of failure

Page 206: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

218

Architecture

HA-OSCAR Primary Server

PBS Job Queue

FAM Job event

Monitor

text

HA-OSCAR Backup Server

Get EventPBS Job Queue

Update on Job ADD/COMPLETE

Job Submission

Prologue

Job Run

Epilogue

Job Initialization

Perform Job Cleanup

Leads to

Ckpt & create & update restart job spec file

Update with Ckpt files and user

home dirs Job Spec Directory

JobID. Spec

JobID. Spec

QPS

Update

Remove job spec file after

completionHA-OSCAR Monitoring Daemon

Monitor Primary Server

Page 207: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

219

Demo Steps

• Submit MPI jobs through Torque• View job queue status on the primary • Simulate outage by bringing the network down• View the job queue status on the standby

Page 208: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

220

Final Thoughts

• It took us lot of work to arrive to our current results.

• HA-OSCAR is an Open Source project – open for contributions from all

• Several HA-OSCAR enabled works toward mission critical HPC clusters

• Please give us your feedback on how we can improve HA-

OSCAR and make it your preferred open source HA clustering stack

• Participation is open!

Page 209: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

221

Thank you. Feedback? Questions?

This slide show is available for download from: http://xcr.cenit.latech.edu/ha-oscar

Page 210: 1 Towards Highly Available, Scalable, and Secure HPC Clusters with HA-OSCAR Dr. Chokchai Box Leangsuksun Louisiana Tech University box@latech.edu Ibrahim.

222

Resources

• HA OSCAR xcr.cenit.latech.edu/ha-

oscar

• Open Cluster Group OpenClusterGroup.org

• OSCAR oscar.OpenClusterGroup.org

• OSCAR Development sourceforge.net/projects/oscar

• Latech CENIT cenit.latech.edu

• Open System Lab www.linux.ericsson.ca

Master slide show (LCI) is available at:

http://xcr.cenit.latech.edu/ha-oscar/papers.html