Top Banner
N. Xiong@ GSU Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University
49

N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

Jan 03, 2016

Download

Documents

Darren Powers
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 1

Chapter 05

Clustered Systems for

Massive Parallelism

N. Xiong

Georgia State University

Page 2: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 2

Chapter 05

Review and Introduction

Page 3: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 3

Chapter 05

Design Objectives of Clusters and MPPs Cluster and MPP System Architectures Design Principles of Clustered Systems Multiple Job Scheduling and

Management Virtual Clustering and Resource

Provisioning Homework Problems

Chapter 04 Main Contents

Page 4: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 4

Chapter 05

Scalability Packaging Control Homogeneity Security

Design Objectives of Clustered Systems

Page 5: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 5

Chapter 05

Design Objectives of Clustered Systems

Page 6: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 6

Chapter 05

Fundamental Cluster Design Issues

Scalable Performance Single System Image Availability Support Cluster Job Management Internode Communication Fault Tolerance and Recovery Growth of Servers in HPC and

HTC Systems

Page 7: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 7

Chapter 05

Resource-Sharing in Cluster Systems

Page 8: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 8

Chapter 05

An Idealized Cluster Architecture

Conventional databases and OLTP monitors offer users a desktop environment

Supports parallel programming based on standard languages and communication libraries

A user-interface subsystem combines the advantages of the Web interface and the windows GUI

Page 9: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 9

Chapter 05

Node Architectures and System Packaging

Two types of cluster nodes compute nodes service nodes

Page 10: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 10

Chapter 05

Compute Node Examples

Page 11: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 11

Chapter 05

Modular Packaging of IBM BlueGene/L System

Page 12: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 12

Chapter 05

Cluster System Interconnects

Page 13: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 13

Chapter 05

High-Bandwidth Interconnects

Page 14: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 14

Chapter 05

An InfiniBand Cluster Interconnection Network

Page 15: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 15

Chapter 05

High-bandwidth Interconnects in Top-500 Systems

Page 16: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 16

Chapter 05

Hardware, Software, and Middleware Support

Page 17: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 17

Chapter 05

Design Principles of Clusters

Single-System-Image (SSI ) Features Single System Single Control Symmetry Location Transparent

Page 18: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 18

Chapter 05

Design Principles of Clusters

Single-System-Image Layers Application Software Layer Hardware or Kernel Layer Middleware Layer

Page 19: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 19

Chapter 05

Design Principles of Clusters

Single-System-Image Composition Single Entry Point Single File Hierarchy Single I/O, Networking, and Memory

Space Other Desired SSI Features

Page 20: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 20

Chapter 05

Single Entry Point

Page 21: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 21

Chapter 05

Single File Hierarchy

It is persistent. It is fault tolerant to some

degree. Network File System (NFS)

and Andrew File System (AFS).

Page 22: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 22

Chapter 05

Single File Hierarchy

Page 23: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 23

Chapter 05

Single I/O, Networking, and Memory Space

Single Input/Output Single Networking Single Point of Control Single Memory Space

Page 24: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 24

Chapter 05

Single I/O, Networking, and Memory Space

Page 25: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 25

Chapter 05

An Example

Page 26: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 26

Chapter 05

Other Desired SSI Features

Single Job Management System

Single User Interface Single Process Space

Page 27: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 27

Chapter 05

Middleware Support for SSI Clustering

Page 28: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 28

Chapter 05

High Availability Through Redundancy

Reliability Availability Serviceability

Page 29: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 29

Chapter 05

Availability and Failure Rate

Page 30: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 30

Chapter 05

Availability Values of Several Representative Systems

Page 31: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 31

Chapter 05

Redundancy Techniques

Page 32: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 32

Chapter 05

Fault-Tolerant Cluster Configurations

Hot Standby Mutual Takeover Fault-Tolerance

Page 33: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 33

Chapter 05

Recovery Schemes

Backward recovery Forward recovery: in real-

time systems

Page 34: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 34

Chapter 05

Checkpointing and Recovery Techniques

Kernel, Library, and Application Levels Checkpoint Overheads Choosing an Optimal Checkpoint Interval

Page 35: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 35

Chapter 05

Checkpointing Parallel Programs

Page 36: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 36

Chapter 05

Cluster Job Scheduling and Management

Cluster Job Management Issues A user server A job scheduler A resource manager

Page 37: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 37

Chapter 05

Cluster Job Types

Serial jobs Parallel jobs Interactive jobs Batch jobs Foreign jobs

Page 38: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 38

Chapter 05

Multi-Job Scheduling Schemes

Page 39: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 39

Chapter 05

Share Cluster Nodes

Dedicated Mode Space Sharing

Time Sharing

Page 40: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 40

Chapter 05

Migration Schemes Issues

Node Availability Migration Overhead Recruitment Threshold:

the amount of time a workstation stays unused before the cluster considers it an idle node

Page 41: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 41

Chapter 05

Virtual Clustering and Resource Provisioning

Page 42: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 42

Chapter 05

Five Virtual Cluster Research Projects

Page 43: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 43

Chapter 05

Live VM Migration and Cluster Management

Page 44: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 44

Chapter 05

Effect by Live Migration

Page 45: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 45

Chapter 05

Dynamic Virtual Resource Provisioning

Page 46: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 46

Chapter 05

Autonomic Adaptation of Virtual Environments

Page 47: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 47

Chapter 05

Some References and Further Reading

Page 48: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 48

Chapter 05

Homework Problems

Page 49: N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

N. Xiong@ GSU Slide 49

Chapter 05

Homework Problems