Top Banner
Ohio Supercomputer Center Cluster Computing Overview Cluster Computing Overview Summer Institute for Advanced Computing Summer Institute for Advanced Computing August 22, 2000 August 22, 2000 Doug Johnson, OSC Doug Johnson, OSC
20

Ohio Supercomputer Center Cluster Computing Overview Summer Institute for Advanced Computing August 22, 2000 Doug Johnson, OSC.

Jan 12, 2016

Download

Documents

Kenneth Fields
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Ohio Supercomputer Center Cluster Computing Overview Summer Institute for Advanced Computing August 22, 2000 Doug Johnson, OSC.

Ohio Supercomputer Center

Cluster Computing OverviewCluster Computing Overview

Summer Institute for Advanced ComputingSummer Institute for Advanced ComputingAugust 22, 2000August 22, 2000

Doug Johnson, OSCDoug Johnson, OSC

Page 2: Ohio Supercomputer Center Cluster Computing Overview Summer Institute for Advanced Computing August 22, 2000 Doug Johnson, OSC.

Cluster Computing Overview

2Ohio Supercomputer Center

OverviewOverview What is Cluster Computing Why Cluster Computing How Clusters Fit with OSC Mission When Did It All Start OSC 128 Processor SGI/Linux Cluster Clusters for Production HPC Environments

Page 3: Ohio Supercomputer Center Cluster Computing Overview Summer Institute for Advanced Computing August 22, 2000 Doug Johnson, OSC.

Cluster Computing Overview

3Ohio Supercomputer Center

What is Cluster Computing?What is Cluster Computing?

A Cluster is a collection of interconnected whole computers used as a single, unified computer

Cluster Computing is many things... High performance computing

Run programs with parallel algorithms High throughput computing

Parametric studies (same program run many times with different parameters)

High availability computing Fail-over redundancy

Both scientific and commercial applications!Both scientific and commercial applications!

Common Resources

CPU(s)MemoryHard DriveNetwork Card

NETW

ORK

Page 4: Ohio Supercomputer Center Cluster Computing Overview Summer Institute for Advanced Computing August 22, 2000 Doug Johnson, OSC.

Cluster Computing Overview

4Ohio Supercomputer Center

Brief History of Cluster Computing at Brief History of Cluster Computing at OSCOSC

Beowulf project at Center of Excellence in Space Data and Information Sciences (CESDIS) installs 1st cluster - 16 Intel 486 DX4 processors @ 100MHz, 16 Mbytes RAM per processor, 10 Mbit Ethernet interconnect (3per node)

OSC installs “Beaker” systems, a dual purpose workstation cluster - 12 DEC Alpha EV4 processors with Full Duplex FDDI interconnect

OSC installs “Trout” system, dual purpose workstation cluster, 14 SGI O2 workstations, R10000 processors @ 150 MHz, ATM interconnect

OSC 10 processor IA32 Linux Cluster, Pentium II-400MHz processors,Myrinet interconnect, 4.5 Gbyte RAM

OSC SGI/Linux 128 Processor Cluster, Pentium III Xeon 550MHz processors, 66 Gbyte RAM, Myrinet and 100Mbit Ethernet interconnect

Page 5: Ohio Supercomputer Center Cluster Computing Overview Summer Institute for Advanced Computing August 22, 2000 Doug Johnson, OSC.

Cluster Computing Overview

5Ohio Supercomputer Center

Why Parallel ComputingWhy Parallel Computing

Parallel computing is a strong presence at the National level and is the future of High Performance Computing(HPC)

Parallel computing platforms are a vital element in our infrastructure

Parallel systems have traditionally not been an accessible resource, compared to single processor systems Higher cost (due mostly to high performance interconnect) Less refined user interface Non-traditional programming techniques with little training available

OSC Mission Statement

OSC provides a reliable high performance computing and communications infrastructure for a diverse, statewide/regional community including education, academic research, industry, and state government.

Page 6: Ohio Supercomputer Center Cluster Computing Overview Summer Institute for Advanced Computing August 22, 2000 Doug Johnson, OSC.

Cluster Computing Overview

6Ohio Supercomputer Center

Why Cluster ComputingWhy Cluster Computing

OSC evaluates new and emerging information technologies Cluster computing is one of the hottest fields in high performance

computing

Potential benefits of clusters over traditional parallel systems High performance interconnect technology is approaching commodity

availability Performance of commodity systems are increasing at an aggressive

rate due to the commercial market of home/office workstations

OSC Mission Statement

...

In collaboration with this community, OSC evaluates, implements, and supports new and emerging information technologies.

...

Page 7: Ohio Supercomputer Center Cluster Computing Overview Summer Institute for Advanced Computing August 22, 2000 Doug Johnson, OSC.

Cluster Computing Overview

7Ohio Supercomputer Center

Why Cluster ComputingWhy Cluster ComputingPotential benefits of clusters over traditional parallel systems (cont)

Operating system gives users the same environment on their desk that they have on the parallel system

Other differences System administration implications

No single system image - OS and software upgrades must be applied to all nodes

Cluster design lends itself to more frequent hardware upgrades Performance implications Accounting/funding implications

Page 8: Ohio Supercomputer Center Cluster Computing Overview Summer Institute for Advanced Computing August 22, 2000 Doug Johnson, OSC.

Cluster Computing Overview

8Ohio Supercomputer Center

How Clusters Fit With OSC MissionHow Clusters Fit With OSC Mission

OSC evaluates new and emerging information technologies Multiple software packages have been evaluated to provide the most robust system Four different network interconnects have been installed to evaluate performance Three different processors and operating systems were investigated

OSC implements new and emerging information technologies A cluster under OSC administration has been available to users since March, 1999 OSC Partnered with Portland Group to bring Cluster Development Kit to OSC users

OSC supports new and emerging information technologies OSC 128 processor cluster in production status Training classes on how to build and use a cluster Staff available to Ohio faculty to help answer questions and trouble shoot problems

OSC Mission Statement

...

In collaboration with this community, OSC evaluates, implements, and supports new and emerging information technologies.

...

Page 9: Ohio Supercomputer Center Cluster Computing Overview Summer Institute for Advanced Computing August 22, 2000 Doug Johnson, OSC.

Cluster Computing Overview

9Ohio Supercomputer Center

To SummarizeTo Summarize

Develop cluster technology so that it can be rolled out to university research labs Provide a hardware and software configuration that will allow labs

to construct a working cluster with minimal effort Experienced OSC staff can provide technical assistance

Evaluate software and hardware configurations to assist researchers in defining a system that will best suit their needs Let the researchers focus on science Based on user applications, provide performance analysis showing

the optimal hardware and software configuration

OSC wants to encourage parallel programming Parallel programming is the future of high performance computing Clusters provide increased access to parallel systems

Page 10: Ohio Supercomputer Center Cluster Computing Overview Summer Institute for Advanced Computing August 22, 2000 Doug Johnson, OSC.

Cluster Computing Overview

10Ohio Supercomputer Center

When Did It All Start?When Did It All Start?

December, 1998OSC management authorizes a dedicated 10 processor cluster for technology evaluation

1 - Front end node1 - Front end node

2 Intel Pentium II 400MHz processors

512 Mbyte RAM, 18 Gbyte Disk

4 - Compute nodes4 - Compute nodes

2 Intel Pentium II 400MHz processors

1 Gbyte RAM, 9 Gbyte disk

Interconnects: 100mbit Ethernet, Dolphinics SCI, Myricom Myrinet

Linux OS, PBS Batch System, PGI Linux OS, PBS Batch System, PGI Compiler SuiteCompiler Suite

April, 1999Performance evaluation yields promising results and machine is opened to users in April, 1999

Page 11: Ohio Supercomputer Center Cluster Computing Overview Summer Institute for Advanced Computing August 22, 2000 Doug Johnson, OSC.

Cluster Computing Overview

11Ohio Supercomputer Center

OSC/SGI ClusterOSC/SGI Cluster

September, 1999Agreement signed between OSC and SGI

October, 1999System powered on

November, 1999Machine configured and running applications on floor of Supercomputing 99

December, 1999Machine installed at OSC

February, 2000Machine opened to friendly users

Page 12: Ohio Supercomputer Center Cluster Computing Overview Summer Institute for Advanced Computing August 22, 2000 Doug Johnson, OSC.

Cluster Computing Overview

12Ohio Supercomputer Center

HardwareHardware

1 front-end node configured with: Two Gigabytes of RAM Four 550 MHz Intel Pentium III Xeon processors, each with 512kB of secondary cache 48 Gigabytes, ultra-wide SCSI hard drives Two 100Base-T Ethernet interfaces One HIPPI interface

32 compute nodes each configured with: Two Gigabytes of RAM Four 550 MHz Intel Pentium III Xeon processors, each with 512kB of secondary cache 18 Gigabytes, ultra-wide SCSI hard drives Two Myrinet interfaces One 100Base-T Ethernet interface

All nodes are SGI 1400L servers

Page 13: Ohio Supercomputer Center Cluster Computing Overview Summer Institute for Advanced Computing August 22, 2000 Doug Johnson, OSC.

Cluster Computing Overview

13Ohio Supercomputer Center

Software and ConfigurationSoftware and Configuration Hardware originally assembled in Mountainview, CA

by SGI Professional Services OS and software environment installed and

configured by OSC staff Linux operating system Portable Batch System

(PBS) Portland Group

Compiler Suite Myrinet MPICH-GM

interface

Page 14: Ohio Supercomputer Center Cluster Computing Overview Summer Institute for Advanced Computing August 22, 2000 Doug Johnson, OSC.

Cluster Computing Overview

14Ohio Supercomputer Center

Clusters for Production HPC EnvironmentClusters for Production HPC Environment

There are two significant efforts with building clusters Building a cluster and making it operational Making the cluster a production system

Ability to host multiple users simultaneously Ability to schedule system resources Ability to function without constant intervention

The OSC cluster has the following attributes that make it a true HPC production system Connection to a Mass Storage System (MSS) Integrated into OSC account database system Job accounting Good utilization High availability

Page 15: Ohio Supercomputer Center Cluster Computing Overview Summer Institute for Advanced Computing August 22, 2000 Doug Johnson, OSC.

Cluster Computing Overview

15Ohio Supercomputer Center

Mass Storage SupportMass Storage Support

HIPPIHIPPI

100 Mbit (private)100 Mbit (private)100 Mbit Switch100 Mbit Switch

DMFDMF

Origin 2000Origin 2000

1 Terabyte disk storageData Migration Facility

(DMF)

IBM 3494IBM 3494

30 Terabyte tape storage

. . . .

Page 16: Ohio Supercomputer Center Cluster Computing Overview Summer Institute for Advanced Computing August 22, 2000 Doug Johnson, OSC.

Cluster Computing Overview

16Ohio Supercomputer Center

User Accounts and AccountingUser Accounts and Accounting User Accounts

Cluster is integrated into the Center’s database system for automatic account generation and maintenance

Job Accounting Accounting has been configured into the environment which tracks

CPU usage of users CPU usage is converted with a charging algorithm and deducted

from a Principal Investigators account Users can view accounting history with text command from Linux

command prompt

Page 17: Ohio Supercomputer Center Cluster Computing Overview Summer Institute for Advanced Computing August 22, 2000 Doug Johnson, OSC.

Cluster Computing Overview

17Ohio Supercomputer Center

Utilization and AvailabilityUtilization and Availability Utilization

System utilization is recorded and accessible via a web link

For parallel systems, utilization is expected to be around 50 to 70%

Current utilization is about 70% parallel and 30% serial

Availability Good availability has been achieved through significant uptime and

minimal system problems Scheduling downtime every 4 weeks for software upgrades,

hardware modifications and general system maintenance

Page 18: Ohio Supercomputer Center Cluster Computing Overview Summer Institute for Advanced Computing August 22, 2000 Doug Johnson, OSC.

Cluster Computing Overview

18Ohio Supercomputer Center

TCP Stream PerformanceTCP Stream Performance

TCP Stream Performance

0

50

100

150

200

250

300

350

0 1000000 2000000 3000000 4000000 5000000 6000000 7000000 8000000 9000000

Block Size (bytes)

Meg

abits

/Sec

ond

HIPPI

Gigabit Ethernet

Fast Ethernet

Page 19: Ohio Supercomputer Center Cluster Computing Overview Summer Institute for Advanced Computing August 22, 2000 Doug Johnson, OSC.

Cluster Computing Overview

19Ohio Supercomputer Center

TCP Stream PerformanceTCP Stream Performance

TCP Stream Performance

0

50

100

150

200

250

300

0 50000 100000 150000 200000 250000

Block Size (bytes)

Meg

abits

/sec

ond

HIPPI

Gigabit Ethernet

Fast Ethernet

Page 20: Ohio Supercomputer Center Cluster Computing Overview Summer Institute for Advanced Computing August 22, 2000 Doug Johnson, OSC.

Cluster Computing Overview

20Ohio Supercomputer Center

UDP Stream PerformanceUDP Stream Performance

./netperf -l 60 -H fe.ovl.osc.edu -i 10,2 -I 99,10 -t UDP_STREAM -- -m 1472 -s 32768 -S 32768

UDP UNIDIRECTIONAL SEND TEST to fe.ovl.osc.edu : +/-5.0% @ 99% conf.

Socket Message Elapsed Messages

Size Size Time Okay Errors Throughput

bytes bytes secs # # 10^6bits/sec

131070 1472 59.99 3229909 0 634.03

524288 59.99 2169706 425.91