Introduction to Grid & Cluster Computing Sriram Krishnan, Ph.D. sriram@sdsc

Introduction to Grid & Cluster ComputingIntroduction to Grid & Cluster Computing

Sriram Krishnan, Ph.D.sriram@sdsc.edu

Web PortalsRich Clients

Set of Biomedical Applications

Motivation: NBCR ExampleMotivation: NBCR Example

Resources

Telescience Portal

Cyber-Infrastructure

Web Services

WorkflowMiddleware

PMV ADTVision ContinuityAPBSCommand

APBS Continuity

Gtomo2TxBRAutodockGAMESS

QMView

Cluster ResourcesCluster Resources

• “A computer cluster is a group of tightly coupled computers that work together closely so that in many respects they can be viewed as though they are a single computer.” [wikipedia]

• Typically built using commodity off-the-shelf hardware (processors, networking, etc)– Differs from traditional “supercomputers”– Now at more than 70% of deployed Top500 machines

• Useful for: high availability, load-balancing, scalability, visualization, and high performance

Grid ComputingGrid Computing

• “Coordinated resource sharing and problem solving in dynamic multi-institutional virtual organization.” [Foster, Kesselman, Tuecke]

– Coordinated - multiple resources working in concert, eg. Disk & CPU, or instruments & database, etc.

– Resources - compute cycles, databases, files, application services, instruments.

– Problem solving - focus on solving scientific problems– Dynamic - environments that are changing in unpredictable

ways– Virtual Organization - resources spanning multiple

organizations and administrative domains, security domains, and technical domains

Grids are not the same as Clusters!Grids are not the same as Clusters!

• Foster’s 3 point checklist– Resources not subjected to centralized

control– Use of standard, open, general-purpose

protocols and interfaces– Delivery of non-trivial qualities of service

• Grids are typically made up of multiple clusters

Popular MisconceptionPopular Misconception

• Misconception: Grids are all about CPU cycles– CPU cycles are just one aspect, others are:

• Data: For publishing and accessing large collections of data, e.g. Geosciences Network (GEON) Grid

• Collaboration: For sharing access to instruments (e.g. TeleScience Grid), and collaboration tools (e.g. Global MMCS at IU)

SETI@HomeSETI@Home

• Uses 1000s of internet connected PCs to help in search for extraterrestrial intelligence

• When the computer is idle, the software downloads ~ 1/2 MB chunk of data for analysis.

• Results of analysis sent back to the SETI team, combined with 1000s of other participants

• Largest distributed computation project in existence

– Total CPU time: 2433979.781 years

– Users: 5436301• Statistics from 2006

IMAGING INSTRUMENTS

COMPUTATIONALRESOURCES

LARGE-SCALE DATABASES

DATAACQUISITION ,ANALYSIS

ADVANCEDVISUALIZATION

NCMIR TeleScience GridNCMIR TeleScience Grid

* Slide courtesy TeleScience folks

Condor pool SGE Cluster PBS Cluster

Globus Globus Globus

Application Services Security Services (GAMA)

StateMgmt

Gemstone

PMV/Vision Kepler

NBCR GridNBCR Grid

Day 1 - Using Grids and Clusters: Job SubmissionDay 1 - Using Grids and Clusters: Job Submission

• Scenario 1 - Clusters:– Upload data to remote cluster using scp– Log on to the said cluster using ssh– Submit job via command-line to schedulers, such as

Condor or the Sun Grid Engine (SGE)

• Scenario 2 - Grids:– Upload data using to Grid resource using GridFTP– Submit job via Globus command-line tools (e.g.

globus-run) to remote resources• Globus services communicate with the resource specific

schedulers

Day 1 - Using Grids & Clusters: SecurityDay 1 - Using Grids & Clusters: Security

Day 1 - Using Grids & Clusters: User InterfacesDay 1 - Using Grids & Clusters: User Interfaces

Day 2 - Managing Cluster Environments Day 2 - Managing Cluster Environments

• Clusters are great price/performance computational engines– Can be hard to manage without experience– Failure rate increases with cluster size

• Not cost-effective if maintenance is more expensive than the cluster itself– System administrators can cost most than

clusters (1 Tflops cluster < $100,000)

Day 2 - Rocks (Open Source Clustering Distribution)Day 2 - Rocks (Open Source Clustering Distribution)

• Technology transfer of commodity clustering to application scientists– Making clusters easy– Scientists can build their own supercomputers

• Rocks distribution is a set of CDs– Red Hat Enterprise Linux– Clustering Software (PBS, SGE, Ganglia, Globus)– Highly programmatic software configuration

management

• http://www.rocksclusters.org

Day 2 - Rocks RollsDay 2 - Rocks Rolls

Day 3 - Advanced Usage Scenarios: WorkflowsDay 3 - Advanced Usage Scenarios: Workflows

• Scientific workflows emerged as an answer to the need to combine multiple Cyberinfrastructure components in automated process networks

• Combination of– Data integration, analysis, and visualization

steps– Automated “scientific process”

• Promotes scientific discovery

Day 3 - The Big Picture: Scientific WorkflowsDay 3 - The Big Picture: Scientific Workflows

Here:John Blondin, NC State

AstrophysicsTerascale Supernova Initiative

SciDAC, DOE

Conceptual SWF

Executable SWF

From “Napkin Drawings” …

… to Executable Workflows

Source: Mladen Vouk (NCSU)

Day 3 - Kepler Workflows: A Closer LookDay 3 - Kepler Workflows: A Closer Look

Day 3 - Advanced Usage Scenarios: MetaSchedulingDay 3 - Advanced Usage Scenarios: MetaScheduling

• Local schedulers are responsible for load balancing and resource sharing within each local administrative domain

• Meta-Schedulers are responsible for querying, negotiating access and managing resources existing within different administrative domains in Grid systems

Day 3 - MetaSchedulers: CSF4Day 3 - MetaSchedulers: CSF4

• What is the CSF Meta-Scheduler?– Community Scheduler Framework– CSF4 is a group of Grid services hosted inside the

Globus Toolkit (GT4)– CSF4 is fully WSRF compliant– Open Source project and can be accessed at

http://sourceforge.net/projects/gcsf– The development team of CSF4 is from Jilin

University, PRC

Day 3 - CSF4 ArchitectureDay 3 - CSF4 Architecture

Local Machine

PBS SGE CondorLSFLocal

MachinePBS SGE Condor

CSF4 Services

Queuing Service

Resource Manager LSF Service

GramPBS GramCondorGramFork GramSGE

WS-GRAM

Resource Manager Factory Service

Job Service

Reservation Service

GT2 Environment

GateKeeper

GramPBS GramSGE GramCondorGramFork

Resource Manager Gram Service

WS-MDS

Meta Information

Grid Environment

GramLSF

Day 4 - Accessing TeraScale ResourcesDay 4 - Accessing TeraScale Resources

• I need more resources! What are my options?– TeraGrid: “With 20 petabytes of storage, and more

than 280 teraflops of computing power, TeraGrid combines the processing power of supercomputers across the continent”

– PRAGMA: “To establish sustained collaborations and advance the use of grid technologies in applications among a community of investigators working with leading institutions around the Pacific Rim”

Day 4 - TeraGridDay 4 - TeraGrid

TeraGrid is a “top-down”,

planned Grid

PSCPSC

Extensible Terascale Facility

• Members: IU, ORNL, NCSA, PSC, Purdue, SDSC, TACC, ANL, NCAR

• 280 Tflops of computing capability

• 30 PB of distributed storage

• High performance networking between partner sites

• Linux-based software environment, uniform administration

• Focus is a national, production Grid

PRAGMA Grid Member InstitutionsPRAGMA Grid Member Institutions

31 institutions in 15 countries/regions (+ 7 in preparation)

UZurichSwitzerland

NECTECThaiGridThailand

UoHydIndia

MIMOSUSMMalaysia

CUHKHongKong

ASGCNCHCTaiwan

HCMUTIOIT-HCMVietnam

AISTOsakaUUTsukubaTITechJapan

BIIIHPCNGONTUSingapore

MUAustralia

APACQUTAustralia

KISTIKorea

JLUChina

SDSCUSA

CICESEMexico

UNAMMexico

UCNChile

UChileChile

UUtahUSA

NCSAUSA BU

ITCRCosta Rica

BESTGridNew Zealand

CNICGUCASChina

LZUChina UPRM

Puerto Rico

Track 1: Agenda (9AM-12PM at PFBH 161)Track 1: Agenda (9AM-12PM at PFBH 161)

• Tues, July 31: Basic Cluster and Grid Computing Environment

• Wed, Aug 1: Rocks Clusters and Application Deployment

• Thurs, Aug 2: Workflow Management and MetaScheduling

• Fri, Aug 3: Accessing National and International TeraScale Resources

Introduction to Grid & Cluster Computing Sriram Krishnan, Ph.D. sriram@sdsc

grids clusters

cluster environments

remote cluster

said cluster

upload data

ncmir telescience grid

cluster sizenot costeffective

mb chunk of data

Documents

Sriram presentation 1

Sriram Krishnan Program Manager Microsoft Corporation...

Blenderized diets: Is there any role in hospitals?€¦ ·....

sdsc leadership training working

Krishnan K. Krishnan - CSUPOM - Home · 2012. 2. 9. ·...

Chellappan Sriram

SDSC Decision: Hyperion Refining

OpenTopography: A Services Oriented Architecture for ... ·...

NPACI/SDSC Security Activities

Sriram Chemistry

Core SRB Technology for 2005 NCOIC Workshop By Michael Wan.....

Apresentação SDSC

Hydra: a federated resource manager for data …...Hydra: a....

Non-Collective Communicator Creation in MPI James Dinan 1,.....

High End Computing at SDSC

Collaboration - SDSC