Introduction to Grid & Cluster Computing Sriram Krishnan, Ph.D. sriram@sdsc
Post on 12-Jan-2016
33 Views
Preview:
DESCRIPTION
Transcript
Introduction to Grid & Cluster ComputingIntroduction to Grid & Cluster Computing
Sriram Krishnan, Ph.D.sriram@sdsc.edu
Web PortalsRich Clients
Set of Biomedical Applications
Motivation: NBCR ExampleMotivation: NBCR Example
Resources
Telescience Portal
Cyber-Infrastructure
Web Services
WorkflowMiddleware
PMV ADTVision ContinuityAPBSCommand
APBS Continuity
Gtomo2TxBRAutodockGAMESS
QMView
Cluster ResourcesCluster Resources
• “A computer cluster is a group of tightly coupled computers that work together closely so that in many respects they can be viewed as though they are a single computer.” [wikipedia]
• Typically built using commodity off-the-shelf hardware (processors, networking, etc)– Differs from traditional “supercomputers”– Now at more than 70% of deployed Top500 machines
• Useful for: high availability, load-balancing, scalability, visualization, and high performance
Grid ComputingGrid Computing
• “Coordinated resource sharing and problem solving in dynamic multi-institutional virtual organization.” [Foster, Kesselman, Tuecke]
– Coordinated - multiple resources working in concert, eg. Disk & CPU, or instruments & database, etc.
– Resources - compute cycles, databases, files, application services, instruments.
– Problem solving - focus on solving scientific problems– Dynamic - environments that are changing in unpredictable
ways– Virtual Organization - resources spanning multiple
organizations and administrative domains, security domains, and technical domains
Grids are not the same as Clusters!Grids are not the same as Clusters!
• Foster’s 3 point checklist– Resources not subjected to centralized
control– Use of standard, open, general-purpose
protocols and interfaces– Delivery of non-trivial qualities of service
• Grids are typically made up of multiple clusters
Popular MisconceptionPopular Misconception
• Misconception: Grids are all about CPU cycles– CPU cycles are just one aspect, others are:
• Data: For publishing and accessing large collections of data, e.g. Geosciences Network (GEON) Grid
• Collaboration: For sharing access to instruments (e.g. TeleScience Grid), and collaboration tools (e.g. Global MMCS at IU)
SETI@HomeSETI@Home
• Uses 1000s of internet connected PCs to help in search for extraterrestrial intelligence
• When the computer is idle, the software downloads ~ 1/2 MB chunk of data for analysis.
• Results of analysis sent back to the SETI team, combined with 1000s of other participants
• Largest distributed computation project in existence
– Total CPU time: 2433979.781 years
– Users: 5436301• Statistics from 2006
IMAGING INSTRUMENTS
COMPUTATIONALRESOURCES
LARGE-SCALE DATABASES
DATAACQUISITION ,ANALYSIS
ADVANCEDVISUALIZATION
NCMIR TeleScience GridNCMIR TeleScience Grid
* Slide courtesy TeleScience folks
Condor pool SGE Cluster PBS Cluster
Globus Globus Globus
Application Services Security Services (GAMA)
StateMgmt
Gemstone
PMV/Vision Kepler
NBCR GridNBCR Grid
Day 1 - Using Grids and Clusters: Job SubmissionDay 1 - Using Grids and Clusters: Job Submission
• Scenario 1 - Clusters:– Upload data to remote cluster using scp– Log on to the said cluster using ssh– Submit job via command-line to schedulers, such as
Condor or the Sun Grid Engine (SGE)
• Scenario 2 - Grids:– Upload data using to Grid resource using GridFTP– Submit job via Globus command-line tools (e.g.
globus-run) to remote resources• Globus services communicate with the resource specific
schedulers
Day 1 - Using Grids & Clusters: SecurityDay 1 - Using Grids & Clusters: Security
Day 1 - Using Grids & Clusters: User InterfacesDay 1 - Using Grids & Clusters: User Interfaces
Day 2 - Managing Cluster Environments Day 2 - Managing Cluster Environments
• Clusters are great price/performance computational engines– Can be hard to manage without experience– Failure rate increases with cluster size
• Not cost-effective if maintenance is more expensive than the cluster itself– System administrators can cost most than
clusters (1 Tflops cluster < $100,000)
Day 2 - Rocks (Open Source Clustering Distribution)Day 2 - Rocks (Open Source Clustering Distribution)
• Technology transfer of commodity clustering to application scientists– Making clusters easy– Scientists can build their own supercomputers
• Rocks distribution is a set of CDs– Red Hat Enterprise Linux– Clustering Software (PBS, SGE, Ganglia, Globus)– Highly programmatic software configuration
management
• http://www.rocksclusters.org
Day 2 - Rocks RollsDay 2 - Rocks Rolls
Day 3 - Advanced Usage Scenarios: WorkflowsDay 3 - Advanced Usage Scenarios: Workflows
• Scientific workflows emerged as an answer to the need to combine multiple Cyberinfrastructure components in automated process networks
• Combination of– Data integration, analysis, and visualization
steps– Automated “scientific process”
• Promotes scientific discovery
Day 3 - The Big Picture: Scientific WorkflowsDay 3 - The Big Picture: Scientific Workflows
Here:John Blondin, NC State
AstrophysicsTerascale Supernova Initiative
SciDAC, DOE
Conceptual SWF
Executable SWF
From “Napkin Drawings” …
… to Executable Workflows
Source: Mladen Vouk (NCSU)
Day 3 - Kepler Workflows: A Closer LookDay 3 - Kepler Workflows: A Closer Look
Day 3 - Advanced Usage Scenarios: MetaSchedulingDay 3 - Advanced Usage Scenarios: MetaScheduling
• Local schedulers are responsible for load balancing and resource sharing within each local administrative domain
• Meta-Schedulers are responsible for querying, negotiating access and managing resources existing within different administrative domains in Grid systems
Day 3 - MetaSchedulers: CSF4Day 3 - MetaSchedulers: CSF4
• What is the CSF Meta-Scheduler?– Community Scheduler Framework– CSF4 is a group of Grid services hosted inside the
Globus Toolkit (GT4)– CSF4 is fully WSRF compliant– Open Source project and can be accessed at
http://sourceforge.net/projects/gcsf– The development team of CSF4 is from Jilin
University, PRC
Day 3 - CSF4 ArchitectureDay 3 - CSF4 Architecture
Local Machine
PBS SGE CondorLSFLocal
MachinePBS SGE Condor
: :
CSF4 Services
Queuing Service
Resource Manager LSF Service
GramPBS GramCondorGramFork GramSGE
WS-GRAM
gabd
Resource Manager Factory Service
Job Service
Reservation Service
GT2 Environment
GateKeeper
GramPBS GramSGE GramCondorGramFork
Resource Manager Gram Service
WS-MDS
Meta Information
Grid Environment
GramLSF
Day 4 - Accessing TeraScale ResourcesDay 4 - Accessing TeraScale Resources
• I need more resources! What are my options?– TeraGrid: “With 20 petabytes of storage, and more
than 280 teraflops of computing power, TeraGrid combines the processing power of supercomputers across the continent”
– PRAGMA: “To establish sustained collaborations and advance the use of grid technologies in applications among a community of investigators working with leading institutions around the Pacific Rim”
Day 4 - TeraGridDay 4 - TeraGrid
TeraGrid is a “top-down”,
planned Grid
PSCPSC
Extensible Terascale Facility
• Members: IU, ORNL, NCSA, PSC, Purdue, SDSC, TACC, ANL, NCAR
• 280 Tflops of computing capability
• 30 PB of distributed storage
• High performance networking between partner sites
• Linux-based software environment, uniform administration
• Focus is a national, production Grid
PRAGMA Grid Member InstitutionsPRAGMA Grid Member Institutions
31 institutions in 15 countries/regions (+ 7 in preparation)
UZurichSwitzerland
NECTECThaiGridThailand
UoHydIndia
MIMOSUSMMalaysia
CUHKHongKong
ASGCNCHCTaiwan
HCMUTIOIT-HCMVietnam
AISTOsakaUUTsukubaTITechJapan
BIIIHPCNGONTUSingapore
MUAustralia
APACQUTAustralia
KISTIKorea
JLUChina
SDSCUSA
CICESEMexico
UNAMMexico
UCNChile
UChileChile
UUtahUSA
NCSAUSA BU
USA
ITCRCosta Rica
BESTGridNew Zealand
CNICGUCASChina
LZUChina UPRM
Puerto Rico
Track 1: Agenda (9AM-12PM at PFBH 161)Track 1: Agenda (9AM-12PM at PFBH 161)
• Tues, July 31: Basic Cluster and Grid Computing Environment
• Wed, Aug 1: Rocks Clusters and Application Deployment
• Thurs, Aug 2: Workflow Management and MetaScheduling
• Fri, Aug 3: Accessing National and International TeraScale Resources
top related