Top Banner
Grid Computing Grid Computing Chip Watson Jefferson Lab Hall B Collaboration Meeting 1-Nov-2001
28

Grid Computing Chip Watson Jefferson Lab Hall B Collaboration Meeting 1-Nov-2001.

Jan 29, 2016

Download

Documents

Eugene Spencer
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Grid Computing Chip Watson Jefferson Lab Hall B Collaboration Meeting 1-Nov-2001.

Grid ComputingGrid Computing

Chip WatsonJefferson Lab

Hall B Collaboration Meeting 1-Nov-2001

Page 2: Grid Computing Chip Watson Jefferson Lab Hall B Collaboration Meeting 1-Nov-2001.

11

What is the Grid?What is the Grid?

Some wax philosophical, and say it is an unlimited capacity for computing. Like the power grid, you just plug in and use it, don’t care who provides it.

Difficulty: “metering” your use of resources, and charging for them. We aren’t there yet.

Simpler view: it is a large computer center, with a geographically distributed file system and batch system.

This view assumes you have a right to use each piece of the distributed system, subject to perhaps local accounting constraints.

Page 3: Grid Computing Chip Watson Jefferson Lab Hall B Collaboration Meeting 1-Nov-2001.

11

Key Aspects of the GridKey Aspects of the Grid

Data Grid: Location independent file system. If you know the “logical name” of a data set, you can

find it. (Normal access controls apply). Files can migrate around the grid to optimize usage,

and may exist in multiple locations.

Computational Grid: Submit a job to “the grid”. You describe the requirements of your job, and grid

middleware finds an appropriate place to run it. Jobs can be batch, or even interactive.

Page 4: Grid Computing Chip Watson Jefferson Lab Hall B Collaboration Meeting 1-Nov-2001.

11

Other Important AspectsOther Important Aspects

Single Sign-OnYou “log on” to the grid once, and you can use the distributed resources for a certain period of time (sort of like the AFS file system)

Analog: all day metro ticket

Page 5: Grid Computing Chip Watson Jefferson Lab Hall B Collaboration Meeting 1-Nov-2001.

11

Distributed Computing ModelDistributed Computing ModelIn the “old” model, a lab has a large computer center, provisioned for all demanding data storage, analysis and simulation requirements.

In the “current” model, only a fraction resides at the lab. already widely used in HEP experiments large experiment may enlist a major computing partner site, e.g.

IN2P3 for BaBar

In the “new” model, many sites large and small participate. Some sites may be special based upon capacity or specialized

capabilities (such as robotic storage). LHC will use a 3 tier model, with a large central facility (tier 0),

distributing data to moderately large national centers (tier 1), which in turn service small nodes (tier 2)

What is a reasonable distribution for Hall D???

Page 6: Grid Computing Chip Watson Jefferson Lab Hall B Collaboration Meeting 1-Nov-2001.

11

Why desert a working model?Why desert a working model?

Easier to get additional fundsState matching funds

Also: NSF or other funding agency

Easier to involve studentsRoom full of computers more attractive than account on a machine 1000 km away

Opportunity for innovationEasier to play with local machine than to get root access on machine 1000 km away

Page 7: Grid Computing Chip Watson Jefferson Lab Hall B Collaboration Meeting 1-Nov-2001.

Case Study:Case Study:

The Lattice PortalThe Lattice Portal

A prototype virtual A prototype virtual computer center for computer center for

Jefferson LabJefferson Lab(under development)(under development)

Page 8: Grid Computing Chip Watson Jefferson Lab Hall B Collaboration Meeting 1-Nov-2001.

11

ContentsContents

Components of the virtual computer center1. Data management2. Batch system3. Interactive system

Architectural components1. Information Services using XML (Java servlets)

Replica CatalogData Grid Server (file cache & transfer agent)Batch Server

2. Authentication using X.509, SSL3. Java client packages

Page 9: Grid Computing Chip Watson Jefferson Lab Hall B Collaboration Meeting 1-Nov-2001.

11

A Virtual Computer Center:A Virtual Computer Center: Data ManagementData Management

Global Logical File System (possibly constrained to a project)

1. Logical names for files (location independent)2. Active files possibly cached at multiple sites3. Inactive files in off-line storage (tape silo, multi-site)

Data Grid Node– Manages a cache of logical files, perhaps using multiple file

servers, NFS exports files locally– Maps logical name to local (physical) file name– Supports file transfers between managed and unmanaged

storage, and between grid nodes (queuing of transfer requests)

Replica Catalog– Tracks which logical files exist at which data grid nodes– Contains some file meta-data to allow file selection by

attributes as well as by name

Page 10: Grid Computing Chip Watson Jefferson Lab Hall B Collaboration Meeting 1-Nov-2001.

11

In picture form…In picture form…

DataGridServer FileServer

ClientProgram

ReplicaCatalog

library

File HostReplica Catalog Host

1. Get file names from meta data (energy, target, magnet settings)

2. Contact replica catalog to locate desired file. Get referral to a Data Grid Server

1. Get file state (on disk), additional info, referral to transfer agent

2. Get the file (parallel streams)

MetaDataCatalog

MetaData Catalog Host

Page 11: Grid Computing Chip Watson Jefferson Lab Hall B Collaboration Meeting 1-Nov-2001.

11

A Virtual Computer Center:A Virtual Computer Center: Batch SystemBatch System

Global Queue(s)

A user submits a batch job, perhaps using a web interface, to the virtual computer center (a.k.a. meta-facility). Based upon the locations of the executable, the input data files, and the necessary compute resources, the job is assigned to a particular compute grid node (cluster of machines).

Compute Grid NodeSet of co-located compute resources managed by a batch system. Typically co-located with a data grid node. E.g. Jefferson Lab’s Computer Center.

Page 12: Grid Computing Chip Watson Jefferson Lab Hall B Collaboration Meeting 1-Nov-2001.

11

Virtual Computer Center:Virtual Computer Center: InteractiveInteractive

Conventional remote login is expected to be less common, as all capabilities are remotely accessible. Nevertheless…

Interactive Services1. ssh login to machine of desired architecture

and operating system2. interactive access to small clusters for serial

and parallel jobs (or fast turnaround on local batch system)

Page 13: Grid Computing Chip Watson Jefferson Lab Hall B Collaboration Meeting 1-Nov-2001.

11

Implementation?Implementation?

As with any distributed system, there are many ways to construct a meta-facility or grid:

CORBA (distributed object system) DCOM (Windows only) Custom protocols over TCP/IP or UDP/IP Grid Middleware

Globus (from ANL)

Legion (UVA)

Web Services

. . . or some combination of the above

Page 14: Grid Computing Chip Watson Jefferson Lab Hall B Collaboration Meeting 1-Nov-2001.

11

What are Web Services?What are Web Services?

Web Services are functions or methods that can be accessed across the web.

Think of this as a “better” RPC (remote procedure call) system. Why better?

Page 15: Grid Computing Chip Watson Jefferson Lab Hall B Collaboration Meeting 1-Nov-2001.

11

Why Web Services ?Why Web Services ?

Use of industry standards HTTP, HTTPS, XML, SOAP, WSDL, UDDI, …

Support for many languages Compiled and scripted

Self describing protocols easier management of versioning, evolution

Support for authentication

Strong Industry Support: Microsoft’s .NET initiative SUN’s ONE (Open Net Environment) IBM contributions to Apache / SOAP

Page 16: Grid Computing Chip Watson Jefferson Lab Hall B Collaboration Meeting 1-Nov-2001.

11

Web Browser

XML to HTML servlet

Web Service

Application

Web Service

Web Service

Grid Service

Local Backend Services

(batch, file, etc.)

Web Server (Portal)

Authenticated connections

Remote Web Server

Web Service

Storage systemGrid resources, e.g. Condor

Batch system

A three tier web services architecture

Page 17: Grid Computing Chip Watson Jefferson Lab Hall B Collaboration Meeting 1-Nov-2001.

11

Web Services Details:Web Services Details: Data GridData Grid

Replica Catalog & Data Grid Node List – contents of a directoryNavigate – to another directory, or follow a soft linkMkdir – make a new directoryLink – make a new linkDelete – a logical file, directory, or linkProperties – set / retrieve properties of a file, directory, or link

(including protection / access control)

Replica Catalog specificCreate – a new logical fileAdd/Remove/Select/Delete replica – manipulate references to

where file is stored

Data Grid Node specificAllocate – space for an incoming fileCopy – a file to/from unmanaged space, or another grid nodeLocate – get reference to physical file for transfer or local

access

Page 18: Grid Computing Chip Watson Jefferson Lab Hall B Collaboration Meeting 1-Nov-2001.

11

Web Services Details:Web Services Details: Batch SystemBatch System

User Job OperationsSubmit

Resource requirements (CPU, memory, disk, net, …)Dependencies on other jobs / events Executables, libraries, etc., input files, output files

CancelSuspend / ResumeList – by queue, owner, site, …View allocation, usage

Operator OperationsOn systems, queues, jobsOn quota / allocation system

Page 19: Grid Computing Chip Watson Jefferson Lab Hall B Collaboration Meeting 1-Nov-2001.

11

Technology Choice: XML + …Technology Choice: XML + …

Advantages Self describing data (contains meta data)

Facilitates heterogenous systems Robust against evolution

(no fragile versioning that distributed object systems encounter)New server generates additional tags which are ignored by old client

New client detects absence of new tags & knows it is talking to an old server (and/or supplies defaults)

Capable of defining all key concepts and operations for both client-server and client-portal communications

Technologies XML – eXtensible Markup Language SOAP – Simple Object Access Protocol (~modern rpc system) WSDL – Web Services Description Language (~idl) UDDI – Universal Description, Discovery and Integration

Page 20: Grid Computing Chip Watson Jefferson Lab Hall B Collaboration Meeting 1-Nov-2001.

11

Technology Choices: Java ServletsTechnology Choices: Java Servlets

Java Advantages1. Rapid code development2. No memory leaks3. Easy to use interface to SQL databases, XML libraries4. Rich library of low level components (containers, etc.)

Web + Servlet Advantages1. Java (see above)2. Scalability (see e-commerce)3. Modular web services

One servlet can invoke another, e.g. to translate XML to HTML

Minor Web Inconvenience1. Asynchronous notification of clients of web services

Page 21: Grid Computing Chip Watson Jefferson Lab Hall B Collaboration Meeting 1-Nov-2001.

11

PPDG Collaboration: JLAB/SRBPPDG Collaboration: JLAB/SRB

Web services as a general portal to a variety of back end storage systems (JLAB, SRB, …)

And other services – batch Project should define the abstractions at the web services level; define all metadata for interacting with a storage system

Define XML to describe digital objects and collections/directories (ALL)Metadata to describe logical namespace of the grid (SRB, JLAB, GridFTP attributes…)Standard structure for organizing as XML

Define (WSDL ?) operations of browse, query, manage (ALL)Listing files available through interface, Caching, replication, pinning, staging, resource allocation, etc

Back-end implementations JASMine (JLAB) SRB (SDSC) (SRM, Globus)

Implement demonstration web services client (JLAB) Web services clients should be able to interact with any of these

Page 22: Grid Computing Chip Watson Jefferson Lab Hall B Collaboration Meeting 1-Nov-2001.

11

Tape storage system• 12000 slot STK silos• 8 Redwood, 10 9940, 10 9840 drives• 7 Data movers ~ 300 GB buffer each• Software – JASMine

15 TB Experiment cache pools

2 TB Farm cache

0.5 TB LQCD cache pool

JASMine managed mass storage sub-systems

JLAB mss - JASMineJLAB mss - JASMineFeatures

Stand alone cache manager

Pluggable policies Implemented in Java Distributed, scaleable Pluggable security

Authentication & authorization

To be integrated with GSI

Scheduling of drives Can manage tape, tape

and disk, or disk alone

Page 23: Grid Computing Chip Watson Jefferson Lab Hall B Collaboration Meeting 1-Nov-2001.

11

Example – demo clientExample – demo client

Similar to a graphical ftp client, but: Each half can attach to a grid

node:Cache – managed filesystem

User’s home directory

Other file systems at web server

Replica catalog

Local mss if it is separate from replica system

Can move files in/out of managed store

Negotiates compatible protocols between grid nodes

E.g., http, SRB, gridFTP, ftp, bbftp, JASMine, etc

Page 24: Grid Computing Chip Watson Jefferson Lab Hall B Collaboration Meeting 1-Nov-2001.

11

Technologies EmployedTechnologies Employed

Apache web server

Tomcat servlet engine, SOAP libraries

Java Server Pages (JSP)

XML data format

XSL style sheets for presentation

X.509 certificate authentication Web interface to a simple certificate authority to issue

certificates valid within the meta-facility (signed by Jefferson Lab)

Page 25: Grid Computing Chip Watson Jefferson Lab Hall B Collaboration Meeting 1-Nov-2001.

11

Data GridData Grid

Capabilities planned: Replicated data (multi-site), global tree structured name space

(like Unix file system) Replica catalog, replicated to multi-site

using mySQL as back end, probably using mySQL’s ability to replica the catalog (fault tolerance)

Browse by attributes as well as by name Parallel file transfers (

bbftp, gridftp, …

Jpars – 100% java parallel file transfers (w/ 3rd party, authen.)

Drag-n-drop between sites Policy based replication (auto migrate between sites)

Page 26: Grid Computing Chip Watson Jefferson Lab Hall B Collaboration Meeting 1-Nov-2001.

11

StatusStatus

Prototype Browse contents of a prototype disk cache / tape

storage file system Move files between managed and unmanaged

storage on data node Move files (including entire directories) between

desktop and data node Displays if file is currently in disk cache Can request move from tape to disk (not released)

Soon 3rd party file transfers (between 2 servers)

Page 27: Grid Computing Chip Watson Jefferson Lab Hall B Collaboration Meeting 1-Nov-2001.

11

Near TermNear Term

Convert from raw XML to SOAP (this month)

Deploy disk cache manager to FSU & MIT

(4Q01)

Abstract disk-to-tape migration of current system

to use WAN site-to-site migration of files;

wrapping, e.g. gridftp or other parallel transfer

(1Q02)

Page 28: Grid Computing Chip Watson Jefferson Lab Hall B Collaboration Meeting 1-Nov-2001.

11

ConclusionsConclusions

Grid Capabilities are starting to emerge

Jefferson Lab will have a functioning data

grid in FY02

Jefferson Lab will have a functioning

meta-facility in FY03