Massive High-Performance Massive High-Performance Global File Systems for Grid Global File Systems for Grid Computing Computing - By Phil Andrews, Patricia Kovatch, Christopher Jordan - Presented by Han S Kim
31
Embed
Massive High-Performance Global File Systems for Grid Computing -By Phil Andrews, Patricia Kovatch, Christopher Jordan -Presented by Han S Kim.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Slide 1
Massive High-Performance Global File Systems for Grid Computing
-By Phil Andrews, Patricia Kovatch, Christopher Jordan -Presented
by Han S Kim
Slide 2
Han S KimConcurrent Systems Architecture Group Outline I I
Introduction II GFS via Hardware Assist: SC02 III Native WAN-GFS:
SC03 IV True Grid Prototype: SC04 V V Production Facility: 2005 VI
Future Work
Slide 3
Han S KimConcurrent Systems Architecture Group I
IIntroductionIntroduction
Slide 4
Han S KimConcurrent Systems Architecture Group 1.Introduction -
The Original Mode of Operation for Grid Computing To submit the
users job to the ubiquitous grid. The job would run on the most
appropriate computational platform available. Any data required for
the computation would be moved to the chosen compute facilitys
local disk. Output data would be written to the same disk. The
normal utility used for the data transfer would be GridFTP.
Slide 5
Han S KimConcurrent Systems Architecture Group 1.Introduction -
In Grid Supercomputing, The very large size of the data sets used.
The National Virtual Observatory consists of approximately 50
Terabytes, is used as input by several applications. Some
applications write very large amounts of data The Southern
California Earthquake Center simulation Writes close to 250
Terabytes in a single run Other applications require extremely high
I/O rates The Enzo application-AMR Cosmological Simulation code
Multiple Terabytes per hour is routinely written and read.
Slide 6
Han S KimConcurrent Systems Architecture Group 1.Introduction -
Concerns about Grid Supercomputing The normal approach of moving
data back and forth may not translate well to a supercomputing
grid, mostly relating to the very large size of the data sets used.
These size and required transfer rates are not conducive to routine
migration of wholesale input and output data between grid sites.
The computation system may not have enough room for a required
dataset or output data. The necessary transfer rates may not be
achievable.
Slide 7
Han S KimConcurrent Systems Architecture Group 1.Introduction -
In this paper.. Show How a Global File System, where direct file
I/O operations can be performed across a WAN can obviate these
concerns. A series of large-scale demonstrations
Slide 8
Han S KimConcurrent Systems Architecture Group II GFS via
Hardware Assist: SC02
Slide 9
Han S KimConcurrent Systems Architecture Group Global File
Systems were still in the concept stage. Two Concerns The latencies
involved in a widespread network such as the TeraGrid The file
systems did not yet have the capability of exportation across a WAN
2. GFS via Hardware Assist: SC02 - At That Time
Slide 10
Han S KimConcurrent Systems Architecture Group Used hardware
capable of encoding Fibre Channel frames within IP packets (FCIP)
Internet Protocol-based storage networking technology developed by
IETF FCIP mechanisms enable the transmission of Fiber Channel
information by tunneling data between storage area network
facilities over IP networks. 2. GFS via Hardware Assist: SC02 -
Approach
Slide 11
Han S KimConcurrent Systems Architecture Group 2. GFS via
Hardware Assist:SC02 - The Goal of This Demo In that year, the
annual Supercomputing conference was Baltimore. The distance
between show floor and San Diego is greater than any within the
TeraGrid. The perfect opportunity to demonstrate whether latency
effects would eliminate any chance of a successful GFS at that
distance.
Slide 12
Han S KimConcurrent Systems Architecture Group 2. GFS via
Hardware Assist: SC02 - Hardware Configuration btw San Diego and
Baltimore Two 4GbE channels Force 10 GbE switch Nishan 4000 Brocade
12000 Fiber Channel Switch Force 10 GbE switch Nishan 4000 Brocade
12000 Fibre Channel Switch Sun SF6800 San DiegoBaltimore FC Disk
Cache, 17TB Silos and Tape Drives, 6PB TeraGrid backbone, ScieNet
10Gb/s WAN Two 4GbE channels Encoded and decoded Fiber Channel
frames into IP packets for transmission and reception
Slide 13
Han S KimConcurrent Systems Architecture Group 2. GFS via
Hardware Assist: SC02 - SC02 GFS Performance btw SDSC and Baltimore
720 MB/s, 80ms round trip SDSC-Baltimore Demonstrated the a GFS
could provide some of the most efficient data transfers possible
over TCP/IP
Slide 14
Han S KimConcurrent Systems Architecture Group III Native
WAN-GFS: SC03
Slide 15
Han S KimConcurrent Systems Architecture Group 3. Native
WAN-GFS: SC03 - Issue and Approach Issue: Whether Global File
Systems were possible without hardware FCIP encoding. SC03 was the
chance to use pre-release software from IBMs General Parallel File
System (GPFS) A true wide area-enabled file system Shared-Disk
Architecture Files are striped across all disks in the file system
Parallel access to file data and metadata
Slide 16
Han S KimConcurrent Systems Architecture Group 3. Native
WAN-GFS: SC03 - WAN-GPFS Demonstration The Central GFS, 40
Two-processor IA64 nodes which provides sufficient bandwidth to
saturate the 10GbE link Each server had a single FC HBA and GbE
connecters Serves the file system across the WAN to SDSC and NCSA
The mode of operation was to copy data produced at SDSC across the
WAN to the disk systems on the show floor To visualize it at both
SDSC and NCSA 10GbE to TeraGrid
Slide 17
Han S KimConcurrent Systems Architecture Group 3. Native
WAN-GFS: SC03 - Bandwidth Results at SC03 The visualization
application terminated normally as it ran out of data and was
restarted.
Slide 18
Han S KimConcurrent Systems Architecture Group 3. Native
WAN-GFS: SC03 - Bandwidth Results at SC03 Over a maximum bandwidth
10 Gb/s link, the peak transfer rate was almost 9Gb/s and over
1GB/s was easily sustained.
Slide 19
Han S KimConcurrent Systems Architecture Group IV True Grid
Prototype: SC04
Slide 20
Han S KimConcurrent Systems Architecture Group 4. True Grid
Prototype: SC04 - The Goal of This Demonstration To implement a
true grid prototype of what a GFS node on the TeraGrid would look
like. The possible dominant modes of operation for grid
supercomputing: The output of a very large dataset to a central GFS
repository, followed by its examination and visualization at
several sites, some of which may not have the resources to ingest
the dataset whole. The Enzo application Writes on the order of a
Terabyte per hour: enough for 30Gb/s TeraGrid connection With the
post processing visualization they could check how quickly the GFS
could provide data in a scenario. Ran at SDSC, writing its output
directly the GPFS disks in Pittsburgh.
Slide 21
Han S KimConcurrent Systems Architecture Group 4. True Grid
Prototype: SC04 - Prototype Grid Supercomputing at SC04 30Gb/s
40Gb/s
Slide 22
Han S KimConcurrent Systems Architecture Group 4. True Grid
Prototype: SC04 - Transfer Rates The aggregate performance: 24Gb/s
The momentary peak: over 27Gb/s The rates were remarkably constant.
Three 10Gb/s connections between the show floor and the TeraGrid
backbone
Slide 23
Han S KimConcurrent Systems Architecture Group V V Production
Facility: 2005
Slide 24
Han S KimConcurrent Systems Architecture Group 5. Production
Facility: 2005 - The needs for Large Disk By this time, the size of
datasets had become large. The NVO dataset was 50 Terabytes per
location, which was a noticeable strain on storage resources. If a
single, central, site could maintain the dataset this would be
extremely helpful to all the sites who could access it in an
efficient manner. Therefore, a very large amount of spinning disk
would be required. Approximately 0.5 Petabytes of Serial ATA disk
drives was acquired by SDSC.
Slide 25
Han S KimConcurrent Systems Architecture Group 5. Production
Facility: 2005 - Network Organization.5 Petabyte FastT100 Disk
NCSA, ANL The Network Shared Disk server 64 two-way IBM IA64
systems with a single GbE interface and Fibre Channel 2Gb/s Host
Bus Adapter The disks are 32 IBM FastT100 DS4100 RAID systems with
67 250GB drivers in each. The total raw storage is 32 x 67 x 250GB
= 536 TB
Slide 26
Han S KimConcurrent Systems Architecture Group 5. Production
Facility: 2005 - Serial ATA Disk Arrangement 2 Gb/s FC connection
8+P RAID
Slide 27
Han S KimConcurrent Systems Architecture Group The Number of
Remote Nodes 5. Production Facility: 2005 - Performance Scaling
Maximum of almost 6GB/s out of theoretical maximum of 8GB/s
Slide 28
Han S KimConcurrent Systems Architecture Group 5. Production
Facility: 2005 - Performance Scaling The observed discrepancy
between read and write rates is not yet understood However, the
dominant usage of the GFS is to be remote reads.
Slide 29
Han S KimConcurrent Systems Architecture Group VI Future
Work
Slide 30
Han S KimConcurrent Systems Architecture Group 6. Future Work
Next year (2006), the authors hope to connect to the DEISA
computational Grid in Europe which is planning a similar approach
to Grid computing, allowing them to unite the TeraGrid and DEISA
Global File Systems in a multi-continent system. The key
contribution of this approach is a paradigm. At least in the
supercomputing regime, data movement and access mechanisms will be
the most important delivered capability of Grid computing,
outweighing even the sharing or combination of compute
resources.
Slide 31
Han S KimConcurrent Systems Architecture Group Thank you !