Top Banner
Welcome to the PVFS BOF! Rob Ross, Rob Latham, Neill Miller Argonne National Laboratory Walt Ligon, Phil Carns Clemson University
13

Welcome to the PVFS BOF!

Feb 05, 2016

Download

Documents

shino

Welcome to the PVFS BOF!. Rob Ross, Rob Latham, Neill Miller Argonne National Laboratory Walt Ligon, Phil Carns Clemson University. An interesting year for PFSs. At least three Linux parallel file systems out there and in use now PVFS GPFS Lustre - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Welcome to the PVFS BOF!

Welcome to the PVFS BOF!

Rob Ross, Rob Latham, Neill Miller

Argonne National Laboratory

Walt Ligon, Phil CarnsClemson University

Page 2: Welcome to the PVFS BOF!

An interesting year for PFSs

• At least three Linux parallel file systems out there and in use now– PVFS– GPFS– Lustre

• Many “cluster file systems” also available• Verdict is still out on what of these are useful

for what applications• Now we’re going to complicate things by

adding another option :)

Page 3: Welcome to the PVFS BOF!

Goals for PVFS and PVFS2

• Focusing on parallel,scientific applications

• Expect use of MPI-IOand high-level libraries

• Providing our solution to a wide audience– Ease of install, configuration,

administration

• Handling and surviving faults is a new area of effort for us

High-level I/O LibraryHigh-level I/O Library

MPI-IO LibraryMPI-IO Library

Parallel File SystemParallel File System

I/O HardwareI/O Hardware

ApplicationApplication

Page 4: Welcome to the PVFS BOF!

Outline

• Shorter discussion of PVFS (PVFS1)– Status– Future

• Longer discussion of PVFS2– Goals– Architecture– Status– Future

• Leave a lot of time for questions!

Page 5: Welcome to the PVFS BOF!

PVFS1 Status

• Version 1.6.1 released in the last week– Experimental support for symlinks

(finally!)– Bug fixes (of course)– Performance optimizations in stat path

- Faster ls- Thanks to Axciom guys for this and other

patches

Page 6: Welcome to the PVFS BOF!

PVFS1 Future

• Community support has been amazing• We will continue to support PVFS1

– Bug fixes– Integrating patches– Dealing with RedHat’s changes

• We won’t be making any more radical changes to PVFS1– It is already stable– Maintain it as a viable option for production

Page 7: Welcome to the PVFS BOF!

Fault Tolerance!

• Yes, we do actually care about this• No, it’s not easy to just do RAID

between servers– If you want to talk about this, please ask

me offline

• Instead we’ve been working with Dell to implement failover solutions– Requires more expensive hardware– Maintains high performance

Page 8: Welcome to the PVFS BOF!

Dell Approach

• Failover protection of PVFS meta server, and/or

• Failover protection of I/O nodes• Using RH AS 2.1 Cluster Manager framework

– Floating IP address– Mechanism for failure detection (heartbeats)– Shared storage and quorum– Mechanism for I/O fencing (power switches,

watchdog timers)– Hooks for custom service restart scripts

Page 9: Welcome to the PVFS BOF!

Metadata Server Protection

Dell PowerVault 220

Communication Network

Clients

Dell PowerEdge

2650

Dell PowerEdge

2650 Heartbeat ChannelActive

Meta Server

SharedStorage

StandbyMeta Server

. . .

IO Nodes

Dell PE 2650

HA Service

Meta Server IP

Page 10: Welcome to the PVFS BOF!

Metadata and I/O Failover

FC

Communication NetworkClien

ts

Dell PE 2650Dell PE 2650

HeartbeatChannelActive

Meta Server

Shared FCStorage

StandbyMeta Server

. . .

Dell|EMC CX600

Dell PE 2650 Dell PE 2650 Dell PE 2650

Active IO Nodes

HA ServiceHA Service

Page 11: Welcome to the PVFS BOF!

The Last Failover Slide

• These solutions are more expensive than local disks

• Many users have backup solutions that allow the PFS to be a scratch space

• We’ll leave it to users (or procurers?) to decide what is necessary at a site

• We’ll be continuing to work with Dell on this to document the process and configuration

Page 12: Welcome to the PVFS BOF!

PVFS in the Field

• Many sites have PVFS in deployment– Largest known deployment is the CalTech Teragrid

installation- 70+ Terabyte file system

– Also used in industry (e.g. Axciom)

• Lots of papers lately too!– CCGrid, Cluster2003– Researchers modifying and augmenting PVFS,

comparing to PVFS

• You can buy PVFS systems from vendors– Dell, HP, Linux Networx, others

Page 13: Welcome to the PVFS BOF!

End of PVFS1 Discussion

Questions on PVFS1?