Top Banner

Click here to load reader

of 25

Stork: State of the Art

Dec 30, 2015

Download

Documents

zena-odonnell

Stork: State of the Art. Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison [email protected] http://www.cs.wisc.edu/condor/stork. The Imminent Data “deluge”. Exponential growth of scientific data 2000 : ~0.5 Petabyte 2005 : ~10 Petabytes - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Stork: State of the ArtTevfik KosarComputer Sciences DepartmentUniversity of [email protected]://www.cs.wisc.edu/condor/stork

  • The Imminent Data delugeExponential growth of scientific data2000 : ~0.5 Petabyte2005 : ~10 Petabytes2010 : ~100 Petabytes2015 : ~1000 Petabytes

    I am terrified by terabytes -- AnonymousI am petrified by petabytes -- Jim Gray

    Moores Law outpaced by growth of scientific data!

  • 500 TB/year2-3 PB/year11 PB/year20 TB - 1 PB/year

  • How to access and process distributed data?TBTBPBPB

  • I/O Management in the History

  • I/O Management in the History

  • I/O Management in the History

  • I/O SUBSYSTEMDISTRIBUTED SYSTEMS LEVELI/O Management in the History

  • Release input spaceRelease output spaceAllocate space for input & output dataIndividual JobsJOB jgetputRelease input spaceRelease output spaceAllocate space for input & output data

  • Separation of Jobs

    Data A A.storkData B B.storkJob C C.condor..Parent A child BParent B child CParent C child D, E..

    DAG specification

  • Stork: Data Placement SchedulerFirst scheduler specialized for data movement/placement.De-couples data placement from computation.Understands the characteristics and semantics of data placement jobs.Can make smart scheduling decisions for reliable and efficient data placement.A prototype is already implemented and deployed at several sites.Now distributed with Condor Developers Release v6.7.6http://www.cs.wisc.edu/condor/stork

  • Support for HeterogeneityProvides uniform access to different data storage systems and transfer protocols.Acts as an IOCS for distributed systems.Multilevel Policy Support

    Protocol translation:

    using Stork Disk Cacheusing Stork Memory Buffer[ICDCS04]

  • Dynamic Protocol Selection[ dap_type = transfer; src_url = drouter://slic04.sdsc.edu/tmp/test.dat; dest_url = drouter://quest2.ncsa.uiuc.edu/tmp/test.dat; alt_protocols = gsiftp-gsiftp, nest-nest; or: src_url = any://slic04.sdsc.edu/tmp/test.dat; dest_url = any://quest2.ncsa.uiuc.edu/tmp/test.dat;]DiskRouter crashesDiskRouter resumesTraditional Scheduler:48 Mb/s Using Stork: 72 Mb/s[ICDCS04]

  • Run-time Auto-tuning[link = slic04.sdsc.edu quest2.ncsa.uiuc.edu; protocol = gsiftp;

    bs = 1024KB;// I/O block sizetcp_bs = 1024KB;// TCP buffer sizep= 4; // number of parallel streams]Traditional Scheduler (without tuning)0.5 MB/s Using Stork (with tuning) 10 MB/s[AGridM03]

  • Controlling ThroughputIncreasing concurrency/parallelism does not always in crease transfer rateEffect on local area and wide are is differentConcurrency and parallelism have slightly different impacts on transfer rate

    Wide AreaLocal Area[Europar04]

  • Controlling CPU Utilization Concurrency and parallelism have totally opposite impacts on CPU utilization at the server side.

    ClientServer[Europar04]

  • Detecting and Classifying FailuresCheck DNS ServerCheck DNSCheck NetworkSFCheck ProtocolCheck HostDNS Server errorNo DNS entryNetwork OutageHost DownProtocol UnavailableFFFFSSSTest TransferTransfer FailedSFCheck CredentialsNot AuthenticatedFCheck FileSource File Does Not ExistFSSSTransientPermanentTransientTransientTransientPermanentPermanent[Grid04]

  • Detecting Hanging TransfersCollecting job execution time statisticsFit a distributionDetect and avoid black holeshanging transfersEg. for normal distribution:99.7% of job execution times should lie between [(avg-3*stdev), (avg+3*stdev)][Cluster04]

  • Stork can also:Allocate/de-allocate (optical) network linksAllocate/de-allocate storage spaceRegister/un-register files to Meta Data CatalogLocate physical location of a logical file nameControl concurrency levels on storage servers

    You can refer to [ICDCS04][JPDC05][AGridM03]

  • Apply to Real Life Applications

  • DPOSS Astronomy Pipeline

  • Failure RecoveryUniTree not respondingDiskrouter reconfigured and restartedSDSC cache reboot & UW CS Network outageSoftware problem

  • End-to-end Processing of 3 TB DPOSS Astronomy DataTraditional Scheduler:2 weeks

    Using Stork:6 days

  • SummaryStork provides solutions for the data placement needs of the Grid community.It is ready to fly!Now distributed with Condor developers release v6.7.6.All basic features you will need are included in the initial release.More features coming in the future releases.

  • Thank you for listening..

    Questions?