INDIGO DataCloud DESY Cloud, The Scientific Data Cloud Managed Shared Storage At the “ownCloud Connects Business” workshop Dr. Patrick Fuhrmann Quirin Buchholz Tigran Mkrtchyan Peter van der Reest Lusine Yakovleva
INDIGO DataCloud
DESY Cloud,The Scientific Data CloudManagedSharedStorageAtthe“ownCloud ConnectsBusiness”workshop
Dr.PatrickFuhrmannQuirin BuchholzTigranMkrtchyanPetervanderReestLusine Yakovleva
June1,2016,Frankfurt,PatrickFuhrmannetal. 2TheScientificDataCloud@ownCloud ConnectsBusiness
Content
• Storage@DESY?• Sync’n ShareatDESY
• Motivation• Requirements• Implementation• Setup
• RequirementsfromScienceCommunities.• dCache forDummies.• TheownCloud– dCache Hybridsystem• Summaryandoutlook.
June1,2016,Frankfurt,PatrickFuhrmannetal. 3TheScientificDataCloud@ownCloud ConnectsBusiness
Storage@DESY
• PetraIII[Tier0](2012…)• SynchrotronRadiation
• 14Beamlines• BeamlineGuestScientists
• 1PB/year– 5PB/year
• European[Tier0]XFEL (2017…)• 3.4Km(Linear)• 2017(Firstbeamline)
• BeamlineGuestScientists• 10– 100PB/year
• HERA[Tier0](1992– 2007)• Particleaccelerator(Proton– Electron)
• 6.3Km(Ring)• Somehundredscientists
• 5PBintotal
• LCG[WLCGTier2](2008,2009 …)• Particleaccelerator(Proton– Proton)• 26.7(Ring)
• About10.000scientist• 15PB/year
2020100PBytes
1992
June1,2016,Frankfurt,PatrickFuhrmannetal. 4TheScientificDataCloud@ownCloud ConnectsBusiness
MorestorageatDESY
•TheDESYdatamanagementteamhasquitesomeexperienceinmanaginghugeamountsofdata.
• Incollaborationwithother‘bigdata’sites,weareprovidingadatamanagementsystem‘dCache’,deployedat70sitesaroundtheworld.
• Seelater.•So,whyarewerunningownCloud ?
June1,2016,Frankfurt,PatrickFuhrmannetal. 5TheScientificDataCloud@ownCloud ConnectsBusiness
Motivation
• DESYhasnoexperienceinsophisticateddatasharing.• DatasharingwasdoneinthetraditionalwaywithACL’sand’group’directories
• However:YoungscientistsstarttheircareersatUniversitiesandLab’swithSync’n Shareintheirblood.(DropBoxGeneration).
• PublicITdepartments,foraverylongtime,didn’tregardSync’n Shareasbeingtheirproblemasmanycommercialsolutionswerearound.
• ItessentiallybecameanissueafterSnowden.• LegalRequirement:Datahadtobestored‘onsite’oratleastinGermany
• Consequence:CCneededtoprovideSync’n Sharelikemechanisms.
June1,2016,Frankfurt,PatrickFuhrmannetal. 6TheScientificDataCloud@ownCloud ConnectsBusiness
Requirements
• Finegrainedsharingoffilesanddirectorieswithindividualsandgroups.
• SharingviaintuitiveWeb2.0mechanisms(AppsorBrowser)• Sharingwith‘thepublic’withorwithoutpasswordprotection• Sharingofspacetouploaddata.(protected)• Expirationofshares• Automaticbidirectionalsynchronizationofdatabetweenmobiledevicesandcentralrepository.
June1,2016,Frankfurt,PatrickFuhrmannetal. 7TheScientificDataCloud@ownCloud ConnectsBusiness
TypicalApplication
Your Cloud SpaceSync
Sync
File up and download
June1,2016,Frankfurt,PatrickFuhrmannetal. 8TheScientificDataCloud@ownCloud ConnectsBusiness
StepstakenbyDESY• Evaluatedpossiblesolutionsin2013.• DecidedtogoforownCloud
• Providesmostofthefeaturesneeded.• OpenSource• WasinusebymanyinstitutesandUniversitiesinGermany• UsedbycolleaguesatSURFSara (Amsterdam)andCERN
• Evaluationshowed:• VerygoodSync’n Sharefeature set• Verygoodinplanningahead(roadmap)• Plansforcrosssitefederatedaccess(nowinplace).• Abitweakindatamanagement
• StartedprototypeinstallationatDESYbeginningof2014
June1,2016,Frankfurt,PatrickFuhrmannetal. 9TheScientificDataCloud@ownCloud ConnectsBusiness
WhatshouldtheDESYSetuplooklike?
(ActuallywilllooklikeinJuly)
June1,2016,Frankfurt,PatrickFuhrmannetal. 10TheScientificDataCloud@ownCloud ConnectsBusiness
TheInfrastructure
AuthenticationKerberos
UserManagementRegistryLDAP
Monitoring
LocalandWide AreaNetworkLoadBalancing Firewalls
Virtualization
Accounting 8 UnlimitedPersistentStorage
June1,2016,Frankfurt,PatrickFuhrmannetal. 11TheScientificDataCloud@ownCloud ConnectsBusiness
Infrastructure Integration
PostgresDB
OwnCloud
OwnCloudOwnCloud
OwnCloud
F5,LoadBalancer
AutomaticFailover
June1,2016,Frankfurt,PatrickFuhrmannetal. 12TheScientificDataCloud@ownCloud ConnectsBusiness
MoreIntegration
DESYKerberos
OwnCloud
8UnlimitedCentral
Storage
DESYLDAPDataLifeCycle
Engine
June1,2016,Frankfurt,PatrickFuhrmannetal. 13TheScientificDataCloud@ownCloud ConnectsBusiness
PoolNode
PoolNode
PoolNode
PoolNode
PoolNode
PoolNode
200TBytesRAID6
200TBytesRAID6
200TBytesRAID6
Horizontally ScalingBackend
OwnCloud OwnCloud OwnCloud OwnCloud
NFS4.1/pNFS
WebLoadBalancer(F5)
June1,2016,Frankfurt,PatrickFuhrmannetal. 14TheScientificDataCloud@ownCloud ConnectsBusiness
SomeStatistics
Filesin/outin7days10.000
70.000Filesin/outperhour
Users Total 490
Users Active 277
SpaceAvailable 567TBytes
SpaceUsed 2*30TBytes
Files 10Millions
CurrentDefaultQualityTwoReplicasondifferentstoragenodes.
June1,2016,Frankfurt,PatrickFuhrmannetal. 15TheScientificDataCloud@ownCloud ConnectsBusiness
Isthatsufficient forscientists?
June1,2016,Frankfurt,PatrickFuhrmannetal. 16TheScientificDataCloud@ownCloud ConnectsBusiness
TypicalWorkflow
Derived PublicationRaw
Sharing
June1,2016,Frankfurt,PatrickFuhrmannetal. 17TheScientificDataCloud@ownCloud ConnectsBusiness
DataCategories
1TB
10- 100TB
1– 100PB Raw
Derived
Publication
LHCDetectordataRawX-RayImagesBrainScansReconstructed(Ntuples)PurifiedImagesBrainMaps
Papers,Presentations,Histograms
Amount Category TypicalApplication
June1,2016,Frankfurt,PatrickFuhrmannetal. 18TheScientificDataCloud@ownCloud ConnectsBusiness
Whatdoweneedtosupport ‘scienceworkflows’?
June1,2016,Frankfurt,PatrickFuhrmannetal. 19TheScientificDataCloud@ownCloud ConnectsBusiness
MoreRequirements
• Storagemustbemanageable:DefinedQoS andDataLifecycle• DifferenttypeofdatamusthavedifferentQoS attached,regardingaccesslatency(performance)anddatadurability(howsafeismydata?)
• SpinningDiskforstreaming• SSDforfastrandomaccess• Tapeforarchive• Multiplecopiesindifferentlocationsondifferentmediaforlongtermdatapreservation
• MovingdatabetweendifferentQoS typeshastobeperformed• w/oserviceinterruption• transparentlytotheuser• w/ochangesinthenamespace
June1,2016,Frankfurt,PatrickFuhrmannetal. 20TheScientificDataCloud@ownCloud ConnectsBusiness
QualityofService
Raw
LongTermPreservation(LegalRequirement)
Derived
SSD
LowLatency(HPC,Analysis)
Publication
SSD
Fast,MultiStreamAccess
June1,2016,Frankfurt,PatrickFuhrmannetal. 21TheScientificDataCloud@ownCloud ConnectsBusiness
EvenmoreRequirements
• Differentaccessprotocolsfordifferentapplications• POSIXMountedFS(nfs4.1/pNFS) forfastanalysis• FTPdialects(gridFTP) forwideareatransferswithGLOBUS,WLCG-FTS• http/WebDAVmostlyforbrowserbasedapplications,visualization,..
• Differentauthenticationmechanismmustbeavailable.• Username/passwordforwebapplications• SAMLtosupporttraditionalIdP’s• OpenIDConnectforgoogle/facebook likeIdP’s• CertificatesforhttpsorGRIDapplications
• Differentcredentialsmustbemap-abletothesameidentity.
June1,2016,Frankfurt,PatrickFuhrmannetal. 22TheScientificDataCloud@ownCloud ConnectsBusiness
ScientificDataCloud
HighSpeedDataIngest
FastAnalysisNFS4.1/pNFS
WideAreaTransfers(Globus Online,FTS)byGridFTP
Sync’ing andSharingwith OwnCloud
June1,2016,Frankfurt,PatrickFuhrmannetal. 23TheScientificDataCloud@ownCloud ConnectsBusiness
Whatwouldthatlooklikefromtheuser’sperspective?
June1,2016,Frankfurt,PatrickFuhrmannetal. 24TheScientificDataCloud@ownCloud ConnectsBusiness
MyDESYXXLHomeQoS support
Patrick’shome
June1,2016,Frankfurt,PatrickFuhrmannetal. 25TheScientificDataCloud@ownCloud ConnectsBusiness
MyDESYXXLHomeProtocolSupport
MultiProtocolNFS4.1/pNFS
GridFTPWebDAVSRM
MyownCloud Home SyncShare
Web2.0ownCloud
June1,2016,Frankfurt,PatrickFuhrmannetal. 26TheScientificDataCloud@ownCloud ConnectsBusiness
Howdoweachievethosegoals?
ORChoosingdCache asthestoragebackendfor
ownCloud !
Thescientificdatacloud
June1,2016,Frankfurt,PatrickFuhrmannetal. 27TheScientificDataCloud@ownCloud ConnectsBusiness
SideTrack
What’sdCache ?
June1,2016,Frankfurt,PatrickFuhrmannetal. 28TheScientificDataCloud@ownCloud ConnectsBusiness
dCache inanutshell (cont.)
• Started2000’• Internationalcollaboration(DESY,FERMIlab,NDGF)• About10members:developers,deployment,support,management• Softwaredeployedatabout70sitesEurope,US,Asia,Russia• Largestdeploymentsintheorderof20PBytes ontapeanddisk.• Totalstoragecloseto200PBytes.• Geographicallylargestinstallationspans4countries.• LargelyfundedbyINDIGO-DataCloud,DESY,FERMIlab andNDGF
INDIGO DataCloud
June1,2016,Frankfurt,PatrickFuhrmannetal. 29TheScientificDataCloud@ownCloud ConnectsBusiness
dCache Design
MediaTransferEngineandPoolManagement dCache
Automaticand
ManualMedia
transition
Virtual file-systemnamespaceLayerProtocoland Authentication Engines
gridFTPNFS/pNFS httpWebDAV
SSDs
SpinningDisks
Tape, BlueRay…
June1,2016,Frankfurt,PatrickFuhrmannetal. 30TheScientificDataCloud@ownCloud ConnectsBusiness
NamespaceDesign
NameSpace PhysicalStorage
Disk
Tape
ExternalSystem
LocationManager
Name
Disk1
Disk2
Tape1
June1,2016,Frankfurt,PatrickFuhrmannetal. 31TheScientificDataCloud@ownCloud ConnectsBusiness
DesignConsequence
• Filesarestoredasobjectsonvariousdataback-ends• RandomDevices :Harddisk,SSD• RemovableMedia:Tape• Objectstores:CEPH
• Back-endscanbehighlydistributed(evenbeyondcountries).• TheFilenamespaceengineisindependentofthedatastorageitself.• Internalandexternalservicescanmovedataaroundw/oserviceinterruption.
June1,2016,Frankfurt,PatrickFuhrmannetal. 32TheScientificDataCloud@ownCloud ConnectsBusiness
dCache Featuressupporting ourideaofascientificdatacloud
• MultiProtocolSupport(TransferandAuthentication)• Transferprotocols:NFS/pNFS,http,WebDAV• MultiAuthenticationCredentialsupport(OpenIDConnect,Kerberos,passwd)
• SophisticatedDataManagement• MultiMediasupport(Tape,SpinningDisk,SSD,…)• Automaticandmanualmediatransitions• Addingandremovingdatanodesw/oserviceinterruption• Automaticreplicamanagement
• Enforcesn<x<mcopiesofdatafiles.• Externalstoragesupport(e.g.Tapesystems:TSM,HPSS,OSM,DMF)
June1,2016,Frankfurt,PatrickFuhrmannetal. 33TheScientificDataCloud@ownCloud ConnectsBusiness
Inparticular :TheQoS Interface
June1,2016,Frankfurt,PatrickFuhrmannetal. 34TheScientificDataCloud@ownCloud ConnectsBusiness
dCache QoS Interfaces
WebService
CDMIService
Cloud
dCache
QoSModule
RESTful
June1,2016,Frankfurt,PatrickFuhrmannetal. 35TheScientificDataCloud@ownCloud ConnectsBusiness
TheQoS WebInterface
DISK TAPE
Click,togetFilebackfromTape.
June1,2016,Frankfurt,PatrickFuhrmannetal. 36TheScientificDataCloud@ownCloud ConnectsBusiness
Puttingpiecestogether
June1,2016,Frankfurt,PatrickFuhrmannetal. 37TheScientificDataCloud@ownCloud ConnectsBusiness
TheDataPath
OwnCloud OwnCloud OwnCloud OwnCloud
NFS4.1/pNFS
WebLoadBalancer(F5)
SpinningDisks
SSD’s TAPE
dCache
June1,2016,Frankfurt,PatrickFuhrmannetal. 38TheScientificDataCloud@ownCloud ConnectsBusiness
FutureWorkTheNamespacePath
Namespace
NamespacedCache
SharingDB
ShareAPI
Namespace,Proxy
June1,2016,Frankfurt,PatrickFuhrmannetal. 39TheScientificDataCloud@ownCloud ConnectsBusiness
dCache – OwnCloud hybrid
• Datapathistheeasiestpart.Worksnicely.• Namespacesynchronizationis/wasverydifficult
• Importanttoletallprotocolsseesynchronizednamespace.• ownCloud didn’texpecttheunderlyingstoragesystemtochangenamespacetree.• Manuallytriggeredsynchronizationtooktoolong.• OwnCloud 9providesfirstattemptforanAPIforexternalnamespace.
• Exposing‘shares’toexternalcomponentnotyetinownCloud.• ImportanttoallowallprotocolstouseownCloud-definedshares.• Prerequisites:
• ownCloud :needsAPItoexpose‘shares’• dCache :needstohavea‘share’objectimplemented.
June1,2016,Frankfurt,PatrickFuhrmannetal. 40TheScientificDataCloud@ownCloud ConnectsBusiness
ownCloud andQoS
I/O(NFS)
ownCloud GUIWeb
dCacheNamespaceAPI
ShareAPI
QoSPluggin
(ServerSideApp)
QoSModule
RESTServices
June1,2016,Frankfurt,PatrickFuhrmannetal. 41TheScientificDataCloud@ownCloud ConnectsBusiness
Summary
• AnOwnCloud - dCache Hybridisaperfectsystemforprovidingmanagedsharedstoragetoscientists.
• Sync’n ShareisprovidedbyownCloud.• AccessprotocolsandAuthenticationMechanismsusedinscienceareprovidedbydCache.
• Unlimitedstoragespaces(viaremovablemedia,e.g.tape)• QualityofServicesupport
• automaticandmanualmediatransitions• Automaticreplicamanagementresultinginhighavailabilityanddatadurability.
• Reduceddowntimesduetotransparentdatamigration.
June1,2016,Frankfurt,PatrickFuhrmannetal. 42TheScientificDataCloud@ownCloud ConnectsBusiness
Outlook
• ThecurrentversionoftheownCloud-dCacheHybridsatisfiestheneedfor
• Sync’n Share• Highlyscalableandmanageableback-endstorage
• Forafullintegration• Thename-spacesofthetwosystemsneedtobesynchronized(OC9)• TheownCloud ‘shares’needtobeexposedtohavethemvisibleinallprotocols(nfs,gridFTP,…)
• WeneedtoprovideanownCloudpluggin(serversideapp)tomakethedCacheQoSstoragetypesvisibleinownCloud.