Page 1
www. chameleoncloud.org
FEBRUARY 5, 2016 1
CHAMELEON:BUILDINGARECONFIGURABLEEXPERIMENTALTESTBEDFORLARGE-SCALECLOUDRESEARCH
Pierre Riteau, Chameleon Lead DevOps Engineer [email protected]
Grid’5000 Winter School 2016 February 5, 2016 Grenoble, France
Page 2
www. chameleoncloud.org
TOAVOIDANYMISUNDERSTANDINGS
Page 3
www. chameleoncloud.org
CHAMELEONDESIGNSTRATEGY� Large-scale:“BigData,BigCompute,BigInstrumentresearch”
� ~650nodes(~14,500cores),5PBdiskovertwosites,2sitesconnectedwith100Gnetwork
� Reconfigurable:“Ascloseaspossibletohavingitinyourlab”� Baremetalreconfigura[on,operatedasasingleinstrument� Supportforrepeatableandreproducibleexperiments
� Connected:“Onestopshoppingforexperimentalneeds”� WorkloadandTraceArchive� Partnershipswithproduc[onclouds:CERN,OSDC,Rackspace,Google,andothers
� Partnershipswithusers� Complementary:“Can’tdoeverythingourselves”
� Complemen[ngGENI,Grid’5000,andotherexperimentaltestbeds� Sustainable:“Easytomaintain,easytoshare”
Page 4
www. chameleoncloud.org
CHAMELEONHARDWARE
SCUsconnecttocoreandfullyconnectedtoeachother
HeterogeneousCloudUnits
AlternateProcessorsandNetworks
SwitchStandardCloudUnit42compute4storagex10
Chicago
To UTSA, GENI, Future Partners
Aus,nChameleonCoreNetwork
100Gbpsuplinkpublicnetwork(eachsite)
CoreServices3.6PBCentralFileSystems,FrontEndandDataMovers
CoreServicesFrontEndandData
MoverNodes 504x86ComputeServers48Dist.StorageServers102HeterogeneousServers16MgmtandStorageNodes
SwitchStandardCloudUnit42compute4storagex2
Page 5
www. chameleoncloud.org
CHAMELEONHARDWARE� StandardCloudUnits(SCU)(deployed)
� Eachofthe12StandardCloudUnitsisasingle48Urack� 42DellR630computeservers,eachwithdual-socketIntelXeon(Haswell)processors(12cores,24threads)and128GBofRAM
� 4DellFX2storageservers,eachwithaconnectedJBODarrayof162TBdrives(totalof128TBperSCU),2x10cores,and64GBofRAM
� Alloca[onscanbeanen[reSCU,mul[pleSCUs,orwithinasingleSCU,oracrossSCUs(e.g.,storageserversforHadoopconfigura[ons)
� 48portForce10S6000OpenFlow-enabledswitches10Gbtohosts,40GbuplinkstoChameleoncorenetwork
� Connectx3InfinibandnetworkinonerackatTACC� Sharedinfrastructure(deployed)
� 3.6PBglobalstorage,100GbInternetconnec[onbetweensites� HeterogeneousCloudUnits(tobeprocuredinY2)
� ARMmicroservers,Atommicroservers,SSDs,GPUs,FPGAs
Page 6
www. chameleoncloud.org
CAPABILITIESANDSUPPORTEDRESEARCH
Virtualiza[ontechnology(e.g.,SR-IOV,accelerators),systems,networking,infrastructure-levelresourcemanagement,etc.
Repeatableexperimentsinnewmodels,algorithms,plaeorms,auto-scaling,high-availability,cloudfedera[on,etc.
Developmentofnewmodels,algorithms,plaeorms,auto-scalingHA,etc.,innova[veapplica[onandeduca[onaluses
Isolatedpar,,on,fullbaremetalreconfigura,on
Isolatedpar,,on,ChameleonAppliances
Persistent,reliable,sharedclouds
Page 7
www. chameleoncloud.org
IMPLEMENTINGTHEEXPERIMENTALWORKFLOW
discover resources
provision resources
configure and interact monitor
- Fine-grained - Complete - Up-to-date - Versioned - Verifiable
- Advance reservations & on-demand - Fine-grained allocations - Isolation
- Bare metal - Deeply reconfigurable - Multiple appliances to a lease - Snapshotting - Complex Appliances
- Hardware metrics - Fine-grained information - Aggregate and archive
Page 8
www. chameleoncloud.org
Page 9
www. chameleoncloud.org
BUILDINGATESTBEDFROMSCRATCH
� Requirements(proposalstage)� Architecture(projectstart)� TechnologyEvalua[onandRiskAnalysis
� Manyop[ons:G5K,Nimbus,LosF,OpenStack� Sustainabilityasdesigncriterion:canaCStestbedbebuiltfromcommoditycomponents?
� Technologyevalua[on:Grid’5000andOpenStack� Architecture-basedanalysisandimplementa[onproposals
� CHI=OpenStack+Grid’5000+specialsauce
Page 10
www. chameleoncloud.org
CHI:DISCOVERINGANDVERIFYINGRESOURCES� Fine-grained,up-to-date,andcompleterepresenta[on� Bothmachineparsableanduserfriendlyrepresenta[ons� Testbedversioning
� “WhatwasthedriveonthenodesIused6monthsago?”� Dynamicallyverifiable
� Doesrealitycorrespondtodescrip[on?(e.g.,failurehandling)� Grid’5000registrytoolkit+ChameleonportalUI
� Automatedresourcedescrip[on,automatedexporttoRM/Blazar� g5k-checks(renamedcc-checksforconsistency)
� Canberunamerboot,acquiresinforma[onandcomparesitwithresourcecatalogdescrip[on
Page 11
www. chameleoncloud.org
v1
Page 12
www. chameleoncloud.org
v1
v2
Page 13
www. chameleoncloud.org
CHI:PROVISIONINGRESOURCES� Resourceleases� Advancereserva[ons(AR)andon-demand
� ARfacilitatesalloca[ngatlargescale� Fine-grainalloca[onofarangeofresources
� Differentnodetypes,switches,etc.� Isola[onbetweenexperiments� Futureextensions:matchmaking,testbedalloca[onmanagement
� OpenStackNova/Blazar,contribu[onstoBlazar� ExtensionstosupportGanochartdisplaysandotherfeatures
Page 14
www. chameleoncloud.org
CHI:CONFIGUREANDINTERACT� BareMetal� Allowdeepreconfigurability(accesstoconsole)� Mapmul[pleappliancestoalease� Snapshopngforimagesharing� Efficientappliancedeployment� Handlecomplexappliances
� Virtualclusters,cloudinstalla[ons,etc.� Interact:shapeexperimentalcondi[ons
� OpenStackIronic,Glance,anduser-data/meta-data
Page 15
www. chameleoncloud.org
CHI:INSTRUMENTATIONANDMONITORING
� Enablesuserstounderstandwhathappensduringtheexperiment
� Instrumenta[on:high-resolu[onmetrics� Typesofmonitoring:
� Infrastructuremonitoring(e.g.,PDUs)� Userresourcemonitoring� Customusermetrics
� Aggrega[onandArchival� Easilyexportdataforspecificexperiments
� OpenStackCeilometer+custommetrics
Page 16
www. chameleoncloud.org
CHI:OVERALLARCHITECTURE
Portal Identity
Management Resource discovery
Grid’5000 Reference
API
Reservation Service (Blazar)
Horizon
Keystone
Nova
Ironic
Neutron
Ceilometer
Glance
special sauce
Custom development
OpenStack
Page 17
www. chameleoncloud.org
HOWDOESITWORKINTERNALLY?Chameleon
user Blazar
R1 R2 Reservations
Reserve resources
Nova
P1 P2 Resource pools
freepool
Create dedicated resource pool
(host aggregate)
Page 18
www. chameleoncloud.org
HOWDOESITWORKINTERNALLY?Chameleon
user Blazar
R1 R2 Reservations
Reserve resources
Nova
P1 P2 Resource pools
freepool
Create dedicated resource pool
(host aggregate)
Launch bare-metal instances in reservation
Ironic
Schedule then request bare-metal
deployment
Cluster Control & provision (IPMI / PXE / iSCSI)
Page 19
www. chameleoncloud.org
DEVELOPEDINTHEOPEN
� hops://github.com/ChameleonCloud
� OpenStackpatches,Grid’5000g5k-checkspatches� Userportal,resourcediscovery,Horizonextensions� Testbedconfigura[onwithPuppet(notyetopen)
� AimistoprovideaChameleon-in-a-box!
Page 20
www. chameleoncloud.org
CHAMELEONTIMELINEANDSTATUS� 10/2014:Projectstarts� 12/2014:FutureGrid@Chameleon(OpenStackKVM)� 04/2015:ChameleonTechnologyPreviewonFutureGridhardware
� 06/2015:ChameleonEarlyUseronnewhardware� 07/2015:ChameleonPublicavailability(baremetal)� 09/2015:ChameleonKVMOpenStackcloudavailable� 10/2015:InteroperabilitywithGENI(1stphase)� Today:600+users/150+projects� 2016:Heterogeneoushardwareavailable
Page 21
www. chameleoncloud.org
INTHEPIPELINE…� Y1themewas“makingthingspossible”:focusoninfrastructure� Y2themeis“frompossibletoeasy”:focusonusers� Outreach:webinars,tutorials,userstories� Experimentmanagement
� Appliances:snapshopng,sharing,appliancemarketplace,community� ExperimentBlueprint:automa[onandpreserva[on
� Func[onality:frompossibletoeasy� Beoerreconfigura[oncapabili[es� Beoernetworkingcapabili[es� Beoerinfrastructuremonitoring(PDUs,etc.)� Andothers
Page 22
www. chameleoncloud.org
Page 23
www. chameleoncloud.org
OPENSTACK:LESSONSLEARNED
� Opera[ngOpenStackcanbedifficult� Forgetabouttradi[onalUNIXadmin:evenbaremetalneedsOVSandIPnamespaces� Thousandsofconfigura[onswitches,manywithlioledocumenta[on� Mustreadthecode!� Inter-dependentcomponentsèchecksalllogswithdebugenabled
� UpstreamdevelopmentmostlydoneonKVM� Lesstes[ngofIronicèbugs
� Lotsofexperimentalprojectswithlioleupstreamsupport� WewereluckyascommunityinterestedinrevivingBlazar
� Donotputtoomuchhopeinblueprints� Manyabandonedordelayedformul[plereleases
� Wheretofindhelpandpossiblefixes?� bugs.launchpad.net(bugreports)/review.openstack.org(patches)� MostdevelopersavailableonIRC
Page 24
www. chameleoncloud.org
VIRTUALIZATIONORCONTAINERIZATION?
� YuyuZhou,UniversityofPiosburgh� Research:lightweightvirtualiza[on� Testbedrequirements:
� Baremetalreconfigura[on� Bootfromcustomkernel� Consoleaccess� Up-to-datehardware� Largescaleexperiments
SC15 Poster: “Comparison of Virtualization and Containerization Techniques for HPC”
Page 25
www. chameleoncloud.org
TEACHINGCLOUDCOMPUTING� NiravMerchantandEricLyons,UniversityofArizona
� ACIC2015:project-basedlearningcourse� Dataminingtofindexoplanets� ScaledanalysispipelinebyJaredMales� DevelopaVM/workflowmanagement
applianceandbestprac[cethatcanbesharedwithbroadercommunity
� Testbedrequirements:� EasytouseIaaS/KVMinstalla[on� Minimalstartup[me� Supportdistributedworkers� Blockstore:makecopiesofmany100GB
datasets
Page 26
www. chameleoncloud.org
DEFENDINGCOMPUTINGRESOURCES� LedbyJessieWalker,UniversityofArkansasatPineBluff
� Workingondetec[ngcyberaoacks� Modelandvisualizemul[-stage
intrusionaoacks(MAS)� CreatecustomSnortrulestomonitor
trafficanddetectaoacks� Complexandexpensivetobuyandusetheirownhardware
� Limitedbypermissionsneededtoruncybersecurityaoacksinsidecampuses
� Testbedrequirements:� Virtualmachinestosimulateaoacksin
thecloudandrunintrusiondetec[onsystems
Page 27
www. chameleoncloud.org
PARTINGTHOUGHTS� FromvisiontorealitywithExpressDelivery
� Builtfromscratchwithinayearonashoestring� Thankstoexperiencefromothertestbeds,esp.Grid’5000
� Thankstoopen-sourcecodefromotherprojects,esp.OpenStackandGrid’5000
� Opera[onaltestbed:600+users/150+projects� Federa[on
� OngoingeffortswithGENI� Grid’5000too?
Page 28
www. chameleoncloud.org
CHAMELEONTEAMKate Keahey
Chameleon PI Science Director
Architect University of Chicago
Joe Mambretti Programmable networks Federation activities Northwestern University
Dan Stanzione Facilities Director
TACC
Pierre Riteau DevOps Lead University of Chicago
Paul Rad Industry Liaison
Education and training UTSA
DK Panda High-perf networking Ohio State University
Page 29
www. chameleoncloud.org
COMEANDWORKWITHUS!
� Asacollaborator� Generalizingresults:whatwouldKameleonorDISTEMlooklikeintheChameleoncontext?
� AlsoprojectsinresourcemanagementforHPC&Cloud,elas[cscalingplaeorm
� Summerinternshipopportuni[es
� Asaco-worker� Programmingpostdocorresearchingprogrammer