Clouds at CERN Tim Bell [email protected] Clouds at CERN Tim Bell [email protected] Academic Cloud Experiences, 29 th April 2013 Academic Cloud Experiences, 29 th April 2013 T. Bell 1
May 25, 2015
Clouds at CERNTim Bell
Clouds at CERNTim Bell
Academic Cloud Experiences, 29th April 2013Academic Cloud Experiences, 29th April 2013T. Bell 1
2
CERN was founded 1954: 12 European States“Science for Peace”
Today: 20 Member States
Member States: Austria, Belgium, Bulgaria, the Czech Republic, Denmark, Finland, France, Germany, Greece, Hungary, Italy, the Netherlands, Norway, Poland, Portugal, Slovakia, Spain, Sweden, Switzerland andthe United Kingdom Candidate for Accession: RomaniaAssociate Members in Pre-Stage to Membership: Israel, SerbiaApplicant States for Membership or Associate Membership:Brazil, Cyprus (awaiting ratification), Pakistan, Russia, Slovenia, Turkey, Ukraine Observers to Council: India, Japan, Russia, Turkey, United States of America;European Commission and UNESCO
Member States: Austria, Belgium, Bulgaria, the Czech Republic, Denmark, Finland, France, Germany, Greece, Hungary, Italy, the Netherlands, Norway, Poland, Portugal, Slovakia, Spain, Sweden, Switzerland andthe United Kingdom Candidate for Accession: RomaniaAssociate Members in Pre-Stage to Membership: Israel, SerbiaApplicant States for Membership or Associate Membership:Brazil, Cyprus (awaiting ratification), Pakistan, Russia, Slovenia, Turkey, Ukraine Observers to Council: India, Japan, Russia, Turkey, United States of America;European Commission and UNESCO
~ 2300 staff~ 1000 other paid personnel> 11000 usersBudget (2013) ~1000 MCHF
~ 2300 staff~ 1000 other paid personnel> 11000 usersBudget (2013) ~1000 MCHF
T. Bell 2
T. Bell 3
Is the Higgs boson the source of mass of our fundamental particles?
T. Bell 4
Why is the universe made of matter
and not equal amounts of matter/antimatter?
T. Bell 5
Dark Matter and Dark Energy?
TTWe do not know the
composition of 95% of the universe
Temperature of the universeWMAP satellite
T. Bell 6
Blue tubes contain the two beam pipes and magnets at 1.8 degrees Kelvin
T. Bell 7
ATLAS detector during construction in 2005
T. Bell 8
Number of candidates (vertical axis)
Mass of the candidates(horizontal axis)
We observe an excess of candidates with a mass of 125 proton-
masses
Search for Higgs decays to 4 “leptons” (electrons or muons)
Also observed in the CMS experiment
T. Bell 9
July 4, 2012
The Worldwide LHC Computing Grid
Tier-1: permanent storage, re-processing, analysis
Tier-1: permanent storage, re-processing, analysis
Tier-0 (CERN): data recording, reconstruction and distribution
Tier-0 (CERN): data recording, reconstruction and distribution
Tier-2: Simulation,end-user analysisTier-2: Simulation,end-user analysis
> 2 million jobs/day> 2 million jobs/day
~250’000 cores~250’000 cores
173 PB of storage173 PB of storage
nearly 160 sites, 35 countries
nearly 160 sites, 35 countries
10 Gb links10 Gb links
Tier-1: permanent storage, re-processing, analysis
Tier-0 (CERN): data recording, reconstruction and distribution
Tier-2: Simulation,end-user analysis
> 2 million jobs/day
~250’000 cores
173 PB of storage
nearly 160 sites, 35 countries
10 Gb links
WLCG:An International collaboration to distribute and analyse LHC data
Integrates computer centres worldwide that provide computing and storage resource into a single infrastructure accessible by all LHC physicists
WLCG:An International collaboration to distribute and analyse LHC data
Integrates computer centres worldwide that provide computing and storage resource into a single infrastructure accessible by all LHC physicistsT. Bell 10
IT Infrastructure Challenges
Staff numbers fixed Materials budget decreasing Increasing users of CERN’s facilities Legacy tools are high maintenance and brittle Additional data centre in Budapest now online
doubling potential capacity and 200GBit/s network
How do we scale from our current 11,000 servers within these constraints ?
T. Bell 11
Approach
Remodel IT services on Cloud layered models IaaS, PaaS, SaaS
Move to commonly used open source tools Puppet,OpenStack,Foreman,Koji,Oz,Kibana, …
Implement clouds at scale IT aims for 15,000 hypervisors with 150,000 VMs
by 2015 Exploit ecosystem solutions such as LBaaS,
DBaaS, MQaaS rather than build our own
T. Bell 12
Clouds in High Energy Physics
T. Bell 13
Long-term preservation of software and data of
HEP experiments
Utilize special computing resources
attached to the detectors
Simplify the management of heterogeneous in-
house resources
Use commercial clouds for exceptional
computing demands
Distributed cloud computing using HEP and non-HEP clouds
Service Models
T. Bell 14
Pets are given names like pussinboots.cern.ch
They are unique, lovingly hand raised and cared for
When they get ill, you nurse them back to health
Cattle are given numbers like vm0042.cern.ch
They are almost identical to other cattle When they get ill, you get another one
Future application architectures tend towards Cattle but Pet support is needed for some specific zones of the cloud
Refine Service Levels ?
T. Bell 15
Hippos are cattle with bulk storage. Useful where Cassandra or MongoDBensures redundancy
Canaries are cattle at high risk to give early warning of failures .. Deploy early, fail fast and fix
Infrastructure Overview
T. Bell 16
Microsoft Active Directory
CERN DB on Demand
CERN Network Database
Account mgmt. system
Horizon
Keystone
NetworkCompute
Glance
Scheduler
Cinder
Nova
CERN Block Storage provider
Dashboard using Horizon
T. Bell 17
Timelines
Deploy as stable release becomes available in EPEL
Keep up to date but not too close Benefit from continuous integration testing of
other companies
T. Bell 18
Grizzly
' 12 Jan2013 Feb Apr May … Oct Dec ' 13
Today HavanaOct, 2013
Havana ServiceNov/Dec, 2013
Apr 4, 2013
Grizzly ServiceMay, 2013
IbexFeb, 2013
FolsomSep 27, 2012
Status CERN IT OpenStack Cloud
Running Folsom around 500 hypervisors on KVM and Hyper-V
High availability using load balancing 75 users creating around 50 new VMs/day
Experiment farms CMS currently running 1,300 hypervisors with
50,000 cores using Essex ATLAS starting to ramp up to a similar size
Other HEP sites moving to private cloud Brookhaven, IN2P3, FutureGrid, NeCTAR, IHEP,
…T. Bell 19
Next Steps (I)
Move to Grizzly Target end May 2013
Enable Kerberos and X.509 authentication Avoids users having to enter passwords
Recycle existing hardware and scale using cells Can recycle around 100 batch machines to
hypervisors/week
T. Bell 20
Cells
T. Bell 21
We’re not alone …
T. Bell 22
Already 6 sites running more than 10,000 hypervisors according to the latest OpenStack user survey
Next Steps (II) Block Storage for Hippos and Pets
Cinder with Ceph, NetApp or GlusterFS Heat for Orchestration and auto-scaling Load Balancing as a Service Bare-Metal to bring all servers under
OpenStack Move ceilometer into production
Accounting by project Move to wall-clock, vCPU metering
T. Bell 23
Cost Model CERN computing is funded from CERN central
budgets, no billing but quotas
T. Bell 24
IT resource manager
Experiment resource managers
Project Management
Quota Management
What to do when quota is exceeded ? No credit card
If capacity is not used ? Spot market on low SLA conditions
Fair share across the cloud ? Worked for supercomputers but heavy for clouds
at scale Bursting to public clouds an option ?
IT provisioned or experiment decision
T. Bell 25
Cloud of clouds: the next big step What is required to get to a cloud of clouds ?
Federated identity Image conversion and sharing API standardisation SLAs Security models
Many initiatives investigating this at different levels Public/Private bursting Private/Private sharing (as the grid) Homogeneous and Heterogeneous
We will see intensive efforts in this area over the coming year
T. Bell 26
Conclusions
Clouds provide a framework for re-engineering how IT is delivering responsive services to the physicists
OpenStack and the ecosystem provide a suitable solution with flexibility and opportunity to contribute as well as benefit from work of others
Migration via re-cycling bare-metal to hypervisors provides a smooth transition
Cloud of clouds has potential to replace grid computing models in the future
T. Bell 27
Questions?Questions?
T. Bell 28
BACKUP SLIDES
Job Opportunities
T. Bell 30
Science is getting more and more global
CERN: x staff, x fellows
T. Bell 31