Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
DCIM: An Integral Part of the Software Defined Data Centre
Michael Rudgyard, CTO & Founder, Concurrent [email protected]
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
Who we are and what we do…
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
• Young, dynamic company, formed in 2010– Based in Birmingham, UK
– Private; funded by venture capital
• We develop an intuitive, end-to-end DCIM solution
• Company Vision– To establish Concurrent COMMAND as the DCIM of choice for both the
technical and commercial management of data centres
Company profile
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
USPs / Key Differentiators
• An end-to-end approach– Integration across all systems delivers a complete view of data centre performance
• Vendor neutral– Supports all of your existing and future infrastructure
• Architected to scale– Meets the needs of both small, large and multi-site data centres
• An intuitive and highly dynamic GUI– Key Performance Indicators drive efficiency– Critical alerting, technical fault-finding and planning
• An open framework– Ensures that managers can customise the product to meet their precise requirements
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
So what is DCIM ?
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
The Evolution of DCIM
Building & Facilities Management
Facilities IT Systems
DCIM arguably started as Data Centre Facilities Management …
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
The Evolution of DCIM
Power Chain / Energy Monitoring
Building & Facilities Management
Facilities IT Systems
Then energy management…
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
The Evolution of DCIM
Environmental Monitoring
Power Chain / Energy Monitoring
Building & Facilities Management
Facilities IT Systems
With an initial focus on cooling…
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
The Evolution of DCIM
Environmental Monitoring
Power Chain / Energy Monitoring
Building & Facilities Management
Facilities IT Systems
Server health monitoring
Vendors then realised that servers provided lots of power & environmental data themselves …
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
The Evolution of DCIM
Environmental Monitoring
Power Chain / Energy Monitoring
Building & Facilities Management OS/VM monitoring
Facilities IT Systems
Server health monitoring
While reports claimed that the biggest waste of energy in the data centre was due to underutilised servers…
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
The Evolution of DCIM
Asset Management
Environmental Monitoring
Power Chain / Energy Monitoring
Building & Facilities Management OS/VM monitoring
Facilities IT Systems
Server health monitoring
Which required knowing what and where these servers were…
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
The Evolution of DCIM
Asset Management
Cable Management
Environmental Monitoring
Power Chain / Energy Monitoring
Building & Facilities Management
Capacity Planning
OS/VM monitoring
Facilities IT Systems
Server health monitoring
Knowing what is where, as well as space, power, cooling and network requirements, allows you to plan…
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
The Evolution of DCIM
Asset Management
Cable Management
Environmental Monitoring
Power Chain / Energy Monitoring
Building & Facilities Management
Capacity Planning
Network & storage monitoring
Application monitoring
OS/VM monitoring
Facilities IT Systems
Server health monitoring
Knowing the power, and CPU, network, IO (and even application) usage, allows you to truly understand end-to-end efficiency …
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
The Evolution of DCIM
Asset Management
Cable Management
Environmental Monitoring
Power Chain / Energy Monitoring
Building & Facilities Management
Capacity Planning
Network & storage monitoring
Application monitoring
OS/VM monitoring
Facilities IT Systems
Server health monitoring
VM migration
While active control of cooling, power distribution and IT resources can bring even greater savings …
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
The business case for DCIM
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
Improve resilience and reduce risk
Increase operational efficiencies
Drive energy savings
Business Drivers
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
Reducing Risk
Asset Management
Cable Management
Environmental Monitoring
Power Chain / Energy Monitoring
Building & Facilities Management
Capacity Planning
Network & storage monitoring
Application monitoring
OS/VM monitoring
Facilities IT Systems
Server health monitoring
VM migration
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
Improving Operational Efficiencies
Asset Management
Cable Management
Environmental Monitoring
Power Chain / Energy Monitoring
Building & Facilities Management
Capacity Planning
Network & storage monitoring
Application monitoring
OS/VM monitoring
Facilities IT Systems
Server health monitoring
VM migration
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
Driving Energy Savings
Asset Management
Cable Management
Environmental Monitoring
Power Chain / Energy Monitoring
Building & Facilities Management
Capacity Planning
Network & storage monitoring
Application monitoring
OS/VM monitoring
Facilities IT Systems
Server health monitoring
VM migration
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
Data Centre Efficiency
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
It’s all about virtualization
It’s all about cooling
It’s all about planning
It’s about staff efficiency
What defines an efficient Data Centre ?
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
Energy
“Most data centers, by design, consume vast amounts of energy in an incongruously wasteful manner”
Sep 22nd, 2012 Power, Pollution and the Internet
• The Energy Problem: – Energy is a critical issue for the fast-growing data centre industry
– Cost of energy is substantial and growing fast (1.5-2% of global electricity)
– Significant political pressure to reduce carbon emissions
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
Reducing ‘Facility’ overheads; PUE
• The data centre industry initially focussed on reducing cooling (and other) overheads
• A measure of this is the Power Usage Effectiveness:
Total power used by the Data Centre
PUE = ---------------------------------Power used by IT equipment
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
Virtualisation
• Virtualisation shifts the focus to server rationalisation
• However, virtualisation often takes place:– With few changes to the power and cooling infrastructure (the PUE increases !!)– With little historical knowledge of server utilisation pre-virtualisation
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
Design vs. Operational Efficiency
• Most new data centres are currently designed against PUE targets– For a given IT hardware capacity, PUE is a good planning metric
• But what if the servers are not doing any useful work in practise ??• PUE is actually a very poor operational metric
• We really need a measure of IT Usage Effectiveness
– ie. how effective is the use of power to deliver necessary IT services
– Against which optimisation can be performed for maximum effect
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
0
0.2
0.4
0.6
0.8
1
ComputeUtilisation
Effectiveness
StorageUtilisation
Effectiveness
NetworkUtilisation
Effectiveness
• The industry has struggled to define ‘standard’ metrics that are meaningful (eg. PUE, ITUE, ITEE, FVER..)
• DCIM is a tool that should enable a customer to use any standard, and even define his own KPIs
Data Centre KPIs
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
Optimising for change
DCIM provides hard evidence for making business decisions
• Which servers should be replaced, virtualised or retired ?– Compare utilisation across the estate
• Which servers are better at delivering a particular service ?– Provides useful procurement information
• When should equipment be retired ?– Sweating IT and cooling assets is often a very bad idea indeed !– DCIM can combine power, utilisation, and asset information (eg. depreciation) and
provide solid CAPex vs. OPex arguments for replacing/upgrading assets
• Do I really need to invest in new equipment or a new data centre ?
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
The Challenges for DCIM
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
Commercial Challenges
• Facilities and IT are often managed independently– Complicated sell for end-to-end solution, mitigated by having a modular application– However, future data centres are likely to follow a more unified management approach
(cf. Google, Yahoo, Facebook, etc…)
• Little ‘C’-level visibility of datacentre risks, costs & efficiency – Power is not ‘charged’ to IT; CAPEx decisions are made without evidence; etc…– Ironically, this is what DCIM sets out to achieve !
• Staff do not have the time to implement DCIM – Ironically, DCIM relieves them of many manual tasks once it is in place
• Co-location providers have different needs to their customers– Need for unified DCIM solutions that target both users as this sector grows
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
Technical Challenges – Product Breadth
• The ever-increasing scope of DCIM– Likely to leave non-specialist (eg. smaller hardware-focussed) providers behind.– Requires a well-thought out product strategy
• Need to support diverse equipment from multiple vendors– Drives a standards-based, agentless approach: eg. SNMP, Modbus, BACnet, 1-wire,
IPMI, WMI etc.
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
Manage everything, from anywhere
BACnetnetwork
ModbusRS485
‘IT’ Ethernet network
Management Ethernet network
1-wire semsornetwork
Protocolconverter
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
Manage everything, from anywhere
BACnetnetwork
ModbusRS485
‘IT’ Ethernet network
Management Ethernet network
1-wire semsornetwork
Facilities management : BMS, branch circuits; UPS systems, generators, CRAC units, environmental sensors etc..
Protocolconverter
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
Manage everything, from anywhere
BACnetnetwork
ModbusRS485
‘IT’ Ethernet network
Management Ethernet network
1-wire semsornetwork
Facilities management : BMS, branch circuits; UPS systems, generators, CRAC units, environmental sensors etc..
Protocolconverter
Environmental monitoring: low-cost, 1-wire sensors
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
Manage everything, from anywhere
BACnetnetwork
ModbusRS485
‘IT’ Ethernet network
Management Ethernet network
1-wire semsornetwork
Facilities management : BMS, branch circuits; UPS systems, generators, CRAC units, environmental sensors etc..
PDU & sensor management
Protocolconverter
Environmental monitoring: low-cost, 1-wire sensors
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
Manage everything, from anywhere
BACnetnetwork
ModbusRS485
‘IT’ Ethernet network
Management Ethernet network
1-wire semsornetwork
Facilities management : BMS, branch circuits; UPS systems, generators, CRAC units, environmental sensors etc..
PDU & sensor management
Protocolconverter
Rack cooling & access management
Environmental monitoring: low-cost, 1-wire sensors
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
Manage everything, from anywhere
BACnetnetwork
ModbusRS485
‘IT’ Ethernet network
Management Ethernet network
1-wire semsornetwork
Facilities management : BMS, branch circuits; UPS systems, generators, CRAC units, environmental sensors etc..
PDU & sensor management
Server health management Protocolconverter
Rack cooling & access management
Environmental monitoring: low-cost, 1-wire sensors
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
Manage everything, from anywhere
BACnetnetwork
ModbusRS485
‘IT’ Ethernet network
Management Ethernet network
1-wire semsornetwork
Facilities management : BMS, branch circuits; UPS systems, generators, CRAC units, environmental sensors etc..
PDU & sensor management
Server health management
OS, VM and application monitoring
Protocolconverter
Rack cooling & access management
Environmental monitoring: low-cost, 1-wire sensors
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
Technical Challenges – Scale Out
• Imagine a high density data-centre with ‘just’ 10,000 servers– ie. 300-500 racks and a similar number of PDUs and sensors
– and up to (say) 16 VMs per server
• You might want to monitor (derive reports from etc..)– 300-1500 environmental sensors
– 20-30 data-points per server (IPMI, Power) = 200k-300k points
– 20-100 data-points per OS/VM (eg. SNMP, WMI) = 3.2M-16M points
– … as well as user and application data.
• That’s of a lot of information if you sample every 10 to 60s!– But ‘scale-out’ data centres can be ten times this size…
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
Technical Challenges – Scale Out
Things that won’t work (and that we don’t do !):
• Using a ‘single-instance’ software architecture – Information will need to be processed in a distributed manner
• Putting unrefined data in a standard SQL data-base– or you’ll need a data-centre to store, process & retrieve this data !
• Expecting simple GUIs (eg. lists and trees) to be effective– Visualisation becomes a key aspect to usability
– Increased need for automation and data consolidation
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
Technical Challenges – Scale Across
• Co-location / cloud providers are interested in providing their customers with portals for managing their own systems
• This provides further challenges for DCIM providers:– Providing customers with relevant ‘facilities’ data
– Granular monitoring of data (eg. power usage) in highly dynamic clouds
– Systems that are capable of scaling across many users
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
The future of DCIM
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
Where DCIM Meets Cloud
Facilities IT Systems
DCIM & Cloud Infrastructure Management is likely to merge…
DCIM
Cloud Infrastructure Management
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
DCIM & the software-defined data centre
• In the future, we will move to the autonomous data centre– Emphasis moves from monitoring to automated management by software
– Potential for very significant operational and energy savings…
• Real-time optimisation of complete service delivery– Migration of virtual machines based on usage; active power control; localised
cooling etc…
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
DCIM & the software-defined data centre
• There are numerous potential issues:– Software platforms will need to talk to even more diverse systems
– The software will need to be scalable and able to deal with highly dynamic environments
– The control mechanisms will need to be defined by the data centre managers and IT team, using simple interfaces that abstract complexity
– There are many optimisation constraints relating to: physical issues; the IT, network and storage infrastructure; the required QoS etc..
• But one of the biggest issues is perceived risk..
– Data centres are ‘mission critical’ and highly conservative
Increased Resiliency - Improved Operational Efficiencies - Reduced Energy Costs
Data Centre Infrastructure Management
Questions ?
Visit our website http://www.concurrent-thinking.com/ for more information
Or fill in our contact form with any enquiries and we will endeavour to reply as quickly as possible.