When Downtime Is Not An Option Joe Felisky Global Director – Technology Solutions SUSE
Jul 12, 2015
When Downtime Is Not An Option
Joe Felisky Global Director – Technology Solutions SUSE
2
Survey Results: More and More Systems Are Considered Critical
Source: Forrester Research, Inc.
3
Workloads that can least tolerate downtime …
• ERP applications, databases, transactional workloads, and more
• Workloads that may impact a large # of users • Workloads that directly impact revenue
Mission/Business Critical
Workloads
• As Data Centers are consolidated onto fewer servers, highly virtualized hosts require additional protections
• Downtime of the guests and hosts can critically impact your IT services
High Density Virtualization
4
Downtime is More Than Just Lost Revenue
5
Planned Downtime
• What is/causes planned downtime? – Scheduled software patching and updates that require system
reboot
– Scheduled hardware maintenance
– Data migration
• What can IT do to mitigate it? – Schedule service window to minimize business impact (getting
harder in globalization and mobile era)
– Optimize the process
Scheduled Downtime
6
Unplanned Downtime
• What is/causes unplanned downtime: – It’s a surprise – no/little warning
– Hardware failure, software bug, malicious attack or operational mistake
– An environmental failure such as natural disaster
• What can IT do to mitigate it? – Reliable systems, High Availability clustering, Geo clustering,
proactive patch management
– Improve process by best practices and training
7
Unplanned Downtime: Top Causes
Source: Forrester Research, Inc.
Exploring Open Source Solutions That Minimize Downtime
9
How To Choose The Right Technologies?
10
1. Prevent
Hardware Downtime
3. Minimize Human Mistakes
2. Maximize Service
Availability
SUSE’s Three Steps Toward Zero Downtime:
11
Step 1: Prevent Hardware Downtime
• Choose the right platform to fit your IT needs.
• By working with hardware partners, SUSE is bringing reliability to the next level. Hardware Architecuture
Hardware Benefits SUSE value-add
IBM System z
• 99.999% availability
• Specially designed tools and technologies
• HA/Geo included
INTEL64
• Cost-effective • Widely accepted
for mission-critical workloads
• Cooperate with hardware to fully exploit RAS (i.e. MCE)
12
Baldor
• Baldor Electric is a global manufacturing company with headquarters in Arkansas, USA. It has 26 plants in the US, Canada, Mexico and China, selling industrial electric products to more than 70 countries.
• Baldor couldn’t ensure 24x7 availability to provide quality service to customers and employees and eliminate the risk of business disruption. At the time, outages occurred five-to-eight times a year, costing hundreds of thousands of dollars.
• By moving workloads to virtual machines on IBM System z running SUSE Linux Enterprise Server for System z, Baldor Electric Company reduced IT costs from two percent of sales to less than one percent—while improving response time, up-time and productivity.
• “The platform goes beyond five 9s,” says Eric Breuer, Manager, Large Systems, Baldor. And the new environment results in considerable time savings for planned downtime.
● 30% lower hardware and software costs.
● 90% consolidation of servers (resulting in lower space, cooling and power costs).
● 90% improvement in up-time (reducing disaster recovery losses).
● “We believe SUSE has the best Linux distribution with the best support. The platform goes beyond “five 9s,” and the new environment results in considerable time savings for planned downtime.” Eric Breuer, Manager Large Systems Baldor
Read Full Case Study
13
Step 2: Maximize Service Availability
• SUSE Linux Enterprise High Availability Extension – Industry-leading open source HA clustering solution
– Set up clusters among physical hosts or virtual guests
– Easy-to-use set-up and management tools
– Resource agents for third-party apps like SAP, PostgreSQL, Oracle, JBoss, Tomcat, Websphere and more.
• Metro and Geo Clustering – Combat regional disasters such as power outage or flood
– Ensure business continuity by metro clustering (up to 25km) or geo clustering (any distance in the world) *Founder of the Linux Foundation HA Working Group
14
15
Maximize service in your cloud or with your storage … • SUSE Cloud
– Leverage High Availability of the Control nodes in your Openstack (SUSE Cloud) deployment
– Utilize Openstack Heat to scale applications on demand for increased service availability
• SUSE Storage (beta) – Software Defined Storage based on Ceph
– Scalable, Reliable and Highly Available through redundancy
– Automated management and repair of underlying storage
16
Step 2: Maximize Service Availability (Cont.) • SUSE Linux Enterprise Live Patching New! (codenamed “kGraft”)
– Live-kernel patching without reboot
– Apply urgent security patches before next service window, reducing the need for planned downtime
• Unique advantages – It integrates smoothly into existing package and patch
management solutions, as it uses the Enterprise Linux RPM package standard
– While patching, there is no need to hold the Linux kernel for a short time, as is necessary in other technologies
17
18
Step 3: Minimize Human Mistakes • Full System rollback
– Built on copy-on-write btrfs and efficient Snapper tool
– Now capable of full-system rollback, including kernel files
– Minimizes risks of operational mistakes
• YaST and autoYaST
– The most efficient single-system management framework, consistent UI
• SUSE Manager
– Open source one-to-many system management
– Reduce errors by proactive and automated patching
– Full Service Pack upgrade automation – with no re-install of the Server OS.
19
20
1. Prevent
Hardware Downtime
3. Minimize Human Mistakes
2. Maximize Service
Availability
Three Steps For Customers To Take Towards Zero Downtime
21
Thank you.
For more information: www.suse.com/zerodowntime
22
+49 911 740 53 0 (Worldwide) www.suse.com
Corporate Headquarters Maxfeldstrasse 5 90409 Nuremberg Germany
Join us on: www.opensuse.org
Unpublished Work of SUSE. All Rights Reserved. This work is an unpublished work and contains confidential, proprietary, and trade secret information of SUSE. Access to this work is restricted to SUSE employees who have a need to know to perform tasks within the scope of their assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated, abridged, condensed, expanded, collected, or adapted without the prior written consent of SUSE. Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability. General Disclaimer This document is not to be construed as a promise by any participating company to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for SUSE products remains at the sole discretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in this presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All third-party trademarks are the property of their respective owners.
257-000020-001