Top Banner
Avoiding Storage Service Disruptions with Availability Intelligence Brent Phillips, Managing Director, Americas Brett Allison, Director of Technical Services www.intellimagic.com
50

Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

Jun 01, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

1

Avoiding Storage Service Disruptionswith Availability Intelligence

Brent Phillips, Managing Director, AmericasBrett Allison, Director of Technical Services

www.intellimagic.com

Page 2: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

2

Today’s Agenda

1. “Availability Intelligence” for the storage infrastructure‒ For EMC, IBM, HP, HDS block storage, and Cisco/Brocade Fabric

2. Modeling storage performance

3. Avoiding and resolving problems

4. Availability Intelligence as a Service

Page 3: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

3

1. Availability Intelligence for Data Storage

Page 4: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

4

We are inspired by creating intelligence

that illuminates the risks hiding inside your IT infrastructure.

“Any sufficiently advanced technology is indistinguishable from magic”

Arthur C. Clarke, 1962

Page 5: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

5

Storage Availability Today: Either Good or Bad

Full Engageds

Little

Panic Scattered

Stress& Focus

BrainStatus

Available

Page 6: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

6

The Missing Stage: About to Be Bad

Little

Healthy

Panic

Engaged

Scattered

Stress & Focus

BrainStatus

Available

Page 7: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

7

Seeing Threats to Continuous Availability• Question: Which has better intelligence to avoid outages:

‒ A 20 thousand Dollar automobile; or ‒ A SAN storage infrastructure costing millions of Dollars?

Page 8: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

8

Predictable

[CATEGORY NAME]

Incidents Leading to Application Unavailability

Response for Unpredictable:

• Find the problem quicker

• Accelerate the problem fix

Response for Predictable:

• Avoid incident with proactive action

Page 9: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

9

Increasing the Predictable Portion

Predictable

[CATEGORY NAME]

What would be the impact on:1. Your IT staff?2. Your Employees?3. Your Customers?

Page 10: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

10© IntelliMagic 2014

Time 

Response Time

Your existing monitors look at symptoms

here, only after users experience problems

Your existing monitors look at symptoms

here, only after users experience problems

SLA Pe

rforman

ce

IT Infrastructure Availability Monitoring Today

Easy metric to get, but is an effect,

not a cause

Page 11: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

11

Availability Intelligence identifies risk here, before

response time suffers

Availability Intelligence identifies risk here, before

response time suffers

© IntelliMagic 2014

Time 

Response Time 

Sub‐component Saturation

SLA Pe

rforman

ce

Monitoring with Availability Intelligence

Requires evaluating every data point

with expert domain knowledge about every component

Easy metric to get, but is an effect,

not a cause

Page 12: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

12© IntelliMagic 2014

Time 

Response Time Sub‐component Saturation

SLA Pe

rforman

ce

Most infrastructure “fires” can be prevented by

intervening here

Most infrastructure “fires” can be prevented by

intervening here

Changing the Outcome - Avoiding Disruptions

Page 13: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

13

What: Foreknowledge about hidden threats to availability

Why: To better protect continuous availability at primary site by1. Avoiding incidents (make more of them predictable)2. Accelerating the resolution (reduce MTTR)

How: Use built-in expert domain knowledge in automaticanalysis of the performance and configuration data

What is Availability Intelligence?

Page 14: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

14

• It is not enough to only have:‒ Easier, nicer graphs, visualizations‒ Statistical analysis (as common w/ ITOA - IT Operations Analytics)

• Rather, understanding what the data means for risk requires:‒ HW component knowledge (as gained from performance modeling)‒ Good or Bad? and rate the risk of unavailability‒ How to derive new, meaningful metrics out of the raw data‒ Best practices to configure, manage infrastructure‒ How to visualize the risk and problems in the infrastructure

What Availability Intelligence Requires

Page 15: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

15

Illuminating Threats Inside the Storage Arrays

Storage Array Response

Times

Within Array

Between Arrays

Imbalance?

Application Workloads

Config or Failure

Changes?Disk Device

Loads

FW Bypass, etc.

Back-end,Cache

AdapterUtilization

Fibre Switch Errors

Front-endLag

Measure:

Lead Measures:Lead Measures:

Page 16: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

16

7 Areas to Apply Expert Domain Knowledge

Benefits1. Avoid Incidents2. Accelerate fixes

Sample actions:• Rebalance work• Fix lost redundancy• Isolate change• Correct error • Hardware upgrade

Machine-Generated

Data

ExpertStorage HW Knowledge

+Availability Intelligence

Automation & Visualization

Page 17: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

17

The Power of Knowing Constantly with Automation

• Assessing risk every interval, for every device, in every data center

• ITIL v3 definition Capacity Management is not achievable w/o automation: – The Process responsible for ensuring that the Capacity of IT Services and the IT

Infrastructure is able to deliver agreed Service Level Targets in a Cost Effective and timely manner… considers all Resources required to deliver the IT Service...

Page 18: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

18

Data Center Rollups of KRI’s - Key Risk Indicators

18© IntelliMagic 2014

Disk Storage Systems

Performance Metrics

Key Risk Indicators

Highest Rating for this Dashboard

Consolidate individual ratings on infrastructure resources into data center views to see risk across enterprise at a glance

Page 19: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

19

Visualizing Risk to Continuous Availability

What does the data mean for your infrastructure availability?Automatic rating of key metrics according to built-in expert knowledge, to obtain intelligence about threats you can use to protect availability

No Border, No Rating Green Border, GoodYellow Border, Early Warning

Red Border, Performance Exceptions

Page 20: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

20

Rating the Risk using Expert Domain Knowledge

Based on straight thresholds where appropriate (like hardware limits)

Based on dynamic thresholds where the limits also depend on

workload characteristics

Page 21: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

21

2. Modeling Storage Performance

Page 22: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

22

IntelliMagic Direction• Predictive model for exact storage configurations• What we model:

‒ Separate hardware and workload‒ Predict what happens on new storage system hardware‒ Predict what happens when workload changes

• What can we do (may require services too):‒ Model to other SAN box, other drive technology‒ Model Cache size‒ Model Automatic tiering, Help assess drive mix

• based on volume data‒ Model Compression impact (IBM SVC)

Page 23: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

23

Storage Performance Modeling Concepts

Configuration

Workload

Performance

Essentially the goal is to solve the following equation:

Page 24: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

24

Abstraction of the Configuration

Page 25: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

25

Abstraction of the Workload

Page 26: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

26

IntelliMagic Direction: Under the Covers

Page 27: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

27

Model Merge and Migrate Options• Using different configuration options

‒ For example, VMAX 40K, HDS VSP, DS8870 (16 core)

Page 28: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

28

Predict scalability

• Project the growth different configurations can safely handle

Page 29: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

29

3. Avoiding and Resolving Problems

Page 30: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

30

3.1 Case Study: “Run Away Query”

Page 31: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

31

Front-end Dashboard – Warning High Front-end Read Response Time

Page 32: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

32

Drill Down to the Multi-Charts

Page 33: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

33

VOLUME-000119, VOLUME-000118, VOLUME-000063, VOLUME-000196 doing ~100 MB/sec

Page 34: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

34

Who is Doing the Work and is it Necessary?

Page 35: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

35

Storage Pool Front-end Dashboard After an Index was Added

Page 36: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

36

Run-Away Volumes are Gone!

Page 37: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

37

3.2 Case Study: “Fabric Contention”

Page 38: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

38

How Do You Quickly Identify Strained Fabric Ports?

Page 39: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

39

3.3 Case Study: “Auto-tier Confusion”

Page 40: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

40

HP 3PAR – Adaptive OptimizationCommon

Provisioning Groups (CPGs):

Groupings of similar LDs for provisioning

CPG_DB_SSD_R5_3plus1_AO

CPG_DB_450gb_R5_3plus1_AO

CPG_900gb_R6_6plus2_AO

Performance: Biases moving data to the fastest tierBalanced: Balances between performance and cost

Tier 0 CPG

Tier 1 CPG

Tier 2 CPG

Page 41: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

41

Distribution of IOPS Across AO/CPGs

Page 42: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

42

Auto-tiering – How well balanced is it?

Page 43: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

43

HP 3PAR Case Study SummaryFinding Recommendations 1. Too much workload on 450s2. Not enough workload on 

900s or SSDs

1. Enable the Tier 1 Warning Limitand set it to a lower amount of space than is currently in use for the Tier 1 CPG.  This should force capacity from the 450 GB 10K RPM drives to the 900 GB 10K RPM drives.

2. Set Cost mode for BASE_ESX_AO

Page 44: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

44

4. Availability Intelligence as a Service

Page 45: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

45

• Incorporates frequently updated hardware knowledge • Very quick time to results (~24 hours)• Okay for security - no PII in infrastructure measurement data• Easy dissemination of intelligence visualizations• Easy access to expert consultants

Availability Intelligence as a Service

Page 46: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

46

• Creating the world’s best intelligence about performance and availability risk in your infrastructure

• 20+ year history of delivering solutions for deep infrastructure analysis

• Privately held, financially independent

• Customer centric, responsive

• Solutions used daily in some of the world’s largest data centers

IntelliMagic

Page 47: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

47

Example Customer – Schaeffler76,000 Employees; 180 Locations; 50 Countries

Page 48: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

48

Outsmart Unavailability with the world’s best intelligence about the current levels of risk hiding in your infrastructure.

A new layer of protection to better maintain continuous availability.

Easily accessible via SaaS.

For questions/more details, contact:[email protected]

Conclusion

“Any sufficiently advanced technology is indistinguishable from magic”

Arthur C. Clarke, 1962

Page 49: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

49

Join us in La Jolla, for the 2016 CMG Conference!

November 7th to 10th 2016 at Hyatt Regency in La Jolla, CA

Page 50: Avoiding Storage Service Disruptions with Availability ...€¦ · HP 3PAR Case Study Summary Finding Recommendations 1. Too muchworkload on 450s 2. Not enough workload on 900s or

50

IntelliMagic Vision as a Service Architecture