Top Banner
PRESENTATION TITLE GOES HERE Optimizing IT Costs using Virtualization, Green and Cloud Computing David Royer SNIA Brasil, Chairman Rio Info 2009 Rio de Janeiro, Brazil
66

Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

Nov 01, 2014

Download

Travel

Rio Info

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

PRESENTATION TITLE GOES HEREOptimizing IT Costs using Virtualization, Green and

Cloud Computing

David Royer

SNIA Brasil, Chairman

Rio Info 2009

Rio de Janeiro, Brazil

Page 2: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

SNIA At A Glance

Voice of the storage industry representing approximately $50-60B in worldwide revenue for hardware and software

Founded in 1997 as a non-profit trade association

Worldwide headquarters in San Francisco USA

Global presence in A/NZ, Canada, China, EMEA, India, Japan and South-Asia

Technology Center activities in Colorado, Beijing, Tokyo, and Bangalore

Focus on education, conferences, specifications / standards, software, industry alliances, best practices, plugfests, and conformance testing for SNIA specifications

Co-owner of Storage Networking World (SNW) conference with Computerworld/IDG Enterprise

a collaborative environment and serve as global contributors toward the advancement of standards, education, and innovation in the storage and information management industry

Page 3: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

Storage Outlook and Growth

Page 4: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

YoY Growth by Segment

-70.00%

-60.00%

-50.00%

-40.00%

-30.00%

-20.00%

-10.00%

0.00%

10.00%

20.00%

30.00%20

08Q1

2008

Q2

2008

Q3

2008

Q4

Tape - Entry Level Tape - Midrange Tape - High End

Int Disk - Entry Int Disk - Midrange Ext Disk - Entry

Ext Disk - Midrange Ext Disk - High End

Worldwide Disk Storage Systems and

Branded Tape Storage Segment Factory

Revenue Growth

• Entry level and midrange external DSS are the only segments showing flat/positive YoY growth in 4Q

2008. This can be attributed to: customers deferring purchase of larger, more expensive storage systems

in favor of lower cost, more modular systems and; the emergence of technologies, such as iSCSI, that

offer enterprise level features yet at a lower price point than traditional FC SAN systems

Source IDC Doc # 218274

Page 5: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

Storage Hardware 2009 Outlook

Tape will continue to decline as disk-based archival and back-up technologies

emerge

Internal storage is closely tied to the server market, which is expected to be

weaker in the coming quarters than the external disk market

External disk storage systems market will feel further the impact of the

economic crisis. Weakness seen in higher end systems, specifically

mainframes and FC SAN.

Healthier segments include:

iSCSI SAN – specifically in the upper entry level and midrange market

Verticals such as Healthcare, Video Surveillance, and Government

Midrange product offerings: as customers fulfilling their enterprise

storage needs with midrange products

Enterprise VTL: Will augment midrange and enterprise tape drives,

especially in tape libraries and automation

Source IDC Doc # 218274

Page 6: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

Storage Software Growth – Average 7%

Data Protection, growth rate through 2013, 6.2%

Archiving Software, growth rate through 2013, 10.4%

Storage Device Management Software, growth rate through 2013, 2.8%

Storage Management Software, growth rate through 2013, 5.6%

Storage Infrastructure, growth rate through 2013, 5.9%

Storage Replication, growth rate through 2013, 7.6%

File System, growth rate through 2013, 7.1%

Source IDC Doc # 217529

Page 7: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

E-Discovery Growth

Combination of software:Storage infrastructure, e-discovery, collaboration, ECM, data management, and security

HardwareStorage spending growth was underpinned by data volume

and requirements to store, manage, index, archive, and preserve data

Servers

Source IDC Doc # 218259

Page 8: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

Focus on a Few

Industry Storage Trends

Green IT

Cloud Computing

Virtualization

Page 9: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

9

Abstract

Best Practices in Managing Virtualized Environments

Today, data center environments are increasingly complex with

virtualization at all layers of the IT stack, including network, server,

SAN and storage. IT professionals are often challenged in diagnosing

application performance issues, optimizing infrastructure resource

utilization, and planning for future changes. The best practices for

managing complex data center environments include cross domain

management orientation, watching the infrastructure response time

for cross-domain performance, looking for application contention and

contention-based latency in the storage layer, best fit analysis of

workloads to storage resources, and working toward infrastructure

performance SLAs. Key requirements for this new breed of

management software include agent-less discovery and SMI-S support.

Page 10: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

10

Virtualization is

Everywhere

SAN SAN

NETWORK

App Servers Web Servers Security

Array Virtualization

Storage Network

Server Virtualization

Client Network

Tremendous BenefitsPooling of resources

Rapidly deploy new

applications

Increase resource

utilization

Over-subscribe resources

Lower acquisition cost and

TCO

Traditional system

management practices

may no longer work

Page 11: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

11

What’s “Real” about

Virtualization?

Like the Emperor‟s new (virtualized) clothes –

A logical interface presenting a

normalized “resource” that isn‟t “all there”

Built over physical and other virtual layers that do not look at all like

the presented logical resource

We will discuss two major IT virtualization initiatives

Storage Virtualization

Server Virtualization

(and the combination of the two!)

Check out SNIA Tutorial:

Virtualization 1- What, Why,

Where, and How

Page 12: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

12

Virtualization Pools Resources

SAN SAN

CLIENT NETWORK

Server Pool

STORAGE NETWORLK

Storage Pool Tier 1Tier 2

Archive

CLIENT NETWORK

Physical Infrastructure Model Virtual Infrastructure Model

Page 13: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

13

Managing Virtualized

Environments

Managing through Virtualization is Challenging

Diagnosing Performance Problems

Optimizing Resource Utilization

Planning for Future Changes

Virtualization Feature “New” Admin Challenge

Clients Reserve and Share

Resource Capacity

Resource Performance still

Degrades Non-linearly with Load

Dynamic Infrastructure Finding Transitional bottlenecks

Increased Resource Utilization Optimal Resource Deployment

Easy to provision new VMs Predicting if the next VM fits

Page 14: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

14

The Bottom Line…

Applications share resources

Poor performance is caused by:Hard-to-find I/O bottlenecks and resource contention

Mis-alignment between layers of virtualization

Under-provisioning shared resources

Over-provisioning of shared resources as insurance negates ROI

Inhibitors to successVirtualized data center complexity

Lack of cross-domain management

Lack of cross-domain communication

Page 15: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

15

Best Practices in Managing Virtualized

Environments

Solving Old Problems in a New Environment

Recommended Best Practices -

1. Cross Domain Analysis and Shared Resource Contention

2. Adopt an Application View of Performance

3. Use Automation Wisely

4. “Effective Capacity” Management

5. Model-based Optimization and Planning

Page 16: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

16

1. Cross Domain Analysis

Virtualization Management is “Cross-Domain” -

Create a Cross-Domain Baseline (discover and collect)

Mapping from multiple layers (app, server, storage, physical & virtual)

Aim for agent-less and “on-line”

Standards like SMI-S are essential for heterogeneous environments

Check Configuration First

Don‟t optimize or “plan a baseline” from a poorly configured system

Checklist vendor configuration best practices

Newer technologies (Thin-wide arrays, 10 GbE networks,

SSDs) move performance bottlenecks elsewhere.Check out SNIA Tutorial:

Solving Business-Oriented Goals

with SMI-S

Page 17: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

17

I/O Paths Through

VirtualizationApplications and Servers

Virtual Server Hosts

Virtual Storage

Storage Arrays

Page 18: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

18

Find Shared Resource

Contention

Stepping Through a Virtual Looking Glass -

Need to Map through Virtualization LayersMap relationships at every level

Exponential problem of server virtualization over storage virtualization

Sum up the loads from every client that shares each resource

Quantify Application Contention due to SharingCalculate performance impact back to each application

Root cause is mostly figuring out What’s Changed when Capacity runs out

If Load changed, was it aberrant behavior or growth?

If Configuration changed, does it violate policy or show thrashing?

If Contention arose, who is new to the pool?

Page 19: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

1919

Application Contention

Cross Domain visibility is naturally “foggy”

Domain specific management has limited view

Virtualization makes it harder

Management requires end-to-end picture

Page 20: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

A common map

helps different domain

admins communicate

Need a map through

all the indirection

Sharing can be

dynamic – maps

must be too

Long data path from application to array…

20

Cross-Domain: Navigating the Virtualized

Environment

Page 21: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

21

2. Adopt Application View of Performance

The Customer is Always Right –

Application Infrastructure PerformanceHow long do it take an I/O to complete from the application point of view (Response Time)

Some applications ($$$) are more loved than others

Manage to this “Service” PerformanceElement utilizations are interesting,but service performance is the goal

Look for Abnormal “Service” BehaviorNot just default rule-of-thumb thresholds on utilizations

Page 22: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

22

Service Layer Metrics

Customer Resource

Throughput @

Response Time0

5

10

15

20

25

30

35

40

0 200 400 600 800 1000 1200 1400

Throughput ( transactions / sec )

Response

Time ( sec )

Service Level Agreement

Optimal

Throughput

Maximum

Throughput

Page 23: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

23

Look for Abnormal Behavior

Check for Abnormal Behavior

Calculate baseline

A statistical analysis of variance of performanceover time

Compare data to baseline

Shared Resources tend to average out peaks that will show in dedicated resources

Helps Justify Virtualization

Acceptable Variance

Page 24: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

24

4. “Effective Capacity”

Management

Capacity Management Isn’t Just “Enough GBs”

Storage has both space and time constraints

(server folk have it easy!)

Manage to the total “Effective Capacity”

Maximum utilization that gives good performance

Not to total actual utilization (aka “saturation”)

Build in Automation for Scalability

Virtualized environments tend to sprawl

And they can change dynamicallyCheck out SNIA Tutorial:

Storage Virtualization II –

Effective Use of Virtualization

Page 25: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

25

Effective Capacity = Optimal Usage

0

5

10

15

20

25

30

35

40

0 200 400 600 800 1000 1200 1400

Throughput ( transactions / sec )

Response

Time ( sec )

Service Level Agreement

Optimal

Throughput

Maximum

Throughput

Page 26: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

26

4. Use Automation Wisely

Build in Automation for ScalabilityVirtualized environments tend to sprawl

And they can change dynamically

Almost everything can be automatedEvent Monitoring

Performance collection and reporting

Analysis of Performance and Configuration

correlation of events with performance, first and second order analysis

Provisioning, Reconfiguration and Migration

Don‟t forget to leave an audit trail

Feedback loopWhat where the effects of the change?

Check out SNIA Tutorial:

Storage Virtualization II –

Effective Use of Virtualization

Page 27: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

27

5. Model based Optimization and Planning

Moving Towards a Real-Time Datacenter -

Constantly Increase Operational EfficiencyMost working infrastructure is sub-optimized

Dedicated resources

“If it ain‟t broke, don‟t fix it” attitudes (or capabilities)

However, when everything is shared, everyone goes down together…

Real-er Time Capacity PlanningUtilizations are related to Response Time through Queuing Theory

Need to predict performance degradation under future application load changes

Need to predict performance improvements from possible architectural/technology changes

Planning and tuning will go from large cyclical events to smaller, more dynamic perturbations

Page 28: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

28

Queuing Theory to The Rescue…

Queuing Models create Response Time curves

Based on established mathematics (Buzen, et.al – see www.cmg.org )

Useful analytically (historically) as well as predictively

For a simple example think of a check-out line at the grocery store

Complex Queuing Network Models can represent

nested and virtualized IT domains

Advanced cross-domain solutions model IT virtualization

Page 29: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

29

Best Practices in

Managing Virtualized Environments

In Summary -1. Cross Domain Analysis and Shared Resource Contention

Virtualization is about sharing across IT domains,and that‟s often the problem

2. Adopt an Application View of PerformanceManage to customer service levels

3. Use Automation WiselyDoing more with less time and fewer errors

4. “Effective Capacity” ManagementShared resources still obey the laws of physics

5. Model-based Optimization and PlanningLeverage Prediction to Improve your Future

Page 30: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

Green IT and

Storage, Energy and the Industry

Storage is a notable contributor to Data

Center energy consumption

Data storage is projected to increase 6-

fold between 2007 to 2011(1)

“Building the Green Data Center”

© 2008 SNIA All Rights Reserved

Industry Concerns today

Fear of „Green Washing‟ – lack of industry wide comparisons tools

Inappropriate comparisons of technologies – Apples to Oranges

New technologies being introduced – how will they effect energy usage?

Benefit of product features vs. bigger picture of data management

(1) IDC White Paper, “The Diverse and Exploding Digital Universe,” March 2008.

Page 31: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

Energy Cost of Data Storage

0

5,000

10,000

15,000

20,000

25,000

30,000

35,000

40,000

45,000

50,000

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

Capacity (

PB

s)

0

500

1,000

1,500

2,000

2,500

3,000

$M

Installed # of Petabytes

Cost to Power and Cool

(57% 2006-2011 CAGR)

(19% 2006-2011 CAGR)

IDC #212714, “The Real Costs to Power and Cool All the World's External Storage” – June 2008 Dave Reinsel

Chart used by permission of IDC

Page 32: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

What Impacts Energy

Consumption for Data Storage

Storage capacity / usage efficiency

increasing data larger capacity more disks

redundant copies magnify capacity needs

variability in usage and utilization inefficient allocation of space

What is valuable data? What is the retention policy?

Data transfer rate / access speed

high I/O bandwidth higher rotational speed; striping across many drives

low access times faster actuators; higher rotational speeds; caches

How fast and immediate must data be available? (time-to-data)

Data integrity

25% of “digital universe” is unique, but 75% are replicas / duplicates

partly to ensure data integrity and survivability; partly wasteful

Data availability / system reliability

RAID uses extra drives, plus redundant power supplies, fans, controllers,

How valuable is data? How likely are failures? How fast must data be available?

Page 33: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

Potential Paths to “Green” Storage

Improve usage efficiency

De-duplication

Thin provisioning

Minimize energy consumption

Improved component designs – high-efficiency power

supplies, advanced & flexible drives

Variants of MAID – idle and spin-down

New technologies

Solid state storage

Alternative + hybrid system designs (opportunity to rethink)

must be driven by

metrics / standards

/ guidelines

Page 34: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

Anatomy of a Storage System

Disk Arrays

UPSs

PDUs

Fans

Switches

Hard drives

Controllers

Power Distribution Unit

Uninterruptible Power Supply

System design, complexity andredundancy vary depending on applications & usage

Component designs, software features, and workload affect power consumption and efficiency

Appliances

Power Supplies

Apps Software

Page 35: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

Storage –

Power Supply Efficiency

Fans

Hard drives

Controllers

Power Supplies

1 - Redundant power supplies are

standard, except in the smallest systems

2 - Significant

mechanical

components, require

dual-output power

supplies (12V, 5V)

3 - Power supplies often custom-

designed for reliability

(for

servers)*

*presented by EPA at ENERGY STAR Computer

Server Stakeholder Meetings; July 2008

Page 36: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

Idle Power versus Active Power

Idle Mode for a Storage Array

storage system is protecting data, ready to process IOs

background maintenance & optimization tasks on-going

factors: time-to-data, overhead electronics, fan, maintenance

systems are idle large fractions of the time

Active Mode for a Storage Array

storage system is carrying out IOs

background tasks continue in parallel

factors: workload (seq/random), response time, throughput

evaluate a variety of workloads, plus sustained peak power

Page 37: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

HDD Capacity versus

High Performance

Capacity

focused on GB/watt at rest

1 TB SATA: 15W

4 x 250 GB FC: 64W

also tend to have better $/GB

NOTE: power use is quadratic with respect to rotational speed

Use the slowest drives that will fit your needs

Performance

focused on seek time

1 TB SATA: 12 – 15 ms

300 GB FC: 3 – 4 ms

also designed for higher RAS * environments

* RAS = Reliability, Availability, Security

Page 38: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

SSD vs HDD

Power Value - Significant Power Savings

EnterpriseSSD

15k RPMHDD

Idle Temp

6.8W 0.5W

Idle Power

85°F

~38% Less Heat, ~90% Less Power

Load Temp

10.1W 0.9W

Load Power

94°F

SSDs reduce

energy cost to

operate and cool

the data center

Page 39: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

Storage Taxonomy

for Energy Measurement

Need a taxonomy (product classification) to enable fair

comparisons among similar storage products

e.g. for motor vehicles – motorcycles, cars, trucks

Similar green metrics may apply to all product categories, but

different values establish best-in-class

Unique considerations apply to special categories

e.g. amphibious cars, skid steer loaders, tanks

Clear taxonomy will simplify comparisons and aid regulatory

efforts

Page 40: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

SNIA Measurement Standard - Draft

Storage taxonomy

Measurement conditions

Idle metric

Active metric(s)

Reporting results

Page 41: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

1) Storage Taxonomy (1 of 2)

Storage Taxonomy Summary

Online Storage Near Online Storage

Prime storage, able to serve random as well as

sequential workloads with minimal delay

Intended as second tier storage behind Online

Storage. Able to service Random and

Sequential workloads, but perhaps with

noticeable delay in time to 1st data access.

Maximum Capacity Guidance Note: Maximum Capacity Guidance reflects the

maximum capacity a given offering can be purchased with and/or field upgraded to. It is intended to be used as a guideline as apposed to an absolute

value. There will be case where a device may have greater or small capabilities, but otherwise is an appropriate match for a given classification due to

other criteria, e.g.: redundancy capabilities

Max Storage Devices Max Storage Devices

Group 1) SoHo & Consumer

Up to 4 DevicesStorage which is designed primarily for home (consumer) or home / small office usage.–Often Direct Connected (USB, IP, etc)

–No option for redundancy (will contain SPOFs)

Group 2) Entry, DAS, or JBOD

More than 4 Devices Up to 4 DevicesStorage which is dedicated to one or at most a very limited number of servers. Often will not include any

integrated controller, but rely on server host for that functionality.–Often Direct Connected (SATA, IP, etc.)

–May optionally offer limited number of redundancy features

Group 3) Entry / Midrange

More than 20 Devices More than 4 DevicesSAN or NAS connected storage which places a higher emphasis on value than scalability and

performance. This is often referred to as „Entry Level‟ storage.–Network connected (IP, SAN, etc.)

–Has options for redundancy features

Group 4) Midrange / Enterprise

More than 100 Devices More than 100 DevicesSAN or NAS connected storage which delivers a balance of performance and features. Offers higher level

of management as well as scalability and reliability capabilities.–Network connected (IP, SAN, etc.)

–Has options for and often delivered with full redundancy (no SPOF)

Group 5) Enterprise / Mainframe

More than 1000 DevicesStorage which exhibits large scalability and extreme robustness associated with Mainframe deployments,

though are not restricted to Mainframe only deployments.–Mainframe connectivity with optional network connection (IP, SAN..)

–Always delivered with full redundancy (no SPOF)

–Often Capable of non-disruptive serviceability

Storage Taxonomy Summary

Online Storage Near Online Storage

Prime storage, able to serve random as well as

sequential workloads with minimal delay

Intended as second tier storage behind Online

Storage. Able to service Random and

Sequential workloads, but perhaps with

noticeable delay in time to 1st data access.

Maximum Capacity Guidance Note: Maximum Capacity Guidance reflects the

maximum capacity a given offering can be purchased with and/or field upgraded to. It is intended to be used as a guideline as apposed to an absolute

value. There will be case where a device may have greater or small capabilities, but otherwise is an appropriate match for a given classification due to

other criteria, e.g.: redundancy capabilities

Max Storage Devices Max Storage Devices

Group 1) SoHo & Consumer

Up to 4 DevicesStorage which is designed primarily for home (consumer) or home / small office usage.–Often Direct Connected (USB, IP, etc)

–No option for redundancy (will contain SPOFs)

Group 2) Entry, DAS, or JBOD

More than 4 Devices Up to 4 DevicesStorage which is dedicated to one or at most a very limited number of servers. Often will not include any

integrated controller, but rely on server host for that functionality.–Often Direct Connected (SATA, IP, etc.)

–May optionally offer limited number of redundancy features

Group 3) Entry / Midrange

More than 20 Devices More than 4 DevicesSAN or NAS connected storage which places a higher emphasis on value than scalability and

performance. This is often referred to as „Entry Level‟ storage.–Network connected (IP, SAN, etc.)

–Has options for redundancy features

Group 4) Midrange / Enterprise

More than 100 Devices More than 100 DevicesSAN or NAS connected storage which delivers a balance of performance and features. Offers higher level

of management as well as scalability and reliability capabilities.–Network connected (IP, SAN, etc.)

–Has options for and often delivered with full redundancy (no SPOF)

Group 5) Enterprise / Mainframe

More than 1000 DevicesStorage which exhibits large scalability and extreme robustness associated with Mainframe deployments,

though are not restricted to Mainframe only deployments.–Mainframe connectivity with optional network connection (IP, SAN..)

–Always delivered with full redundancy (no SPOF)

–Often Capable of non-disruptive serviceability

See: Green Storage Power Measurement Specification for complete details

Page 42: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

1) Storage Taxonomy (Continued: 2 of 2)

Storage Taxonomy Summary(Continued)

Removable Media

Libraries

Virtual Media

Libraries

Infrastructure

Appliances

Infrastructure

Interconnect

Archival storage used in a

sequential access mode. A

Typical example would be Tape

based archival, both Stand Along

and Robotically assisted libraries.

Storage which simulates

removable Media Libraries.

Will typically use non tape

based storage and as such are

able to respond to data

requests more quickly

Devices placed in the storage SAN

or network adding value through

one or more dedicated Storage

enhancements. Examples include:

SAN Virtualization, Compression,

De-duplication, etc.

Devices which enable a SAN or

other Storage Network data

switching or routing.

Maximum Capacity Guidance Note:

Maximum Capacity Guidance reflects the maximum capacity a given offering can be purchased with

and/or field upgraded to. It is intended to be used as a guideline as apposed to an absolute value. There

will be case where a device may have greater or small capabilities, but otherwise is an appropriate match

for a given classification due to other criteria, e.g.: redundancy capabilities

Max Tape DrivesMax Storage Devices

Supported*Max Port Count

Group 1) SoHo & Consumer

Stand Alone Drive(No Robotics)

Note: * Infrastructure Appliances by

definition have no intrinsic storage, other than what is used for local

processing and/or local Cashing of

data.

Storage Devices Support in this case

refers to the number of storage

devices controllable down stream of

the Appliance

Storage which is designed primarily for home (consumer) or home / small

office usage.–Often Direct Connected (USB, IP, etc)

–No option for redundancy (will contain SPOFs)

Group 2) Entry, DAS, or JBOD

Up to 4 Drives Up to 32Storage which is dedicated to one or at most a very limited number of

servers. Often will not include any integrated controller, but rely on server

host for that functionality.–Often Direct Connected (SATA, IP, etc.)

–May optionally offer limited number of redundancy features

Group 3) Entry / Midrange

More than 4 Drives Up to 100 DevicesSupport for up to 20

DevicesUp to 128

SAN or NAS connected storage which places a higher emphasis on value

than scalability and performance. This is often referred to as „Entry Level‟

storage.–Network connected (IP, SAN, etc.)

–Has options for redundancy features

Group 4) Midrange / Enterprise

More than 24 Drives

More than 100

DevicesSupport for more than 20

DevicesMore than 128

SAN or NAS connected storage which delivers a balance of performance

and features. Offers higher level of management as well as scalability and

reliability capabilities.–Network connected (IP, SAN, etc.)

–Has options for and often delivered with full redundancy (no SPOF)

Group 5) Enterprise / Mainframe

More than 11 DrivesMore than 100

Devices

Support for more than

100 Devices

© SNIA 2009

Storage which exhibits large scalability and extreme robustness associated

with Mainframe deployments, though are not restricted to Mainframe only

deployments.–Mainframe connectivity with optional network connection (IP, SAN..)

–Always delivered with full redundancy (no SPOF)

–Often Capable of non-disruptive serviceability

Storage Taxonomy Summary(Continued)

Removable Media

Libraries

Virtual Media

Libraries

Infrastructure

Appliances

Infrastructure

Interconnect

Archival storage used in a

sequential access mode. A

Typical example would be Tape

based archival, both Stand Along

and Robotically assisted libraries.

Storage which simulates

removable Media Libraries.

Will typically use non tape

based storage and as such are

able to respond to data

requests more quickly

Devices placed in the storage SAN

or network adding value through

one or more dedicated Storage

enhancements. Examples include:

SAN Virtualization, Compression,

De-duplication, etc.

Devices which enable a SAN or

other Storage Network data

switching or routing.

Maximum Capacity Guidance Note:

Maximum Capacity Guidance reflects the maximum capacity a given offering can be purchased with

and/or field upgraded to. It is intended to be used as a guideline as apposed to an absolute value. There

will be case where a device may have greater or small capabilities, but otherwise is an appropriate match

for a given classification due to other criteria, e.g.: redundancy capabilities

Max Tape DrivesMax Storage Devices

Supported*Max Port Count

Group 1) SoHo & Consumer

Stand Alone Drive(No Robotics)

Note: * Infrastructure Appliances by

definition have no intrinsic storage, other than what is used for local

processing and/or local Cashing of

data.

Storage Devices Support in this case

refers to the number of storage

devices controllable down stream of

the Appliance

Storage which is designed primarily for home (consumer) or home / small

office usage.–Often Direct Connected (USB, IP, etc)

–No option for redundancy (will contain SPOFs)

Group 2) Entry, DAS, or JBOD

Up to 4 Drives Up to 32Storage which is dedicated to one or at most a very limited number of

servers. Often will not include any integrated controller, but rely on server

host for that functionality.–Often Direct Connected (SATA, IP, etc.)

–May optionally offer limited number of redundancy features

Group 3) Entry / Midrange

More than 4 Drives Up to 100 DevicesSupport for up to 20

DevicesUp to 128

SAN or NAS connected storage which places a higher emphasis on value

than scalability and performance. This is often referred to as „Entry Level‟

storage.–Network connected (IP, SAN, etc.)

–Has options for redundancy features

Group 4) Midrange / Enterprise

More than 24 Drives

More than 100

DevicesSupport for more than 20

DevicesMore than 128

SAN or NAS connected storage which delivers a balance of performance

and features. Offers higher level of management as well as scalability and

reliability capabilities.–Network connected (IP, SAN, etc.)

–Has options for and often delivered with full redundancy (no SPOF)

Group 5) Enterprise / Mainframe

More than 11 DrivesMore than 100

Devices

Support for more than

100 Devices

© SNIA 2009

Storage which exhibits large scalability and extreme robustness associated

with Mainframe deployments, though are not restricted to Mainframe only

deployments.–Mainframe connectivity with optional network connection (IP, SAN..)

–Always delivered with full redundancy (no SPOF)

–Often Capable of non-disruptive serviceability

See: Green Storage Power Measurement Specification for complete details

Page 43: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

Desired Storage Metric –

“Productivity”

• “typical workload”, with levels

• “four corners”, maximum

performance, maximum power

Standard Performance Evaluation Corporation

• The Green Grid Productivity Proxy Proposals

example – Proxy #4 – bits/kilowatt-hour

• detailed performance benchmark – results/W

Random,

read

Random,

write

Sequential

write

Sequential,

read

Many possible definitions – must balance simplicity against applicability

Page 44: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

Complications

Single disk drive power profile

IBM Haifa Research Labs

SPECweb 2005 (banking) + storage

Storage powerServer power

• Significant

whole-system

considerations

• Max power =/= Max performance

“Storage Modeling for Power

Estimation”, Miriam Allalouf , Yuriy

Arbitman, Michael Factor, Ronen I.

Kat, Kalman Meth, and Dalit Naor;

IBM Haifa Research Labs;

manuscript; March 2009

“The Next Frontier for Power/Performance Benchmarking:

Energy Efficiency of Storage Subsystems” Klaus-Dieter Lange;

SPEC Benchmark Workshop 2009; January 2009

Page 45: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

Need for Data Redundancy

RAID 10 – protect against multiple disk failures

DR Mirror – protect against whole-site disasters

Backups – protect against failures and unintentional deletions/changes

Compliance archive – protect against heavy fines

Test/dev copies – protect live data from mutilation by unbaked code

Overprovisioning – protect against volume out of space application crashes

Snapshots – quicker and more efficient backups

Page 46: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

App

Data

1 TB

5 TB

10 TB

Test/Dev

copies

Data

RAID10

“Growth”

Snapshots

Data

RAID10

“Growth”

Snapshots

Backup

Archive

Test

Test

Test

Test

Test

Compliance

Archive

Data

RAID10

“Growth”

Snapshots

Data

RAID10

“Growth”

Snapshots

Backup

Archive

Disk

Backup

Data

RAID10

“Growth”

Snapshots

Data

RAID10

“Growth”

Snapshots

Backup

DR

Mirror

Data

RAID10

“Growth”

Snapshots

Data

RAID10

“Growth”

Snapshots

Snap-

shots

Data

RAID10

“Growth”

Snapshots

Over-

provision

Data

RAID10

“Growth”

~10x +

RAID 10

Overhead

Data

RAID10

Data

- Power consumption is roughly linear in

the number of naïve (full) copies

Result of Redundancy

Page 47: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

Positive Effect of

Green Storage Technologies

RAID 5/6 Thin

Provisioning

Virtual

Clones

Dedupe

&

Compression

1 TB

5 TB

10 TB

Data

RAID10

“Growth”

Snapshots

Data

RAID10

“Growth”

Snapshots

Backup

Archive

Test

Test

Test

Test

Test

DataRAIDDP

“Growth”

Snapshots

DataRAID DP

“Growth”

Snapshots

Backup

Archive

Test

Test

Test

Test

Test

- Green storage technologies use less raw

capacity to store and use the same data set

- Power consumption falls accordingly

DataRAIDDP“Growth”

Snapshots

DataRAID DP“Growth”

Snapshots

Backup

Archive

Test

Test

Test

Test

Test

DataRAIDDP“Growth”

Snapshots

DataRAID DP“Growth”

Snapshots

Backup

Archive

Test

Test

Test

Test

Test

DataRAIDDP“Growth”

Snapshots

DataRAID DP“Growth”

Snapshots

Backup

Archive

Test

Test

Test

Test

Test

DataRAIDDP“Growth”

Snapshots

DataRAID DP“Growth”

Snapshots

BackupArchive

Multi-

Use

Backups

Page 48: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

Green Storage Technologies

Enabling technologies

Storage virtualization

Storage capacity planning

Green software

Compression

Snapshots

Virtual (writeable) clones

Thin provisioning

Non-mirrored RAID

Deduplication and SIS

Resizeable volumes

Page 49: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

Typical Savings

Thin provisioning

40 - 60%

Average 30% utilization over 80% utilization

RAID 6

35%

For 14-disk RAID 6 set, compared to RAID 1/10

Deduplication

40 – 95%, depending on dataset and time interval

~ 40 – 50% average over time

Resizeable volumes

20 – 50%

Page 50: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

Green Storage Technologies

(cont.)

Other storage technologies and power saving techniques

Capacity vs. high performance drives

ILM / HSM

MAID

SSDs

Power supply and fan efficiencies

Facilities-side technologies

Hot aisle/cold aisle

Water & natural cooling

Flywheel UPSs

Page 51: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

Savings Matrix

C SS VC TP R DD RV

Compression (C)

Snapshots (SS)

Virtual Clones (VC)

Thin Provisioning (TP)

RAID (R)

Deduplication (DD)

Resizeable Vols (RV)

Savings can multiply in combinations with checkboxes

Page 52: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

SNIA Green Efforts

SNIA Green Storage Initiative (GSI) and SNIA Green Storage Technical

Work Group (TWG)

on-going efforts to develop data-driven green standards & metrics

power measurements at multi-vendor “unplugged” fests

alliances with other active green organizations

(The Green Grid, 80PLUS/Climate Savers, DMTF, SPEC, SPC)

collaboration with EPA on the ENERGY STAR for Storage program

Whitepapers / workshops

four tutorials at SNW; online tutorials available

(www.snia.org/education/tutorials)

white papers from GSI

Page 53: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

Cloud Computing and Storage

Page 54: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

IDC: Worldwide IT Cloud Services Spending*/**

54

Storage5%

Server9%

App Dev &

Deployment11%

Business Applications

57%

Infrastructure Software

18%

Storage13%

Server8%

App Dev &

Deployment9%

Business Applications

52%

Infrastructure Software

18%

2008

$16.2 billion

2012

$42.3 billion

* by Product/Service Type, 2008 & 2012

** Includes enterprise IT spending on Business Applications, Systems Infrastructure

Software, Application Development

& Deployment Software, Servers and Storage

Source: IDC - IT Cloud Services Forecast - 2008, 2012: A Key Driver of New Growth

$5.5 billion

Page 55: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

Some basic cloud storage

attributes

Pay as you go

Self service provisioning

Scalable, Elastic

Rich application interfaces

No need for consumers to directly manage their own storage

resource

By offloading the Storage Management, data

owners can focus more on the management of data

requirements ...

Page 56: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

Cloud Computing Perceived Benefits

and Demand Drivers

Cloud computing‟s “nirvana-like”

promise drives higher service

level expectations among

business entities and individual

users

Which in turn puts pressure on

the enterprise data center to

deliver higher service quality (at

lower cost)IT Providers

Key Benefit:

Competitivenes

Lower TCO

Faster Time to Market

Higher Cust Rentention

Service quality

Resource optimization

Resiliency

Flexibility

Efficiency

“Green”

Enhanced chargeback

Business Entities

Key Benefit:

Innovation

Faster, easier innovation

New business models

New products and services

Faster time to market

Lower IT cost

Lower IT risk (brand

protection)

Improved IT user productivity

Improved Client Satisfaction

Improved Disaster Recovery

IT Users

Key Benefit:

Quality of Experience

Speed of access

Ease of access (anywhere,

anytime)

Ease of use

Minimal software requirements

on access device

No long-term commitments

Page 57: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

What is Cloud Storage?

Cloud Storage can be contrasted with SAN/NAS storageBoth are “Storage Networking”

Provisioning may be different (some interfaces do not require this)

How you pay for it may be different

One primary difference is that essential management tasks for storage resources are performed by the Cloud operator and not the storage user

Public Storage CloudsLatency may be an issue for most enterprise applications

Primarily aimed at web-facing applications that already serve data over the web

Importance of SLA Management

Private Storage CloudsCan be either web-facing or used for enterprise applications

Can be operated by internal IT departments – driving costs down and achieving better utilizations

Importance of SLA Management

Hybrid use of public and private clouds (including existing data centers)

This is not only about capacity provisioningData Assurance, Security, Delivery, Migration…

Leverage Virtualized and Self*/Automated Management EnvironmentsAlso part of Virtual Data Centers

Page 58: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

Some Examples of Cloud Interfaces

De facto and proprietary interfaces

Amazon S3 (http://aws.amazon.com/s3) “As simple as possible, but no simpler”

GoGrid (http://wiki.gogrid.com/wiki/index.php/Cloud_Storage)

Some offer standard data path APIs, but allocation and provisioning are behind “storefronts” or proprietary APIs

SAMBA, RSYNC, SCP – “standard” open source

Microsoft Azure Interface

De jure APIs

WebDAV (http://www.ietf.org/rfc/rfc2518.txt)

iSCSI (http://www.ietf.org/rfc/rfc3720.txt)

NFS (http://www.ietf.org/rfc/rfc3530.txt)

FTP (http://www.ietf.org/rfc/rfc959.txt)

But very few of these interfaces support the use of

metadata on individual data elements

Page 59: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

Cloud Storage:

Use Cases and Requirements

Store my file and give me back a URL (i.e. Amazon S3)

Best Effort Quality of Service?

Provision a filesystem and mount it (i.e. WebDAV)

Quality of Service specification via provisioning interface

Give me Filesystems/LUNs for my Cloud Computing

NAS box in the cloud…

Store my backup files until I need them back

Maybe offer me a local cache as well

Archive my files in the Cloud for Preservation/Compliance

Maybe offer me eDiscovery services, “tape in the mail” retrieval

Store all my files, allowing me to set the Data Requirements, let me cache

and distribute geographically

Policy driven Data Services based on Data System Metadata markings

Page 60: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

Types of APIs

Besides the “Data Path” APIs (previous slide), there are other interfaces that Cloud Storage may require

E.g. Storage Provisioning

For certain types of data storage interfaces (block, file) from the cloud you will need to provision/allocate storage before you can use it

This provisioning can be done via a UI or an API

Existing standards can be leveraged (e.g. SNIA SMI-S)

E.g. Storage Metering

Since the cloud storage paradigm is “pay as you go”, you need to know what your bill will be at the end of the billing cycle

What operations affect my bill?

UI typical, but an API standard would enable interoperability and better automation

Telecom Industry Practice – every transaction has a “Call

Detail Record” that is aggregated for billing

Page 61: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

Some Example

Data Storage Interfaces

Block Interfaces

SCSI, ATA, IDE

Local File Interfaces

POSIX, NTFS

Network File Interfaces

NFS, CIFS, SMB2, Appletalk, Novell, AFS

Object Based

OSD, XAM

Database

JDBC, ODBC

Not all of these make sense for the Cloud

Page 62: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

Cloud API to the

Resource Domain Model

Cloud interfaces with all 3 domains (Information, Data, Storage)

Integration of services with different type of Clouds (Compute, Applications...)

Federation of Clouds

Cloud Exchange, Cloudbursting…

Data Movement

Migration, Delivery, Regulations

Page 63: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

XAM API: an example

Data Storage Interface

XAM is the first interface to standardize system metadata for retention of data

Given this we can see that XAM is a data storage interface that is used by both Storage and Data Services (functions)

XAM implements the basic capability to Read and Write Data (through Xstreams)XAM has the ability to locate any XSet with a query or by supplying the XUID

XAM allows Metadata to be added to the data and keeps both in an XSet objectXAM uses and produces system metadata for each XSet

For example Access and Commit times (Storage System Metadata)

But it also uniquely specifies Data System Metadata for Retention Data Services

XAM User metadata is un-interpretable by the system, but stored with the other data and is available for use in queries

Page 64: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

Standards for Cloud Storage

Service access interfaces

Storage service interfacesProvisioning

QOS

Performance management

Chargeback accounting

Data protection

Storage Security

Storage infrastructure management interfaces (SMIS)

Service Management

SOA

Application

Middleware

Virtualized Infrastructure

Server / Storage / Network

Virtu

al Im

ag

e M

an

ag

em

en

t

Clo

ud

Se

rvic

e U

se

r

Compute

Page 65: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

SNIA Cloud Technical Work Group

www.snia.org/cloud

Engaging the industry

http://groups.google.com/group/snia-cloud

Alliances

Education & Whitepapers

Use Cases & Taxonomy

Interface Specification

And coming soon to Brazil! Cloud Storage Brasil

http://groups.google.com/group/snia-cloud-br?hl=pt-br

Page 66: Rio Info 2009 - Optimizing IT Costs using Virtualization, Green and Cloud Computing - David Royer

PRESENTATION TITLE GOES HEREThank You

Muito Obrigado!

www.snia.org

www.snia.com.br