Top Banner
Storage 1
21

Storage

Feb 08, 2016

Download

Documents

Esme

Storage. Storage 101. What is a storage array?. Agenda. Definitions What is a SAN Fabric What is a storage array Front-end connections Controllers Back-end connections Physical Disks Management Performance Future – Distributed storage. Definitions. SAN – Storage Area Network - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Storage

Storage

1

Page 2: Storage

Storage 101

What is a storage array?

Page 3: Storage

Agenda

• Definitions• What is a SAN Fabric• What is a storage array• Front-end connections• Controllers• Back-end connections• Physical Disks• Management• Performance• Future – Distributed storage

Page 4: Storage

Definitions• SAN – Storage Area Network

– This is generally used as a catch all term for all the following definitions– For storage personnel SAN does NOT equal storage array

• LUN – Logical Unit Number, also known as a volume• WWN – World Wide Name

– MAC address for storage networks• Fabric – Network that connects hosts to storage• iSCSI – Internet SCSI• SCSI – • FC – Fibre Channel• FCoE – Fibre Channel over Ethernet• FCIP – Fibre Channel over IP• Storage Array – Storage Device that provides block level access to volumes• DAS/DASD – Direct Attached Storage

– Storage directly attached to a server without any network• NAS – Network attached Storage

– Storage device that provides file level access to volumes• RAID – Redundant array of Independent Disks

– A way to combine multiple physical disks into a logical entity providing different performance and protection characteristics.

Page 5: Storage

5

What is a SAN fabric• A network comprising hosts, storage arrays they access,

and storage switches that provide the network connectivity

Page 6: Storage

6

Page 7: Storage

7

Page 8: Storage

8

SAN Fabric Details• A SAN Fabric has hosts that connect to the network

– Each host has a physical connection and some logical addresses pWWN (Port WWN) is the equivalent MAC address for the port on

the host that is connected to the network FCID is a dynamic address that represents the connection as well

– Only HP-UX 11v2 and below use this– Typically hosts connect into some storage switch

These look like traditional network switches in many ways and operate the same way.

These switches will contain both host ports and storage ports, or in the storage world, initiators and targets

– Storage arrays that provide storage also connect into these switches to provide the full network

Page 9: Storage

What is a storage array?

• A storage array is a system that consists of components that provide storage available for consumption– The components are front-end ports, controllers, back-end

ports, and physical disk drives

Page 10: Storage

10

Page 11: Storage

Front-end connections • Front-end connections are used for individual hosts to

connect to the storage array and utilize the volumes available– This can be directly connected in a small or medium size

SAN, or in a DAS environment• The physical transport mechanism can be fibre or copper• The logical transport protocols can be block level protocols

such as iSCSI, FC, or FCoE– Some arrays also support file level protocols as well such as

NAS devices• The larger arrays tend to have more front-end connections

to aggregate bandwidth and provide load balancing• Volumes are typically presented via one or more front-end

connections to hosts

Page 12: Storage

Controllers• Controllers are the brains that translate the request from

the front-end ports and determine how to fulfill the request• Controllers run code optimized for moving data and

performing mathematical calculations needed to support RAID levels

• Controllers also have a certain amount of on-board memory, or cache, to help reduce the amount of data that has to come from spinning disks. – Many arrays perform some level of read-ahead caching and

write caching to optimize performance• They also have some diagnostics routines and

management in order to support the operations of the array.

Page 13: Storage

Back-end connections• From the controllers themselves to the physical disk

shelves or disks there are back-end connections. – These send actual commands to the disks commanding them

to retrieve or write blocks of data. – These connections are usually transparent to all but the most

sophisticated storage engineer. Often times these have specific fan-out ratios where each disk

shelf may have two or four connections and split the bandwidth available in some way.

– Back-end connections are rarely a bottleneck

Page 14: Storage

Physical Disks• These days physical disks come in all shapes and sizes

– Spinning drives come in capacities of anywhere from 146GB to 3TB, with the space increasing year over year (though not performance) These drives also come in various rotational speeds anywhere

from 5400 RPM in a laptop drive to 15000 RPM in an enterprise class drive, which directly affects performance

– Non Spinning drives, also known as SSD’s, come in capacities that don’t yet match spinning drives, though there are SSD cards that have up to 960GB of storage space available.

– These physical disks directly impact the performance of the storage array system, and are usually the bottleneck for most enterprise class storage systems.

Page 15: Storage

15

Provisioning• Provisioning storage is a multi-step process

– Configure the host with any software including multi-path support

– Alias the host port WWN– Zone the host port alias to a storage array WWN– Activate update zone information– Create host representation on storage array– Create volume on storage array– Present/LUN Mask volume to correct host– Format volume for use

Page 16: Storage

Performance• There are many statistics you can use to monitor your storage devices, however

there tend to be two key ones that directly impact performance more than most. • IOPS – Input/Output Operations Per Second

– This is based on the number of disks that support the volume being used and the RAID level of the volume 15k RPM disks provide 200 IOPS raw without any RAID write penalty Raid 1 has a 1:2 ratio for writes. For every 1 write command sent to the array, 2

commands are sent to the disks. Raid 5 has a 1:4 ratio, while Raid 6 has a 1:6 ratio

– Read existing data block, Read Parity 1, Read Parity 2, Calculate XOR (parity) is not I/O, Write data, Write Parity 1, Write Parity 2

Read commands are always 1:1 For an application that has a requirement of 10,000 IOPS and a 50/50 read to write ratio

on a raid 6 volume:– 5,000 read IOPS, translating into 25 physical disks– 5,000 write IOPS translating into 30,000 back-end operations requiring 150 physical

disks– Total requirement is 175 physical disks just to support the performance needed!

• Bandwidth– This is based on the speed of the connections from the host to the array as well as

how much oversubscription is taking place within the SAN Fabric.

Page 17: Storage

Performance• Bandwidth

– This is based on the speed of the connections from the host to the array as well as how much oversubscription is taking place within the SAN Fabric.

• Fibre Channel currently supports 16Gb full duplex, though 8Gb is more common– That’s 3200 MBps in each direction, transferring 3GB of data

each second in one direction or 6GB of data bi-directionally. • FCoE currently supports 10Gb, though the roadmap

includes 40Gb and 100Gb– 10Gb is 2400 MBps in each direction, while 100Gb is 24000

MBps, 23.4GB per second!• Besides the speed is the matter of oversubscription

Page 18: Storage

Performance• Oversubscription – The practice of providing less

aggregate bandwidth than the environment may add up to• In an environment with 100 servers having dual 8Gb FC

connections we’d have a total of 1600Gb that is directed at a storage array via some SAN switch

• The storage array may only have a total of eight 8Gb FC connections for 64Gb aggregated bandwidth

• We have a ratio of 1600:64 or 25:1. – This is done in networking all the time and is now a standard

in the storage world. – The assumption is that there will never be a need for all 100

hosts to be transmitting 100% of the time their full bandwidth

Page 19: Storage

19

Storage Futures• Converged Infrastructure

– Datacenters designed today talk about converged infrastructures One HP Blade enclosure can encompass servers, networking,

and storage components that need to be configured in a holistic manner

Virtualization has helped speed this convergence up, though organizational design is usually still far behind.

– Storage arrays are beginning to support target based zoning The goal is to reduce the administration needed to configure a

host to storage mapping letting the storage array do more intelligent administration without human intervention

Page 20: Storage

20

Storage Futures• Over the last few years storage has begun transitioning

from “big old iron” to distributed systems where data is spread across multiple nodes for capacity and performance. – EMC Isilon– HP Ibrix– Nutanix– Vmware VSAN– Nexenta

• As always in IT, the pendulum is swinging back to the distributed platforms for storage where each node hosts a small amount of data instead of a big platform hosting all of the data.

Page 21: Storage

21

Storage Futures• Data protection is maturing from traditional RAID levels such as 1, 1+0,

5, 6, etc– RAID levels do offer additional protection however don’t protect against

corruption most of the time– RAID levels also have performance implications that are usually negative to

the applications residing upon them• These days the solution is to create multiple copies of files or blocks

based upon some rules– Most of the large public cloud providers use this solution including Amazon

S3, or simple storage service It just so happens by default anything stored in S3 has three copies!

• The ‘utopia’ world is a place where each application has some metadata that controls what protection level and performance characteristics are required– This would enable these applications to run internally or externally yet

provide the same experience regardless. – This is the essence of SDDC, Software Defined Data Center. The application

requirements will define where they run without any intervention.