Top Banner

of 43

Data storage management and retrieval

Jun 04, 2018

Download

Documents

sadafScribd
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/13/2019 Data storage management and retrieval

    1/43

  • 8/13/2019 Data storage management and retrieval

    2/43

    Section ObjectiveUpon completion of this section, you will be able to: Understand the concept of information availability

    and its measurement

    Describe the backup/recovery purposes andconsiderations

    Discuss architecture and different backup/recoverytopologies

    Describe local replication technologies and theiroperation Describe remote replication technologies and their

    operation.

    Introduction to Business Continuity - 2

  • 8/13/2019 Data storage management and retrieval

    3/43

  • 8/13/2019 Data storage management and retrieval

    4/43

    Chapter ObjectiveAfter completing this chapter, you will be able to:

    Define Business Continuity and Information Availability

    Detail impact of information unavailability Define BC measurement and terminologies

    Describe BC planning process

    Detail BC technology solutions

    Introduction to Business Continuity - 4

  • 8/13/2019 Data storage management and retrieval

    5/43

    What isBusiness Continuity Business Continuity is preparing for, responding to,

    and recovering from an application outage thatadversely affects business operations

    Business Continuity solutions address unavailabilityand degraded application performance

    BC is an integrated and enterprise wide process and set

    of activities to ensure information availability

    Introduction to Business Continuity - 5

  • 8/13/2019 Data storage management and retrieval

    6/43

    What is Information Availability

    (IA) IA refers to the ability of an infrastructure to function

    according to business expectations during its specifiedtime of operation

    IA can be defined in terms of three parameters: Accessibility

    Information should be accessible at right place and to theright user

    Reliability Information should be reliable and correct

    Timeliness Information must be available whenever required

    Introduction to Business Continuity - 6

  • 8/13/2019 Data storage management and retrieval

    7/43

    Causes of Information Unavailability

    Introduction to Business Continuit

    Disaster (

  • 8/13/2019 Data storage management and retrieval

    8/43

    Impact of Downtime

    Introduction to Business Continuity - 8

    Lost RevenueKnow the downtime costs (perhour, day, two days...) Number of employees

    impacted (x hours out *

    hourly rate)

    Damaged Reputation

    Customers

    Suppliers

    Financial markets

    Banks

    Business partners

    Financial Performance

    Revenue recognition

    Cash flow

    Lost discounts (A/P)

    Payment guarantees

    Credit ratingStock price

    Other Expenses

    Temporary employees, equipment rental, overtime

    costs, extra shipping costs, travel expenses...

    Direct loss

    Compensatory payments

    Lost future revenue

    Billing losses

    Investment losses

    Lost Productivity

  • 8/13/2019 Data storage management and retrieval

    9/43

    Measuring Information Availability

    MTBF:Average time available for a system or component to perform itsnormal operations between failures

    MTTR:Average time required to repair a failed component

    IA = MTBF / (MTBF + MTTR) or IA = uptime / (uptime +

    downtime)Introduction to Business Continuit

    Detection

    IncidentTime

    Detectionelapsed

    time

    Diagnosis

    Response Time

    Repair

    Recovery

    Repair time

    Restoration

    Recovery Time

    MTTRTime to repair or downtime

    Incident

    MTBFTime betweenfailures or uptime

  • 8/13/2019 Data storage management and retrieval

    10/43

    Availability MeasurementLevels of 9s Availability

    % Uptime % Downtime Downtime per Year Downtime per Week

    98% 2% 7.3 days 3hrs 22 min

    99% 1% 3.65 days 1 hr 41 min

    99.8% 0.2% 17 hrs 31 min 20 min 10 sec

    99.9% 0.1% 8 hrs 45 min 10 min 5 sec

    99.99% 0.01% 52.5 min 1 min

    99.999% 0.001% 5.25 min 6 sec

    99.9999% 0.0001% 31.5 sec 0.6 sec

    Introduction to Business Continuit

  • 8/13/2019 Data storage management and retrieval

    11/43

    BC Terminologies Disaster recovery

    Coordinated process of restoring systems, data, andinfrastructure required to support ongoing business

    operations in the event of a disaster Restoring previous copy of data and applying logs to that

    copy to bring it to a known point of consistency

    Generally implies use of backup technology

    Disaster restart Process of restarting from disaster using mirrored

    consistent copies of data and applications

    Generally implies use of replication technologies

    Introduction to Business Continuity - 11

  • 8/13/2019 Data storage management and retrieval

    12/43

    BC Terminologies (Cont.)Recovery Point Objective (RPO)

    Point in time to which systemsand data must be recovered afteran outage

    Amount of data loss that a

    business can endure

    Recovery Time Objective (RTO)

    Time within which systems,applications, or functions mustbe recovered after an outage

    Amount of downtime that a

    business can endure and survive

    Introduction to Business Continuity - 12Recovery-point objective Recovery-time objective

    Seconds

    Minutes

    Hours

    Days

    Weeks

    Seconds

    Minutes

    Hours

    Days

    Weeks Tape Backup

    Periodic Replication

    Asynchronous Replication

    Synchronous Replication

    Tape Restore

    Disk Restore

    Manual Migration

    Global Cluster

  • 8/13/2019 Data storage management and retrieval

    13/43

    Business Continuity Planning (BCP)

    Process Identifying the critical business functions

    Collecting data on various business processes within

    those functions Business Impact Analysis (BIA)

    Risk Analysis

    Assessing, prioritizing, mitigating, and managing risk

    Designing and developing contingency plans anddisaster recovery plan (DR Plan)

    Testing, training and maintenance

    Introduction to Business Continuity - 13

  • 8/13/2019 Data storage management and retrieval

    14/43

    BC Technology Solutions Following are the solutions and supporting

    technologies that enable business continuity anduninterrupted data availability:

    Single point of failure

    Multi-pathing software

    Backup and replication

    Backup recovery Local replication

    Remote replication

    Introduction to Business Continuity - 14

  • 8/13/2019 Data storage management and retrieval

    15/43

    Introduction to Business Continuity - 15

    Reso lv ing Sing le Points of Failure

    FC Switches

    Storage Array

    Redundant Network

    Clustered ServersRedundant Arrays

    Remote Site

    Redundant Ports

    Redundant FC Switches

    Redundant Paths

    Heartbeat Connection

    IP

    Storage Array

    Client

  • 8/13/2019 Data storage management and retrieval

    16/43

    Multi-pathing Software Configuration of multiple paths increases data

    availability

    Even with multiple paths, if a path fails I/O will notreroute unless system recognizes that it has analternate path

    Multi-pathing software helps to recognize and utilizes

    alternate I/O path to data Multi-pathing software also provide the load balancing

    Load balancing improves I/O performance and datapath utilization

    Introduction to Business Continuity - 16

  • 8/13/2019 Data storage management and retrieval

    17/43

    Backup and Replication Local Replication

    Data from the production devices is copied to replica deviceswithin the same array

    The replicas can then be used for restore operations in the

    event of data corruption or other events Remote Replication

    Data from the production devices is copied to replica deviceson a remote array

    In the event of a failure, applications can continue to run fromthe target device

    Backup/Restore Backup to tape has been a predominant method to ensure

    business continuity Frequency of backup is depend on RPO/RTO requirements

    Introduction to Business Continuity - 17

  • 8/13/2019 Data storage management and retrieval

    18/43

    Chapter SummaryKey points covered in this chapter:

    Importance of Business Continuity

    Types of outages and their impact to businesses Information availability measurements

    Definitions of disaster recovery and restart, RPO andRTO

    Business Continuity technology solutions overview

    Introduction to Business Continuity - 18

  • 8/13/2019 Data storage management and retrieval

    19/43

    Concept in PracticeEMC PowerPath

    Introduction to Business Continuity - 19

    SE

    RVER

    STORAGE

    SCSI

    Driver

    SCSI

    Driver

    SCSI

    Driver

    SCSI

    Driver

    SCSI

    Driver

    SCSI

    Driver

    SCSI

    Controller

    SCSI

    Controller

    SCSI

    Controller

    SCSI

    Controller

    SCSI

    Controller

    SCSI

    Controller

    PowerPath Host Based Software

    Resides between

    application and SCSI

    device driver

    Provides Intelligent I/O

    path management

    Transparent to the

    application

    Automatic detection

    and recovery from

    host-to-array path

    failures

    Host Application (s)

    LUNLUN

    LUNLUN

    Storage Network

  • 8/13/2019 Data storage management and retrieval

    20/43

    Check Your Knowledge Which concerns do business continuity solutions address?

    Availability is expressed in terms of 9s. Explain therelevance of the use of 9s for availability, using examples.

    What is the difference between RPO and RTO? What is the difference between Disaster Recovery and

    Disaster Restart?

    Provide examples of planned and unplanned downtime in

    the context of storage infrastructure operations. What are some of the Single Points of Failure in a typical

    data center environment?

    Introduction to Business Continuity - 20

  • 8/13/2019 Data storage management and retrieval

    21/43

    System Management

    Introduction to Business Continuity - 21

    Management systems in storage networks

    Five basic services

    Different service interface

  • 8/13/2019 Data storage management and retrieval

    22/43

    System Management

    Introduction to Business Continuity - 22

    Requirements

    User related

    Component related

    Architecture related

  • 8/13/2019 Data storage management and retrieval

    23/43

    System Management

    Introduction to Business Continuity - 23

    Requirements

    User related

    Network administrator

    Data transport functions properly

    transmission capacity and protocols

    Storage administrator

    Allocation of LUNs to the server

    RAID configuration

    Industrialist economist

    Wear and tear of devices

    Related cost

    Balanced management:

    Conception of network

    Implement of storage network

    Easier management

  • 8/13/2019 Data storage management and retrieval

    24/43

    System Management

    Introduction to Business Continuity - 24

    Requirements

    Component related

    Applications: These include all software that processes data in a

    storage network.

    Data: Data is the term used for all information that is processed by the

    applications, transported over the network and stored on storage

    resources.

    Resources: The resources include all the hardware that is required for

    the storage and the transport of the data and the operation of

    applications.

    Network: The term network is used to mean the connections between

    the individual resources.

    Individual component requirement: monitoring, availability, performance

    or scalability.

  • 8/13/2019 Data storage management and retrieval

    25/43

    System Management

    Introduction to Business Continuity - 25

    Requirements

    Architecture related

    servers and storage devices are decoupled by multiple virtualization

    layers

    assignment of storage capacity to servers

    application will be impacted by maintenance.

    host bus adapters, hubs, switches, gateways can each affect the data

    flow.

    Solution: Central management system

  • 8/13/2019 Data storage management and retrieval

    26/43

    System Management

    Introduction to Business Continuity - 26

    Basic Services

    Discovery

    Monitoring

    Central configuration

    Analysis

    Data management

  • 8/13/2019 Data storage management and retrieval

    27/43

    System Management

    Introduction to Business Continuity - 27

    The discovery component detects the applications and resources

    used in the storage network automatically.

    It collects information about the properties, the current

    configuration and the status of resources. The status comprises

    performance and error statistics.

    It correlates and evaluates all gathered information and supplies

    the data for the representation of the network topology.

  • 8/13/2019 Data storage management and retrieval

    28/43

    System Management

    Introduction to Business Continuity - 28

    The monitoring component compares continuously the current

    state of applications and resources with their target state.

    In the event of an application crash or the failure of a resource, it

    must take appropriate measures to raise the alert based upon the

    severity of the error that has occurred.

    The monitoring components performs error isolation by trying to

    find the actual cause of the fault in the event of the failure of part

    of the storage network

  • 8/13/2019 Data storage management and retrieval

    29/43

    System Management

    Introduction to Business Continuity - 29

    The central configuration component significantly simplifies the

    configuration of all components.

    For instance, the zoning of a switch and the LUN masking of a

    disk subsystem for the setup of a new server can be configured

    centrally where in the past the usage of isolated tools was

    required.

    Only a central management system can help the administrator to

    coordinate and validate the single steps.

    Furthermore it is desired to simulated the effects of potential

    configuration changes in advance before the real changes areexecuted.

  • 8/13/2019 Data storage management and retrieval

    30/43

    System Management

    Introduction to Business Continuity - 30

    The analysis component collects continuously current

    performance statistics, error statistics and configuration

    parameters and stores them in a data warehouse.

    These historic data enables trend analysis to determine capacity

    limits in advance to plan necessary expansions on time. This

    supports operational as well as economic conclusions.

    An further aspect is the spotting of error-prone components and

    the detection of single point of failures.

  • 8/13/2019 Data storage management and retrieval

    31/43

    System Management

    Introduction to Business Continuity - 31

    The data management component covers all aspects regarding

    the data such as performance, backup, archiving and migration

    and controls the efficient utilization and availability of data and

    resources.

    The administrator can define policies to control the placement

    and the flow of the data automatically.

  • 8/13/2019 Data storage management and retrieval

    32/43

    System Management

    Introduction to Business Continuity - 32

    Characteristics of Management

    interfaces:

    There are two main types of

    device in the storage network:

    connection devicesend-point devices.

    Types of interfaces:

    In-band

    Out- band

    Standardized

    Proprietary

  • 8/13/2019 Data storage management and retrieval

    33/43

    System Management

    Introduction to Business Continuity - 33

    In-bandInterfaces for the management of end-point devices

    Management functions for discovery, monitoring and configuration of connection

    devices and end-point devices are made available on this interface

    Out- band

    Most end point devices has one or more interfaces are not directly connected tothe storage network, but are available on a second, separate channel.

    In general, these are LAN connections and serial cables. This channel is not

    intended for data transport, but is provided exclusively for management purposes.

    This interface is therefore called out-band.

    Standardized

    Proprietary

  • 8/13/2019 Data storage management and retrieval

    34/43

    System Management

    Introduction to Business Continuity - 34

    Standardized : The standardisation and developmet for in-band management is found at two levels.

    In-band transport levels: The management interfaces for Fibre Channel, TCP/IP andInfiniBand are defined on the in-band transport levels.

    In-band upper layer protocols (ULP) : Primarily SCSI variants such as Fibre Channel FCP

    and iSCSI are used as an upper layer protocol. SCSI has its own mechanisms for requesting

    device and status information: the so-called SCSI Enclosure Services (SES). In addition to the

    management functions on transport levels a management system can also operate these

    upper layer protocol operations in order to identify an end device and request status

    information.

    Proprietary:

    APIs: Proprietary interfaces are differentiated as application programming interfaces (APIs),

    which are used to call special management functions.

    Telnet and Secure Shell (SSH) based interfaces and element managers. A great number of

    devices have an API over which special management functions can be invoked. These are

    usually out-band, but can also be in-band, implementations.

    Element manager: An element manager is a device-specific management interface. It is

    frequently found in the form of a graphical user interface (GUI) on a further device or in the

    form of a web user interface (WUI) implemented over a web server integrated in the device

    itself. Since the communication between element manager and device generally takes place

    via a separate channel next to the data channel, element managers are classified amongst the

    out-band management interfaces.

  • 8/13/2019 Data storage management and retrieval

    35/43

    System Management

    Introduction to Business Continuity - 35

    In-band Management:In-band management runs over the same interface as the one that connects

    devices to the storage network and over which normal data transfer takes place.

    This interface is thus available to every end device node and every connection

    node within the storage network. The management functions are implemented as

    services that are provided by the protocol in question via the nodes.

    Two types of services:

    Operational services: Operational services serve to fulfil the actual tasks of

    the storage network such as making the connection and data transfer.

    Management specific services : Management-specific services supply the

    functions for discovery, monitoring and the configuration of devices.

    In order to be able to use in-band services, a so-called management agent isnormally needed that is installed in the form of software upon a server connected

    to the storage network. This agent communicates with the local host bus adapter

    over an API in order to call up appropriate in-band management functions from an

    in-band management service.

  • 8/13/2019 Data storage management and retrieval

    36/43

    System Management

    Introduction to Business Continuity - 36

    In-band Management:

    In-band management runs

    through the same interface

    that connects devices to the

    storage network and via

    which the normal datatransfer takes place. A

    management agent

    accesses the in-band

    management services via

    the HBA API.

  • 8/13/2019 Data storage management and retrieval

    37/43

    System Management

    Introduction to Business Continuity - 37

    In-band Management: Fibre chanel SANServices for management: Each service defines so called one or more servers. Servers are split

    into individual components and implemented in distributed form by connecting individual

    components through fibre channel SAN.

    Directory services

    Management service

    Types of servers

    Name server: The name server is defined by the directory service. It is an example of an

    operational service. Its benefit for a management system is that it reads out connection

    information and the Fibre Channel specific properties of a port (node name, port type).

    Configuration Server: The configuration server belongs to the class of management-specific

    services. It is provided by the management service. It allows a management system to detect

    the topology of a Fibre Channel SAN.

    Zone server: The zone server performs both an operational and an administrative task. It

    permits the zones of a Fibre Channel SAN fabric to be configured (operational) and detected

    (management-specific).

  • 8/13/2019 Data storage management and retrieval

    38/43

    System Management

    Introduction to Business Continuity - 38

    In-band Management: Fibre chanel SANDiscovery

    The configuration server is used to identify devices in the Fibre Channel SAN and to

    recognise the topology. The so-called function Request Node Identification Data (RNID) is

    also available to the management agent via its host bus adapter API, which it can use to

    request identification information from a device in the Fibre Channel SAN. The function

    Request Topology INformation (RTIN) allows information to be called up about connected

    devices.

    Suitable chaining of these two functions finally permits a management system to discover the

    entire topology of the Fibre Channel SAN and to identify all devices and properties. If, for

    example, a device is also reachable out-band via a LAN connection, then its IP address can be

    requested in-band in the form of a so-called management address. This can then be used by

    the software for subsequent out-band management.

  • 8/13/2019 Data storage management and retrieval

    39/43

    System Management

    Introduction to Business Continuity - 39

    In-band Management: Fibre chanel SAN

    Monitoring

    Since in-band access always facilitates communication with each node in a Fibre Channel

    SAN, it is simple to also request link and port state information. Performance data can also be

    determined in this manner. For example, a management agent can send a request to a node in

    the Fibre Channel SAN so that this transmits its counters for error, retry and traffic. With the aid

    of this information, the performance and usage profile of the Fibre Channel SAN can be

    derived. This type of monitoring requires no additional management entity on the nodes inquestion and also requires no out-band access to them. The FC-GS-4 standard also defined

    extended functions that make it possible to call up state information and error statistics of other

    nodes. Two commands that realise the collection of port statistics are: Read Port Status Block

    (RPS) and Read Link Status Block (RLS).

  • 8/13/2019 Data storage management and retrieval

    40/43

    System Management

    Introduction to Business Continuity - 40

    In-band Management: Fibre chanel SAN

    Messaging

    In addition to the passive management functions described above, the Fibre Channel

    protocol also possesses active mechanisms such as the sending of messages, so-called

    events. Events are sent via the storage network in order to notify the other nodes of status

    changes of an individual node or a link. Thus, for example, in the occurrence of the failure of a

    link at a switch, a so-called Registered State Change Notification (RSCN) is sent as an event

    to all nodes that have registered for this service. This event can be received by a registeredmanagement agent and then transmitted to the management system.

  • 8/13/2019 Data storage management and retrieval

    41/43

    System Management

    Introduction to Business Continuity - 41

    In-band Management: Fibre chanel SAN

    Zoning Problem:

    In addition to the passive management functions described above, the Fibre Channel

    protocol also possesses active mechanisms such as the sending of messages, so-called

    events. Events are sent via the storage network in order to notify the other nodes of status

    changes of an individual node or a link. Thus, for example, in the occurrence of the failure of a

    link at a switch, a so-called Registered State Change Notification (RSCN) is sent as an event

    to all nodes that have registered for this service. This event can be received by a registeredmanagement agent and then transmitted to the management system.

  • 8/13/2019 Data storage management and retrieval

    42/43

    System Management

    Introduction to Business Continuity - 42

    In-band Management: Fibre chanel SAN

    Services for management: Two services

    Directory services

    Management service

  • 8/13/2019 Data storage management and retrieval

    43/43