Top Banner
The Administrator Shortcut Guide To tm The Administrator Shortcut Guide To tm High-Availability Network Attached Storage John Vacca
131
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: High-Availability Network Attached Storage

The AdministratorShortcut Guide Totm

The AdministratorShortcut Guide Totm

High-AvailabilityNetwork AttachedStorage

John Vacca

Page 2: High-Availability Network Attached Storage

Introduction

i

Introduction

By Sean Daily, Series Editor

Welcome to The Administrator Shortcut Guide to High-Availability Network Attached Storage!

The book you are about to read represents an entirely new modality of book publishing and a major first in the publishing industry. The founding concept behind Realtimepublishers.com is the idea of providing readers with high-quality books about today’s most critical IT topics—at no cost to the reader. Although this may sound like a somewhat impossible feat to achieve, it is made possible through the vision and generosity of corporate sponsors who agree to bear the book’s production expenses and host the book on its Web site for the benefit of its Web site visitors.

It should be pointed out that the free nature of these books does not in any way diminish their quality. Without reservation, I can tell you that this book is the equivalent of any similar printed book you might find at your local bookstore (with the notable exception that it won’t cost you $30 to $80). In addition to the free nature of the books, this publishing model provides other significant benefits. For example, the electronic nature of this eBook makes events such as chapter updates and additions, or the release of a new edition of the book possible to achieve in a far shorter timeframe than is possible with printed books. Because we publish our titles in “real-time”—that is, as chapters are written or revised by the author—you benefit from receiving the information immediately rather than having to wait months or years to receive a complete product.

Finally, I’d like to note that although it is true that the sponsor’s Web site is the exclusive online location of the book, this book is by no means a paid advertisement. Realtimepublishers is an independent publishing company and maintains, by written agreement with the sponsor, 100% editorial control over the content of our titles. However, by hosting this information, the sponsor has set itself apart from its competitors by providing real value to its customers and transforming its site into a true technical resource library—not just a place to learn about its company and products. It is my opinion that this system of content delivery is not only of immeasurable value to readers, but represents the future of book publishing.

As series editor, it is my raison d’être to locate and work only with the industry’s leading authors and editors, and publish books that help IT personnel, IT managers, and users to do their everyday jobs. To that end, I encourage and welcome your feedback on this or any other book in the Realtimepublishers.com series. If you would like to submit a comment, question, or suggestion, please do so by sending an email to [email protected], leaving feedback on our Web site at www.realtimepublishers.com, or calling us at (707) 539-5280.

Thanks for reading, and enjoy!

Sean Daily

Series Editor

Page 3: High-Availability Network Attached Storage

Table of Contents

ii

Introduction...................................................................................................................................... i

Chapter 1: Overview of High-Availability NAS Technology Solutions .........................................1

High-Availability NAS Fundamentals.............................................................................................2

Defining High-Availability NAS.........................................................................................2

Normal Availability Systems...................................................................................3

High-Availability Systems.......................................................................................3

Disk and CPU Fault Tolerance ................................................................................3

Hot Sparing ..............................................................................................................4

Clustering NAS Architectures .............................................................................................4

Clustering Scalable High-Availability NAS........................................................................8

Increasing the Scalability of High-Volume Servers ................................................8

Enabling Heterogeneous Computation ....................................................................9

Increasing the I/O Bandwidth to Requesters ...........................................................9

Improving Performance ...........................................................................................9

Managing SANs.................................................................................................................10

Making Sans as Self-Managing as Possible...........................................................10

Supporting Attribute-Based Management .............................................................10

Supporting Quality-of-Service Management.........................................................11

Facilitating the Dynamic Reconfiguring and Expanding of SANs........................11

Presenting a Single System View to Users ............................................................11

Presenting a Single System View for Management Purposes ...............................12

Utilizing High-Availability NAS from an Application’s Perspective ...............................12

Describing a High-Availability NAS System....................................................................13

Operating Within a High-Availability NAS Environment ................................................14

Accessing High-Availability NAS.........................................................................14

Data Sharing and Concurrent Update ................................................................................16

Choosing a High-Availability NAS Solution ................................................................................17

Benefiting from a High-Availability NAS Solution ..........................................................20

Streamlining Architecture......................................................................................20

Reducing Server I/O Bottlenecks...........................................................................21

Increasing Reliability and Data Availability..........................................................21

Allocating Efficient Use of Resources...................................................................21

Simplifying Using a High-Availability NAS Solution ..........................................22

Page 4: High-Availability Network Attached Storage

Table of Contents

iii

Increasing Productivity ..........................................................................................22

Lowering TCO.......................................................................................................22

Comparing High-Availability NAS Appliances with General Purpose Servers ...............23

Comparing High-Availability NAS with SAN..................................................................23

Strategizing Your Storage Solutions..............................................................................................25

Trying to Standardize DAS................................................................................................26

Standardizing High-Availability NAS...............................................................................27

Speeding Networks ............................................................................................................27

Shifting Storage .................................................................................................................28

Confusing High-Availability NAS and SAN.....................................................................28

Designing Storage Subsystems with Parallel Processing ..................................................30

Case Study .....................................................................................................................................30

Hosting Service at a Higher Level .....................................................................................31

Tasking the Storage Server Architecture ...........................................................................32

Improving Response Time and Availability ......................................................................32

Summary ........................................................................................................................................33

Chapter 2: Designing High-Availability NAS Solutions...............................................................34

Fibre Channel Topologies..............................................................................................................35

Point-to-Point Topology ....................................................................................................35

Arbitrated Loops ................................................................................................................36

Switched Fabric .................................................................................................................36

Fibre Channel Switching Hub............................................................................................38

Customer Selection of Fibre Channel Products .............................................................................39

Designing High-Availability NAS.................................................................................................41

LAN-Free Backups ............................................................................................................42

Serverless Backups ............................................................................................................42

Clustering...........................................................................................................................43

High Availability ...............................................................................................................43

Centralizing Storage...........................................................................................................44

Video Editing .....................................................................................................................44

Disaster Recovery ..............................................................................................................44

Backups..............................................................................................................................44

SAN/NAS Server Component Redundancy ......................................................................44

Page 5: High-Availability Network Attached Storage

Table of Contents

iv

High-Availability NAS Design Options ........................................................................................46

Distributed Server Aggregation Layer...............................................................................46

Test Results For Network Design Option 1...........................................................48

Redundant Server Aggregation Layer ...............................................................................52

Test Results for Network Design Option 2............................................................53

Cost Justification and Considerations............................................................................................55

Lack of NAS Standards .................................................................................................................57

Architectural Design Considerations .............................................................................................58

High-Availability NAS Solution Architecture...................................................................58

Availability: Filer Cluster ......................................................................................59

Scalability: Network Storage .................................................................................60

Flexibility: Wide Sharing and Local Sharing Deployments ..................................61

System Availability................................................................................................62

Data Availability....................................................................................................66

The Snapshot Feature: Instantaneous File Recovery .............................................67

Instantaneous File System Recovery .....................................................................68

Cost-Effective Automated File System Replication ..............................................69

Case Studies ...................................................................................................................................71

Case Study 1 ......................................................................................................................71

Description of Systems ..........................................................................................71

Citrix MetaFrame...................................................................................................72

Lunar Flare NAS....................................................................................................73

Integrated MetaFrame and Lunar Flare NAS System............................................75

Technical Benefits .................................................................................................76

High Availability ...................................................................................................76

Application Processing Scalability ........................................................................78

Lunar Flare NAS Scalable Storage Capacity.........................................................78

Scalable Bandwidth Between MetaFrame Servers and Lunar Flare NAS ............79

Case Study 2 ......................................................................................................................79

E7000 Overview ....................................................................................................79

Best Practices .........................................................................................................80

Security as the Highest Priority .............................................................................80

Performance as the Highest Priority ......................................................................80

Page 6: High-Availability Network Attached Storage

Table of Contents

v

Security and Performance as Equal Priorities........................................................80

Supported Antivirus Software Packages................................................................81

Computer Associates eTrust InoculateIT...............................................................81

Network Associates NetShield ..............................................................................81

Symantec Norton AntiVirus Corporate Edition.....................................................81

Trend Micro ServerProtect ....................................................................................81

Technical Benefits .................................................................................................81

Summary ........................................................................................................................................82

Chapter 3: Planning for High-Availability NAS Solutions ...........................................................83

Identify Your Business Requirements ...........................................................................................83

Determine Your Requirements ..........................................................................................83

Identify Your Top Priority .................................................................................................85

Inventory and Analyze Your Environment....................................................................................85

Enlist the Help of High-Availability NAS Vendors ..........................................................85

Begin Your Inventory with a Broad List ...........................................................................86

Analyze Your Inventory Information ................................................................................86

Determine Your High-Availability NAS Components..................................................................87

Select Your Components ...................................................................................................87

Fibre-Channel Switches .........................................................................................88

Storage Devices .....................................................................................................89

Bridges ...................................................................................................................89

HBAs......................................................................................................................89

Cabling...................................................................................................................89

Cable Connectors ...................................................................................................90

GBICs ....................................................................................................................90

Validate Compatibility.......................................................................................................90

Calculate Your Needed Port Count....................................................................................90

Develop Your High-Availability NAS Design Plan......................................................................91

Keep Your Plan Simple to Start.........................................................................................91

Core-to-Edge: The Ideal Fabric Design Plan.....................................................................91

Pre-Tested High-Availability NAS Design Plans..............................................................92

Purchase Your High-availability NAS Solution ................................................................92

Planning High-Availability NAS Technical Solutions Customization..........................................92

Page 7: High-Availability Network Attached Storage

Table of Contents

vi

The Need for Storage Networking.....................................................................................93

High-Availability NAS on the Edge ......................................................................94

Storage Virtualization ............................................................................................94

iSCSI ......................................................................................................................96

NDMP....................................................................................................................96

Cisco and Network Appliance: An Integrated Approach ..................................................97

Deployment Scenarios .......................................................................................................98

Internet and E-Business Applications....................................................................98

Business Applications in the Data Center............................................................101

Workgroup Collaboration ....................................................................................104

Distributed Enterprise Storage with Secure WAN Connectivity.........................106

Case Study ...................................................................................................................................108

Tarantella .........................................................................................................................108

Findings................................................................................................................109

Arrays vs. Multiprocessor Machines ...................................................................110

Tricord Value Proposition for Tarantella Server Arrays .....................................111

Performing Additional Capacity Planning Tests .................................................111

Future Expansion .................................................................................................113

Summary ......................................................................................................................................113

Chapter 4: Installing and Deploying High-Availability NAS Solutions .....................................114

Installation and Deployment Steps ..............................................................................................114

Establish an Implementation Plan....................................................................................115

Naming Plan.........................................................................................................115

Prototype and Testing Plan ..................................................................................115

Production Deployment Plan ...............................................................................116

Create a Prototype and Test Your High-Availability NAS Solution...............................116

Switches First, Edge Devices Second, Hosts Last...............................................116

Create a Baseline Logical and Physical Diagram................................................116

Testing Scenarios .................................................................................................117

Running an I/O Load ...........................................................................................117

Transition and Release to Production ..............................................................................117

Create Documentation About Your New High-Availability NAS ......................118

Case Studies .................................................................................................................................119

Page 8: High-Availability Network Attached Storage

Table of Contents

vii

Case Study 1: Expanding Storage the High-Availability NAS Way...............................119

Case Study 2: A Packaged Solution ................................................................................120

Case Study 3: Using HSM Technology...........................................................................121

Summary ......................................................................................................................................122

Page 9: High-Availability Network Attached Storage

Copyright Statement

viii

Copyright Statement © 2004 Realtimepublishers.com, Inc. All rights reserved. This site contains materials that have been created, developed, or commissioned by, and published with the permission of, Realtimepublishers.com, Inc. (the “Materials”) and this site and any such Materials are protected by international copyright and trademark laws.

THE MATERIALS ARE PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. The Materials are subject to change without notice and do not represent a commitment on the part of Realtimepublishers.com, Inc or its web site sponsors. In no event shall Realtimepublishers.com, Inc. or its web site sponsors be held liable for technical or editorial errors or omissions contained in the Materials, including without limitation, for any direct, indirect, incidental, special, exemplary or consequential damages whatsoever resulting from the use of any information contained in the Materials.

The Materials (including but not limited to the text, images, audio, and/or video) may not be copied, reproduced, republished, uploaded, posted, transmitted, or distributed in any way, in whole or in part, except that one copy may be downloaded for your personal, non-commercial use on a single computer. In connection with such use, you may not modify or obscure any copyright or other proprietary notice.

The Materials may contain trademarks, services marks and logos that are the property of third parties. You are not permitted to use these trademarks, services marks or logos without prior written consent of such third parties.

Realtimepublishers.com and the Realtimepublishers logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners.

If you have any questions about these terms, or if you would like information about licensing materials from Realtimepublishers.com, please contact us via e-mail at [email protected].

Page 10: High-Availability Network Attached Storage

Chapter 1

1

Chapter 1: Overview of High-Availability NAS Technology Solutions

The torrents of information storming through today’s businesses could hardly have been foreseen when the first computer systems achieved desktop status. These units came equipped with the storage capacity of a goldfish bowl by today’s standards. Building on this early direct-attached storage (DAS) architecture, Information Technology (IT) departments soon answered increasing information demands with general-purpose servers and DAS, typically attached using a SCSI high-speed interface. Now these processing and storage initiatives are hard pressed to support and direct the monumental data requirements of enterprise resource planning (ERP), IT, and data warehousing for today’s companies.

Thanks in a large part to the Internet, today’s information influx does not stop. Data is created, transmitted, stored, and delivered around the clock. And both internal and external customers are becoming more dependent on rapid, reliable access to company data. Those companies that are not yet Internet-operational feel the pressure to get there fast. This scenario also leaves Internet and applications service providers (ASPs) as well as dot-com organizations scrambling for reliable, scalable solutions. Overall, businesses need to meet skyrocketing storage needs, and they’d like to do so without an exponential increase in talented IT professionals who are difficult to find and expensive to hire. High-availability Network Attached Storage (NAS) is the answer.

Enterprises face considerable challenges brought by the rapid adoption of emerging Internet business applications and the associated storage infrastructure requirements. Storage networking aims to deliver enterprise-class storage solutions to significantly mitigate these challenges and lower the total cost of ownership (TCO), thus eliminating the need for a costly IT staff. The TCO includes the original cost of a computer and software, hardware and software upgrades, maintenance, technical support, and training. For example, Network Appliance and Cisco Systems are collaborating to deliver networked enterprise-class storage solutions to help enterprises meet these challenges. These enterprise-class storage solutions deliver key business benefits such as scalability, performance, simplified management, increased availability, and security, while leveraging existing investments and expertise in Internet Protocol (IP) networks—drastically reducing TCO.

This introductory chapter sets the stage for the rest of the book by presenting very detailed and in-depth coverage of applications utilizing high-availability NAS. Let’s start with an extensive discussion of high-availability NAS fundamentals.

Page 11: High-Availability Network Attached Storage

Chapter 1

2

High-Availability NAS Fundamentals High-availability NAS is designed to separate storage resources from network and application servers to simplify storage management and improve the reliability, performance, and efficiency of the network; thus, increasing the overall productivity of the organization. Furthermore, NAS servers are self-contained, intelligent devices that attach directly to your existing LAN. A file system is located and managed on the NAS device and data is transferred to clients over industry-standard network protocols (TCP/IP or Internet Packet eXchange—IPX) using industry-standard file-sharing protocols, such as Server Message Block (SMB), Common Internet File System (CIFS), NetWare Core Protocol (NCP), Network File System (NFS), AppleTalk Filing Protocol (AFP), or Hypertext Transfer Protocol (HTTP). This intelligence on the NAS device enables true data sharing among heterogeneous network clients.

Defining High-Availability NAS High-availability NAS is a concept of shared storage on a network. It communicates using Network File System (NFS) for UNIX environments, CIFS for Microsoft Windows environments, File Transfer Protocol (FTP), HTTP, and other networking protocols. High-availability NAS brings platform independence and increased performance to a network.

NAS devices are often referred to as appliances because many can simply be plugged in to the network with little or no configuration.

A NAS device is typically a dedicated, high-performance, high-speed communicating, single-purpose machine or component. NAS devices are optimized to stand alone and serve specific storage needs with their own OSs and integrated hardware and software. Think of NAS devices as types of Plug and Play (PnP) appliances, except NAS devices have the purpose of serving your storage requirements. The systems are simplified to address specific needs as quickly as possible—in real time. NAS devices are well suited to serve networks that have a mix of clients, servers, and operations and may handle such tasks as Web cache and proxy, firewall, audio-video streaming, tape backup, and data storage with file serving.

High-availability of critical information services is affected by both scheduled and unscheduled system downtimes. Although scheduled downtime for system maintenance and upgrades is inevitable, it is fatal to information services that are considered non-interruptible. In addition, unscheduled downtime is unpredictable and should be avoided. Human errors, OS failures, computer hardware failures, and network failures are usually the cause for most unscheduled downtimes.

Before I describe the methods for preventing these failures, you must first understand the definitions of the various levels of availability. They are normal availability, high availability, and CPU fault tolerance.

Page 12: High-Availability Network Attached Storage

Chapter 1

3

Normal Availability Systems Normal availability systems are defined as general-purpose computer hardware and software systems that have no hardware redundancy or software enhancement to provide fault-processing recovery. They require manual, human intervention to identify and correct (that is, repair) the failed component(s) and restart the system before resuming normal operations.

High-Availability Systems High-availability systems are defined as loosely coupled NAS with redundant hardware components managed by software that provides fault detection and correction procedures to maximize the availability of the critical services and applications provided by that system. These systems require no manual, human intervention to identify a failed component, execute a procedure to avert a system failure, and notice the averted failure. This configuration minimizes the possibility of immediate data loss and service interruption. There are two distinct high-availability models for client-server architectures: the Replicated Services Model and the Failover Model.

• Replicated Services Model—The Replicated Services Model utilizes distributed applications and distributed databases on multiple servers in a LAN/WAN environment in which the data is replicated to some or all the servers. When a server failure occurs, the data and applications are accessible from an alternative server.

• Failover Model—The Failover Model utilizes duplicate server hardware configurations in which one server has the role of an active server for data and application services and the other is a backup server that monitors the state of the active server. When the backup server detects a hardware or software failure that has occurred on the active server, it takes over the role and identity of the active server.

The backup service is also known as the “passive node” or “passive server.”

Disk and CPU Fault Tolerance The disk and CPU fault-tolerance definition consists of proprietary, expensive, and tightly coupled duplicated systems. Fault-handling capabilities are integrated into and become a function of the OS. These systems have spontaneous and fully automatic response to system failures and provide uninterrupted services.

Page 13: High-Availability Network Attached Storage

Chapter 1

4

Hot Sparing Hot sparing is a fault-tolerant configuration that reserves a disk for use when something goes wrong with another disk in the array. When you assign a disk to be a hot spare, that disk is put aside and is not used for storing data. Its purpose is to replace a failed member of a fault-tolerant set. The advantage of using a hot spare is that disk failures can be resolved without the need to shut down the system or interrupt data access.

For example, Tricord’s Automatic Hot Spare Support ensures high-availability in the event of appliance failure by automatically rebuilding data onto the spare appliance. It affords the administrator the flexibility to use spare appliance as additional capacity when necessary.

The next part of the chapter lays the foundation for developing high-availability NAS requirements by positing a system architecture in which NAS solves fundamental problems or offers significant functional advantages over traditional architectures. In a high-availability NAS context, server clustering generally refers to the grouping together of servers for the purpose of enhancing their performance and/or providing failover protection in the event that a member server malfunctions. Uninterrupted and seamless availability of data and applications during and after a server failure is a primary benefit of a server cluster architecture within high-availability NAS.

One must be careful with the word “uninterrupted” here. Although NLB clusters offer 100 percent uptime, an application can be unavailable for a few seconds to several minutes during a server cluster failover.

Clustering NAS Architectures Though servers can be clustered together outside of a Storage Area Network (SAN) environment, as Figure 1.1 shows, there are many benefits associated with clustering them together as part of high-availability NAS, as Figure 1.2 shows. These benefits include shared access to resilient disk and tape backup systems, higher performance data replication options, improved storage scalability, and enhanced resource availability through the inherent advantages of high-availability NAS–based storage systems.

A SAN is a discrete network of servers and storage devices (RAID, tape libraries, and so on) attached together via a high speed I/O interconnect, such as Fibre Channel. Data is transferred via serial I/O rather than network protocols, and raw data requests are made directly to disk and not over the LAN. All storage transactions are processed on a separate network with dedicated bandwidth for data.

Page 14: High-Availability Network Attached Storage

Chapter 1

Figure 1.1: Server clustering.

5

Page 15: High-Availability Network Attached Storage

Chapter 1

Figure 1.2: Server clustering: physical disks residing in a dedicated SAN fabric.

The traditional progression for an end user is to purchase a system and when additional processing capability is needed, to replace the system with a bigger system. This method has been the prevailing method for mainframes in particular, as Figure 1.3 shows. At multiple points in this sequence, traumatic discontinuities occur. If the user outgrows the architecture that the user started with, the user may have to convert from one OS to another or even from one vendor’s proprietary architecture to another’s. Converting entails huge costs for the organization in both dollars and employee time, such that these conversions are avoided if at all possible. Another disadvantage with this model is the poor residual value of computer equipment. A system replacement often results in the invested capital being lost when the old system goes out the door. Moreover, larger systems tend to be sold in lower volume than smaller ones, which results in each new system having a higher cost of computing.

6

Page 16: High-Availability Network Attached Storage

Chapter 1

Figure 1.3: Traditional mainframe computing model.

You should consider the alternative architecture shown in Figure 1.4. Replacing a mainframe computer with a cluster of smaller, standards-based servers offers compelling advantages to the user. Because the cluster can start off as a single system, the threshold to entry is lower. These smaller systems are sold in higher volume, making the cost of computing less. With no dependence on proprietary architectures, the availability of equipment from multiple sources lets the user choose the best alternative with each purchase.

Figure 1.4: Clustered high-availability NAS computing model.

There are more advantages. The upgrade cost can be controlled more precisely by adding only the amount of additional resources required. If a different vendor has leapfrogged the original supplier in terms of offering what the user needs at the time of upgrade, the user would be free to choose the best value without concern about migration or conversion to a new architecture. With the right architecture, there might never be a need for conversions from one OS to another, retraining of staff, or developing new procedures—all of which have plagued mainframe customers in the past.

It would be nice if the clustered high-availability NAS computing model were as easy to achieve as it is to draw. There are fundamental architectural roadblocks to be overcome if the cluster model is to realize its potential.

Although it is perhaps too simplistic to tie the continued importance of mainframes in the systems market to only two characteristics, there are two significant ones that prevent clusters of open architecture systems from pervasively replacing mainframes. One is the inability of clustered systems to share data in a way that lets the cluster take on the workload that a single mainframe can. The other arises from the extensive experience the mainframe world has in managing storage and data. This experience has evolved into management software that simply has as yet no equivalent in the open systems markets. If clusters of systems are to replace mainframes, these two deficiencies will have to be remedied. Enter high-availability NAS.

7

Page 17: High-Availability Network Attached Storage

Chapter 1

8

Clustering Scalable High-Availability NAS High-availability NAS offers the prospect of solving both open system shortcomings mentioned earlier. The ability to scale processors and storage independently and linearly is a fundamental goal of high-availability NAS. The high-availability NAS concept physically disassociates storage from processors. Devices are no longer peripheral components of a processor, but a separate and equal architectural entity. Storage can be managed, changed, and expanded with no impact on the processor configuration or operations—if high-availability NAS is done right. It should also liberate the processors in the same way. This element is the easy part of scalability.

For high-availability NAS to deliver fully on its scaling promise, there must be no overhead that increases appreciably faster than the rate of scaling. For instance, in loosely coupled, shared-processor systems today, this interprocessor-communication activity goes up exponentially as the number of processors increase. This kind of overhead severely limits scalability, and high-availability NAS research aims to eliminate this barrier to scaling.

Making a cluster of servers process the same workload a mainframe accepts is easy, unless it requires that all the servers process transactions on the same data. Unless a cluster can run enterprise applications such as a Fortune 500 manufacturing system, an airline’s reservation system, or a financial institution’s complete inventory of transactions, a cluster cannot replace a mainframe.

If, however, each processor in the cluster can accept any transaction that would have arrived at the mainframe and process it, there is hope for the cluster model. The challenge to the cluster’s ability to do so is giving the servers the ability to share data, including having concurrent update rights in a way that increasing the number of processors does not destroy performance. With this, any server in the cluster could process any transaction, and processors could be added as needed with no fall off in performance. This ability is what a cluster model must demonstrate if it is to replace mainframes. Specifically, high-availability NAS aims to do the following:

• Increase scalability of high-volume servers

• Enable heterogeneous computation

• Increase the I/O bandwidth to requesters

• Improve performance

Increasing the Scalability of High-Volume Servers High-availability NAS increases the scalability of high-volume servers by applying clusters of them to large, monolithic applications that traditionally have been serviced only by mainframes or tightly coupled processors. In other words, high-availability NAS will not only will provide sufficient connectivity for a large group of servers to have access to all the storage they require (in terms of both distance and number), but also provide a mechanism so that many servers sharing update access to common storage can do so without the excessive overhead that accompanies clusters today.

Page 18: High-Availability Network Attached Storage

Chapter 1

9

Enabling Heterogeneous Computation When high-availability NAS enables heterogeneous computations, systems are allowed to run different OSs that access a common pool of storage and shared data. Perhaps an end user with a cluster of servers in a data mining application finds that a supercomputer doing statistical analysis on the data would be of value. If the end user could just wheel up such a system and have it act upon the data already in place, it would be much more cost effective than if the end user had to purchase a completely separate system, including a duplicate set of storage devices. Given the size of many data warehouses, the latter would not be possible; the data movement time alone would be prohibitive. Nevertheless, the fact that high-availability NAS makes the storage data organization free for the OS should make it possible for a server running any OS to access the data.

There is the problem of the internal data structures of a file, which must be reconciled at the application level. In other words, the key to fully understanding such complex topics as enabling heterogeneous computation is to become intimately familiar with the various internal data structures that reside within the confines of the kernel. Direct access to some of these structures is necessary for even the most primitive operations, while others reside at a much lower level.

Increasing the I/O Bandwidth to Requesters High-availability NAS also makes storage available to requesters at network speeds without “channeling” it through a server. This ability means increased performance for high-availability data applications.

Improving Performance High-availability NAS improves performance by eliminating the necessity for a server to translate a request into physical device accesses. When several servers are accessing the same data, the metadata never travels over the interconnect.

Metadata is data about data. Metadata describes how and when and by whom a particular set of data was collected, and how the data is formatted. Metadata is essential for understanding information stored in data warehouses and SANs.

Though there are a lot of factors affecting how caching is done and how it is affected by the object abstraction, there are some clear opportunities for benefit. A common cache of the metadata serves all requesters with no coherency issues. The number of I/O operations is correspondingly reduced, as a single metadata retrieval can service requests from multiple requesters. Furthermore, high-availability NAS understands quite precisely which objects are in use and which are not; thus, the cache space can be more effectively utilized. Typically, even system caches are not tied closely to the file system and do not reflect what the OS knows about files being open or closed.

Page 19: High-Availability Network Attached Storage

Chapter 1

10

A file can be closed and then be needed quite soon thereafter. A cache that is not as closely tied to the file system may happen to fortuitously retain data that otherwise would have been discarded. Even this condition can be accounted for in the high-availability NAS environment, with an indication that an object being closed will be soon accessed again.

Managing SANs The second fundamental problem is that of SAN management. The NAS view is to make all components of the SAN architecture participate in the management. By breaking down management from a huge CPU task to many small activities assigned to the lowest possible level, and directing those activities by means of simple attributes expressed by the user, a SAN takes on the responsibility for managing itself. High-availability NAS should make it possible to

• Make SANs be as self-managing as possible

• Support attribute-based management

• Support quality-of-service management

• Facilitate the dynamic reconfiguring and expanding of SANs

• Present as nearly as possible a single system view to users

• Present as nearly as possible a single system view for management purposes

Making Sans as Self-Managing as Possible Making SANs as self-managing as possible will eliminate the associated drudgery now imposed on the OS by such requirements as space management. It should also make scalability more linear by increasing SAN management capability at the same rate as the number of storage devices increases. Today, an OS assumes the responsibility for allocating space on a disk drive, reclaiming space from deleted files, and (in some cases) deallocating bad sectors. Doing so for one drive is not difficult, but a server with dozens of drives could find it consuming quite a good portion of its processing cycles. High-availability NAS would take over space management, eliminating any increase in OS overhead associated with the number of drives on a system. A server with dozens of disk drives would get the benefit of all those devices, contributing horsepower for managing the SAN resource.

Supporting Attribute-Based Management High-availability NAS should make it possible to support attribute-based management by having the devices take action based on the properties assigned to a given object or set of objects. The many engines (each storage device having some usable processing power to apply to this task) available would be put to additional use by helping with more than just space management. They could contribute to breaking the task of data management into many simple, small functions performed concurrently. For instance, an attribute could be set for an object that called for that object to be versioned. Every time the object was closed after an update, the storage device could automatically keep the old version of the object while giving the new one a separate identifier.

Page 20: High-Availability Network Attached Storage

Chapter 1

11

Similarly, an attribute might be set to indicate that an object should be exported after it is updated. This setting could cause the device to start a sequence of actions independent of the application processor, and these actions would result in a copy of that object being sent to another device. Extrapolating this very simple process, an entire storage complex could participate in a timelier and less intrusive backup function. If the devices knew enough about what work was going on, they could make sure that an export operation only took place when an object was in a consistent state. Doing so is a little more complicated for some data; complex data structures such as databases may require that a multiple NAS appliances take coordinated action to set them up. However, even this requirement can be handled by a straightforward extension of the basic capability.

For example, Tricord’s Lunar Flare NAS clusters scale multiple Lunar Flare NAS appliances into a single storage pool for Windows and AppleTalk clients.

Supporting Quality-of-Service Management High-availability NAS supports the quality of service management by having the storage devices be as knowledgeable as possible about their own conditions, and informing the appropriate service of those conditions or acting in response to those conditions as guided by policy assignments. Suppose a disk drive has several levels of transfer-rate performance to offer a requester. The device could allocate an object to whatever zone is most appropriate given the users’ interest in cost versus performance. This ability could as easily apply to high-availability NAS that has combinations of mirrored, RAID 5, and unprotected storage, with the user selecting the residence for particular data depending on the user’s requirements for resiliency. A quality-of-service management system could interpret user choices into attributes associated with particular objects, leaving the device to act on those attributes and use its resources accordingly.

Facilitating the Dynamic Reconfiguring and Expanding of SANs High-availability NAS facilitates the dynamic reconfiguring and expanding of SANs. Whenever additional space is required, there is a central authority to which any sever in the SAN could turn to find additional space or to find all storage devices available to it. This could be the basis for OSs being more dynamic and flexible as to what hardware they are operating with at any point. It need not be the peripheral set that was present at system generation time or even power up time.

Presenting a Single System View to Users High-availability NAS presents as nearly as possible a single system view to users. If all requesters accessing a high-availability NAS configuration (and all users contacting the requesters) get a single view of the data regardless of which member of a cluster they are connected to, clusters of systems can look and feel much more like a single system than otherwise. If clusters of high-volume servers are to replace large monolithic systems, this view will be key.

Page 21: High-Availability Network Attached Storage

Chapter 1

12

Presenting a Single System View for Management Purposes Finally, high-availability NAS presents as nearly as possible a single system view for management purposes. If clusters are to work as well as mainframes, the management tools must let the user control the configuration as if it were a single, coherent computing facility, even if there are multiple vendors and OSs represented.

For example, Tricord’s Illumina software makes deployment and management of a cluster as simple as managing a single appliance.

Utilizing High-Availability NAS from an Application’s Perspective As data requirements increase at a rapid pace, end users demand reliable, high-performance, and global access to information from anywhere at anytime. IT managers constantly seek more affordable, manageable storage solutions that meet these user expectations. High-availability NAS has led the way for the mainstream deployment of storage solutions that facilitate data consolidation and sharing. By leveraging well-understood technologies (such as IP, Gigabit Ethernet, NFS, and CIFS), high-availability NAS enables a flexible, robust storage solution that is easily managed and scaled, and contributes to a scalable and reliable network and SAN infrastructure.

For example, the IP/Gigabit Ethernet networking technology combined with storage appliances is one possible solution for delivering high-performance applications and features such as scalability, availability, and security. This possible solution addresses customer requirements for high-availability NAS in several dynamic application areas:

• Internet e-business applications—High-performance data-sharing and scalable-networked storage infrastructures for e-businesses.

• Business applications in the data center—Superior data availability and recoverability for enterprise business applications within a data center.

• Workgroup collaboration—High-performance data sharing across heterogeneous OS environments.

• Distributed storage over secure WAN—Collaboration among distributed sites with centralized administration and disaster recovery. (For an example of a product that accomplishes this goal, see the sidebar “Tricord’s Cluster-to-Cluster Replication.”)

The preceding applications, as well as others, are fully discussed in depth throughout Chapter 2 through Chapter 6, but are beyond the scope of this introductory chapter. The deployment of highly available NAS infrastructures for all environments is thoroughly discussed in Chapter 4. The best design practices that integrate network and storage to enable the creation of highly available NAS solutions are extensively covered in Chapter 2. Chapter 6 covers all advanced application solutions.

Page 22: High-Availability Network Attached Storage

Chapter 1

Tricord’s Cluster-to-Cluster Replication

Recently, Tricord, developer of Illumina clustering software and Lunar Flare clustered-server high-availability NAS appliances, released its enhanced protocol support and general availability of a replication feature in Illumina that allows customers to use Lunar Flare appliances for business continuance and online backup. Tricord replication enables customers to maximize their business continuity by distributing their data between two Tricord clusters. In the event of a disaster, user access can be switched to the mirrored cluster within minutes, allowing business operations to continue almost uninterrupted. Cluster-to-cluster replication can also be used for online backup. Lunar Flare clusters are priced affordably, making it realistic to support disk-based backup as the first stage of a multi-tiered backup environment.

Tricord’s near real-time mirroring is supported between clusters separated by a LAN or WAN. Replication performance is optimized to take advantage of the highly parallel nature of the Tricord clustered architecture. The feature is built upon an enhanced version of the Internet-standard rsync protocol. The current cluster-to-cluster replication offering represents the initial phase of Tricord’s comprehensive disaster recovery/business continuance efforts; a subsequent release will support cross-platform replication to Tricord clusters.

Describing a High-Availability NAS System A high-availability NAS-based system can be constructed of equipment and software from many different vendors. Its constituents collaborate in a way that should make it appear to the world outside as one large computer system. Four components constitute a high-availability NAS architecture or configuration: object-oriented devices (OODs), requesters, a file server, and the interconnect (see Figure 1.5).

Figure 1.5: High-availability NAS configuration.

The OODS are the storage components of the system. They include disk drives, RAID subsystems, tape drives, tape libraries, optical drives, jukeboxes, and any other storage devices to be shared. They must have an I/O channel attachment to the requesters that will access them. The requesters are the servers or clients sharing and directly accessing the OODs.

A file server performs management and security functions such as request authentication and resource location. It is key to the self-management capability of NAS. The OODs depend on it for management direction, while the requesters are relieved of SAN management to the extent that the file server assumes that responsibility. In smaller systems, a dedicated file server might not be feasible; a requester may take on the responsibility for overseeing the operation of the high-availability NAS environment. In environments in which the security and flexibility that the file server brings are not desired or an overriding need for performance calls for the cluster of requesters to talk directly with (and only with) the OODs, a file server might not be present.

13

Page 23: High-Availability Network Attached Storage

Chapter 1

14

The interconnect is the physical infrastructure through which all high-availability NAS components communicate. It must have properties of both networks and channels, have distance and addressability to adequately connect all components in the networks, and have the low latency and flow control properties of a channel. Also, it must have the manageability features of mainframe-class peripherals.

Operating Within a High-Availability NAS Environment When high-availability NAS is powered up, all devices must identify themselves either to each other or to a common point of reference, such as the file server or interconnect. The interconnect offers network-management techniques to be used for this identification. For instance, in a Fibre Channel-based high-availability NAS environment, the OODs and requesters would log on to the fabric. Any component wanting to determine the operating configuration could use fabric services to identify all other components. From the file server, the requesters learn of the existence of the storage devices they could have access to, while the OODs learn where to go when they need to locate another device or invoke a management service such as backup. Similarly, the file server can learn of the existence of OODs from the fabric services.

Depending on the security practice of a particular installation, a requester may be denied access to some equipment. From the set of accessible storage devices, it can then identify the files, databases, and free space available.

At the same time, each high-availability NAS component can identify to the file server any special considerations it would like known. Any device-level service attributes could be communicated once to the file server, from which all other components could learn of them. For instance, a requester may want to be informed of the introduction of additional storage subsequent to startup, which would be triggered by an attribute set when the requester logs on to the file server. The file server could do this automatically whenever new OODs are added to the configuration, including conveying important characteristics, such as it being RAID 5, mirrored, and so on.

Accessing High-Availability NAS When a requester must open a file, it may be able to go directly to the OODs or it may have to go to the file server for permission and location information. To what extent the file server controls access to storage is a function of the security requirements of the installation. First, let’s consider the case in which the installation is physically secure. That is, there is not a requirement to protect the transmission of command and data between a requester and the OODs. There might still be a file server present for management functions, but one that does not oversee the requester I/O. In this case, a requester is in a position to access and create objects directly on an OOD. It can open, read, write, and close objects just as if they were natively attached to the requester.

Page 24: High-Availability Network Attached Storage

Chapter 1

A typical sequence might go something like this: The requester reads from an OOD one or more well-known objects that reveal the logical volumes or partitions of the device (more about this later) and how to start looking at objects. The requester then opens and reads an object, which might be a root directory (see Figure 1.6). From this, it is straightforward to find other objects, based on the contents of the directory. The requester repeats the process until the desired data is located. This can be accessed just like any file on an OOD, the difference being that the data is referenced by the object ID and a displacement within the object, not a Logical Block Address (LBA) whose address is relative to the start of the storage device.

Figure 1.6: Read object sequence.

In the second case, in which security is required, the file server interposes itself into the I/O chain to the degree necessary for the desired level of protection. A requester must first go to the file server for permission to perform a set of I/O operations. The file server accredits the request by returning sufficient information to allow the requester to communicate directly with an OOD.

The file server may also withhold the OODs location information from the requesters at initialization time in support of the security requirement.

The OODs will also be informed of the installation security policy when they log on to the file server. Based on this, they will not allow an I/O request unless it is properly constructed—including encoded with a valid permission. All high-availability NAS elements collaborate to enforce the security of the system. Security requirements may demand both protecting data while in flight (using encryption) and validating requests against security criteria at the OODs before servicing.

15

Page 25: High-Availability Network Attached Storage

Chapter 1

Though the sequence of operations seems quite similar to the first case, they are quite different, especially in terms of the payloads associated with each command (see Figure 1.7). In the secure case, both commands and data may be encrypted. In addition, the permission information must be added to the command parameters.

Figure 1.7: Read object sequence using the file server.

Data Sharing and Concurrent Update High-availability NAS would solve the data-sharing problem by moving to an optimistic control scheme in which requesters can act as though there is no conflict unless there is actually simultaneous attempts to access the same records by multiple requesters. They learn this from the OODs themselves when attempting to commit updates. An attribute can be set by a requester to establish that it intends to update certain data. The attribute is reset after completion of the writes. Only if another requester attempts to update data for which the attribute is set is there a need to resolve a conflict. This approach should greatly reduce the inter-system traffic database servers generate to maintain data consistency.

When the cluster starts up, all requesters can open the shared database objects and are ready to update them. Attribute bits can be used to indicate the appropriate granularity of control. Virtual asynchronous device–compliant host interfaces will even make it possible for applications to directly establish integrity control.

So, with the preceding in mind, why would you choose high-availability NAS. If you are owners and operators of small to medium-sized businesses, do you really need NAS? Let’s take a look.

16

Page 26: High-Availability Network Attached Storage

Chapter 1

17

Choosing a High-Availability NAS Solution If you are an owner or operator of a small to medium-sized business, there are unique challenges and concerns that you address on an everyday basis. In addition to being in charge of daily operations, you probably have a hand in everything from client service to human resources to accounting. These are traditional business functions that most of us are familiar with, so executing them shouldn’t pose too great of a challenge, right? But what about IT?

IT has grown tremendously in importance as a business function over the past decade, yet it is still the one that average businesspeople know the least about. IT is often touted as an enterprise-level concept—a sophisticated, expensive, high-tech business advantage that should only be of concern to Fortune 1000 companies. But its importance to small and medium-sized businesses cannot be overstated. Every document ever created or received by your staff, whether it be an office memo, customer database, inventory log, or client correspondence, must be safely stored and easily accessible at all times. Imagine not having access to your data. What would the impact be on overall productivity? Or imagine losing that data entirely. The importance of implementing a network infrastructure that supports your business 24×7 becomes increasingly clear. (For an example of how a small business benefited from a high-availability NAS solution, see the sidebar “Research and Development.”)

Research and Development

Simple Technology is a small company that designs, manufactures, and markets a comprehensive line of more than 3600 memory, storage, and connectivity products used in high-performance computing, networking and communications, consumer electronics, and industrial applications. The company is also expanding its presence in the new and emerging high-growth memory and FLASH storage products for applications such as digital cameras, MP3 digital audio players, PDAs, and various Internet devices.

Designing such products requires the creation of thousands of digitized images and complex graphics and necessitates a tremendous amount of storage space. With increased product demand for higher density with smaller form factors, lower power consumption, and higher speeds at lower cost, the company requires a fault-tolerant storage system that provides quick access to its design data and complex graphics 24×7. A reliable storage system is also needed to meet the increasingly short product lead time demanded by its customers, as time-to-market and time-to-volume dynamics are becoming more critical factors for its customers’ product adoption and success.

After a deliberate process of evaluating available technologies and products, including those of the current system storage providers, Simple Technology selected Procom’s NetFORCE 1500 product to expand its storage capabilities and address certain issues that were not being appropriately handled by Simple Technology’s current solution. NetFORCE’s scalable and flexible architecture not only enabled the company to easily and quickly access the exact information it required, but also allowed the company to minimize the network traffic over its WAN by having smaller high-availability NAS solutions per separate buildings and locations instead of one large central NAS solution. The NetFORCE 1500 also enabled Simple Technology to aggregate design data and graphics from multiple disparate sources and provide its engineers with immediate data access 24×7. NetFORCE offered the best performance-to-cost storage system and enabled the company to manage all data with relative ease and minimal administration and maintenance. The implementation of NetFORCE helped Simple Technology by increasing the productivity of design engineers, while significantly decreasing costs associated with storage.

Page 27: High-Availability Network Attached Storage

Chapter 1

18

So you’re smart enough to know that data, and the successful management of that data, is critical to the success of your business. But for those of you who don’t have a technical background or specialized training, the task of building and maintaining a network infrastructure can seem rather daunting. Smaller companies also cannot afford the luxury of outsourcing the task to an IT specialist. So what is the solution for small to medium-sized businesses that require a simple, reliable, and cost-effective way to manage their data? A high-availability NAS system is the answer. Whether you have 10 or 100 employees, a high-availability NAS solution will keep your mission-critical data safe, and enable your staff to share files quickly and reliably for maximum levels of productivity.

To understand the value of a high-availability NAS system, it is first important to understand what it’s designed to address: data storage. Data storage is probably the most ubiquitous concept in the world of technology. As a businessperson, you are directly or indirectly dealing with data storage everyday—when you access a file, run a software application, email a colleague, or draft a memo. As your business grows, so does the amount of data you generate. If you don’t have enough storage capacity to handle that data, you will inevitably slow down your network, and as a result, lower your productivity. In the business world, seconds tick by like hours, and time is money. Chances are that you’ve experienced the frustration that occurs while waiting for a file to open. It’s also likely that you run a variety of software applications as part of your business, and that you’ve experienced slowdowns with them as well. These problems are probably occurring for two reasons: You have a lot of data being shared from workstation to workstation and there is a lack of storage space on your server.

By installing a high-availability NAS system, you can drastically increase the speed of your network so that you no longer experience the downtime, frustration, and lost productivity and profitability that result from insufficient data storage. High-availability NAS systems are completely dedicated to storage, making them the best solution for improving the speed and functionality of your network. NAS relocates the storage onto its own independent platform, effectively separating file sharing from application serving. Because applications and storage are no longer running on the same system file server bandwidth is freed and overhead is reduced on existing application servers. The result is that applications are processed more quickly and efficiently, and your staff has fast and reliable access to data—both to the benefit of your bottom line.

In addition to accessibility, a high-availability NAS solution offers high levels of reliability. Almost all high-availability NAS systems incorporate a feature called RAID. A system with RAID capability can protect and provide immediate access to data, despite a single disk failure or concurrent disk failures. Different levels of RAID offer different levels of protection. With RAID 0, data is striped across all physical drives to improve access times (however, RAID 0 doesn’t offer any protection). With RAID 1, the second set of drives duplicates the information from the first set for maximum data protection. RAID 5 distributes data and parity across all drives and is capable of tolerating the loss of one drive, providing full drive integrity.

Page 28: High-Availability Network Attached Storage

Chapter 1

As a small to medium-sized business owner, cross-platform file sharing is also an area of importance to you. Your current infrastructure, like those of many businesses, may contain a mix of Windows, Windows NT, Apple Macintosh, Novell NetWare, UNIX, and Linux platforms. Traditionally, sharing data across these different platforms can be both challenging and expensive. The good news is that with a high-availability NAS system, cross-platform sharing becomes quite simple. On the network, a high-availability NAS system can appear like a native file server to each of its different clients. That means that files are saved on the high-availability NAS system, as well as retrieved from the NAS system, in their native file formats. So you don’t have to worry about converting your entire office to one single platform or losing your initial investment in your desktops and servers.

Another key benefit of NAS lies in its simplicity. High-availability NAS systems are incredibly easy to install (see Figure 1.8), which is great news for today’s small to medium-sized businesses, which have neither the budget nor the daily need for an IT manager. Technological enhancements to your network should not be difficult or require significant time or effort from your staff. Today’s high-availability NAS systems are out of the box PnP. They are up and running in minutes—about the same time it takes to program a VCR. In addition, installation does not require high levels of technical skill or a background in computer science. Any user, regardless of experience level, can create networked storage within 5 minutes and two mouse clicks with minimal effort. Not only are high-availability NAS systems easy to set up, but they are also easy to use. Intuitive software programs guide you in managing your network and getting the most out of your high-availability NAS system. Again, they are designed with simplicity in mind.

Figure 1.8: Two ways to install a simple high-availability NAS solution.

19

Page 29: High-Availability Network Attached Storage

Chapter 1

20

A high-availability NAS solution is also an attractive option for small to medium-sized businesses due to its cost. In any business, it is important that dollars spent result in dollars earned. The proposed benefits of implementing a new technology must be carefully evaluated to determine whether the investment will justify itself in the long run. Expanding your servers is not a cost-effective way to increase storage capacity—implementing a high-availability NAS system is. A high-availability NAS solution, by virtue of being a single repository completely dedicated to storage, is simply the smartest investment for ensuring the integrity, reliability, and accessibility of your data. For a few thousand dollars, today’s high-availability NAS systems offer the same performance, reliability, and feature sets that enterprises pay $10,000 or more for.

The preceding factors present a strong case to small to medium-sized businesses that are looking at a simple and cost-effective way to experience what today’s global enterprises are benefiting from: sophisticated technology; fast, reliable network access; productivity increases; and the peace of mind that comes with knowing that your mission-critical data is safe. No matter how you look at it, high-availability NAS systems are simply the best way for you to safeguard, manage, and leverage the information that is the foundation of your business.

But, what are the benefits of a high-availability NAS solution for large businesses? Are they the same as for small to medium-sized businesses? Let’s take a look.

Benefiting from a High-Availability NAS Solution The answer to the preceding questions is a resounding Yes! Large businesses realize the following benefits from high-availability NAS:

• Streamlined architecture

• Reduced server I/O bottlenecks.

• Increased reliability and data availability

• Efficient allocation and use of resources

• Simplicity

• Increased productivity

• Lower TCO

Streamlining Architecture High-availability NAS appliances have a streamlined architecture designed for one function: to serve data files to clients in heterogeneous network environments. Powered by an OS optimized for file I/O activity, file serving performance is greater than that of a general purpose server, which is designed to perform a multitude of functions. A modern multi-tasking OS can have 7 million lines of code to provide multiple general-purpose functions. A file server serving a specific high-availability NAS OS is a fraction of the size and runs much faster and more efficiently. The result is improved data access times for network clients.

Page 30: High-Availability Network Attached Storage

Chapter 1

21

Reducing Server I/O Bottlenecks The largest source of network and application server degradation is file service. Carnegie Mellon University studies show that the server processor spends on the average 25 percent of its time serving file I/O requests. This percentage increases as simultaneous requests increase. Separating storage from the server reduces the file serving activity and I/O bottlenecks and increases server bandwidth. CPU cycles can then be dedicated to handling application requests, resulting in improved client response time.

Increasing Reliability and Data Availability The architecture of a thin server appliance is designed around a specific function. All components, both hardware and firmware, are tightly integrated to perform that single function. This “closed box” architecture provides for extremely high reliability. According to Gartner Dataquest, more than 60 percent of server failures are caused by storage-related problems. Network downtime resulting from server failure costs organizations thousands of dollars per hour. Separating storage resources from the server decreases both the number of components and the amount of file I/O activity, reducing the probability of server downtime and increasing the reliability of the network and application servers. A more reliable and efficient network saves your organization time and money.

Most networks experience server downtime at some point, whether it’s for planned maintenance or due to unexpected crashes or outages. Because high-availability NAS servers operate independent of network servers and communicate directly with the client, files remain available in the event of network server downtime.

Allocating Efficient Use of Resources A high-availability NAS solution provides a common pool of storage that can be shared by multiple servers and clients, regardless of their file system or OS. This ability enables you to efficiently allocate storage and alleviates the problem of one server running out of storage while another server might have more than needed.

In addition, a high-availability NAS solution enables you to locate storage where it’s needed on the network and provide clients with direct, server independent communication to storage resources. Localizing file I/O traffic provides for a more efficient use of network resources.

As I mentioned earlier, a high-availability NAS appliance (such as Tricord’s Lunar Flare NAS) connects directly to your existing LAN and transfers data over standard network access protocols (such as TCP/IP or IPX) using standard file-sharing protocols (such as SMB, CIFS, NCP, NFS, FTP, or HTTP). No additional software or client licenses are required for clients to access storage, so you can implement a storage solution and leverage your existing network investments.

Page 31: High-Availability Network Attached Storage

Chapter 1

22

Simplifying Using a High-Availability NAS Solution The traditional methods of adding storage are too cumbersome for today’s network environments. A high-availability NAS solution enables you to add storage anywhere on your network in minutes simply by plugging in a network cable, applying power, and configuring a few settings. There is no server re-configuration and no network downtime. In addition, you can do all of this during normal working hours.

Management of high-availability NAS appliances can be performed from anywhere on your network or over the Internet using a standard Web browser or alternative management tools. Product enhancements and problem fixes are performed with a simple 3-minute flash upgrade. A thin-server appliance is so simple there is no need to understand or learn a complex OS, and anyone can administer one.

Increasing Productivity High-availability NAS appliances provide increased productivity for your whole organization. Network clients benefit from the ability to share storage resources with clients from another network. They also benefit from reduced data access times and improved application server response times. And, in the unlikely event of network server downtime, network clients can still access work files, maintaining their productivity. Network administrators enjoy the luxury of simple installation and management and fewer storage-related problems.

Lowering TCO Although disk drive costs have dropped drastically, the average company spends roughly $3.50 per megabyte each year in administrative and lost productivity costs to manage its current storage. A high-availability NAS solution, with its many benefits, features a lower TCO than other methods of adding storage to your network.

Page 32: High-Availability Network Attached Storage

Chapter 1

23

Comparing High-Availability NAS Appliances with General Purpose Servers A high-availability NAS appliance is characterized by a streamlined architecture designed and optimized for performing one function—data delivery. This “closed box” approach results in more efficient performance, higher reliability, easier installation, simpler management and use, and lower TCO compared with a general-purpose server, as Table 1.1 shows.

Consideration NAS Appliance General Purpose Server

Performance OS and hardware platform designed and optimized to perform a specific function very efficiently; low overhead

OS and hardware platform designed for serving applications and multiple general-purpose functions; high overhead

Reliability and Data Availability

Streamlined architecture with specialized OS results in high reliability and data availability

A greater number of non-embedded components and complex, general purpose software OS means a higher chance of failure and downtime

Administration Simple administration of specialized OS

High administration overhead of complex NOS system

Connectivity Network OS independent; multi-protocol client support

Network OS dependent; client must meet the server’s interface and protocol requirements

Maintenance Low High Costs Streamlined costs; all hardware

and software components are for specific function (data I/O); unlimited users—no license required

Unnecessary costs—more than the needed components for file services; client licenses required

TCO Low High

Table 1.1: High-availability NAS appliances versus general purpose servers.

Comparing High-Availability NAS with SAN Some people confuse high-availability NAS with SAN; after all NAS is SAN spelled backwards. The technologies also share a number of common attributes. Both provide optimal consolidation, centralized data storage, and efficient file access. Also, both allow you to share storage among a number of hosts, support multiple OSs at the same time, and separate storage from the application server. In addition, both can provide high data availability and can ensure integrity with redundant components and RAID.

Some view high-availability NAS as competitive to SAN. However, they can work quite well in tandem. Their differences? High-availability NAS and SAN represent different storage technologies, and they attach to your network in very different places (see Table 1.2). NAS is a defined product that sits between your application server and your file system (see Figure 1.9 and the sidebar “Defining Filer”). SAN is a defined architecture that sits between your file system and your underlying physical storage (see Figure 1.10). A SAN is its own network, connecting all storage and all servers. For these reasons, each lends itself to supporting the storage needs of different areas of your business.

Page 33: High-Availability Network Attached Storage

Chapter 1

Figure 1.9: A basic configuration for a high-availability NAS filer on a LAN.

Figure 1.10: A typical SAN configuration.

Consideration NAS SAN

Network wires TCP/IP or IPX over Ethernet, Fast Ethernet, Token Ring, FDDI, and ATM

Fibre Channel

Protocols Industry-standard file-sharing protocols (SMB, CIFS, NCP, AFP, NFS, and HTTP)

Raw data requests directly to disk drive

File system The file system is located at the storage

The file system is located at the application server

Data sharing True data sharing between heterogeneous clients because file system is at the storage side and data is transferred to client using industry-standard file-sharing protocols

Software required on all nodes on SAN for heterogeneous nodes to share files

Environment Workgroup to enterprise Enterprise Installation PnP into existing network Difficult and expensive; Fibre

Channel based hubs and switches to channel traffic, routers to interconnect data devices, and servers and software to link them all

Technology Based on industry-standard technologies

Based on newer immature technologies

Table 1.2: High-availability NAS vs. SAN.

24

Page 34: High-Availability Network Attached Storage

Chapter 1

25

SANs offer many benefits in the enterprise, including having a more easily scalable backup and recovery architecture.

Defining Filer

High-availability NAS devices known as filers focus all their processing power solely on file service and file storage. As integrated storage devices, filers are optimized for use as dedicated file servers. They are attached directly to a network, usually to a LAN, to provide file-level access to data. Filers help you keep administrative costs down because they are easy to set up and manage, and they are platform-independent.

High-availability NAS filers can be located anywhere on a network, so you have the freedom to place them close to where their storage services are needed. One of the chief benefits of filers is that they relieve your more expensive general-purpose servers of many file-management operations. General-purpose servers often get bogged down with CPU-intensive activities, and thus can’t handle file-management tasks as efficiently as filers. High-availability NAS filers not only improve file-serving performance but also leave your general-purpose servers with more bandwidth to handle critical business operations.

Analysts at International Data Corporation (IDC) recommend high-availability NAS to help IT managers handle storage capacity demand, which the analysts expect will increase more than 10 times by 2003. High-availability NAS is the preferred implementation for serving filers for any organization currently using or planning on deploying general-purpose file servers. Users report that better performance, significantly lower operational costs, and improved client/user satisfaction typically result from installing and using specialized high-availability NAS appliance platforms.

So what’s the best way of connecting the storage within your computer systems? The answer to this question used to be simple, but new technologies have created solutions that are more complex but can be managed more easily. Therefore, how important is an effective storage strategy?

Strategizing Your Storage Solutions With network content expected to continue expanding at explosive rates over the next 5 years, analysts predict that enterprise storage will account for 75 percent of all computer hardware expenditures. Such expansion makes it increasingly important for IT professionals to develop comprehensive strategies designed to optimize network infrastructure with storage solutions that will enable scalability, reliability, performance, availability, affordability, and manageability.

Storage standards are weak standards that are driven by component considerations. Network standards are strong standards that are driven by system considerations.

Compounding the challenge of explosive content demands, there are two major technology shifts IT professionals must consider when developing an enterprise storage strategy. First, the impact of networking technology on storage architecture and content management. And second, the impact of parallel processing on the design of storage products. These two technology shifts have produced three mutually coexistent methods for connecting storage to computing platforms: Direct Attached Storage (DAS), high-availability NAS, and SAN, as Figure1.11 illustrates.

Page 35: High-Availability Network Attached Storage

Chapter 1

Figure 1.11: DAS, NAS, and SAN topologies.

Trying to Standardize DAS More than 95 percent of all computer storage devices such as disk drives, disk arrays, and RAID systems are directly attached to a client computer through various adapters with standardized software protocols such as SCSI, Fibre Channel, and others. This type of storage is alternatively called captive storage, server attached storage, or DAS, as Figure 1.12 shows.

DAS evolved from the server industry in which server vendors have traditionally sold storage as an add-on. DAS is an appropriate choice for very low-end PC applications, very high-end high-performance mainframe applications, and certain computer-intensive and high-performance online transaction processing (OLTP) database applications.

Figure 1.12: DAS topology.

26

Page 36: High-Availability Network Attached Storage

Chapter 1

The committees that established these standards, however, allowed such wide flexibility in interoperability that there are many variations of SCSI and Fibre Channel for the many available UNIX and NT systems. For example, there are seven variations of SCSI, and most UNIX vendors implement Fibre Channel differently. This is because storage was local to a specific server when these standards were defined, and server vendors implemented variations that were not compatible. Storage standards, therefore, are weak standards that are driven by component considerations. In other words, the problem with storage standards is that there seems to be so many of them.

As a result of weak storage standards, third-party DAS vendors, such as EMC Software and Compaq, need to re-qualify their products with each revision of a server’s OS. This can often lead to long lists of supported OSs for SCSI or Fibre Channel interconnects to different hosts. Each interconnect often requires special host software, special firmware, and complicated installation procedures.

Standardizing High-Availability NAS In contrast, network standards are strong standards that are driven by system considerations. There are two true network standards for accessing remote data, and these standards have been broadly implemented by virtually all UNIX and NT system vendors. Developed and put into the public domain by Sun Microsystems, NFS is the de-facto standard for UNIX. Developed by IBM and Microsoft, CIFS is the standard for all flavors of the Windows OS.

As a result of these broadly accepted standards for network data access, storage devices that serve data directly over a network (called high-availability NAS devices) are far easier to connect and manage than DAS devices. Also, high-availability NAS devices support true file sharing between NFS and CIFS computers, which together account for the vast majority of all computers sold (see Figure 1.13).

Figure 1.13: High-availability NAS topology.

Speeding Networks During the past few years, the transfer rate for leading-edge DAS interconnects has increased fivefold from 20MB per second for F/W SCSI-2 to 100MB per second for Fibre Channel. Over this same period, however, the transfer rate for leading-edge networking interconnects has increased tenfold from 12.5MB per second for 100baseT Ethernet to 128MB per second for Gigabit Ethernet.

27

Page 37: High-Availability Network Attached Storage

Chapter 1

28

In other words, network data rates have not only caught up, but have surpassed DAS, and are no longer two times slower as they were 5 years ago. This has shifted the bottleneck from the network to the server and its direct attached storage.

Shifting Storage Analysts predict a major shift from DAS to SAN and high-availability NAS. As DAS vendors were involved in the never-ending task of supporting all flavors of UNIX, NT, SCSI, and Fibre Channel for their storage products, both Gartner Dataquest and IDC recently began projecting explosive growth for high-availability NAS and SAN products as a percentage of the total storage market. These projections are based on four key factors:

1. Strong standards for high-availability NAS result in simpler installation and lower management cost.

2. Increased network speed can equalize the performance gap that used to exist between high-availability NAS and DAS for many applications.

3. True data sharing between heterogeneous clients is possible with high-availability NAS and not with DAS.

4. Trends to recentralize storage to reduce management costs.

Gartner Dataquest has predicted that the commanding DAS 95 percent storage market share of today will be eclipsed by high-availability NAS over the next 5 years, and IDC projects that specifically designed NAS products will grow fivefold, from $5 billion in 2002 to $9 billion by 2006. Faster networks are increasing deployment of high-availability NAS.

Confusing High-Availability NAS and SAN Server vendors have implemented a variety of specialized hardware and software schemes to encourage the sale of storage with their processors. General-purpose DAS vendors have followed the same strategy. Not wanting to support high-availability NAS, where it would be easier for competitors to make inroads due to the clear NFS/CIFS standards, general-purpose server vendors and general-purpose storage vendors have developed their own proprietary visions of network storage. These visions are alternatively called Storage Networks (SNs) or SANs.

Page 38: High-Availability Network Attached Storage

Chapter 1

The vendors developed these proprietary visions to bring the benefits of high-availability NAS to their users without losing control of the storage and networking sale to high-availability NAS vendors. The SAN initiative is a loose configuration of vendors attempting to promulgate the weak standards of the past while talking about bringing the benefits of networking to storage architecture. The following benefits are available with high-availability NAS and are considered the future for SAN as well:

• LAN and server-free backup. However, many NAS boxes may mean more libraries—whereas it is easier to share libraries and drives dynamically in a SAN and still never have to send backup data over the LAN.

• Storage resource pooling/sharing.

• Easy storage resource management.

• Data sharing.

• Interoperability of heterogeneous servers and storage.

Although SAN is currently a hot topic, there are pitfalls to its deployment. Instead of putting the storage directly on the network, the emerging SAN concept puts a network between the storage subsystems and the server as Figure 1.14 shows). This means that SAN actually adds network latency to the DAS storage model. SAN standards are in formative stages and may not be established for years, and leading storage vendors have announced proprietary SANs that are still largely visions.

Figure 1.14: SAN topology is still largely a vision.

29

Page 39: High-Availability Network Attached Storage

Chapter 1

30

EMC recently announced a proprietary Enterprise Storage Network (ESN), and Compaq has recently announced a proprietary Enterprise Network Storage Architecture (ENSA). As with UNIX and SCSI, SAN is likely to become a variety of similar architectures that are not based on strong standards. This may create major roadblocks to successful integration and data sharing between heterogeneous platforms. Both high-availability NAS and SAN are valid technologies and serve important roles with different objectives. However, because of the complexity arising from the many varieties of SCSI, UNIX, and proprietary SANs, a small percentage of storage today is actually connected to SANs. In a recent ITCentrix survey of UNIX and NT sites that employee more than 5000 employees, only 7 percent of enterprises have actually implemented SAN in production compared with about 48 percent that have implemented high-availability NAS.

Designing Storage Subsystems with Parallel Processing Equal in importance to the impact of networking technology on storage architecture and management is the shift to parallel-processing architectures in storage subsystem design. Experts have noted the semiconductor industry finds it increasingly difficult to achieve faster processing speeds. To counteract this situation, storage subsystem providers are taking advantage of parallel processing designs in two ways: Designing computer nodes with multiple CPUs and linking multiple nodes together to act as one system.

Finally, continuous data access to a high-availability NAS solution involves two main elements: a redundant and fault-tolerant network infrastructure and a highly available storage solution with built-in data protection features. The solutions presented in this chapter focus on integrating the two elements into a high-availability NAS solution with no single point of failure. With that in mind, let’s now focus on a case study in which high-availability NAS helped a European company deliver fast and reliable Web hosting.

Case Study The implementation of high-availability NAS servers has helped Internet Fr (http://www.internet-fr.net/) improve the speed and reliability of its Web hosting services. With more than 20,000 professional sites hosted, the company is the leader in the Web hosting business in France. In the past, using server-attached storage, it was difficult to provide the level of availability required by customers with mission-critical e-commerce sites. The switch to high-availability NAS provides dramatic performance improvements because dedicated processors provide separation and optimization of functions that allow parallel processing of both network and storage tasks. It has helped the company deliver a new higher-level service package dedicated to the hosting of critical applications such as e-commerce sites, online communities, workflow applications and others. The storage server helped Internet Fr offer consistently higher and faster response time than its competitors. It has also resulted in zero downtime since it was installed, which helps increase the company’s availability levels as well.

Page 40: High-Availability Network Attached Storage

Chapter 1

31

With its host center located in Paris, the company has its own backbone and multiple peering agreements that provide direct connections with leading French Internet service providers. Internet Fr advises its customers on the choice of hardware and software architecture for their Web sites. Servers are monitored on a 24×7 basis, and trained technicians respond immediately to each alarm. The company provides a full range of reporting services that include weekly or monthly traffic and service-level reporting. Prior to bringing the system online, the performance and reliability of the system is thoroughly tested. The company offers ten e-commerce solutions and has prestigious customers such as Century 21, Kenzo, Tati, and Matra Aerospace. The company recently announced that its sales have reached $5 million. It has also opened subsidiaries in Italy and Spain.

Hosting Service at a Higher Level Recently, the company made the decision to introduce its Tenors level of service, designed to provide the ultimate performance for critical e-business applications. Tenor integrates a range of sophisticated services including load balancing, security, redundant failover, backup, application maintenance and others, and guarantees that each application will be hosted on a dedicated server. In the development of this new service level, Internet Fr technical management recognized that the company needed to improve its storage systems. The server-attached storage that the company has used in the past was sufficient for some applications, but couldn’t meet the requirements of the most demanding companies that Internet Fr was hoping to attract to its Tenors service. The problem was that the storage-server workload conflicted with the application load, causing a reduced quality of service and longer response times.

Internet Fr technical managers began investigating the high-availability NAS approach. They found that high-availability NAS servers typically offer higher reliability and performance than general-purpose servers because they are optimized to move file data as efficiently as possible from disk to network and vice versa. After examining a number of different high-availability NAS alternatives, Internet Fr managers selected two Auspex 4Front NetServer 2000 data servers—each with one processing node, 1GHz of data cache RAM, and 512GB of storage capacity that can easily be scaled to 9TB. The key reasons for selecting the Auspex units were the fact that their ultimate storage capacity is higher than competitors, and they support both UNIX and NT file systems. Each storage server has two 2GB NICs. The first is used for communications with the various UNIX and NT servers operating at the site. The second provides intelligent and protected replication between the two storage servers using TurboCopy software and backup data to a Quantum ATL P1000 robotic tape drive changer.

Page 41: High-Availability Network Attached Storage

Chapter 1

32

Tasking the Storage Server Architecture The I/O node is the fundamental building block of Auspex’s architecture. Each node contains a dual-Intel processor motherboard that has different and logically separate processing functions. The network processor processes network protocols and manages associated caches. The file and storage processor is dedicated to managing the file systems and associated storage hardware. The result is a dramatic improvement in reliability and performance compared with general-purpose servers. The Auspex NS2000 is designed to provide continuous availability in mission-critical, data-intensive settings. Flexible, hardware-assisted RAID is incorporated into the system design. Redundant copies of data are maintained on the file servers that employ RAID 5. The database server load is balanced dynamically across the file servers and mounts can be moved dynamically between servers. Redirection can be run from the command line, scripts, or with Simple Network Management Protocol (SNMP) traps without affecting users connected to the Web clients. This feature also allows for rolling upgrades without service interruptions. Batch clients can update multiple file servers simultaneously to maintain synchronization of data.

The NS2000 incorporates a call home monitoring system, which can detect host, I/O node, and component failures to provide accurate problem determination data to the Auspex personnel for serviceability. It supports a full Solaris management environment which, along with an Auspex Control Point browser-based management tool and industry-standard SNMP and Management Information Bases (MIBs), provide the user with an easy-to-manage system that can be readily integrated into his or her existing operating environment. Furthermore, for automated backup and recovery, the NS2000 provides Network Data Management Protocol (NDMP) support that allows leading, third-party, NDMP-compliant data-management tools to execute backup and restore operations via a remote console.

An important requirement of this application was support for the NT and UNIX OSs, which are both used by various Internet Fr clients. Auspex’s NeTservices software provides seamless file sharing between NFS and NT clients. NeTservices optimizes the CIFS protocol for NT to give users the same speed of access and reliability as UNIX clients. NeTservices allows Auspex NS2000 to act as a primary or backup domain controller in NT, eliminating the need to use an NT server. It also controls access to files and directories and lets a user manage NT or UNIX data from the same console. NeTservices has demonstrated the industry’s fastest ZD NetBench 5.01 performance at 58MB per second throughput. The test result was more than double the fastest NetBench test result posted by any other high-availability NAS vendor.

Improving Response Time and Availability Since Internet Fr installed the storage servers, the company has experienced a dramatic improvement in response time and availability. Benchmarks consistently show that Internet Fr is delivering page downloads several times faster than its main competitors and significantly better than it was able to achieve in the past with server-attached storage. The zero-downtime performance provided by the storage system has also helped the company to deliver a 99.997 percent availability guarantee to its customers. Installing the storage servers has helped Internet Fr marketing efforts as well. Many of Internet Fr’s potential customers are familiar with the Auspex system, which has helped the customers build confidence in Internet Fr’s hosting capabilities.

Page 42: High-Availability Network Attached Storage

Chapter 1

33

All in all, this system provides the power and flexibility Internet Fr needs to meet the expectations of its blue chip clients. Redundant configuration eliminates worry about storage system failures while allowing the company to scale storage capacity when it needs to in order to accommodate growth. The net result is that Internet Fr is able to deliver higher levels of performance than its competitors—probably at higher levels than any other French hosting site. You can’t overestimate the importance of data storage to a Web hosting company. Internet Fr’s ability to quickly and reliably deliver information from these servers has helped the new Tenors service succeed beyond the company’s expectations. It is already generating a significant proportion of its revenue. The Auspex NS2000 has proven to be the ideal platform to support Internet Fr’s growth.

Summary Chapter 1 has identified trends in computing that affect high-availability NAS strategy decisions, introduced the main storage architecture options, and identified issues involved in selecting an appropriate enterprise storage solution. The chapter also defined the different types of storage architecture, comparing benefits, technologies, and applications. In addition, I provided guidelines on the best use for each storage architecture. In Chapter 1, we compared the subsystem architecture used by several major enterprise storage vendors. The chapter also discussed the benefits of implementing a parallel hardware and software design in an enterprise storage strategy. Finally, the chapter brought the pieces together in a conclusion and case study designed to help CIOs and network administrators identify the best total solution for their specific enterprise storage needs.

The best total solution goes beyond selecting just the right storage architecture and product—it includes determining which vendor offers the most advanced technology, has a proven history of providing innovative enterprise storage systems and solutions, and who provides value-added service, consulting, and support. In the next five chapters, we will thoroughly discuss the topics introduced in this chapter and much more. Have a good read and enjoy!

Page 43: High-Availability Network Attached Storage

Chapter 2

34

Chapter 2: Designing High-Availability NAS Solutions

As data requirements increase at a rapid pace, end users demand reliable, high-performance, and global access to information from anywhere at anytime. IT managers constantly seek more affordable, manageable storage solutions that meet these user expectations. High-availability NAS has led the way for the mainstream deployment of storage solutions that facilitate data consolidation and sharing. By leveraging well-understood technologies (such as IP, Gigabit Ethernet, NFS, and CIFS), high-availability NAS enables a flexible, robust storage solution that is easily managed and scaled and contributes to a scalable and reliable network and storage infrastructure.

For example, Tricord Systems’ Lunar Flare high-availability NAS (clustered server appliance) product provides a manageable data storage solution for companies and organizations that require scalability to support rapid growth. Enabled by Tricord’s Illumina software, multiple Lunar Flare high-availability NAS appliances aggregate to create a cluster, managed as a single resource. This storage solution lets users add appliances into a single, manageable pool with almost no administration—and in small, affordable increments as needed. Clusters of up to 16 Lunar Flare NAS units provide seamless scalability from 129GBs to 2TBs.

See Case Study 1 later in the chapter for more detailed information about Lunar Flare and Illumina. High-availability NAS is the common storage system behind the Citrix MetaFrame for Windows application server farm, which I discuss in Case Study 2.

With the preceding in mind, the solutions discussed in this chapter address the customer requirements for high-availability NAS in several dynamic application areas:

• Internet e-business applications—High-performance data sharing and scalable, networked storage infrastructures for e-businesses.

• Business applications in the data center—Superior data availability and recoverability for enterprise business applications within a data center.

• Workgroup collaboration—High-performance data sharing across heterogeneous OS environments.

• Distributed storage over secure WAN—Collaboration among distributed sites with centralized administration and disaster recovery.

This chapter also discusses the deployment of highly available NAS infrastructures for campus-type environments. Based on these integrated architectures, a discussion of how two network design configurations were tested and evaluated is presented. The configurations presented in this chapter are based on the requirements for a typical large enterprise. The test results validate combined configurations for deploying highly available NAS solutions.

Page 44: High-Availability Network Attached Storage

Chapter 2

35

Nevertheless, key design features must be observed to design highly available NAS solutions. Today, the idea of building a high-availability NAS solution across numerous sites (each interconnected and backing up terabyte’s of data every night) will not work, as NAS technology is still maturing and has not evolved enough to justify the massive expenditure involved. Furthermore, a high-availability NAS should been designed with the following considerations in mind:

• Start by building small highly available NAS islands.

• Design your high-availability NAS solution to be scaleable.

• Design your high-availability NAS solution specifically for your requirements.

• Plan your high-availability NAS carefully.

• Consult an expert if you’re unsure about your requirements.

• Don’t go overboard with the technology—buy technology that is flexible.

You will save money, time, and effort by observing these simple rules. Now, let’s look at Fibre Channel topologies. The main reason for using Fibre Channel is to be able to share high-availability NAS storage devices among several computers.

Fibre Channel Topologies Fibre Channel offers three design topologies for high-availability NAS solutions. These topologies can be combined to fit any needed configuration. The simplest topology is simply a point-to-point connection between two nodes. A more advanced topology is available by using an arbitrated loop. This configuration allows as many as 126 devices to share one loop. Finally, Fibre-Channel switches are available that can form fabrics with thousands of nodes.

Point-to-Point Topology Point-to-point is the first and most simple Fibre Channel topology. It connects to nodes, be it two host machines, one host machine and one high-availability NAS storage device, or one host and a switch. The benefits of this setup include no need for arbitration and no addressing problems—the package you receive must come from the other side. In addition, the devices have access to the full bandwidth of the link. However, there are several arguments against this topology—one being restricted communication between two nodes. . This kind of topology may be useful for connecting two machines that are physically separated by some distance. However, constant access to 100Mbps is usually not needed

Page 45: High-Availability Network Attached Storage

Chapter 2

36

Arbitrated Loops The arbitrated loop topology was actually added to the original Fibre-Channel specifications. It was realized that there would be a need for a topology between the point-to-point and the switched topologies. The arbitrated loop combines some of the features from point-to-point with some of the features from the fabric.

One of the main arguments against the pure fabric is that the price per fabric port is too high if each fabric port is connected to only one node. Although this setup offers each device full connection to the fabric, most nodes do not constantly need the full bandwidth of Fibre Channel. Arbitrated loop offers (as the name suggests) a loop topology. The loop can contain as many as 126 ports and one fabric port with all the nodes arbitrating for usage of the loop.

In the arbitrated loop topology, all nodes connect their outgoing connector to the incoming connector of the downstream node. This way, all nodes on the loop act as repeaters for all the frames running around the loop.

At any given time, only one port can be sending frames. There is one exception: if both of the ports that want to communicate support full duplex, they may agree to set up such a connection, fully utilizing the connection in both directions. An arbitration procedure selects the loop master, who thereafter owns the loop. Because any of the nodes on the loop must win an arbitration before being allowed to use the loop, performance will degenerate as more nodes are added. A loop that contains 127 active ports is likely to operate with less-than-preferable performance.

Switched Fabric A switched-fabric I/O architecture eliminates many of the problems associated with a multidrop-bus scheme. With the switched-fabric architecture, data paths between computing nodes may change dynamically to support multiple simultaneous data transfers. Designers use the term fabric to represent this architecture because you can connect any node to any other node through data paths that resemble the interwoven threads in cloth. A major benefit of switched fabric is that each connection is a direct point-to-point data path. This setup yields better electrical characteristics, allowing higher frequencies and bandwidth than bus architectures. A typical switched-fabric architecture uses multiple stages of switches to route transactions between a source and a target. A sophisticated switched-fabric system can also increase system availability by routing around defective paths or nodes.

Page 46: High-Availability Network Attached Storage

Chapter 2

Figure 2.1 represents a simple system based on switched-fabric interconnects. The six nodes designated as A through F are connected to each other through 3 × 3-port crossbar switches. Any path through a crossbar switch may support serial or parallel data on a copper or an optical medium, depending on the physical device. You can route data from any node to any other node over several possible paths. As many as three data streams may be active at the same time, creating an aggregate bandwidth that is three times larger than the bandwidth of any single path. If a crossbar switch fails, you can route data around the failure, although the maximum aggregate data rate will suffer. By increasing the crossbar size, adding more switches, and installing multiple stages, designers can create complex switched-fabric interconnects that can support large systems that are scalable and fault tolerant.

Figure 2.1: Switched fabric replaces the traditional shared-bus system and supports multiple data streams and tolerates path failures.

A Fibre Channel switch typically provides 8 to 16 ports, with full gigabit speeds available at each port (see the sidebar “Innovative Switch”). Following the model previously established by Ethernet switches, a Fibre Channel switch port may be configured to support a single node or a shared segment of multiple nodes (a loop). Because a switch requires more processing power (memory and microcode at each port to properly route frames), switch per-port costs are usually less than six times Arbitrated Loop hub per-port costs.

37

Page 47: High-Availability Network Attached Storage

Chapter 2

38

Innovative Switch

If you think Fibre Channel SANs are so costly and complex that they will eventually lose out to alternative technologies such as iSCSI and Infiniband, well, you’re not entirely wrong. Fibre Channel SANs do present a steep degree of difficulty. But, fortunately for the Fibre Channel industry, less-traditional Fibre Channel vendors such as Vixel and BlueArc are bringing much-needed simplicity to Fibre Channel SANs.

Case in point: a new family of switches from Vixel based on a technology with a promising name, InSpeed. On a single application-specific integrated circuit (ASIC) chip, InSpeed delivers what is essentially a 12-port Fibre Channel switch featuring both 1Gbps and 2Gbps full Fibre Channel connectivity and fabric service transparency. New SAN deployments can utilize InSpeed technology as the core fabric of an entry-level SAN installation.

Perhaps even more intriguing is InSpeed’s ability to extend switching capabilities to collective storage devices such as RAID and Just a Bunch of Disks (JBOD) disk arrays and even tape libraries. Typically, a JBOD connects to the SAN through a Fibre Channel arbitrated loop, which is essentially a daisy-chain that limits the performance and diagnostics of the storage array. By contrast, an InSpeed embedded switch installed on the JBOD can independently address and monitor each disk device in the loop, which translates to improved system performance by as much as threefold, according to Vixel. InSpeed technology also gives the system the ability to detect each disk drive’s error rate and traffic congestion symptoms.

The range of features associated with InSpeed has not only captured the attention and interest of manufacturers of disk and tape devices such as Quantum, but also major high-availability NAS vendors such as Blue Arc and Network Appliance, which plan to include InSpeed technology in their products. As InSpeed and technologies like it begin to penetrate the high-availability NAS market, the convergence of SAN and NAS will likely be accelerated, fostering even greater simplicity for storage administrators. The best example of this convergence is already upon us in the form of BlueArc’s SiliconServer, which joins high-availability Ethernet NAS technology with Fibre Channel SAN technology in a working marriage that yields an enormous degree of simplicity for users.

With an impressive, scaleable storage capacity of more than 250TB, SiliconServer is a 72-inch tall model of the future of storage networks, where SAN and high-availability NAS models interact seamlessly. Adding InSpeed will have a major impact for both BlueArc and its customers. For Fibre Channel vendors, InSpeed could represent much-needed relief from the threat of simpler, emerging technologies such as iSCSI and InfiniBand. For customers, it could finally fulfill the promise of more affordable, easy-to-manage, high-performance Fibre Channel SANs.

Fibre Channel Switching Hub Fibre-Channel switches allow for large Fibre-Channel topologies. With hubs, all nodes have to share the bandwidth of one Fibre-Channel switching hub, giving limited performance when more than a few nodes are connected. Switches will be the logical choice when a larger high-availability NAS solution is to be constructed. Several vendors offer Fibre-Channel switching hubs. Be aware that switches from different vendors may not be able to communicate, meaning they cannot be part of the same fabric.

Switches come with different numbers of ports (8 and 16 ports are common). Each of these ports can then connect to an arbitrated loop (connecting as many as 126 nodes to this port), connect only one node to the port, or connect to another switch. By connecting several switches, large numbers of nodes will be connected. For a switch to be generally usable, one has to make sure that all the ports are G_Ports. For some switches, all the ports are G_Ports; for others, the port-type must be changed by swapping daughter-boards inside the switch.

Page 48: High-Availability Network Attached Storage

Chapter 2

39

Customer Selection of Fibre Channel Products Customers quickly discover that Fibre-Channel products are built upon the concepts and protocols they know well. Fibre Channel delivers the same type of functions as SCSI and legacy networks, only Fiber Channel is faster, easier, more scalable, and much more reliable. Fibre Channel products expand the flexibility of IT organizations with the products’ inherent ability to run SCSI and IP protocols on the same high-availability NAS solution. High-availability NAS brings new levels of capability and performance. Take a look at what goes into a high-availability NAS product. You will find it quite familiar and interesting.

Fibre Channel products are built without restrictions. Virtually any topology that an IT organization requires is possible. The basic building blocks are point-to-point dedicated bandwidth, loop shared bandwidth, and switched scaled bandwidth. Switches and hubs are stackable.

A Fibre Channel high availability NAS solution is built from products that are very familiar to IT professionals. In the following list, I’ll describe these products.

• Copper cables—Four types of copper cables are defined in the Fibre-Channel standard. The most popular implementations are twin-ax using DB-9 or HSSD connectors.

• Disk enclosures—Fibre-Channel disk enclosures utilize a back plane with a built-in Fibre-Channel loop. At each disk location in the back plane loop is a port bypass circuit that permits hot swapping of disks. If a disk is not present, the circuit automatically closes the loop. When a disk is inserted, the loop is opened to accommodate the disk.

• Drivers—If software drivers for the host bus adapter (HBA) vendor are not resident in your server or workstation, they are installed into the OS using standard procedures for the OS. Fibre-Channel drivers support multiple protocols (typically SCSI and IP). Most popular OSs are supported (including Windows NT and Windows XP, AIX, Solaris, IRIX, and HPUX).

• Extenders—Extenders are used to provide longer cable distances. Most optical interfaces are multimode cable. Extenders convert the multimode interface to single mode and boost the power on the laser. Typically, an extender will provide a single-mode cable distance of 30Km or 18 miles.

• Fibre-Channel disks—Fibre-Channel disks have the highest capacity and transfer capability available. Typically, these disks have a capacity of 9GB and support redundant Fibre-Channel loop interfaces.

• Fibre optic cable connector—The SC connector is the standard connector for Fibre-Channel fiber optic cables. It is a push-pull connector and is favored over the ST connector. If the cable is pulled, the tip of the cable in the connector does not move out, which would result in loss of signal quality.

• Multimode cable—Multimode cable is dominant for short distances of 2Km or less. Multimode has an inner diameter of 62.5 or 50 microns, allowing light to enter the cable in multiple modes, including straight and at different angles. The many light beams tend to lose shape as they move down the cable. This loss of shape is called dispersion and limits the distance for multimode cable. Cable quality is measured by bandwidth and distance. Existing 62.5 micron FDDI cable is usually rated at 100 or 200MHz per kilometer, providing gigabit communications for as far as 100 or 200 meters.

Page 49: High-Availability Network Attached Storage

Chapter 2

40

• Single-mode cable—Single-mode cable is used for long distance cable runs. Its distance is limited by the power of the laser at the transmitter and by the sensitivity of the receiver. Single-mode cable has an inner diameter of 7 or 9 microns and allows only a single ray of light to enter the cable. Therefore, with single-mode cables there is no dispersion.

• Connector—Fibre Channel has recently adopted a new connector. It reduces the size of the connector by 50 percent, doubling the connector density for hubs and switches.

• Gigabit Interface Converters (GBICs)—Distances in a data center are supported with twin-ax copper circuits and, therefore, hubs, disks, and many HBAs come standard with a copper interface. GBICs and media interface converters plug into the copper interface and convert it to an optical interface. GBICs use an HSSD connector for the copper interface and media interface converters use the DB-9 copper interface. The benefit is a low-cost copper link and optics for longer distance when required.

• Gigabit Link Modules—GLMs are pluggable modules providing either a copper or fiber optic interface. GLMs include the Serializer/Deserializer (SERDES) and have a media-independent parallel interface to the HBA. Users can easily change the media interface from copper to fiber optics.

• HBAs—Fibre-Channel HBAs are similar to SCSI HBAs and network interface cards (NICs). Fibre-Channel HBAs are available for copper and optical media. A typical Fibre-Channel PCI HBA, is half length, and utilizes a highly-integrated Fibre-Channel application-specific integrated circuit (ASIC) for processing the Fibre-Channel protocol and managing the I/O with the host. Adapters are also available for SBus, PCI, MCA, EISA, GIO, HIO, PMC, and Compact PCI.

• Hubs—Fibre-Channel hubs are used to connect nodes in a loop. Logically, the hub is similar to a Token Ring hub with “ring in” and “ring out.” Each port on a hub contains a Port Bypass Circuit (PBC) to automatically open and close the loop. Hubs support hot insertion and removal from the loop. If an attached node is not operational, a hub will detect this condition and bypass the node. Typically, a hub has 7 to 10 ports and can be stacked to the maximum loop size of 127 ports.

• Link analyzers—Fibre-Channel link analyzers capture cause and effect of data errors. Specific frame headers can be monitored and captured for analysis.

• Routers/LAN Switches—Routers/LAN switches interface Fibre Channel with legacy LANs. These are layer 2 (L2) and/or L3 devices that use Fibre Channel for a reliable, gigabit backbone.

• SCSI bridges—Fibre Channel provides the ability to link existing SCSI-based storage and peripherals using a SCSI bridge. SCSI-based peripherals appear to the server or workstation as if they were connected directly on Fibre Channel.

• SNA gateways—SNA gateways interface Fibre Channel to SNA. Fibre-Channel HBAs are integrated into standard products such as the Novell SAA and Microsoft SNA gateways.

Page 50: High-Availability Network Attached Storage

Chapter 2

41

• Static switches—Static switches, also called link switches, provide point-to-point connections and are externally controlled. They offer a low-cost option for applications not requiring the fast, dynamic switching capability inherent in the Fibre-Channel protocol.

• Switches—Fibre-Channel switches are among the highest performing switches available for high-bandwidth and low-latency communications. The secret is in the Fibre-Channel protocol, designed specifically by the computer industry to remove the barriers of performance with legacy channels and networks. Today, a Fibre-Channel switch provides connection and connectionless service (Classes 1, 2, and 3) or only connectionless service (Classes 2 and 3). Typical connection setup or frame-switching time is less than 1 microsecond. Switches are stackable to meet the most demanding application requirements. The number of addresses available is 224 or more than 16 million. Switch options provide high-availability features.

• Switch WAN extender—Fibre-Channel switches can be connected over WANs using an Interworking Unit (IWU). Expansion ports on switches are linked using either ATM or STM services. Because Fibre Channel may be faster than a single ATM or STM interface, multiple WAN channels can be used for full Fibre-Channel bandwidth.

Designing High-Availability NAS There are a number of high-availability NAS backup design considerations that affect the initial selection of products and their deployment:

• LAN-free backups

• Serverless backups

• Clustering

• High availability

• Centralizing storage

• Video editing

• Disaster recovery

• Backup

All the preceding factors are interrelated and should be balanced for optimum use and cost savings.

Page 51: High-Availability Network Attached Storage

Chapter 2

42

LAN-Free Backups Traditional backup software transfers data from application and file servers over the LAN to a backup server and from there to tape. Thus, backups are scheduled overnight when the network is quiet. For 24 × 7 operations, there is often no quiet time, so backups degrade performance or another solution is needed.

LAN-free backups take the backup traffic off the LAN and put it onto a high-availability NAS solution. There are four benefits of LAN-free backup. Because the many-to-many connectivity of Fibre Channel allows a tape library to be shared by multiple servers, LAN-free backup amortizes the cost of that resource over multiple servers, making it much more affordable for mid-range UNIX or NT servers to have direct access to a library. It also minimizes disruption by removing backup traffic from the production LAN and onto the high-availability NAS solution, avoiding saturating the client-server LAN with backup traffic and allowing normal LAN operation to continue. Removing backup traffic from the LAN also increases performance by reducing the backup window, because data is backed up and restored via a 2Gbps Fibre Channel–based high-availability NAS solution rather than across a 10/100Mbps Ethernet network. In addition, because tape resources can be dynamically allocated to backup sessions on each server, intelligent scheduling can optimize the use of shared drives.

With the advent of new backup technologies such as Mammoth 2, LTO Ultrium, and Super DLT, there is scope for backup speeds to increase; however, speed is often limited by network bandwidth. Backups can only go as fast as data can be transferred to the backup server. There is little point backing up to high-speed drives over a low-speed network. Using high-availability NAS for LAN-free backup greatly improves the data transfer rate. Each link of a high-availability NAS runs at 200MBps and moves data more efficiently than similar speed Ethernet.

Each server holding data is effectively a mini backup server attached to the tape device via high-availability NAS. A main backup server still exists, controlling the backup schedules and maintaining the backup catalogues and databases. This catalogue data is transferred via the LAN to the backup server with only a very small impact on the LAN.

For high-availability–NAS–based storage, data is transferred to a server via the high-availability NAS solution, then to the tape device, again via high-availability NAS. Why doesn’t the data go straight from the disc to the tape drive? That would be serverless backup.

Serverless Backups Serverless backups are a great move forward from LAN-based backups. Discs and tape drives are both on the high-availability NAS solution. So why send all the data via a server? Why not send it straight to the tape drive from the disc? This option is a possibility using SCSI extended copy commands, but for a useful backup, you need to record the files backed up somewhere and you need a method for restoring individual files.

Serverless backup makes use of special software installed on the Fibre-Channel switch to move data between disc and tape and extract the catalogue information. Serverless backups no longer deal in file-by-file backups but in data blocks. The backup software instructs the RAID system to move a block of data directly from disk to tape via the Fibre-Channel switch with a built-in “data mover.” This data mover interprets the blocks of data and moves the data to the correct location.

Page 52: High-Availability Network Attached Storage

Chapter 2

43

The main performance advantage of serverless backups is the virtual elimination of the backup server. With all other backup methods, either a dedicated server is used or time has to be set aside on application servers, effectively imposing backup window limitations. Serverless backup allows for true, high-speed, windowless backups.

For serverless backup to work, you need both the correct hardware and software. Many backup software vendors will be releasing software in 2003 that provides serverless backup capabilities. It is important to purchase high-availability NAS hardware that includes the correct serverless backup agents.

Clustering Clustering is a concept that lets an aggregation of servers achieve a specific goal, such as availability, scalability, or performance. By combining a number of servers with the right network and software components, you can build a high-performance server cluster. To build high-performance storage to connect to the high-performance cluster, a high-availability NAS solution is employed. Just as a high-performance cluster uses multiple paths to increase performance, a high-availability NAS provides multiple high-performance paths for storage. It is now possible to build high-performance clusters with low-cost Windows-based software and Intel processors, bringing new price and performance to the market.

The differences between a cluster built for performance and a cluster built for availability are significant. The high-availability NAS cluster provides redundancy of all key components and data paths. For instance, there are two high-availability NAS switches, so in the event of failure, the data is always available via an alternative path to the servers. Usually, the server OS or clustering application will be ready to failover to another server in the event of a server failure, so the application is always available.

The external storage also has dual pathways to allow access to the data, which is usually protected against disk drive failure either through mirroring or a RAID-5 protection mechanism. For performance, parallelism creates higher aggregate bandwidth by combining the capacity of multiple pathways. This capability usually means a more expensive configuration than that of a high-availability NAS configuration. Instead of just redundant components, there are several parallel components to create multiple parallel pathways to reduce data performance constraints. Because mechanical storage devices are so much slower than the electronics in the server, a high-availability NAS solution can provide performance benefits by providing the servers with multiple paths to multiple storage devices to reduce the storage bottleneck. To provide the appropriate storage infrastructure, a low-overhead high-availability NAS switching environment is required. Additionally, a non-blocking or unconstrained pathway for the data is necessary. These pathways in a high-availability NAS solution are provided with Fibre-Channel links of 100MBps through a high-availability NAS switch.

High Availability Certain applications require high availability, which involves the duplication of all key components within the NAS solution. High-availability NAS can be used to improve data availability by providing alternative resilient paths.

Page 53: High-Availability Network Attached Storage

Chapter 2

44

Centralizing Storage Traditionally, storage has been purchased along with the server. These servers took up valuable space and periodically needed to be upgraded or replaced. For example, upgrading involves adding new hard disks, memory, or additional processors, which all requires scheduled server downtime. Every 2 to 3 years, a server would need to be replaced as a result of older processors or lack of storage expandability. To replace a server, you needed a full backup to be performed, which then would need to be restored to the new server. This process takes time and can sometimes cause problems when the new server is brought back online. Today, processors have far more processing power than many applications currently require; therefore, servers do not need to be changed as often.

Video Editing Current video, audio, and graphics editing systems are geared to increasing productivity through speed, collaboration, and multitasking. Broadcasters, effects professionals, film editors, and other rich media providers depend upon switches as the high-availability NAS interconnect for the world’s most advanced digital video and film editing systems. These systems demand high-performance out of the switching subsystem to allow creators to focus on content, not their storage system.

Disaster Recovery A key feature of high-availability NAS is the ability to easily set up a disaster recovery site. The distance can be extended to more than 100Km by using special GBICs or repeaters.

Backups Centralizing backups through an automated tape library has helped ensure greater data availability while freeing up IT staff to focus on more strategic management activities. If an overnight backup was not successful, an additional backup can be performed during operational hours because the data is not being backed up across the corporate IP network. This ability provides faster recovery and an additional level of protection.

SAN/NAS Server Component Redundancy The high-availability NAS components are configured to be physically replicated to separate locations. Failover and redundancy are provided between the high-availability NAS units. Each high-availability NAS component can access any storage Logical Unit Number (LUN) from any device on the SAN as easily as any host.

Other vendor high-availability NAS file servers use captive, vendor-supplied closed-system storage or, worse yet, storage is an afterthought to the entire environment. High-availability NAS is a further extension of the environment into NAS applications. Legacy or existing storage can be accessed from the SAN similar to a host computer. SAN benefits are now extended to the high-availability NAS framework.

Page 54: High-Availability Network Attached Storage

Chapter 2

In Figure 2.2, each oval designates a disk storage subsystem, each circle designates a disk LUN, and each color is an OS (UNIX, NT, Solaris and so on). Physically, this configuration is laid out as in the preceding text and images. All disk and storage management is performed at the SAN level. Administrators need only decide to which storage they want each host to have access.

Figure 2.2: The ANY-to-ANY framework.

Each high-availability NAS component can access any storage LUN within the SAN as a host computer can. High-availability NAS can use all of SAN’s ANY-to-ANY dynamic allocation of storage LUNs at anytime. As Figure 2.2 shows, high-availability NAS servers are configured with redundant access to LUNs on disparate storage devices.

Administrators can allocate any LUN of storage within the SAN to any host of high-availability NAS at any time (at their discretion). Administrators now have unprecedented storage allocation and resource flexibility. Any LUN within the SAN can be allocated to any host for storage-attached applications or assigned to high-availability NAS file serving.

Finally, the need to serve nonvolatile data to millions of clients worldwide continues to drive the design and deployment of high-availability NAS. Next, let’s take a look at a couple of design options.

45

Page 55: High-Availability Network Attached Storage

Chapter 2

46

High-Availability NAS Design Options The integration and testing of the high-availability NAS solutions requires that failure scenarios of each of the redundant network and storage building-block elements be validated. Tests cover the failures of filer cluster units, network links, and series switches. A network ping test, from Win2K or Windows XP and UNIX clients could be utilized to determine the average failover and recovery times.

When recovery time issues are concerned with clusters, the Event Log is of much more value, as when services go offline and come back online. You can get quite a bit of help from application-related events combined with the use of SNMP than by simply using ping.

Failover time is measured for each test scenario that involves removing an element from the test network. Recovery time is measured when the failed element is later reinserted into the test network. An NFS file copy can also be used to test data access continuity during each failure and recovery scenario. The test results should show how the high-availability NAS designs react to various network link and device failures and should validate the interoperability of high-availability NAS solutions.

For example, two design approaches to building a highly available NAS infrastructure were recently tested by Cisco Systems. These two approaches include a network design with multiple distributed switches and another configuration with redundant hardware in a single switch. Each topology applies to a different user scenario and addresses different levels of high-availability NAS requirements. The two tested configurations are referred to as:

• Distributed server aggregation layer (network design option 1)

• Redundant server aggregation layer (network design option 2)

Distributed Server Aggregation Layer The design that Figure 2.3 shows uses a highly redundant switch in the distribution layer and redundant switches in the access/server aggregation layer. The distribution and access/server aggregation layers are connected with Gigabit Ethernet.

Page 56: High-Availability Network Attached Storage

Chapter 2

Figure 2.3: High-availability NAS network design option 1.

Each dual-homed clustered filer is connected to the two switches at the access/server aggregation layer. Both switches are configured with Cisco’s Spanning-Tree Protocol enabled, the PortFast feature enabled for the ports connecting to filers, and the UpLinkFast feature enabled for the uplink ports connecting to the switches. Also, both switches have the Spanning-Tree Protocol and BackBoneFast feature enabled. To overlap Layers 2 and 3 for fast and predictable network convergence, one of the switches is configured with Spanning-Tree Root [Primary] and Hot Standby Routing Protocol (HSRP) [Active]. (For more information about the architecture’s layers, see the sidebar “High-Availability NAS Architecture Layers.”)

47

Page 57: High-Availability Network Attached Storage

Chapter 2

48

High-Availability NAS Architecture Layers

The access/server aggregation layer has two functions: end-user connectivity (such as workstation and printer access) and connectivity for servers and high-availability NAS (server aggregation). Scalability can be achieved at this layer with Virtual LAN (VLAN) technology. VLAN technology provides good segmentation and management techniques for addressing. It permits effective control of the broadcast domains. Using Inter Switch Link (ISL) or Institute of Electrical Engineers (IEEE) 802.1Q VLAN trunking protocols, multiple VLANs can be carried over a single network path. For example, the Cisco Spanning-Tree Protocol implementations (Per VLAN Spanning Tree—PVST—and PVST+) allow redundant Layer-2 network paths between the server aggregation layer and the distribution layer. Cisco’s EtherChannel technology allows network bandwidth to grow incrementally.

A VLAN can be spanned within the server aggregation layer. This spanning allows dual-homed servers to be connected to the server aggregation layer in a more homogenous way, providing increased redundancy to the application servers at the network edge. Using Cisco’s EtherChannel technology, an aggregate bandwidth of as much as 8Gbps can be scaled while connecting to servers and high-availability NAS filers. EtherChannel technology also allows the load balancing of traffic based on media access control (MAC) address, IP source/destination address, and Layer-4 port numbers.

The distribution layer aggregates Layer-2 traffic from multiple access layer switches, terminates VLANs and subnets, and directs traffic as needed to the core. The distribution layer provides the mechanism for first-hop redundancy to the clients/servers by configuring HSRP. Additionally, HSRP optional features such as interface tracking and preempt delay ensure increased availability. The network can quickly converge in the event of a connectivity failure to the core layer and allow reconvergence of Layer 3 due to bringing up failed distribution switches.

In the event of a failure in the distribution layer, the overall convergence time can be minimized by overlaying Layer 2 (Spanning-Tree Root [Primary]) and Layer 3 (HSRP [Primary]) on the same distribution switch. Scalable routing protocols such as Open Shortest Path First (OSPF) and Enhanced Interior Gateway Routing Protocol (EIGRP) can be fine-tuned to achieve the best possible convergence time and to optimize load balancing between the distribution and core layers.

Typically, the core layer is a Layer-3 high-speed backbone for the campus network with low latency and high packet throughput. It acts as an access point for intranet connectivity and aggregates traffic from multiple distribution layers.

The Network Appliance dual-homing method makes use of Layer-2 connectivity. In this configuration, only one interface is active and the second is in standby mode. In the event that an active interface/link fails, the standby interface takes over the active role with minimum network interruption. The filer’s software monitors the active link and handles the failover. The dual-homed filer configuration has been tested and validated with both Fast Ethernet and Gigabit Ethernet connectivity.

Test Results For Network Design Option 1 The results indicate the time the network takes to converge after a failure and to recover after reinserting the failed link or device. The network convergence time is measured from the Win2K or Windows XP client, with the client continuously pinging both filer cluster partners at 1-second intervals. Different failure scenarios were simulated, failing one element or segment during each test. During all test scenarios, a series of NFS file-copy operations was performed to validate that continuous data access was maintained during failure and recovery phases. Network design option 1 passed all failure and recovery tests.

Page 58: High-Availability Network Attached Storage

Chapter 2

49

Table 2.1 details the 17 test scenarios. The table indexes 1a through 8b, which correspond to the physical locations depicted in Figure 2.3. The tabular results apply to the test configurations only. Variations in equipment and configurations will have an effect on the failover times.

Location Type of Failure and Recovery

Convergence Time (in seconds)

Feature Responsible for Convergence

1a Fail the forwarding uplink between access and distribution (Layers 2 and 3 active) switch

3 Spanning Tree; UpLinkFast

Restore the failed uplink between access and distribution (Layers 2 and 3 active) switch

0 Spanning Tree

1b Fail the blocking uplink between access and distribution (Layers 2 and 3 standby) switch

0 Spanning Tree

Restore the failed uplink between access and distribution (Layers 2 and 3 standby) switch

0 Spanning Tree

2a Fail the active supervisor on the distribution (Layers 2 and 3 root) switch

1 HSRP

Restore the failed supervisor on the distribution (Layers 2 and 3 root) switch

0 Supervisor HA Protocol

2b Fail the router on the distribution (Layers 2 and 3 root) switch

1 HSRP

Restore the failed router on the distribution (Layer 2 root) switch

1 HSRP

2c Fail the distribution (Layers 2 and 3 root) switch

1 HSRP

Restore the failed distribution switch

8 Spanning Tree; HSRP

3a Fail one of the links in EtherChannel between distribution switch

0 to 1 EtherChannel

Restore the failed link in EtherChannel between distribution switch

0 to 1 EtherChannel

Page 59: High-Availability Network Attached Storage

Chapter 2

50

Location Type of Failure and Recovery

Convergence Time (in seconds)

Feature Responsible for Convergence

3b Fail both links in EtherChannel between distribution switch

4 Spanning Tree; BackBoneFast

Restore both the failed links in EtherChannel between distribution switch

0 Spanning Tree

4a Fail the active supervisor on the distribution (Layers 2 and 3 standby) switch

0 Supervisor HA Protocol

Restore the failed supervisor on the distribution (Layers 2 and 3 standby) switch

0 Supervisor HA Protocol

4b Fail the router on the distribution (Layers 2 and 3 standby) switch

0 HSRP

Restore the failed router on the distribution (Layers 2 and 3 standby) switch

0 HSRP

4c Fail the distribution (Layers 2 and 3 standby) switch

0 Spanning Tree; HSRP

Restore the failed distribution (Layers 2 and 3 standby) switch

0 Spanning Tree; HSRP

5a Fail the forwarding uplink between server aggregation and distribution (Layers 2 and 3 active) switch

3 Spanning Tree; UpLinkFast

Restore the failed uplink between server aggregation and distribution (Layers 2 and 3 active) switch

0 Spanning Tree

5b Fail the blocking uplink between server aggregation and distribution (Layers 2 and 3 standby) switch

0 Spanning Tree

Restore the failed uplink between server aggregation and distribution (Layers 2 and 3 standby) switch

0 Spanning Tree

6 Fail the server aggregation switch

3 Layer 2 [Single-Mode VIF]

Restore the failed server aggregation switch

0 Spanning Tree; PortFast

Page 60: High-Availability Network Attached Storage

Chapter 2

51

Location Type of Failure and Recovery

Convergence Time (in seconds)

Feature Responsible for Convergence

7a Disconnect the standby link between filer and server aggregation switch

0 Layer 2 Protocol [Single -Mode VIF]

Restore the disconnected link between filer and server aggregation switch

0 Spanning Tree; PortFast

7b Disconnect the active link between filer and server aggregation switch

1 to 2 Layer 2 Protocol [Single -Mode VIF]

Restore the disconnected link between filer and server aggregation switch

0 Spanning Tree; PortFast

8a Fail the filer head [switch off the filer]

45 Cisco Cluster Monitor

Restore the filer head [switch on the filer]

75 Cisco Cluster Monitor

8b Disconnect the link between filer and disk array

120-180 Cisco Cluster Monitor

Reconnect the link between filer and disk array

0 Cisco Cluster Monitor

Table 2.1: Option 1 test scenarios.

Page 61: High-Availability Network Attached Storage

Chapter 2

Redundant Server Aggregation Layer The design shown in Figure 2.4 makes use of highly redundant switches in the distribution and server aggregation layers, with redundant switches in the access layer. The distribution and access/server aggregation layers are connected with Gigabit Ethernet.

Figure 2.4: High-availability NAS network design option 2.

For example, the filer cluster connects to highly redundant switches with dual supervisor (high-availability option enabled) through Cisco’s Gigabit EtherChannel. The Gigabit Ethernet ports connecting to the filers are bundled across separate blades using EtherChannel, providing physical redundancy and load balancing. The uplink ports connecting to the switches in the distribution layer are configured with the UpLinkFast feature enabled, and the ports connecting to filers are configured with the PortFast option enabled for fast convergence in the event of a link failure.

To take advantage of redundancy and load balancing, all switches have Spanning-Tree Protocol enabled. The switches in the distribution layer are configured with the BackBoneFast feature, which provides fast Layer-2 convergence due to indirect failure. To overlap Layers 2 and 3 for fast and predictable network convergence, one of the switches in the distribution layer is configured with Spanning-Tree Root [Primary] and HSRP [Active].

52

Page 62: High-Availability Network Attached Storage

Chapter 2

53

Test Results for Network Design Option 2 The results indicate the time taken for the network to converge after a failure and for recovery after the failed link or device is reinserted. The network convergence time is measured from the Win2K or Windows XP client, with the client continuously pinging both filer cluster partners at 1-second intervals. Different failure scenarios were simulated, failing one element or segment during each test. During all test scenarios, a series of NFS file-copy operations was performed to validate that continuous data access was maintained during failure and recovery phases. Network design option 2 passed all failure and recovery tests.

Table 2.2 details the 17 test scenarios. The table indexes 1a through 8b, which correspond to the physical locations depicted in Figure 2.4. The tabular results apply to the test configurations only. Variations in equipment and configurations will have an effect on the failover times.

Location Type of Failure and Recovery

Convergence Time (in seconds)

Feature Responsible for Convergence

1a Fail the forwarding uplink between access and distribution (Layers 2 and 3 active) switch

3 Spanning Tree; UpLinkFast

Restore the failed uplink between access and distribution (Layers 2 and 3 active) switch

0 Spanning Tree

1b Fail the blocking uplink between access and distribution (Layers 2 and 3 standby) switch

0 Spanning Tree

Restore the failed uplink between access and distribution (Layers 2 and 3 standby) switch

0 Spanning Tree

2a Fail the active supervisor on the distribution (Layers 2 and 3 root) switch

1 HSRP

Restore the failed supervisor on the distribution (Layers 2 and 3 root) switch

0 Supervisor HA Protocol

2b Fail the router on the distribution (Layers 2 and 3 active) switch

1 HSRP

Restore the failed router on the distribution (Layer 2 active) switch

1 HSRP

2c Fail the distribution (Layers 2 and 3 active) switch

1 HSRP

Restore the failed distribution (Layers 2 and 3 active) switch

8 Spanning Tree; HSRP

Page 63: High-Availability Network Attached Storage

Chapter 2

54

Location Type of Failure and Recovery

Convergence Time (in seconds)

Feature Responsible for Convergence

3a Fail one of the links in EtherChannel between distribution switch

0 to 1 EtherChannel

Restore the failed link in EtherChannel between distribution switch

0 to 1 EtherChannel

3b Fail both links in EtherChannel between distribution switch

4 Spanning Tree; BackBoneFast

Restore both the failed links in EtherChannel between distribution switch

0 Spanning Tree

4a Fail the active supervisor on the distribution (Layers 2 and 3 standby) switch

0 Supervisor HA Protocol

Restore the failed supervisor on the distribution (Layers 2 and 3 standby) switch

0 Supervisor HA Protocol

4b Fail the router on the distribution (Layers 2 and 3 standby) switch

0 HSRP

Restore the failed router on the distribution (Layers 2 and 3 standby) switch

0 HSRP

4c Fail the distribution (Layers 2 and 3 standby) switch

0 Spanning Tree; HSRP

Restore the failed distribution (Layers 2 and 3 standby) switch

0 Spanning Tree; HSRP

5a Fail the forwarding uplink between server aggregation and distribution (Layers 2 and 3 active) switch

3 Spanning Tree; UpLinkFast

Restore the failed uplink between server aggregation and distribution (Layers 2 and 3 active) switch

0 Spanning Tree

5b Fail the blocking uplink between server aggregation and distribution (Layers 2 and 3 standby) switch

0 Spanning Tree

Restore the failed uplink between server aggregation and distribution (Layers 2 and 3 standby) switch

0 Spanning Tree

Page 64: High-Availability Network Attached Storage

Chapter 2

55

Location Type of Failure and Recovery

Convergence Time (in seconds)

Feature Responsible for Convergence

6 Fail the active supervisor on the server aggregation switch

2 to 3 Supervisor HA Protocol

Restore the failed supervisor on the server aggregation switch

0 Supervisor HA Protocol

7 Fail one of the links in EtherChannel between filer and server aggregation switch

0 to 4 EtherChannel

Restore the failed link in EtherChannel between filer and server aggregation switch

0 EtherChannel

8a Fail the filer head [switch off the filer]

45 Cisco Cluster Monitor

Restore the filer head [switch “on” the filer]

75 Cisco Cluster Monitor

8b Disconnect the link between filer and disk array

120 to 180 Cisco Cluster Monitor

Reconnect the link between filer and disk array

0 Cisco Cluster Monitor

Table 2.2: Option 2 test scenarios.

Next, let’s look at why lowering high-availability NAS costs is good, but not when it puts even less crucial data at risk. In other words, what are the high-availability NAS cost justifications and considerations?

Cost Justification and Considerations Another way organizations can cut their costs is by making the switch from direct-attached to high-availability NAS. For example, Continental Airlines’ Houston-based scheduling, cargo, and passenger revenue management and pricing division took this approach when it upgraded its high-availability NAS systems. The division maintains a set of 12 Oracle databases, containing the schedules for the airline’s more than 3000 daily flights; tallying the number of passengers on each flight as well as competitors’ schedules and prices. If the data goes down, the customers go elsewhere.

As previously explained, high-availability NAS uses shared storage on the network, typically through dedicated, high-performance servers specifically designed for file storage. These devices, as you know, are called filers. They provide high-speed data access while freeing up the application servers for more CPU-intensive tasks.

Page 65: High-Availability Network Attached Storage

Chapter 2

56

Continental went with two Network Appliance F840 filers that are connected through a gigabit Ethernet switch to two Sun Microsystems 10000 Enterprise servers running eight applications in separate domains. To enhance reliability, the filers include built-in RAID 4, fast online replication and point-in-time copies.

The filers also take a snapshot of each data volume seven times a day, with the final snapshot being backed up on tape every night. Under the old system, backups could only be done once a week on Saturdays because the process took 6 hours to complete. Now, daily backups are done in 5 minutes per volume.

Although these reliability features are enough to make some firms switch to high-availability NAS, there are also the bottom-line gains. In Continental’s case, 10TB of storage came at a cost of $300,000. Quotes for a comparable amount of direct-attached storage ran as high as $2 million. Finally, if an enterprise is already thinking of consolidating data sites to reduce costs (see the sidebar “Lower High-Availability Hierarchical NAS Management Costs”), high-availability NAS should be considered to further leverage those efforts and provide additional savings.

Lower High-Availability Hierarchical NAS Management Costs

One thing everyone can agree on regarding hard times: They make you appreciate what you have. Gone is the expansionist mentality of years past, when the easiest way to solve a storage problem was to simply throw more capacity at it. Instead, companies are now taking on the air of conservationists concerned about their storage resources and determined to make the most of them. Even tape storage is quickly returning to vogue as a component of a not-so-new idea called high-availability hierarchical NAS management. Furthermore, this idea is just another flavor of the traditional Hierarchical Storage Management (HSM) concept.

Unlike the every-byte-is-sacred mantra that closed a slew of high-dollar storage array sales during the past few years, high-availability hierarchical NAS management operates under the assumption that not all stored data needs to reside on a tier-one storage system. With high-availability hierarchical NAS management, infrequently used data is taken off the pricey system and dropped onto less expensive storage disk arrays. Data that is rarely touched is stored on tape. Furthermore, the true physical location of the data is transparent to the user. This setup works like traditional HSM in which the file name of a migrated file remains on the magnetic storage with a pointer referencing the true location of the data. As far as the user is concerned, the data is right there on the disk—only its access time is a little slow. Nevertheless, high-availability hierarchical NAS management does not change a company’s backup, restore, or data-mirroring policies, but simply melts away a residual dot-com era belief that throwing money at a storage problem is a sure fix.

But alas, the key word in the term “storage business” is still “business.” How is a storage buyer supposed to know he or she has the option of cheaper storage systems next to the expensive ones if his or her sales representative never reveals this information? Times may be getting tougher for storage managers, thanks to the march of legal regulations, such as the Health Insurance Portability and Accountability Act (HIPAA) of 1996, and the expected fallout from the recent document-shredder scandals. Fears that some legal audit trail may lead to archived data that may have been degraded on a 4-year-old tape cartridge takes us back to the idea of high-availability hierarchical NAS management with renewed emphasis on the basic question: What data within an organization can ever again be considered “less important”?

Inexpensively archived data can be corrupted by as much as 20 percent during a long period. Thus, the hidden danger of high-availability hierarchical NAS management. If you’re being sued for malpractice, and you retrieve an old file of an X-ray and suddenly there’s a dark spot on the lung, that’s big trouble.

Next, let’s very briefly look at high-availability NAS standards design issues. Why briefly, you ask? Because there are no standards.

Page 66: High-Availability Network Attached Storage

Chapter 2

57

Lack of NAS Standards Storage managers cried out for interoperability, and behold, the high-availability NAS management software platform was born! The concept of high-availability NAS management heterogeneity is gaining traction in the vendor community courtesy of companies such as EMC, TrueSAN, and Sun Microsystems. But some unfinished business remains that threatens to trap you in more dark proprietary corners. If you survey the current crop of high-availability NAS management software platforms, it becomes clear that the unfinished business is a real commitment to standards, but more on that in a moment.

First up is AutoIS, EMC’s management effort, which includes a set of middleware applications called WideSky that are designed to manage both EMC and competing storage hardware. Theoretically free from platform bias, WideSky supports an impressive list of OSs and storage hardware, although Fujitsu is curiously not included, and you can only manage IBM and StorageTek tape drives.

Next, TrueSAN recently revealed it had celestial inspiration launching Cloudbreak, arguably one of the first storage operating systems to tie storage virtualization, network management, device management, and resource management in one platform.

Then there’s Sun Microsystems, which is trying to push its StorEdge suite, part of the storage Open Net Environment (ONE) architecture as a solution for managing high-availability multiplatform NAS environments. The trouble is, by Sun’s own admission, these solutions are optimized for the Solaris platform and servers.

The fundamental technical problem confronting all three of these options is significant. For an application to access data on high-availability multiplatform NAS, it must cross several software and hardware layers, each from a different vendor with its own proprietary and generally uncooperative management interface. Consequently, changing how data is accessed or how high-availability NAS performs normally requires configuring each layer separately— more than an annoyance for vendors trying to automate high-availability NAS management. So, while single-vendor solutions such as EMC’s Symmetrix, XIOtech’s MAGNITUDE, and Compaq’s (now HP’s) StorageWorks offer a good set of proprietary high-availability NAS management solutions, crossing vendor borders (as WideSky, Cloudbreak, and Storage ONE attempt to do) means entering uncharted territory. Without open standards governing the interoperability of the various hardware and software layers in high-availability NAS systems, even multiplatform high-availability NAS management solutions remain essentially proprietary. In other words, a lack of connectivity standards continues to leave vendors free to implement overlapping and mostly self-centered solutions. It’s time for a real multivendor initiative that does not attempt to sell one vendor’s proprietary dream.

Now, let’s discuss how high-availability NAS has brought networking technology to the data storage environment. In other words, let’s take a look at how high-availability NAS provides better availability and reliability, performance, and more configuration flexibility.

Page 67: High-Availability Network Attached Storage

Chapter 2

58

Architectural Design Considerations The growing market for high-availability NAS is a result of the exploding demand for storage capacity in the increasingly Internet-dependent world and its tight labor market. High-availability NAS is a proven approach to networking storage. Technically, including a file system in a storage subsystem differentiates NAS, which has one, from SAN, which doesn’t. In practice, however, it is often high-availability NAS’s close association with the Ethernet network hardware and SAN with the Fibre-Channel network hardware that has a greater effect on a user’s purchasing decisions. This part of the chapter focuses on high-availability NAS architecture emerging technology. At the same time, it keeps in mind that this emerging technology may blur the network-centric distinction between NAS and SAN. For example, the decreasing specialization of SAN protocols promises SAN-like devices on Ethernet network hardware. Alternatively, the increasing specialization of high-availability NAS systems may embed much of the file system into storage devices. For users, it is increasingly worthwhile to investigate the high-availability NAS storage core and emerging architectural technologies.

High-Availability NAS Solution Architecture A high-availability NAS architecture consists of the following components:

• Availability—Filer cluster • Scalability—Network storage • Flexibility and manageability—Wide sharing and local sharing deployments • System availability • Data availability

A high-availability NAS solution incorporates a network-layered architecture with a high-availability NAS architecture. The integrated solution enables reliable, high-performance, universal access to shared, consolidated information across a medium to a large campus-type network. The combined architecture provides the following benefits:

• Availability—Redundant hardware components in combination with multiple distributed devices contribute to a highly available NAS infrastructure and data storage solution. The failure of any single hardware device or network link will not prevent access to data storage. Software features in both the network and storage layers provide for automatic failover and continuous data access.

• Scalability—The modularity of the architecture allows for more accuracy in capacity planning at each layer. The solution is scalable at the different layers, permitting the enterprise to scale not only bandwidth, but also the number of users and storage utilization. Load balancing can be carried out between different redundant devices and paths, efficiently managing traffic and optimizing link utilization.

• Flexibility and manageability—Simple but deterministic Layer 2 and Layer 3 paths make it easier to manage the network and deploy storage. Optional redundancy at each layer can be provided without breaking or disrupting the entire network. The flexibility of the architecture allows the addition of storage appliances to the network infrastructure based on the solution or application that the storage will support. Typical deployments include enterprise-wide sharing of data among numerous campus clients or local sharing between data center application servers and local storage appliances.

Page 68: High-Availability Network Attached Storage

Chapter 2

Availability: Filer Cluster Two filers can be configured as a cluster to provide increased protection against hardware failures. Clustered filers are connected through an interconnect adapter and cables and are configured so that both filers share access to the same set of Fibre Channel disks and networks. Each filer uses the cluster interconnect to continually monitor the availability of the partner filer. The interconnect is also used to mirror each filer’s nonvolatile RAM (NVRAM) log data and to synchronize the time of the clustered partners. Fibre Channel loops connect each filer to its own disks, and a separate Fibre Channel loop provides a connection to its partner’s disks (see Figure 2.5).

NVRAM is a type of computer memory that retains data in the event of a loss of power. In filers, NVRAM is used for logging incoming write data and requests.

Figure 2.5: High-availability NAS solution architecture.

59

Page 69: High-Availability Network Attached Storage

Chapter 2

60

Each filer has primary responsibility for a subset of the disks and both can operate independently. The cluster architecture is an active/active configuration. During normal operation both filers are operating and serving data from their individual disk arrays. If a system failure occurs on one filer in a cluster, the partner filer will perform a takeover of the failed filer functions and provide client access to the data on the failed filer’s disk arrays. The partner filer maintains its own network identity and its own primary functions but also assumes the network identity of the failed machine and handles the added functionality through a virtual filer. Redundant disk, fan, or power supply failures are handled independently in the same manner as with a standalone filer; these failures do not trigger failovers. In addition to an automatic takeover, a manual takeover can be forced at any time. This ability lets you complete scheduled filer maintenance tasks without interrupting data services.

Clustered partners protect against a failure of a filer system unit, not a single network interface failure. Network connection problems are better handled at the interface level.

Cisco Systems’ Data ONTAP software allows filer NICs to be configured in multihomed configurations with an active interface that fails over to a standby interface whenever a loss of network link occurs. Multiple interfaces can be bundled to form a compatible EtherChannel. In this configuration, all interfaces are active and the switch controls the link failover and load balancing.

Scalability: Network Storage High-availability NAS appliance storage provides a very flexible configuration for building a scalable storage network. Filers connect directly to the network and are accessed by using the industry-standard network file system protocols: NFS and CIFS. By separating the data storage and file system operations from clients and application servers, filers let the high-availability NAS farm transparently scale in capacity and file system processing power.

When more storage is needed, additional disks can be added to one or multiple filers. There is no need to reconfigure application server hardware or pre-allocate storage for a specific client OS or file system type. Data is immediately available to all network clients and servers. In most cases, this storage can be added on the fly without disrupting applications. A single filer can scale to multiple terabytes. Multiple filers can be used to scale storage into very large storage farms.

A filer provides a complete file store and handles all the necessary processing and management of data. File system operations, RAID protection, quota management, and multiprotocol access are all handled by the storage appliance. Application servers and clients can devote their resources to running applications. When additional application servers are needed, they can be transparently added. If more file system operations and storage processing power is needed, additional filer processor units can be added to the storage farm. A storage farm architecture can be infinitely scaled for most applications and networks.

Page 70: High-Availability Network Attached Storage

Chapter 2

61

Flexibility: Wide Sharing and Local Sharing Deployments The flexibility of a high-availability NAS allows for deployments in a variety of different applications and network configurations. Enterprise, or wide area, sharing enables information stored on a filer to be accessible from numerous locations and clients. Different types of clients and applications can all share data stored in home directories and project directories. The client machines communicate directly with the filer. The client desktop machines can be in the same building or on opposite sides of the world from the filer (see Figure 2.5).

The high-reliability and data management features of filers make them suitable storage solutions for mission-critical server applications such as database, email, and Web servers. In a local sharing environment, the filer and application server are located in the same data center and communicate on a dedicated local server farm network. Clients accessing the application server use separate wide-sharing network connections to send their requests to the application server. All filer to server communications are carried out by the application servers over the dedicated storage network. The dedicated network connections between the application servers and the filers provide high reliability and high performance.

Now, let’s shift the discussion to system availability. As you know, information is a critical business asset and customers today require continuous availability to data. Enterprise storage solutions must furnish a high degree of protection for corporate data, provide near-continuous data access, and incorporate procedures to correct problems with minimal business impact. Most vendors today only focus on system availability. Although system availability is extremely important, this focus doesn’t address data availability emergencies, such as

• Data corruption occurring within an application

• A UNIX user simultaneously edits a file being read and/or written to by a Windows user, causing file corruption or a crashed Windows application

• Major software upgrades failing or corrupting data

• Critical files being accidentally deleted or incorrectly modified

• Natural or man-made disasters

The normal recovery mechanism for such emergencies is to recover a previous instance of the data set by reloading from tape. This method means that data is unavailable during the recovery period—an unacceptable situation in today’s business environment. Thus, this part of the chapter discusses both system and data availability.

Page 71: High-Availability Network Attached Storage

Chapter 2

System Availability Availability is typically measured as a percentage of total uptime available over the course of a year. For example, a 99.99 percent availability requirement translates into 53 minutes of downtime, whereas a 99.9 percent availability requirement means 8.8 hours of downtime per year, as Table 2.3 shows. Downtime includes planned maintenance and unscheduled outages.

Availability Classification Level of Availability (%) Annual Downtime

Continuous Processing 100 0 minutes Fault Tolerant 99.999 5 minutes Fault Resilient 99.99 53 minutes High Availability 99.9 8.8 hours Normal Commercial Availability (single node)

99 to 99.5 87.6 to 43.8 hours

Table 2.3: System availability classifications.

By the preceding standards, the high-availability NAS appliance enterprise storage architecture provides a fault-resilient level of system availability, which is demonstrated with average system availability statistics as measured across the high-availability NAS appliance installed based of more than 10,000 storage appliances (an average of 99.99 percent, as Figure 2.6 shows).

Figure 2.6: Average system availability measured across a high-availability NAS appliance.

62

Page 72: High-Availability Network Attached Storage

Chapter 2

63

To better understand how a high-availability NAS appliance achieves such high average system availability, it is important to understand the major causes of system failures. IT managers indicate the major causes of system failure are, in order of frequency:

• Software defects/failures

• Planned administrative downtime

• Operator error

• Hardware outage/maintenance

• Building/site disaster

• Metropolitan disaster

A high-availability NAS appliance effectively achieves fault-resilient system availability by excelling in each of the areas previously listed. The appliance approach improves reliability because it performs a single function very well. A general purpose computer has many features and applications that make it impossible to test all possible usage patterns. High-availability NAS appliances can be tested much more thoroughly because they do only one thing.

The high-availability NAS appliance architecture is driven by a robust, tightly-coupled, multi-tasking, real-time microkernel. This pre-tuned compact kernel minimizes complexity and improves reliability. For example, Data ONTAP software is less than 2 percent the total size of general purpose operating systems.

The high-availability NAS appliance approach also helps improve overall application availability, in that file system operations that normally run on general purpose application file servers are no longer executed—thus improving the general application server availability. This differentiation is clear when compared with conventional storage subsystems. In these examples, the odds of application server downtime are increased as a result of the 100 percent dependency on the application server’s OS and file system software for all I/O operations. This setup contrasts significantly with high-availability NAS appliance deployment options, which allow for multiple application servers, such that the failure of any one of those application servers does not preclude the other application servers from accessing the data. This benefit isn’t measured in high-availability NAS appliance fault-resilient availability.

High-availability NAS appliances utilize proven high-volume, industry-standard hardware components, which help drive high hardware reliability. The most common components to fail are disk drives followed by power supplies and fans. High-availability NAS appliances utilize redundant disks (RAID), power supplies, and fans for system units and shelves to protect customers against these common component failures.

For example, the Data ONTAP kernel utilizes the robust Write Anywhere File Layout (WAFL) file system. WAFL and RAID were designed together to avoid the performance problems that most file systems experience with RAID and to ensure the highest level of reliability. RAID is integrated into the WAFL file system to eliminate operator errors, OS and application software release mismatches, patch level mismatches, and so on.

Page 73: High-Availability Network Attached Storage

Chapter 2

64

High-availability NAS appliances use RAID-4 parity protection for all data stored in the disk subsystem. In the event that any disk drive fails, the data on the failed drive is reconstructed to a global hot spare disk drive. While reconstruction occurs, requests for data from the failed disk are served by reconstructing the data on the fly with no interruption in file service. RAID-4 provides the benefit of dynamic file system and RAID group expansion with just a single command.

Some vendors claim that running RAID-4, RAID-5, and RAID-S is unsafe due to the fact that a double-disk failure within a single RAID group will cause data loss. This claim is stated usually for one of two reasons: The vendor’s RAID-5 or RAID-S performance is significantly slower than high-availability NAS appliance RAID-4 performance, or the vendor wants to sell RAID 0+1 (striping + mirroring), which is twice the number of disks. Although it is true that a double-disk failure within a single RAID group will cause data loss, one must look at the probability of such an event happening.

Let’s examine four cases using 84 18GB disks (1.512TB raw capacity) connected to Network Appliance’s high-availability NAS high-end F760 storage appliance (see Table 2.4):

• F760, 84 18GB disks, 1 volume, twelve 7-drive RAID groups

• F760, 84 18GB disks, 1 volume, six 14-drive RAID groups

• F760, 84 18GB disks, 3 volumes, four 7-drive RAID groups per volume

• F760, 84 18GB disks, 3 volumes, two 14-drive RAID groups per volume

Number of Volumes

Number of RAID Groups per Volume

Number of Disks per RAID Group

Mean Time to Data Loss (MTTDL) in Years Per Volume

MTTDL in Years for Entire Storage Appliance

1 12 7 348,460 348,460 1 6 14 99,560 99,560 3 4 7 1,045,378 348,460 3 2 14 298,679 99,560

Table 2.4: MTTDL for four high-availability NAS appliance setups.

Based on these results, one can see that the odds of a double-disk failure are extremely rare. RAID protection means that the chance of a double-disk failure in which data might be lost is measured in terms of tens of thousands of years. After months or years of use, a few blocks on a disk will go bad. Therefore, a few media read errors on disk blocks over time is normal for a disk.

Page 74: High-Availability Network Attached Storage

Chapter 2

65

For example, the data ONTAP kernel will retry reading a disk if there is a media error. If the read error persists, ONTAP will do the following: recalculate the data by reading other disks in the RAID group, reap the bad block to another area of the disk, and store the correct data in the remapped block. As files are read and re-read over time, ONTAP will automatically reap blocks as necessary. Some files are not read for months at a time or are never re-read. In this case, the areas of the disk in which these files exist may never be accessed. ONTAP has a special feature, RAID scrubbing, which forces a read on every disk block. Even if a user never reads a given file, the filer ensures that it will be read by RAID scrubbing, which forces a read on every disk block. If a media error is detected, the block will be remapped. This behavior avoids the situation in which a disk block could go bad over a period of months or years and not be repaired.

To protect against this scenario, high-availability NAS appliances routinely verify all data stored in the file system using RAID scrubbing. By default, this verification occurs once per week, early on Sunday morning; although you can be rescheduled or suppressed this behavior altogether. During this process, all data blocks are read in parallel. If a media error is encountered, the bad block is recomputed and the data is rewritten to a spare block. The WAFL file system uses NVRAM to keep a log of NFS requests it has processed since the last consistency point. If, for some reason, a power failure were to occur, this would force an unclean shutdown.

NVRAM is special memory with batteries that allow it to store data even when system power is off.

When the storage appliance reboots following a power failure, the storage appliance finds the current consistent state on the disks and replays the outstanding requests from the log. The file system can ignore any writes that were in progress when the storage appliance lost power because it knows the written blocks are unallocated in the last consistent image.

The high-availability NAS appliance hardware provides several features to enhance system and data availability. For example, it has a watchdog timer to detect certain software failures, environmental monitoring, low meantime to repair, and redundancy in failure-prone components such as main memory (ECC), disks (RAID), power supplies, and fans.

The robust Data ONTAP software is based on a simple, message-passing kernel that has fewer failure modes than general purpose OSs. These features combine to demonstrate average system availability greater than 99.99 percent.

Although the measured availability on average is fault resilient, the storage appliance does not tolerate failures of main system components, such as the system board. To eliminate a single point of failure, high-availability NAS appliances offer cluster failover as an option. Cluster failover provides hardware redundancy without adding complexity.

Data availability is also affected by planned downtime. Planned downtime typically occurs at predetermined times, such as once a month or quarter. With conventional file servers and storage subsystems, time must be planned for activities that include backup, software maintenance, hardware maintenance, application/database upgrades, OS upgrades, and so on. The storage appliance architecture minimizes, and almost completely eliminates, the need for planned downtime. Let’s walk through some causes of planned downtime and examine how the high-availability NAS appliance enterprise storage architecture addresses each.

Page 75: High-Availability Network Attached Storage

Chapter 2

66

First, the average annual aggregate user productivity loss due to disk grooming (for the purpose of increasing file system size on server storage) is 3129 hours. Conventional storage subsystems require complex reconfiguration or scheduled downtime to accomplish file system and RAID group expansion. The high-availability NAS appliance addresses this requirement by allowing systems administrators to add disk storage and dynamically expand file systems and RAID groups with a single command and with zero downtime. Also, logical partitions and shares within file systems can be dynamically expanded with zero downtime.

Second, data management tasks, such as the spreading of data across spindles to get better performance, are automatically managed by the high-availability NAS appliance. It can be thought of as a self-tuning automobile that requires minimal intervention, if any.

Third, OS upgrades normally require hours of downtime. With a high-availability NAS appliance, upgrades are installed while the system is operational and serving data. A simple reboot of the storage appliance is then scheduled at your preferred time, and the reboot takes only about 90 seconds.

Finally, in many cases, data must be taken offline to ensure a safe, consistent backup or a particular application must be put into hot backup mode, which affects overall system performance while the backup is occurring. With a high-availability NAS appliance, a safe, consistent backup can be achieved from a snapshot copy of the file system with zero downtime (snapshots are discussed further later in the chapter). Specific applications, such as a RDBMS, can be put in hot backup mode for a few seconds while the snapshot copy is taken to ensure 100 percent data availability. Many competitive offerings require that the data be replicated prior to being backed up.

Operator error often translates into unplanned downtime, which is more disruptive than planned downtime. The high-availability NAS appliance enterprise storage architecture greatly minimizes the chance of operator error given that there are simply few tasks that ever need to be done. For example, RAID is built into the file system, so no setup or ongoing configuration is required. No data management involvement, other than expanding file systems, need be done. User authentication is done via an NT Primary Domain Controller (PDC) or NIS server. For those few administrative tasks, an easy-to-use Web-based interface is provided. For Windows administrators, the storage appliance takes advantage of NT User and Server Manager.

Data Availability The high-availability NAS appliance enterprise storage architecture is designed to provide a level of data availability never before seen in the industry. Most vendors focus only on system availability, which is not satisfactory given today’s business environment. Increased data availability translates into higher revenue, profit, and productivity. Historically, organizations have paid significant premiums for hardware and software to obtain high system availability, which often does not produce increased data availability.

Page 76: High-Availability Network Attached Storage

Chapter 2

The Snapshot Feature: Instantaneous File Recovery Accidental deletion of a critical file usually results in a user calling an IT Help Desk and requesting that a systems administrator restore the file from tape. This process results in productivity loss for the user until the file is restored, and it requires a systems administrator to spend valuable time going to the data center, loading a tape, and retrieving the file from tape. This situation is common. A recent study by Strategic Research (http://www.srinstitute.com/home.cfm) indicates that the average site restores files from tape 144 times per year. The snapshot technology enables users to instantaneously recover accidentally deleted files without having to call an IT Help Desk (see Figure 2.7). This ability results in productivity gains for the user and less strain on already stressed IT staff.

Figure 2.7: Example of file recovery utilizing snapshot technology.

67

Page 77: High-Availability Network Attached Storage

Chapter 2

Instantaneous File System Recovery Time to recovery has become an important measurement in many IT organizations. Some dire situations (the discovery of corruption in a database) may require the full restoration of a previously saved state leaving data unavailable for an extended period of time. According to Strategic Research, the average site does two full file system restorations per year. Instantaneous file system recovery software lets a file system revert back to a previous point in time by providing a level of data availability never before seen.

An instantaneous file system recovery feature allows a file system to be frozen in time. The software allows a file system to revert to the state and contents of a previous snapshot. The systems administrator may select any of the up-to-twenty existing snapshot copies to revert the file system back to. For example, if data corruption within a database is discovered, normal recovery mechanisms require restoring the damaged portion of the database from tape. Instantaneous file system recovery software eliminates this time-consuming option. Instantaneous file system recovery software enables the IT administrator to quickly revert the file system back to a previous state when the database was consistent. The log files are replayed and users are again accessing data (see Figure 2.8). Time to recovery is now 3 minutes to revert the file system back plus the log replay time. Compare this time with the time required to reload the entire damaged portion of the database from tape, which could take a day or longer.

Figure 2.8: How instantaneous file system recovery software works.

Many high-availability NAS appliance customers store application binaries on storage appliances so that upgrades and patch updates are done in one place as opposed to having the binaries installed on each individual application server requiring each to be upgraded and patched. With instantaneous file system recovery software, IT organizations now see an even greater benefit in utilizing high-availability NAS appliances.

For example, let’s assume an IT organization is doing a major application software upgrade and something goes wrong. Upon completing the upgrade, the data conversion does not work, causing data corruption. With high-availability NAS software, such as Network Appliance’s SnapRestore, the previous environment can quickly be restored within 3 minutes without having to reinstall the previous release of software and data from tape. This level of data availability translates into higher revenue, profit, and productivity.

68

Page 78: High-Availability Network Attached Storage

Chapter 2

Cost-Effective Automated File System Replication Most IT organizations have business continuance teams in place to help plan for disasters such as floods, fires, earthquakes, and so on.. Most IT organizations today archive data to tape and send the tapes to an offsite location. However, time recovery in the case of disaster is days, which is unacceptable in a world in which data must be accessed 7 × 24 × 365. Thus, many companies are planning for and deploying real-time data replication technology.

Instantaneous file system recovery software from Cisco and Network Appliance leverages the WAFL snapshot capability to provide an automated file system replication facility. Using instantaneous file system recovery technology, a storage appliance can replicate one or more file systems to a partner storage appliance, keeping the target file system synchronized with snapshot copies that are created automatically on the source file system. The target of an instantaneous file system recovery replication scenario can be located almost any distance from the source—it can be in the same building as the source storage appliance or on the other side of the world. Instantaneous file system recovery software offers two key advantages over conventional replication products.

First, because instantaneous file system recovery software leverages the WAFL file system design, only changed 4KB blocks are sent from the source to the destination. This setup has significantly less overhead than offerings that must replicate an entire disk track because the solution doesn’t have any knowledge of the file system.

Second, there is no impact on performance. Write operations reaching the source appliance are acknowledged immediately providing sub-10ms response time to end users, as Figure 2.9 illustrates.

Figure 2.9: Immediate write application acknowledgement.

69

Page 79: High-Availability Network Attached Storage

Chapter 2

Because time to recovery has become such an important measurement, many IT organizations are looking at different application data sets within their enterprise and determining the minimum time-to-recovery requirements for each data set. For example, lack of data access to ERP data at the end of a quarter could easily result in a revenue impact of $200,000 or more per hour to a company, whereas the impact of an individual not being able to get to a non-critical file in his/her home directory would be minimal. Another example is an e-commerce site which ties company revenue to online transactions. Lack of data access for online transactions could result in millions of dollars of lost revenue per hour. However, data measuring individual customer buying habits stored in a data warehouse for marketing research purposes would not result in millions of dollars of lost revenue per hour.

Many IT organizations are choosing to deploy the instantaneous file system recovery technology to automatically replicate their most critical data and keep it online. Figure 2.10 illustrates a data replication solution for mirroring the most critical data from one high-availability NAS appliance and replicating it online using instantaneous file system recovery software to a second storage appliance.

Figure 2.10: Network Appliance’s Filer using the Setup Wizard for high-availability NAS services.

Finally, in keeping with the theme of designing high-availability NAS solutions, let’s look at a couple of product-specific case studies.

70

Page 80: High-Availability Network Attached Storage

Chapter 2

71

Case Studies The first case study is concerned with a multi-server load-balanced configuration, the MetaFrame for Windows application server from Citrix Systems. This application server provides a highly available and scalable program execution environment for Microsoft Windows-based applications under Windows NT, Terminal Server Edition (WTS) and Win2K and Windows XP OSs. This case study also presents Tricord Systems’ Lunar Flare NAS with patented Illumina clustering software as the common storage system behind a Citrix MetaFrame for Windows application server farm. A highly available and scalable storage system extends the high-availability features of the MetaFrame application server farm to the common storage repository for user data. Tricord’s Lunar Flare high-availability NAS solution features redundancy, high availability, scalability, load balancing, automatic client session failover, common administration, and low initial investment.

The second case study is a brief look at Compaq’s (now part of Hewlett-Packard) StorageWorks NAS Executor E7000. This discussion provides best practices for deploying antivirus software packages on the NAS Executor E7000. In addition, it provides brief descriptions of four antivirus software packages qualified by Compaq for the StorageWorks high-availability NAS Executor E7000.

Case Study 1 The emphasis of this case study is to demonstrate how Lunar Flare NAS with Illumina clustering software from Tricord Systems complements a load-balanced Citrix MetaFrame application server farm as the common storage system behind or beside the server farm.

Description of Systems A brief description of a basic implementation of both MetaFrame and Lunar Flare NAS as well as a description of the combined product implementation is necessary to better understand the complementary features of each product and demonstrate how Lunar Flare NAS enhances the solution. As with any IT solution, there are numerous methods of implementation and this case study seeks only to provide an understanding of the basic concepts of the solution.

Page 81: High-Availability Network Attached Storage

Chapter 2

72

Citrix MetaFrame MetaFrame takes advantage of the multi-user kernel installed when WTS, Win2K, or Windows XP is configured for application server mode. Multiple users can simultaneously execute programs on the server or servers, and the user interface to the programs, using the Citrix ICA protocol, are presented on the user’s workstation or terminal as a familiar Windows application, which by all appearances could be executing locally rather than on the server. Each user is able to customize the program interface and operating characteristics to personal preference just as if the application were executing on the user’s personal workstation. These application settings are stored in the user’s profile and home directory and must be available when the user logs onto the system. If multiple load-balanced Citrix servers exist in an environment, which is the emphasis of this case study, those individual user application configurations must be available regardless of to which server the user connects. The user will be connected to the least busy server based on the configuration of the Citrix load-balancing software, so some type of storage for user profiles, home directories, and data must be available and common to all servers in the load-balanced server farm.

User familiarity with the Windows interface and applications contributes to making the system useable. Citrix load-balancing software helps make the system available and scalable by connecting users to servers that are up as well as least busy. If additional program execution capacity is needed, additional servers can be added to the farm; user connections would then be spread to the additional servers. Using the many tools available from Citrix, the entire server farm can be administered as one system, and by consolidating the execution environment, efficiency can be gained in areas such as administrative overhead, the application update and upgrade process, reducing the amount of idle program execution resources (such as CPU cycles and memory), and enhancing system and data security. These computing resources can then be made available to users on internal LANs and WANs via dial-up connections and securely via the Internet using the Citrix Program Neighborhood client and/or the Citrix NFuse application portal.

The key to retaining and seamlessly presenting the individual user environment each time a user connects to a multi-server application farm is the central, common data storage from which all servers can access each user’s application environment. This storage component of the system is vital to maintaining the integrity of the system and Lunar Flare high-availability NAS has the features necessary to play this vital role.

For more information about Citrix MetaFrame, see The Definitive Guide to Citrix MetaFrame (Realtimepublishers.com) at http://www.metaframebook.com.

Page 82: High-Availability Network Attached Storage

Chapter 2

Lunar Flare NAS Lunar Flare NAS is a unique storage solution in that individual storage appliances are combined to form an appliance cluster which functions as a single storage system. This configuration is made possible by Tricord’s Illumina software, which is a true distributed file system that distributes all data across all the appliances in the cluster to form a single pool of storage.

Lunar Flare NAS uses two separate network interfaces (also known as channels)—the network channel, which connects each appliance in the cluster to the user or server network, and the back channel, which is the communications channel for Illumina (see Figure 2.11).

Figure 2.11: Lunar Flare NAS architecture.

Because each appliance added to the cluster includes its own processor, memory, network connections, storage, and so on, adding storage capacity does not add constraints to other components of the system. Instead, adding an appliance to the cluster adds to all of the cluster’s resource needs and actually allows performance to increase as capacity is added (see Figure 2.12).

Figure 2.12: Lunar Flare NAS cluster throughput.

73

Page 83: High-Availability Network Attached Storage

Chapter 2

74

Because data is distributed across all appliances in the cluster, the user sees the same view of data regardless of which node in the cluster they connect to, making the system very user friendly. No remapping of drives is necessary simply because a connection is made to a different appliance because drive mappings and UNC paths can be made to the cluster instead of the individual nodes. Files are striped across all appliances in the cluster and the inherent RAID characteristics of Illumina provide fault tolerance in the case of an appliance failure. In addition, the load-balancing and failover features of the Lunar Flare cluster insure that connections to the cluster are balanced and efficient, and should a node fail, connections to that appliance will be automatically and transparently moved to other nodes. A hot spare appliance can also be configured as part of the system to replace any node that has failed, further enhancing the high-availability features of the system. Adding storage capacity to the cluster is as simple as plugging in another appliance, configuring an IP address, and adding the node to the cluster. All data will then automatically be redistributed evenly across all nodes using the back channel network. This redistribution is done on the fly and is transparent to users on the system, which means storage capacity can be added without downtime.

Storage capacity is not the only thing scalable about this system. With most systems, as more storage capacity is added and more and more connections are made to that system, the network connection to the data can become saturated. With the Lunar Flare cluster, adding storage capacity also adds another network connection path for data flow to users and network nodes, thus eliminating the network as a potential bottleneck in the system. As mentioned earlier, adding appliances also adds CPU capacity to support the feature set of the system, and adds memory used to cache read and write operations to the file system.

After initial appliance addressing and naming using the Windows-based Lunar Flare Configuration Utility, the system is administered through a standard Web browser executing a password-protected Java application. The Lunar Flare Admin Utility can be accessed by pointing the browser to any appliance in the cluster by IP address or by name, to the cluster name as defined in DNS, or to the cluster’s NetBIOS name via WINS. All cluster administrative operations are accessible via this interface, including integration of cluster shares with Microsoft domain security, SNMP configuration, UPS support, system event notification via email, load balancing, failover, and hot spare configuration.

Lunar Flare NAS is a 1U rack mount or stackable system currently available in two capacities, 135GB and 240GB, with a current maximum cluster size of 3.8TB. The complete feature set is available in a 2-node cluster, and can be scaled as needed by simply adding an appliance to the cluster. The real value though, when combined with Citrix MetaFrame, is how the two products complement each other to provide an integrated, highly available and scalable system.

Page 84: High-Availability Network Attached Storage

Chapter 2

75

Integrated MetaFrame and Lunar Flare NAS System When Lunar Flare NAS is combined with a MetaFrame application server farm, the integrated system takes on the features previously described but offers more than simply adding two lists of features together to make a larger feature set. The features of Lunar Flare NAS extend the usability, availability, and scalability aspects of the requirements typically considered when implementing a MetaFrame application server farm. Extending these requirements beyond the application execution environment to the data storage system common to the server farm as well as the network connectivity to the storage system from the server farm results in a highly available and completely scalable system in which component failure is transparent or has negligible effect on the user environment.

Figure 2.13 shows an implementation that combines access for both external network/Internet users and internal network users and that mirrors the test environment used to formulate the conclusions drawn in this case study. Two load-balanced servers running Win2K or Windows XP Server with MetaFrame 1.8 and MetaFrame Feature Release 1 (FR1) are connected to a 4-node Lunar Flare NAS cluster (cluster name=TRCDLF) with three appliances active in the cluster and one designated as a hot spare. A single server running Win2K or Windows XP Server, IIS 5.0, and NFuse 1.6 provides the application portal for Web-enabled Windows applications. User profile and home directories are mapped using UNC paths in the Win2K or Windows XP user properties to hidden shares on the Lunar Flare cluster using variables in the configuration for a template user. The template user can then be copied for each additional user created in the system. For example, the user profile path would be \\trcdlf\profiles$\%username%, the profile directory for that particular user would then be automatically created, and, in the case of a user home directory, a drive letter could be automatically mapped to the folder created under the share on the Lunar Flare NAS. Internal users could map drives to the Lunar Flare cluster from both their personal workstations and application sessions within MetaFrame.

Page 85: High-Availability Network Attached Storage

Chapter 2

Figure 2.13: Integrated MetaFrame and Lunar Flare NAS configuration.

Technical Benefits Lunar Flare NAS provides several technical benefits to the combined system—particularly in the area of availability and scalability. These benefits are detailed next and, where applicable, specific examples from the test environment are used to more clearly illustrate those benefits. These benefits are weighted towards MetaFrame systems that are multi-server in nature, are expected to serve a growing number of users, and which are expected to meet the other characteristics of a modern information system as previously described.

High Availability The availability features of an information system are a cornerstone of its justification, not only for the users, but also for those who propose, design, implement, and administer the system. Murphy’s Law dictates that the system will go down exactly at the same time as the president, CEO, or other decision maker needs access to it most. The load-balanced MetaFrame application server farm, when combined with Lunar Flare NAS, addresses availability issues as follows.

First, this system is a load-balanced, clustered solution. If one of the servers in the MetaFrame farm fails, users reconnect to the system and their applications simply execute on one of the other servers. Their individual application settings remain available because they were not stored on the failed server. If one of the Lunar Flare NAS appliances fail, the data remains intact due to the RAID characteristics of the Illumina software and file system, and continues to be available to users. There is no single point of failure in either of these subsystems. In the test environment, a MetaFrame server and a Lunar Flare node were shut down at the same time and users were still able to access and execute applications and save their updated information.

76

Page 86: High-Availability Network Attached Storage

Chapter 2

Second, in addition to the data integrity feature of the clustered solution, a user connection to a failed Lunar Flare appliance is automatically and transparently moved to one of the other appliances in the cluster. This functionality was illustrated in the test environment by connecting 10 users to the load-balanced MetaFrame server farm (both internal and external users were connected). Each user had both a Microsoft Word document and an Excel spreadsheet open, and each was saving documents to the user’s individual home directories on the Lunar Flare cluster. The load-balancing feature of Illumina spread the connections across the three active nodes in the cluster, and connections to each Lunar Flare node were determined through the Lunar Flare Administration Utility statistics page for node LF2, as Figure 2.14 shows. Node LF2 was then failed by disconnecting the AC cable, and all updated client documents were then saved, including those attached to the failed node. This functionality was tested three times and, automatic reconnect to the user home directories via other nodes took about 10 to 15 seconds to complete after failing the Lunar Flare appliance—no loss of data occurred in any case. The automatic connection failover feature was also tested by allowing for the time period required for the application’s auto-save feature to save the document after an appliance failure. Saving data in this manner was also successful and transparent. Even with a complete loss of power to a Lunar Flare appliance, the system was fully functional with no impact to the user.

Figure 2.14: Lunar Flare Administration Utility’s connection information.

Finally, another high-availability feature of the Lunar Flare NAS cluster is the hot spare capability. An appliance can be defined as a hot spare, and will replace a failed node after a user-definable time period. A cluster with a hot spare is shown in the Administration Utility status information as shown in Figure 2.15.

77

Page 87: High-Availability Network Attached Storage

Chapter 2

Figure 2.15: Lunar Flare Administration Utility showing the hot spare status.

The combination of availability features of a load-balanced MetaFrame server farm and Lunar Flare NAS makes this combined system one which should meet the availability requirements of almost any organization.

Application Processing Scalability The load-balancing features of MetaFrame provide a highly scalable application execution environment. Many combinations of applications and servers are available using the power of Citrix load-balancing and application publishing. Specific servers within the farms can be configured to serve several applications or be dedicated to executing a specific application, depending on the needs of the organization and how the applications are used. For instance, if an increasing number of users need to run a Microsoft Access application, a server can be added to the farm and dedicated to that function, relieving some load from the general application servers in the farm and dedicating resources to the specific need. The number of servers in the farm is limited only by network addressing space, making the processing environment nearly infinitely scalable.

Lunar Flare NAS Scalable Storage Capacity Illumina makes Lunar Flare NAS extremely scalable. Because Lunar Flare NAS features a true distributed file system and a separate channel for inter-cluster communications, it can efficiently scale from a 2-node 135MB mirrored system to its current 3.8TB limit, simply by plugging in additional appliances on an as-needed basis. An appliance can easily be added in about 5 minutes with no system downtime. When a new appliance is added to the cluster, all data is redistributed evenly across all appliances in the cluster.

In the test environment, user profiles and the Windows component of the user home directory typically used between 350MB and 450MB of disk space. Allowing an additional 500MB of space for user files makes the space needed per user approximately 1GB. Thus, Lunar Flare NAS could scale to support thousands of users if needed. The uses for the cluster are not limited to the MetaFrame functionality described thus far. It is also an excellent system to store many other types of files and content directed towards all or a portion of the user base, as well as a perfect platform on which to back up servers in the farm or store server images used to build new servers when adding or replacing processing capacity.

78

Page 88: High-Availability Network Attached Storage

Chapter 2

79

Scalable Bandwidth Between MetaFrame Servers and Lunar Flare NAS Finally, another important aspect of Lunar Flare NAS scalability is in the area of bandwidth available to access the data. Because this solution is clustered, network connections are added with each appliance, creating additional paths to the data. Adding servers to the MetaFrame farm typically means more users are being added as well, which in turn increases the need for system storage. These new servers have more than a single network connection to the storage system, and the load-balancing features of Illumina help insure that the load is spread evenly across all the network connections to the storage nodes. Lunar Flare NAS eliminates the typical scenario of a growing number of MetaFrame servers on one end of the network connection, increased storage capacity on the other end, and an increasingly saturated network connection in between. Bandwidth scalability means better performance and response times for users and improved system availability.

Case Study 2 The StorageWorks NAS Executor E7000 can store and share large quantities of data among multiple users. Many organizations deploy the E7000 to share valuable data with users and business partners in an environment in which users have access to valuable company information, mission-critical applications, and user-specific information. In this shared-resource environment, each user represents a potential source of viruses and poses a threat of infection to all central fileserver users. Any virus infection on the E7000 can have a devastating impact on an organization. To protect shared data, companies must have a viable virus detection and eradication strategy.

This case study provides best practice guidelines to help administrators select and deploy an antivirus solution as a method to reduce the high risk of viruses. To help organizations meet the need for a complete E7000 antivirus solution, Compaq tested four industry-leading antivirus software packages with the E7000:

• Computer Associates eTrust InoculateIT

• Network Associates NetShield

• Symantec Norton AntiVirus Corporate Edition

• Trend Micro ServerProtect

E7000 Overview The E7000 offers administrators the simplicity of a heterogeneous high-availability NAS device with the scalability and fault tolerance of a SAN. Providing administrators with high-availability storage for application and user data, achieved through the fusion of high-availability NAS functionality and SAN infrastructure, the E7000 enables universal network storage. The E7000 provides organizations with a solution that can be used for heterogeneous file serving, media and video streaming, and server and storage consolidation. Having no single point of failure when using the optional clustered configuration, the E7000 delivers high performance and reliability. In other words, the E7000 is designed for Compaq SAN customers who want to leverage their existing or planned storage infrastructures to enable file serving and who want to avoid deploying multiple incompatible storage strategies.

Page 89: High-Availability Network Attached Storage

Chapter 2

80

Best Practices Administrators must balance security and performance when deploying an antivirus solution. Each environment has unique virus protection and system needs that often require tradeoffs in either security or performance. Optimal installation and configuration of the antivirus software can drastically improve both the security and performance of a server. Three typical scenarios include:

• Security as the highest priority

• Performance as the highest priority

• Security and performance as equally important components

Security as the Highest Priority When security is the highest priority, the typical configuration involves installing the antivirus software on the E7000 and configuring the software to scan all files. Because the software resides on the server, protection is centralized and the server is not dependent on the clients to maintain antivirus security. This method is by far the easiest to deploy because clients connecting to the server do not have to install and maintain current antivirus software, reducing software licensing fees and administration time. The server protects itself from any virus threats in the environment by scanning each file as it moves from a client to the server. The downside to this configuration is that it requires a great deal of server resources and overhead. Resources used for virus scanning are not available for file serving. Therefore, this configuration typically provides the lowest performance of the three typical antivirus configurations.

Performance as the Highest Priority When high-availability NAS server performance is the highest priority, the typical configuration involves installing and configuring the antivirus software on the clients only. In this scenario, all the E7000 resources are available for file serving. This configuration typically results in the best file-serving performance. However, the server will be unable to protect itself because the antivirus software is not installed directly on the server. The server is completely dependent on the client machines to keep the environment clear of viruses. All connecting clients must install and maintain antivirus software for protection to be effective. As a result, this configuration provides the weakest security of the three typical antivirus configurations.

Security and Performance as Equal Priorities When both security and performance are equally important, the typical configuration involves installing the antivirus software on the E7000 and configuring the software to scan selected files only, such as executable or batch files. The administrator may also install the antivirus software on the client machines. This configuration typically provides a good balance of security and performance for the E7000. The antivirus software consumes some of the file server resources, but the majority of the resources are still available for file serving. Typically, this configuration minimizes any major security or performance tradeoffs. The server will be capable of protecting itself and maintaining performance.

Page 90: High-Availability Network Attached Storage

Chapter 2

81

Supported Antivirus Software Packages Compaq recognizes that organizations with file-serving environments demand the highest levels of antivirus protection. To address this concern, Compaq tested and qualified four industry-leading antivirus software packages with the E7000. This part of the case study provides a brief description of each of the antivirus solutions tested and a table indicating the vendor and the product version used for testing. Each of the packages discussed in this part of the case study has been tested by Compaq and is supported for use with the E7000.

Computer Associates eTrust InoculateIT The eTrust InoculateIT antivirus solution from Computer Associates is designed for Windows products. eTrust InoculateIT provides protection against the latest viruses and includes the following features: easy manageability, multiple scanning engine support, automated signature distribution, real-time detection with system cure, centralized event logging and alerting, and virus quarantine.

Network Associates NetShield NetShield is the antivirus solution from Network Associates for file servers. The NetShield central management console lets users monitor and configure the protected server from any workstation. NetShield includes the following features: terminal services support, centralized management, remote manageability, alert management, and multi-tier active virus defense.

Symantec Norton AntiVirus Corporate Edition Norton AntiVirus Corporate Edition is the Symantec enterprise antivirus solution for file servers. Norton AntiVirus provides centralized management from a single console that lets administrators configure and protect servers. Norton AntiVirus Corporate Edition includes the following features: single management console protection and monitoring, antivirus policy management across multiple platforms, quick deployment and automatic virus protection through closed loop automation, and terminal services support.

Trend Micro ServerProtect ServerProtect is the Trend Micro enterprise antivirus solution for file serving and storage. ServerProtect is designed to safeguard file servers from virus infection and can be installed and managed from a single console. ServerProtect includes the following features: centralized domain management, three-tier remote management, real-time scanning, virus pattern updates, Trojan system cleaner, and the NetworkTrap tool.

Technical Benefits Finally, integrating industry-leading antivirus software with the E7000 produces a robust solution for protecting valuable data from corruption or data loss caused by viruses. Choosing and deploying an antivirus configuration that meets the performance and virus protection requirements of an organization is vital in maintaining optimal performance from the E7000 and in protecting vital organizational data and the computing environment from the threat of viruses.

Page 91: High-Availability Network Attached Storage

Chapter 2

82

Summary In this chapter, I’ve described how continuous data access to a high-availability storage solution involves two main elements: a redundant and fault-tolerant network infrastructure and a highly available NAS solution with built-in data protection features. The testing outlined in this chapter focused on integrating the two elements into a high-availability NAS solution with no single point of failure. The solutions integrate high-availability NAS designs for campus environments with NAS storage clusters. The test results validate the effectiveness of a high-availability NAS data-access and storage solution.

The high-availability NAS appliance mission is to enable continuous data access throughout the enterprise. It accomplishes this mission in two ways: system availability of greater than 99.99 percent and unparalleled data availability and recoverability.

Finally, data availability (especially for disaster recovery or rapid return to an uncorrupted database condition) can be vastly enhanced by the use of high-availability NAS storage appliances within an enterprise. Given the high-availability NAS appliance enterprise storage architecture, traditional data-management practices should be re-examined. In many cases, they may no longer be necessary.

Page 92: High-Availability Network Attached Storage

Chapter 3

83

Chapter 3: Planning for High-Availability NAS Solutions

After you’ve designed your high-availability NAS solution to meet your business challenges, you’ll need to move on to the planning phase. In addition, you will need to determine your business requirements and plan for a high-availability NAS solution that meets those requirements. With the preceding in mind, this chapter shows you how to plan for your high-availability NAS solution by using the following four steps:

1. Identify your business requirements—Learn how to determine which requirements you should target with your high-availability NAS solution. You should also answer thought-provoking questions, which I’ll discuss later, to help you define your business requirements.

2. Inventory and analyze your environment—Use an inventory template and practical advice to determine whether the components in your existing infrastructure are suitable for redeployment in your high-availability NAS solution.

3. Determine your high-availability NAS components—Leverage the inventory template that you started in Step 2 to create a list of the new hardware and software components required for your high-availability NAS solution.

4. Develop your high-availability NAS plan—View pre-tested high-availability NAS plans that demonstrate real-world configurations. To help you plan your high-availability NAS solution, get experienced guidance and view multimedia presentations.

This chapter discusses each of the preceding steps in the planning phase and provides an overview summary on the topic as well as a list of additional tools and information that you can browse for more in-depth information. Simply skip ahead to the step that interests you, or you can start from Step 1 and read through each step sequentially.

Identify Your Business Requirements Clearly determine your business and IT requirements. This step is the first and most important in planning for a high-availability NAS solution. These requirements should ultimately drive your plan and component decisions.

Determine Your Requirements Figure 3.1 shows a form that you can fill out to help you identify and define your requirements.

Page 93: High-Availability Network Attached Storage

Chapter 3

84

Business Requirement Questions Checklist Form

Date: ______________

The following are questions to help you identify and define your business requirements (Check all questions answered and completed):

_____1. Availability—Which critical business applications do you need to safeguard against unplanned outages?

______What tasks do you need to perform (such as adding or redeploying storage resources) without disrupting your business operations?

_____2. Scalability—How much faster are your storage requirements growing than your server requirements?

_____What amount of storage capacity do you estimate needing in the upcoming year?

_____What new applications (ERP, CRM, supply chain, and so on) do you anticipate needing within the next year?

_____3. Performance—What are the peak performance needs of your key applications?

_____4. Management—Which management tasks are currently difficult due to lack of resources or skills?

_____What will these be in one year?

_____Would you prefer to integrate your storage management into your existing management infrastructure?

_____5. Data backup—By what percentage do you need to reduce your backup window?

_____Do you need to relieve your LANs from backup congestion so that your applications can run faster?

_____6. Disaster recovery—In an emergency, what applications do you need to quickly and efficiently failover to your alternative data center?

_____Do you have an alternative data center? If so, how far away is it from your primary data center?

_____Do you need to clone or snapshot mission-critical data at peak production times?

_____7. Server and storage utilization—How much available but inaccessible storage could be utilized by your data-intensive applications if it were shared across your enterprise?

_____Do you have physical environment restrictions in your data center? If so, what are they?

_____Which older, smaller-capacity storage devices could you consolidate into fewer, larger-capacity devices for easier management and to reduce service contracts?

_____8. Budget—How much more storage could you utilize from current capital expenditures (allowing you to postpone new expenditures), assuming you could reclaim at least 30 percent of that under-utilized storage space?

Figure 3.1: An example business requirement questions checklist form

Page 94: High-Availability Network Attached Storage

Chapter 3

85

Identify Your Top Priority Most successful high-availability NAS deployments begin by targeting a single requirement. Take your list of business requirements and determine which is the most critical, then focus on designing, installing, and deploying a solution that will address that requirement. This approach will help you deploy your high-availability NAS solution quickly, cost-effectively, and easily. You can use your results to demonstrate ROI and provide business justification for deploying more high-availability NAS solutions to address other important business requirements; or to expand your initial high-availability NAS solution after you have a successful solution in production.

Inventory and Analyze Your Environment You likely already have a storage infrastructure in place that is not meeting your business needs. Fortunately, you might be able to redeploy some of this existing infrastructure in your high-availability NAS solution. To form a solution that meets your business requirements, a high-availability NAS solution can interconnect your existing components with new components. Alternatively, you might choose to simply buy all new components.

Inventorying and analyzing your current environment will help you determine which existing components still meet your business needs and are capable of connecting to a high-availability NAS solution. In addition, inventorying your physical environment will help you assess size limitations and cabling distance requirements even if your approach is to use all new components.

Enlist the Help of High-Availability NAS Vendors If you haven’t already found high-availability NAS vendors you feel comfortable working with, now would be a good time to look. The vendors you choose can play an important and helpful role in evaluating, designing, implementing, and managing your high-availability NAS solution. You should ensure that the high-availability NAS vendor you choose:

• Understands your business requirements

• Understands high-availability NAS

• Has a proven high-availability NAS track record

• Has the ability to put together a total high-availability NAS solution including hardware, software, education, professional services, and support

• Is familiar with the existing components you plan to use and is able to integrate them with new ones to form your high-availability NAS solution

Page 95: High-Availability Network Attached Storage

Chapter 3

86

Begin Your Inventory with a Broad List In your current storage infrastructure, you should start your inventory by creating a list of the existing storage devices, servers, and software. You can use some sort of a custom-made high-availability NAS components inventory worksheet to help you record this information. (For tips about creating this worksheet, see the sidebar “Gather Detailed Information About Each Component.”) You should create a list that includes:

• Storage devices

• Hosts

• Subcomponents such as HBAs, bridges, and so on

• Locations and geographical considerations such as size limitations and distances

• Applications—including their traffic patterns—performance, and availability requirements.

Gather Detailed Information About Each Component

You should fill in your custom-made high-availability NAS components inventory worksheets with specific information about each of the components. Details should include but not be limited to the following categories.

For each host: OS, HBA count and drivers (include driver version level), type of connections supported (loop or fabric), applications list, initial and projected storage requirements, and dimensions and weight.

For each storage device: make, model, and firmware version; type of connections supported (loop or fabric); number of hosts it can serve; ports and number of hosts per port; capacity (used or free); number and type of Fibre-Channel interfaces; number and type of Ethernet interfaces; dimensions and weight.

If you have a large number of existing components, you might want to consider using one of several available software programs to help automate the inventory process.

Analyze Your Inventory Information You should analyze each component in your inventory to determine whether it meets your business requirements and is capable of connecting to a high-availability NAS solution. You can record this analysis in your custom-made high-availability NAS components inventory worksheet. The following questions will help in your analysis:

• Is the component Fibre Channel ready?

• How old is the equipment?

• Is your equipment nearing the end of its lifecycle or can it still meet your business requirements?

• If it’s a storage device, does it meet your capacity needs now and in the future?

• Does it meet your availability and reliability requirements?

Page 96: High-Availability Network Attached Storage

Chapter 3

87

• Does it meet your performance requirements?

• Is this a proprietary attachment or does it meet current industry standards?

• Is this a legacy device (SCSI)?

• What are your space considerations?

• Can you fit this device as well as your other high-availability NAS components in the available space?

• Are the applications high-availability NAS aware?

When this inventory and analysis is complete, you should have an idea of which existing components you may want to use in your high-availability NAS solution and which physical environment requirements you will want to consider as you choose new components.

Determine Your High-Availability NAS Components An inventory and analysis of your current environment will help you determine which, if any, hosts and storage devices from your existing storage infrastructure you would like to redeploy in your high-availability NAS solution and which physical environment requirements will affect your choice of new components. After you’ve done so, you can form your total high-availability NAS solution by:

• Determining which new components are needed

• Validating compatibility across your new and existing components

• Calculating the number of switch ports needed to interconnect all hosts and devices

To record your new component selections, you can use your high-availability NAS components inventory worksheet. You may have already started filling these worksheets out in the previous step.

Select Your Components Your business requirements should drive your choice of components for your high-availability NAS solution. Those components include Fibre-Channel switches, storage devices, bridges, HBAs, cabling, cable connectors, and GBICs.

Page 97: High-Availability Network Attached Storage

Chapter 3

88

Fibre-Channel Switches Because switches form the intelligent foundation of your high-availability NAS solution, it’s important to select the right ones for your business requirements. You should consider the following:

• Hardware redundancy—An option on switches is a dual, hot-swappable power supply, including hot-swappable cooling fans to ensure optimal uptime.

• Port count—Switch product lines should offer a wide range of port count choices from entry level (8 to 16 ports) to enterprise level (64 to 128 ports) to best meet changing business requirements. You can use switches alone, or link them to form a network of larger port-count fabrics. The number of ports you need will depend on your choice of single or dual-attached hosts and storage devices.

• Management and monitoring software—Switches should offer a wide variety of easy-to-use, Web-based management, including monitoring tools to meet a variety of needs.

• Security and access control software—Switches should offer software that protects your high-availability NAS solution from security breaches, including software that enables secure sharing of storage resources across your high-availability NAS solution.

• Integration with third-party applications—For your high-availability NAS solution to be able to tie into existing management infrastructures and high-availability NAS management toolsets, the switches should offer an API. Third-party vendors can use the API to integrate their products with your switch software and hardware.

• Speed—Switches are available that support both 1Gbps and 2Gbps throughput per port. Many offer auto-sensing and speed-matching capabilities for both existing and next-generation devices.

• Budget—Balance your budget realities with your needs. With a high-availability NAS solution, you can easily “pay-as-you-grow,” so purchase the best features you can afford now and expand when your business demands it.

• Standards involvement—Be certain that switches adhere to industry standards. The manufacturer should also be actively involved in standards bodies to promote interoperability and provide customers with investment protection and flexibility to choose the best-in-class high-availability NAS components.

• Forward and backward compatibility—Switches should be forward and backward compatible with other switches in the product line so that you can migrate from 1Gbps to 2Gbps high-availability NAS environments and deploy a highly scalable core-to-edge storage networking infrastructure.

Page 98: High-Availability Network Attached Storage

Chapter 3

89

Storage Devices There are many types of disk and tape storage devices available that meet a wide variety of storage requirements. Make sure that the devices are Fibre Channel ready. Integrating a variety of devices into your high-availability NAS solution lets you allocate storage based upon cost, availability, and performance criteria. If critical application data is stored on a device, you’ll want to have that device dual-attached to your fabric to ensure high availability. Such a setup will require twice the number of Fibre-Channel connections on that device.

Bridges In communications networks, bridges are devices that link or route signals from one ring or bus to another or from one network to another. Bridges are especially useful if you intend to use some existing SCSI devices such as libraries in your new high-availability NAS infrastructure. Bridges may also extend the distance span and capacity of a single LAN system, perform no modification to packets or messages, operate at the data-link layer of the OSI reference model (Layer 2), read packets, and pass only those with addresses on the same segment of the network as the originating user.

HBAs The requirements of the applications that will run on your high-availability NAS solution should help drive what type of HBAs you need on your hosts. From the available 1Gbps, 2Gbps, and auto-sensing speeds, choose the speed needed to match your current and future data throughput requirements. If critical applications are running on a host, you’ll want to have that host dual-attached to your fabric to ensure high availability. This setup requires twice the number of HBAs for that host.

Cabling Copper and optical are the two primary types of media used for the physical cabling between the components of your high-availability NAS solution. Copper is less expensive, and optical provides a reliable signal over a longer distance. Copper has distance limitations such that it is typically only used within a rack. The type of connections made within your high-availability NAS solution will often be driven by your device connection needs. Table 3.1 shows your cabling options.

Page 99: High-Availability Network Attached Storage

Chapter 3

90

Cabling Type Cost Optimal Distance (meters)

1Gbps high-availability NAS solution cabling options Copper (STP) $ less than 25 Multimode Optical $$ 2 to 500 Single-mode Optical $$$ 2 to 10,000 2Gbps high-availability NAS solution cabling options Multimode Optical $$ 2 to 300 Single-mode Optical $$$ 2 to 5000

Table 3.1: Cabling options for high-availability NAS solutions.

Cable Connectors With optical cables, the SC connector is the most widely used; however, next-generation high-density LC and MT-RJ connectors, Small Form-Factor Pluggable (SFP), are becoming more popular because their small size allows more connections in tight spaces. DB-9 is the standard copper connector, although many organizations are switching to HSSDC connectors because of their improved reliability and smaller size.

GBICs Removable GBICs convert optical to electrical signals. Your distance requirements and speed will determine which type you need: 1Gbps—short-wave (500 meters) or long-wave (10,000 meters) and 2Gbps—short-wave (300 meters) or long wave (5,000 meters).

Validate Compatibility An important step in finalizing your component choices is to validate compatibility of all components within your high-availability NAS solution. To do so, check compatibility lists provided by high-availability NAS vendors. In addition, update your high-availability NAS components inventory worksheets to reflect the compatibility status.

Calculate Your Needed Port Count Total the number of Fibre-Channel network connections needed in your high-availability NAS solution to identify the number of switch ports you need. These include the number of HBAs on your hosts and the number of Fibre-Channel connections on your storage devices

If you want dual-attached hosts and devices to ensure high-availability, you need to double your port count for those hosts and devices. Also, if you need the highest uptime possible, implementing dual fabrics is additionally recommended. (I’ll cover dual fabrics and availability in more detail in a moment.) In this step, you should also update your high-availability NAS components inventory worksheets to reflect any changes in port count.

Page 100: High-Availability Network Attached Storage

Chapter 3

91

Develop Your High-Availability NAS Design Plan Just as your business requirements drive your choice of high-availability NAS components, so should they drive your high-availability NAS design plan. With regard to high-availability NAS planning, availability, scalability, and performance are the key requirements to think about. There are a variety of ways you can arrange and connect your high-availability NAS components. However, experience has shown that thinking strategically with an eye on the future, yet starting simple, yields greatest success.

Keep Your Plan Simple to Start A high-availability NAS solution is extremely flexible, easy to scale, and provides great investment protection. As an organization’s needs grow, servers, storage devices, and switches can be easily added. This “pay-as-you-grow” technology makes it possible to keep your initial high-availability NAS design plan simple, then easily scale when business needs demand it. You will enjoy faster deployment and ROI by not trying to move your entire environment onto a high-availability NAS solution all at once.

Core-to-Edge: The Ideal Fabric Design Plan Always think strategically regarding how your simple high-availability NAS design plan can evolve to meet your long-term requirements, even if you start with a simple design plan. The ideal design plan to build toward is core-to-edge, as you scale and add more switches. The core-to-edge design plan has proven to be the most flexible and effective solution for meeting the requirements of availability, scalability, and performance, although there are many possible fabric design plans.

A core-to-edge fabric has two or more “core” switches in the center of the fabric, which interconnect several “edge” switches. Hosts, storage, and other devices connect to the free ports on the edge switches or connect directly to the core switch. Core-to-edge design plans have a number of advantages including:

• Easy to scale without downtime

• Able to grow from hundreds to thousands of ports

• High cost/performance ratio

• Simple and easy to understand

• Outstanding, flexible performance

• Good “resilience” with no single point of failure

• Works well with a wide variety of applications

• Proven and tested to be reliable

Page 101: High-Availability Network Attached Storage

Chapter 3

92

The core-to-edge design plan includes redundant paths between switches, providing automatic real-time rerouting of traffic in case a switch is accidentally disabled. This resilience ensures high availability.

In addition, the core-to-edge fabrics are highly scalable, allowing you to easily add switches, devices, and hosts to your high-availability NAS solution without having to disrupt service or do extensive cabling. One common technique is to replace the lower port-count switches in your core with higher port-count switches and redeploy the replaced switches as edge switches to protect your initial investment.

Core-to-edge design plans also let you increase performance as you grow. There are several ways to do so, including adding Inter-Switch Links (ISLs) and taking advantage of more advanced performance features such as ISL trunking.

Pre-Tested High-Availability NAS Design Plans There are many documented, proven, pre-tested high-availability NAS design plans available for you to learn from. These design plans include many different heterogeneous hardware, software, and application configurations addressing the following needs:

• Backup and restore

• Business continuance

• High availability

• Server and storage consolidation

Purchase Your High-availability NAS Solution After you’ve completed your design plan, you can revisit your inventory worksheet of existing and new high-availability NAS components to ensure that your selections are final. You can then purchase those components from your trusted high-availability NAS vendor.

Now, let’s look at how to plan for customized high-availability NAS technical solutions. In other words, how can emerging high-availability NAS technology solve perennial storage limitations in capacity, performance, and flexibility.

Planning High-Availability NAS Technical Solutions Customization As enterprises experience the widespread adoption of Internet business applications, the explosion in data reflects the continual proliferation and dependence on data storage infrastructure. Driving this growth is the increasing deployment of solutions such as e-commerce, customer care, workforce automation, and e-learning. This rapid growth, along with the escalating management costs associated with the storage infrastructure, has resulted in significant interest in moving from a direct-attached storage model to a more scalable and manageable high-availability NAS model.

IDC predicts that high-availability NAS is expected to grow from $8 billion (US) in 2001 to approximately $54 billion in 2006, and direct-attached storage is expected to shrink from $12 billion in 2001 to approximately $7 billion in 2006.

Page 102: High-Availability Network Attached Storage

Chapter 3

93

High-availability NAS enables companies to extend their storage networks beyond isolated islands in the data center to campus, metropolitan, and wide-area environments. Storage networking has emerged as an increasingly strategic component of the IT infrastructure, addressing the following needs:

• Improve data availability and integrity

• Scale, share, and optimize storage resources

• Simplify storage management

• Minimize TCO for storage

The goal of a consolidated storage network is to provide a framework for uniting multiple storage architectures—including DAS, high-availability NAS, and Fibre-Channel SANs—into a single, well-managed, scalable, and extensible storage infrastructure. This part of the chapter highlights key market drivers for IP-connected high-availability NAS and elaborates on some common deployment scenarios in customer environments. High-availability NAS has led the way for the mainstream deployment of IP-based storage consolidation and file sharing. Using well-understood technologies, such as IP, Gigabit Ethernet, NFS, and CIFS, high-availability NAS provides a flexible storage solution that is easily scaled and managed across large enterprise environments.

For example, coupled with Gigabit Ethernet, IP multilayer switching, and routing platforms from Cisco Systems, the combined solutions leverage the already deployed and trusted IP infrastructure. This setup helps deliver an exceptionally low TCO by reducing associated operational costs, complexity, and deployment time. High-availability NAS solutions address several customer requirements:

• Internet e-business applications—High-performance data sharing and scalable high-availability NAS infrastructure for e-businesses.

• Business applications in the data center—Superior data availability and recoverability for enterprise business applications within a data center.

• Workgroup collaboration—High-performance data sharing across heterogeneous OS environments.

• Distributed storage over secure WAN—Collaboration among distributed sites with centralized administration and disaster recovery.

The Need for Storage Networking Globalization of businesses as well as the nature of e-business initiatives demand around-the-clock 24 × 7 operation. Any downtime results not just in productivity loss but also significant revenue impact. Therefore, data availability has emerged as a critical corporate requirement for e-business IT infrastructure.

To address escalating storage demands, most corporations store enterprise data captive in servers or within a storage subsystem that is directly attached to a server. As the volume of storage data increases, this server-centric architecture has proven difficult to scale, manage, and deliver 24 × 7 availability.

Page 103: High-Availability Network Attached Storage

Chapter 3

94

High-availability NAS promises to reduce the cost and complexity associated with delivering highly available and scalable storage services. This network-centric model, which is termed high-availability NAS, can be described as the software and hardware that enables storage to be consolidated, shared, accessed, and managed over a networked infrastructure.

High-availability NAS focuses on accelerating the convergence of storage and networks based on open architecture and industry standards. High-availability NAS is an implementation model of storage networking that delivers convergence of storage with IP-based networks.

High-Availability NAS on the Edge Every company uses network-shared storage, whether it’s implemented as a ring of trusty old file servers or as a speedy, high-availability NAS and an expensive SAN. As workloads increase (both the frequency of transactions and the quantity of data to move and store), IT managers are learning that their older storage systems have trouble keeping up. Perhaps the systems don’t meet performance requirements, creating a bottleneck for applications. They may be too difficult to manage, imposing excessive downtime for maintenance and upgrades. Or maybe the problem is finding the right way to protect the stored data.

Solutions for these and other common storage problems are on the horizon in the form of emerging technologies, primarily storage virtualization—iSCSI, which is encapsulated in IP and Network Data Management Protocol (NDMP). These technologies exist today but are not yet widely deployed by several NAS vendors. Each addresses specific weaknesses in current high-availability NAS solutions, so when laying out long-range plans for your company’s shared storage, you should consider making room for one or more of them.

Storage Virtualization One of the limitations of current storage solutions is that software (OSs and applications) uses some very old rules to figure out where to store its data. It must still identify storage locations with great specificity, usually involving a combination of network ID and hierarchical path. A company can have a vast quantity of storage on its network, but the storage is often split into discrete pools, each of which is managed and accessed separately.

Storage virtualization merges these storage pools in ways that best meet application requirements. It also makes it easier to reallocate storage as needed, even across multiple file servers or SANs. With storage virtualization, you size your storage for the needs of the entire network, not the needs of each class of application (see sidebar “High-Availability Virtual Storage Emerges.”)

Page 104: High-Availability Network Attached Storage

Chapter 3

95

High-Availability Virtual Storage Emerges

Momentum is gathering in storage circles as vendors roll out new initiatives designed to alleviate critical enterprise concerns about storage management. Following in the wake of EMC’s recently announced AutoIS enterprise storage-management strategy, Hitachi Data Systems (HDS) and Fujitsu have developed competing enterprise storage-management initiatives.

Fujitsu recently launched its Eternus strategy, just one week after HDS launched its TrueNorth initiative. As is AutoIS, TrueNorth and Eternus are wide-reaching efforts aimed at partnering with storage software developers and hardware companies, such as switch vendors, to deliver heterogeneous virtualization of storage resources. The separate programs also offer a level of policy-based automation that finally permits storage administrators to manage multiple tasks, such as SRM and high-availability NAS monitoring, from a single console.

Customers have been hitting the wall in terms of storage management. The whole reason there is so much energy focused on storage management is because customers haven’t had very good policy-based tools and automation. They’ve had resident function, but no framework.

TrueNorth, Eternus, and AutoIS seek to provide just such a storage-management framework using three components common to each: storage software, storage hardware, and aggressive collaboration with third-party storage vendors to create as open a storage framework as possible. HDS’s HiCommand storage software will provide the management framework for TrueNorth, and HDS launched a new Lightning 9900a V-Series storage server to fortify the hardware component. The third is a set of API trading partnerships set to be formed with a variety of storage component vendors to deliver the maximum amount of heterogeneous functionality to HDS high-availability NAS. Key to the company’s strategy is the ability to collaborate with all its partners, whether ISV, independent hardware vendors (IHV), or the customer to allow users to integrate solutions to be part of their open framework.

For example, Hitachi has an install base through traditional Hitachi sales efforts, and more importantly, through Hewlett-Packard (HP) and Sun Microsystems. This base has put HDS in a situation in which it has many storage clients that now have management problems. With EMC out there with AutoIS, it would have been impossible for Hitachi to continue to exist as a viable vendor unless the company put in place a comprehensive storage-management strategy—customers were demanding TrueNorth.

Similar to TrueNorth and AutoIS, Fujitsu’s Eternus storage-management initiative leverages software, hardware, and third-party product interoperability to solve the high-availability NAS management dilemmas. However, unlike HDS, Fujitsu does not face the challenge of supporting a large hardware install base while fending off advances from competitors such as EMC. Instead, Fujitsu is poised to take advantage of what experts see as a huge opportunity for hardware-independent storage-management software.

Eternus is armed with a suite of storage-management software from Fujitsu’s independent software company Fujitsu Softek. Still trying to get traction in the industry, Fujitsu Softek’s storage software is one of only two storage-management suites that offer virtualization, data management, SRM, and high-availability NAS monitoring under a single, intuitive management GUI. Fujitsu says that there is an enormous opportunity for hardware-independent vendors to create comprehensive storage-management solutions.

With partners such as Brocade, Microsoft, and Oracle already on board, the Eternus storage framework will continue to expand the number of third-party systems it can control. The AutoIS, TrueNorth and Eternus announcements, as well as initiatives from IBM and Veritas’ recent Powered storage virtualization program, are all good news for IT executives managing storage.

Page 105: High-Availability Network Attached Storage

Chapter 3

96

For more information about SRM, see The Definitive Guide to Storage Resource Management, a link to which is available at http://www.realtimepublishers.com.

The benefits are clear, yet storage virtualization is new to most IT leaders. Only a handful of products exist now, and vendors haven’t done a very good job of describing the technology’s benefits to prospective customers.

iSCSI Performance is a critical consideration in storage design, but it isn’t the only criterion. The SCSI parallel interface blasts data at very high speeds over short distances on behalf of a single server. Multigigabit Fibre-Channel SAN switches help address SCSI’s distance and single-server limitations while keeping performance high. But neither approach works over very long distances or extends access to systems that aren’t directly connected to the SCSI bus or switch.

iSCSI solves SCSI’s accessibility and distance problems by leveraging existing TCP/IP infrastructures. An iSCSI host adapter turns a server’s SCSI commands and data into network packets and transmits them across the company’s IP network. The advantage of iSCSI is that the OS doesn’t know a network is involved—iSCSI looks like a local storage device.

Also, iSCSI is emerging as an alternative to Fibre Channel for SANs, allowing companies to deploy SANs using their existing Ethernet cabling. The Internet Engineering Task Force (IETF) has not yet ratified the iSCSI standard, but several vendors are shipping equipment designed against the working draft (a fairly typical order of events for a SCSI standard).

Of these three technologies, iSCSI has gained the most attention. The iSCSI’s transparency plus the quick adoption of Gigabit Ethernet will drive its prominence in the enterprise.

NDMP High-availability NAS appliances bring the convenience of plug-and-play (PnP) to networked storage. But each high-availability NAS unit is an island, so managing several units, particularly performing bandwidth-intensive backups, is a challenge. NDMP is a cross-vendor standard for enterprise data backups. The standards group, led by Network Appliance and Legato Systems, intends to get all high-availability NAS units and tape devices speaking the same language. In this model, the backup software orchestrates a network connection between an NDMP-equipped high-availability NAS appliance and an NDMP tape library or backup server. The appliance uses NDMP to stream its data to the backup device, making efficient use of network resources. NDMP is a voluntary standard, so total product coverage is unlikely, but a critical mass of hardware and software does seem likely at this point.

Page 106: High-Availability Network Attached Storage

Chapter 3

97

Some tape libraries with NDMP are already available. NDMP is quite new and its advantages aren’t yet widely recognized. However, NDMP can be used to back up data to almost any tape library, including legacy standalone drives. So its lack of recognition and use is not a compatibility issue. With most backup applications that support NDMP, the problem is more one of granularity. NDMP is just beginning to support object-level restores, but still insists on volume-level backup (depending on the version used). Many organizations don’t always want volume-level backup and are forced to organize NAS volumes based around backup strategy (so as not to waste media) instead of around organizational requirements.

The progress of storage technology is linked to advances in networks and I/O buses. Rising technologies such as 10-Gigabit Ethernet, PCI-X, and InfiniBand will bring about new approaches to high-availability NAS just as Fibre Channel, Gigabit Ethernet, and 64-bit PCI did. Capacity, performance, and connectivity demands will always rise, but technology can keep up.

Cisco and Network Appliance: An Integrated Approach Successful high-availability NAS deployments hinge upon the integration of storage appliances with a scalable and intelligent network infrastructure. High-availability NAS solutions provide high-performance data sharing and administration for workgroup, departmental, and enterprise environments. These solutions leverage open standards such as NFS, CIFS, and NDMP to deliver advanced capabilities such as clustering and remote replication. Using an IP network infrastructure to provide universal access and interconnections for high-availability NAS allows storage access, replication, and backup over enterprise-wide networks and provide the following benefits to customers:

• Scalable storage infrastructure—Ethernet switching in the LAN and routing across enterprise networks and storage appliances deliver a high-availability NAS architecture that is highly scalable not only within the data center but also across campus networks WANs. In addition, this architecture allows storage and server farms to scale independently of each other.

• Improved business continuance and data protection—A redundant or clustered NAS architecture ensures high availability of data. A consolidated storage network, combined with software capabilities, enables global management and remote backup and restore and disaster recovery implementation for high-availability NAS environments.

• Consolidated network design, administration, and support—Customers can leverage their existing IP network architectures and expertise instead of building separate high-availability NAS solutions based on technologies that need a separate administration and management framework.

• Lower cost of ownership—Consolidation, sharing, and access of storage over a high-availability NAS reduces IT infrastructure management costs, while using storage resources more efficiently.

• Intelligent IP services—Various intelligent services at the IP layer ensure continuous data availability, protection, and scalability for the solution architectures. Bandwidth can be scaled while leveraging intelligent network services such as Layer-3 switching, Quality of Service (QoS), caching, server load balancing, and security.

Page 107: High-Availability Network Attached Storage

Chapter 3

98

Deployment Scenarios This part of the chapter describes typical customer deployments for a combined high-availability NAS solution, including solution architecture, components, and design considerations:

• Internet and e-business applications

• Business applications in the data center

• Workgroup collaboration

• Distributed storage over secure WAN

These deployment examples can be used as a baseline reference and can be tailored to suit specific customer environments.

Internet and E-Business Applications E-business applications require highly scalable and available NAS infrastructures that deliver exceptional performance for the Web, application, and database tiers. Examples include:

• An Internet portal that delivers Web-based applications such as email, online calendar, and personalized multimedia content for a user community.

• A business-to-business (B2B) application that connects suppliers and customers to an enterprise.

• Intranet applications, such as workforce automation, for enhanced productivity.

The solution architecture that Figure 3.2 illustrates is a typical Internet infrastructure deployment in an e-business environment. It highlights the high-availability NAS components along with some of their associated design considerations.

Page 108: High-Availability Network Attached Storage

Chapter 3

Figure 3.2: Internet e-business application example.

In a tiered architecture, as shown in the diagram, the options to deploy storage at the Web tier include deployment of Web servers with either of the following: DAS to store all Web content and consolidated high-availability NAS connected to these Web servers.

In the DAS configuration, a redundant copy of the content is stored on each Web server, resulting in wasted storage resources. More importantly, as content changes, updating and managing content across these Web servers becomes unmanageable, leading to efficiency bottlenecks. This approach to Web content storage is inefficient in terms of scalability of the Web server farm because customers must often upgrade servers to increase storage capacity.

An alternative approach to the Web tier configuration is to consolidate multiple copies of content from Web servers onto filers. In this case, the Web servers can leverage this shared resource and there is no need for data replication on the individual servers. This approach significantly reduces the content-management burden and makes the Web servers much easier to replace in case of failures or maintenance issues because all data is stored on filers.

99

Page 109: High-Availability Network Attached Storage

Chapter 3

100

E-business applications must be available 24 × 7. Downtime is not an option because potential revenue, as well as market perception, is negatively impacted.

The configuration in Figure 3.2 shows a highly available, fault-tolerant architecture. At the Web tier, a clustered pair of filers provides a highly resilient storage platform for the Web content accessed by Microsoft Windows, UNIX, or Linux-based Web servers through CIFS or NFS. In case of filer head failure, data service is migrated rapidly and seamlessly to the second filer. The Web servers are aggregated at this tier using switches and are connected through the Gigabit Ethernet ports on the switches. To ensure that there is no single point of failure, the switches are configured with redundant network modules, power supplies, and so forth.

The back-end filers, which store the application and database content, are also configured in a cluster. Highly resilient network connectivity for the application and database tiers is achieved using redundant switches through Gigabit Ethernet connections. For example, the switches are configured with Cisco Systems’ PortFast feature on the ports that connect the filers and the UplinkFast feature to connect to the front-end switches. This setup provides rapid convergence of the spanning tree if one of the network links goes down. The network switches are configured to run Hot Standby Routing Protocol (HSRP) between them. If a switch breaks down, the other switch takes over the load of the failed switch in a few seconds without any observable interruptions at the application layer.

For disaster recovery, a mirroring software feature called SnapMirror from Cisco System is used to replicate databases over a secure WAN connection to a remote location. I’ll discuss this topic in more detail later in this chapter.

A typical Internet/e-business architecture involves other elements in the Web tier such as firewalls, caching solutions, and local or global load-balancing devices. However, this part of the chapter focuses on the high-availability NAS applications and implementation. Details for these other technologies, although important, are beyond the scope of this chapter.

Data storage requirements for Internet and e-business applications grow at a faster pace than for any other environment. As demand escalates, the high-availability NAS infrastructure must scale rapidly with no downtime.

The architecture that Figure 3.2 shows is scalable both vertically and horizontally. The storage capacity on the filers can scale vertically up to 12TB by adding disks or shelves of disks as needed without filer downtime or reconfiguration. Because the filer storage is deployed in a flexible switched fabric, total storage capacity can be scaled indefinitely by adding more filers to the fabric (horizontal scaling).

Faster online response for users is one of the key drivers for customer retention and loyalty in an e-business environment. For example, the “8-second rule” states that if a Web page does not completely load within 8 seconds, customers might not return to the Web site, leading to lost revenue. Although performance is typically addressed at several levels in the infrastructure design, high-availability NAS plays an important role in meeting overall performance goals.

Page 110: High-Availability Network Attached Storage

Chapter 3

101

A filer can deliver high performance over the network because it is dedicated to reading and writing data efficiently and does not have the overhead associated with a general-purpose OS. Also, the filers offload I/O activity from application servers, improving overall system performance. Because the RAID and Write Anywhere File Layout (WAFL) subsystems are tightly integrated, the WAFL write-anywhere design allows scheduling of multiple writes to the same RAID stripe whenever possible, for faster operation.

The filers in this scenario can be remotely managed and monitored by using a systems-management framework that supports a standard Simple Network Management Protocol (SNMP) MIB interface such as HP OpenView. Data backup from filers to a locally attached tape drive can be remotely administered using NDMP.

This solution offers an exceptionally low TCO by significantly reducing operational costs, complexity, and deployment time. For example, snapshot technology can reduce the backup window of production databases dramatically from hours to minutes. Snapshots are frozen images of production data. Snapshots of live databases can be taken in seconds, then backed up to tape. Moreover, filers eliminate the need for database administrators to perform common administrative tasks—such as disk layout tuning and retuning—allowing them to focus on value-added tasks such as delivering additional features and fine-tuning applications. Because this architecture is based on IP-centric appliances, deployment complexity is considerably reduced, enabling e-businesses to achieve faster time-to-market for their product and service offerings.

Business Applications in the Data Center Common enterprise business applications, such as enterprise resource planning (ERP), customer relationship management (CRM), and supply chain management (SCM), that are deployed in enterprise data centers require a highly available NAS infrastructure. As businesses implement Internet strategies for enhanced productivity, these applications evolve into Web-enabled solutions. As a result, the architecture in this scenario can be extended with other features such as firewall security, load balancing, and so forth, to be equally applicable in an Internet-centric environment.

The primary focus of an ERP environment is high application data availability. The content associated with such enterprise business applications represents a key enterprise asset, and downtime for mission-critical applications results in significant negative business impact. The desired high availability is achieved by building redundancy into the server, network, and storage designs. Figure 3.3 shows a data center that deploys clustered filers and various servers in a Gigabit Ethernet network infrastructure to support an ERP application.

Page 111: High-Availability Network Attached Storage

Chapter 3

Figure 3.3: Business applications in the data center.

The ERP application in this example is hosted on clustered application servers. The application is layered over a database, which is hosted on a second (database) server cluster. A clustered filer configuration provides a highly available NAS for the ERP application database. The filers, as well as each of the servers, are connected to two switches for network link redundancy. This server access layer connects to the core/distribution layer that provides connectivity to the enterprise network infrastructure.

To support application availability, each application server cluster node is configured to take over the ERP application from its peer in case of server or network failure. Similarly, the database server cluster is configured so that any server can take over database services from its peers in case of server failures.

Each server cluster node is connected to two switches through Gigabit Ethernet links to provide a redundant network path in case of link failure. HSRP is configured on the two switches to provide automatic failover if one of the switches fails. The clustered pair of filers can take over each other’s data services in case the individual filer heads fail. This setup provides full, active-active, application-level redundancy to the overall architecture.

Additional redundancy is built into the high-availability NAS architecture by using dual-homing for the filer cluster nodes. Dual-homing (also known as virtual interfaces—VIFs) provides additional availability and scalability to the high-availability NAS design.

In standard dual-homing mode (single-mode VIF), the interfaces are in active/standby configuration. If one of the links goes down, it triggers the standby interface to take over servicing network requests. The filer’s data ONTAP appliance OS monitors the active link and handles the failover.

102

Page 112: High-Availability Network Attached Storage

Chapter 3

103

Multimode VIFs can be configured to use EtherChannel capabilities that are supported by switches, which lets all interfaces be active. This capability enables them to act as a single VIF (with combined bandwidth of multiple individual links now available). The switch monitors the status of the links and handles link failures.

The PortFast feature is enabled on the switches on the ports that connect the filers, which allows faster convergence of the spanning tree and switch-over to alternative links for network traffic. The switches can also be configured to join ports on two different modules into an EtherChannel configuration. This configuration eliminates downtime if one of the links or modules fails.

For efficient use of network bandwidth, servers are connected to separate virtual LANs (VLANs) that are configured on the switch that limits traffic on the Ethernet subnets. The filers are in their own separate VLAN on the back end, and the switch routes traffic intended for them to that subnet. Gigabit EtherChannel trunking provides higher aggregate bandwidth to networked storage required by some ERP databases.

Data represented in a business application must be easily and quickly recoverable. According to a recent Gartner study, roughly 41 percent of the unplanned downtime is caused by corrupted data due to application errors. To reduce administration overhead, the ERP applications and databases can be structured so that development, test, and production environments are hosted on separate volumes.

For a stable production environment, testing is often required before database applications are introduced to the production environment. Snapshots can be used to test changes to the production database. In this scenario, snapshots of database table spaces can be taken and tests can be run. If the test fails, the database can be restored to a “known good” state facility within minutes, without reverting to tape archives.

Periodically, scheduled snapshots can help recover from corruption as a result of application errors in a production database. Multiple snapshots of ERP data are available online and the entire ERP data can revert in minutes to a previous version. Snapshots dramatically reduce the backup window of databases from hours to minutes and increase application availability. Snapshots of terabyte-sized volumes containing ERP database table spaces can be created in seconds and backed up to tape without interrupting the production database.

Databases can be remotely replicated over a WAN for disaster recovery, as I’ll discuss later in this chapter. These features dramatically improve application availability and data recoverability in the solution architecture.

Another key benefit offered by this solution is that storage can be scaled independently of the database or application servers without any downtime to the application. Shelves of disk space can be added as needed to a cluster to scale vertically to 12TB. Therefore, storage capacity can be effortlessly added to the volume supporting an ERP database while the database and filer are in operation. To accommodate growth, high-availability NAS can be scaled horizontally by deploying additional filers. Volumes can be moved between filers for load balancing.

Finally, this architecture, which is based on high-availability NAS for ERP databases, reduces operational cost, complexity, and deployment processes dramatically. This approach significantly reduces acquisition and maintenance costs, and therefore the TCO of the ERP application.

Page 113: High-Availability Network Attached Storage

Chapter 3

Workgroup Collaboration Engineering environments such as software development or engineering design groups demand an infrastructure that enables data sharing over a high-performance LAN with low management overhead. Applications with these requirements include simulators and synthesis tools, and tools for CAD, CAM, CAE, source code management, version control, and so forth.

In addition, there is a critical need within an enterprise to consolidate multiple files from Windows NT user desktops onto a data-consolidation platform such as high-availability NAS appliances to enable data sharing among users (termed NT consolidation). The architecture in Figure 3.4 effectively addresses the requirements in the workgroup collaboration environment.

Figure 3.4: Workgroup collaboration.

Various types of clients are aggregated at the client-access tier using redundantly configured switches through 10/100 Fast Ethernet ports. Redundant Gigabit Ethernet uplinks connecting to the distribution layer switches ensure no single point of failure by providing alternative paths, fast convergence in case of link failures, and load balancing across these redundant switches.

A clustered pair of filers ensures that data service is automatically migrated over to the second head, in the unlikely case of a filer head failure. The filers are connected through Gigabit Ethernet ports on the switches at the distribution tier, which also provides connectivity to the application servers in case the applications used by the workgroup require them at this tier. This setup creates a private dedicated high-availability NAS for connecting the application servers that run engineering applications to the filers.

To extend high availability, these distribution layer switches are also deployed in a redundant configuration. They are configured with HSRP, which allows one switch to assume the load of a failed switch within seconds without any interruption to data connectivity at the higher layer. The application servers—in the case of the engineering environment—are also connected to these switches to allow data sharing with other organizations.

104

Page 114: High-Availability Network Attached Storage

Chapter 3

105

Software code and engineering designs are the core assets of any engineering organization. Data corruption leads to rework, which negatively affects time-to-market. The solution architecture described in this part of the chapter allows seamless data recoverability. Periodic snapshots can be automatically scheduled at the desired frequency. Snapshots of terabyte volumes can be taken in seconds and provide as many as 31 read-only images of data online for immediate recovery.

Snapshots help individual users restore personal data files in case these users need to go back to a previous version without resorting to help from their IT organization (a common scenario in engineering organizations). Users (engineers) want to be able to revert designs back to older versions on their own. Before this type of implementation, engineers would often revert to redesigning rather than requesting a restore of an old design to save face. The result was lost productivity. In other words, if the application data is corrupted, restoration is allowed of complete volumes to a specific snapshot, which avoids hours or days of data restoration from tape.

Engineering environments tend to be complex to administer. Simplification of data management is key to smooth operation. Core administrative requirements include:

• Ability to back up without impacting the production system

• Simple processes for system upgrades

• Ability to recover data effortlessly during a data corruption or data loss event

• Simplified disaster recovery implementation

For backup and quick disaster recovery, data can be remotely replicated over a WAN. The data is replicated in asynchronous mode with no performance penalty or WAN link hogs. I’ll cover this topic in more detail later in this chapter.

Data from filers can be backed up to a central tape library that supports NDMP over the IP network. Backups are performed from snapshots so that the production volume can remain online, dramatically reducing the backup window.

Filers enable multiple OS platforms to share the storage resources. Such multiprotocol OS support allows true data sharing in which a single copy of data can be shared between Windows and UNIX clients. Therefore, the clients in this environment can be a combination of UNIX or NT machines running engineering applications that are connected through the network access layer.

Filers deliver high performance over the network because they leverage an appliance architecture that is dedicated to reading and writing data efficiently and does not have the overhead associated with a general-purpose OS. Also, the filers offload I/O activity from application servers, improving overall system performance. Because the RAID and WAFL subsystems are tightly integrated, WAFL knows the current load and usage of every disk in the array. This allows WAFL to automatically tune its performance for a given workload.

On the network layer, the switched VLAN topology allows full-duplex 10/100 Ethernet bandwidth between the clients and the switch as well as a dedicated Gigabit Ethernet connection to the filers for storage access through the distribution tier.

Page 115: High-Availability Network Attached Storage

Chapter 3

This solution provides scalability options including adding more clients at the network access layer through additional switches connecting to the distribution layer. The storage capacity on the filers can scale vertically up to 12TB as needed with no downtime or reconfiguration. The architecture can also scale horizontally by adding filers to the network in a few minutes.

The overall distributed storage environment can be managed centrally using a Web-based GUI. The network components and filers can also be integrated into major systems-management platforms with built-in SNMP support.

Distributed Enterprise Storage with Secure WAN Connectivity Organizations that have geographically distributed sites, such as offshore development offices that are connected to a central headquarters or to regional offices, demand distributed high-availability NAS for effective collaboration among geographically dispersed groups as well as disaster-recovery implementation. These remote sites are typically connected to each other and the central site through a hub and spoke topology and therefore require a secure WAN environment for data exchange across the links. The high-availability NAS solution must be designed for safe and efficient WAN bandwidth use. Figure 3.5 shows an example of how an enterprise can deploy distributed storage environments to address these requirements.

This scenario extends the previous engineering workgroup scenario to include an offshore development group that needs to work collaboratively with the core engineering team at the headquarters site. The typical applications include sharing files with the local team as well as having read-only access to the central engineering folder.

Figure 3.5: Distributed storage over secure WAN.

106

Page 116: High-Availability Network Attached Storage

Chapter 3

107

The WAN link between an engineering site at a central location and remote offices must be secure before any sensitive data files are exchanged or copied across it. The WAN link must also be efficiently utilized for high ROI from the cost of the link. The WAN connectivity in this scenario can be achieved through a dedicated link (using WAN services such as Frame Relay, ATM, and so forth) connecting to the WAN module of the central site infrastructure. An alternative approach is to deploy secure VPN connectivity over the data links that are already in place.

As Figure 3.5 shows, routers in the figure provide various WAN modules, firewall security, and hardware-accelerated encryption for virtual private network (VPN) connectivity. These security features ensure that the WAN link bandwidth is appropriately used by storage-related traffic and does not drain resources from other mission-critical application data.

Collaboration among geographically dispersed groups demands data sharing among different locations. The engineering group at a central location in this example might need to locally access an offshore group’s design data. This access must be achieved without compromising the data-sharing requirement within the offshore workgroup environment.

The network connectivity at the remote site is typically addressed using LAN switches. Similarly, switches connect the server farm with back-end filer storage using Gigabit Ethernet connectivity at the central site. The switches are configured with features such as VLANs and EtherChannel and high-availability features including HSRP, fast routing protocol convergence, and redundant switch modules.

More remote sites can easily be added for WAN and VPN connectivity to the routers. For increased data protection, dedicated firewalls are deployed to offload the security functionality from firewall-enabled routers. The modular switch platform allows various modules to be added to the same chassis, enabling the user to avoid fork-lift upgrades at the network layer.

The solution architecture shown in the example provides seamless scalability. To add more storage at the central site, customers can easily add filers or disk storage to the existing infrastructure. The filers can support as much as 12TB of storage and support clustered configurations. This ability enables the solution architecture to provide much higher availability of storage resources.

As in other solutions described in this chapter, Web-based GUI provides Web-based management for all filers in this architecture. Similarly, SNMP support enables the high-availability NAS components and filers to be integrated into major management platforms. Finally, in keeping with the theme of planning for high-availability NAS solutions, let’s look at a product specific case study.

Page 117: High-Availability Network Attached Storage

Chapter 3

Case Study This case study contains information that will be found useful when capacity planning for Tarantella Enterprise 3 servers (http://www.tarantella.com) for high-availability NAS. Tests were performed by Tarantella engineers under controlled conditions at Sun Microsystems Performance Centre. The case study describes the testing methodologies, results, analyses, and sizing guidelines. The tests were done with Sun SPARC hardware and Solaris 8 OS using Tarantella Enterprise 3 version 3.10 software. The case study also very briefly covers Tricord’s value proposition for Tarantella server arrays and the benefits of Tricord’s approach for Tarantella resellers.

Tarantella Tarantella Enterprise 3 software leverages existing IT investment without the cost of re-engineering. It provides a non-intrusive solution that allows IS to regain control of their IT systems, and save costs. As Figure 3.6 shows, Tarantella Enterprise 3 provides fast, secure access to Windows, Web-based, Java, mainframe, AS/400, Linux, and UNIX systems and high-availability NAS applications from client devices anywhere in the world. This proven Web-based solution’s centralized management reduces complexity and scales to accommodate rapid corporate change, technological advancement, and expanding remote access needs.

Figure 3.6: Tarantella Enterprise 3 server solutions.

In a server-based computing environment, high-availability NAS application execution and data processing occur on centralized servers. These servers can be any combination of Windows, mainframe, AS/400, Linux, or UNIX servers. Tarantella Enterprise 3 software integrates and manages these diverse server environments and provides a framework for deploying high-availability NAS applications to users.

108

Page 118: High-Availability Network Attached Storage

Chapter 3

109

To successfully deploy high-availability NAS applications, it is necessary to determine requirements for those servers. These can be determined by answering the questions: What hardware is needed to support n users and how many users will a specific piece of hardware support?

The results and analyses contained within this case study address these considerations, although they should not be interpreted in isolation. There are many factors to consider in a complex, server-based computing model. Sizing and capacity planning need to apply to all parts of the model, from the Tarantella server to the high-availability NAS application servers being used.

Findings The number of users that a server configuration can support varies depending on criteria such as processor type, memory, hard disk, network configuration, and user type (typing speed, applications used, frequency, application screen size, and so on). Table 3.2 indicates the typical number of user sessions that a Sun Enterprise 4500 can support (users were running StarOffice 5.2 on Solaris).

System Sun Enterprise 4500

Processor type SPARC Processor speed 400 MHz Processor(s) 12 OS Solaris 8 Max user sessions tested 850 Recommended user sessions 1600 to 2000

Table 3.2: Server capacity.

The recommendations outlined in Table 3.2 show the server capacity requirements for Tarantella Enterprise 3 software and exclude the processor requirements of the OS, which are minimal. When running multiple applications, additional processing power may be needed. The amount depends on the high-availability NAS applications being used. If two high-availability NAS applications are being used and both are equally active, perhaps both generating reports, then twice the processing requirement is needed. However, if one high-availability NAS application is being used while the other sits idle (as is more typical), then the only processing requirement is for the active application.

The tests show that the Tarantella server CPU was around 70 percent idle with 850 users running StarOffice. In general, a requirement of approximately 3MHz of CPU on a RISC system should be adequate for a user actively running a single application at a time.

Although disk capacity is not critical, Tarantella recommends selecting a large enough drive to handle the swap space needed (as determined by the amount of memory). The company also recommends the use of fast hard drives to minimize any delays caused by swapping.

When running multiple high-availability NAS applications, the only impact on disk requirements is that each active but idle application is likely to end up in swap space. Thus, the system swap space requirements may be higher.

Page 119: High-Availability Network Attached Storage

Chapter 3

110

Enough memory must be allocated for the OS as well as for the Tarantella Enterprise 3 software. As a general rule, memory requirements for Tarantella Enterprise 3 software are approximately 128MB base, plus 4MB per active user session.

When running multiple application sessions more memory is required. However, some memory will be shared, and application usage patterns show that not all sessions are likely to be active at the same time. The biggest impact on memory usage is the high-availability NAS application display resolution.

A Tarantella server array is a collection of connected Tarantella servers sharing a common set of configuration information, administered from a single point. An array acts as a single entity for administration and application deployment. The servers within an array can automatically distribute users across the array to balance the load using a variety of load-balancing algorithms. Adding more machines to a Tarantella server array will increase the number of users that can be supported.

Arrays vs. Multiprocessor Machines There are two ways to increase the number of Tarantella Enterprise 3 users supported: by adding machines to an array and by adding processors. Both approaches have advantages, but the greatest factor when selecting the right combination is typically the cost of a machine with more processors versus the cost of more, simpler machines:

• The use of multiprocessor machines can minimize the number of machines that must be purchased. Using fewer, large machines has several benefits, such as less administrative overhead and lower cost of software.

• The use of multi-machine arrays can result in less expensive, less complex hardware that is easier to replace, should failure occur. Tarantella server arrays have several benefits including automatic distribution of users across an array to balance the load and an added level of resilience should systems fail. Multiple machines can also offer better network capabilities as a result of the additional network interfaces. However, there will be a small amount of additional network traffic within the array.

• An array of multiprocessor machines is recommended to gain the benefits of the array while keeping the network traffic to a minimum.

Page 120: High-Availability Network Attached Storage

Chapter 3

Tricord Value Proposition for Tarantella Server Arrays Tricord Lunar Flare clusters provide the ability for a Tarantella array to maintain high-availability NAS application connectivity even during degraded conditions including a failure of a Tarantella array node, a communication interruption, a failure of the Web client box, and a failure of the Web server. Of course, this setup does not protect against an application server failure. The following is a value proposition summary for Tarantella server arrays:

• Ease of use for Tarantella array application delivery—Preloaded Tarantella on Tricord nodes, no need for complex Beowulf clusters, successful installation out-of-box.

• Fault tolerance and high availability—Any Tricord node can fail, session continuance to the application server, some applications can run on Tricord (session proofing).

• Improved performance and scalability—Tarantella runs faster on Tricord clusters, can utilize Tricord for user application data, can add more nodes without system interruption and a failure of the Web Server.

So, why does a Tarantella array run faster on Tricord? The answer is that a Beowulf Cluster communicates with each box through the front channel NICs, as Figure 3.7 shows.

Figure 3.7: Tricord uses a “back-channel” for inter-node communication.

Finally, what are the benefits of Tricord’s approach for the Tarantella resellers?

• High availability for both the Tarantella server array and for the common storage subsystem—Any individual appliance (node) can fail without a loss or interruption of data availability.

• High scalability (as many as 16 nodes per cluster)—To add a new node requires only four clicks of the mouse; new node installs automatically with no downtime.

• Simple management (lowers management costs)—Cluster is managed as one entity and automatic load balancing across the cluster.

Performing Additional Capacity Planning Tests The purpose of this case study has been to provide a good starting point from which to make Tarantella server decisions. It may be more effective to go straight into pilot mode, rather than spend large amounts of resources analyzing users’ work habits and capturing these actions into a simulated script.

111

Page 121: High-Availability Network Attached Storage

Chapter 3

112

After a server configuration is chosen as a starting point (based on this case study’s findings), users can gradually be added to determine the maximum number that a system configuration can support. If more server resources are required, it is always possible to add a server to a Tarantella array. As an aid to understanding the various factors involved when deploying high-availability NAS applications using Tarantella Enterprise 3 software, the following should also be taken into consideration.

• Determining application suitability

• Characterization of users

• Network utilization

Some high-availability NAS applications, such as those that make very extensive use of graphics or multimedia capabilities, may not be suited for running in a server-centric model. If the high-availability NAS application is suitable, it should preferably be run alone through the Tarantella server rather than starting an entire desktop session. This setup can save significant amounts of resources on the application server and reduce resource requirements on the Tarantella server, thus allowing more users to log on simultaneously.

Usage patterns need to be considered as they can have a significant performance impact on the application server as well as on the Tarantella server. For example, all users logging on at the same time of day will have an impact on the overall system responsiveness.

A Tarantella server can simply be added to an existing high-availability NAS solution to Web-enable applications; however, an understanding of the high-availability NAS topology will always yield improvements in performance and scalability. Both network latency (the time it takes a packet to reach the other end of the network) and bandwidth (the amount of data that can travel over the network within a given period of time) are important factors. Because everything users see on their screens is server-based, both latency and bandwidth affect how the Tarantella server functions.

Tarantella Enterprise 3 software makes use of the Adaptive Internet Protocol (AIP) to greatly reduce any problems that may be introduced by the network. AIP uses several methods to compress data and to remove redundant requests, which significantly help with low-bandwidth network connections. This feature makes the Tarantella server ideal for access from remote sites or for deployment of high-availability NAS applications over the Internet, perhaps by an application service provider (ASP). Connecting over a low-bandwidth connection has no significant impact on Tarantella server capacity or scaling.

High latency or load problems in other areas of the network, however, can have a negative effect on the responsiveness of any network. There is no way of solving these types of problems, and although they do not occur too often, they are unfortunately still encountered on the Internet, especially from more remote locations. Although the Internet is an ideal and cost-effective network for B2B or remote site application deployment, Tarantella recommends that a dedicated network still be considered in situations in which the Internet causes too many latency or node stability problems.

Page 122: High-Availability Network Attached Storage

Chapter 3

113

When positioning Tarantella within a server environment, some placement factors should be considered. Because AIP is designed to work over unknown and variable networks, the link from the Tarantella server to the clients should be the longest, while all connections between the Tarantella server and the high-availability NAS application servers are minimized. So, placing the Tarantella servers in the same room as the high-availability NAS application servers they are connecting to is recommended.

Also, Tarantella Web-enabling software makes use of the standard, native protocols that the high-availability NAS application servers use, such as X11 and RDP. These protocols may not be designed to work over low-bandwidth networks or may not automatically adapt to the various network conditions that may be encountered. Therefore, a faster network between the Tarantella server and the application servers can yield better capacity and scalability. For example, increasing the network from 10Mbps to 100Mbps between these servers can result in improved performance and scalability.

It is not necessary to also increase the network connections between the Tarantella server and the clients because AIP is quite capable of working over low-bandwidth connections. This can be a very cost-effective way to upgrade part of the network rather than upgrading both server and client connections all at once.

Future Expansion Finally, although it is good to plan ahead to allow for the addition of extra processors and memory to systems running Tarantella Enterprise 3 software, physical machines can always be added to the Tarantella server array at any time. Adding new machines can be done online, with no system downtime. This ability allows for expansion at a rate controlled by the administrator and lessens the requirement to spend time and effort “up front” determining the size of machines, thus avoiding the necessity of purchasing new, large machines from day one. In addition, older machines don’t have to be discarded as new ones are added, as Tarantella server arrays may be built from a mix of system architectures.

Summary This chapter showed you how to determine which requirements you should target with your high-availability NAS. It also showed you how to determine whether the components in your existing infrastructure are suitable for redeployment in your high-availability NAS solution. In addition, the chapter showed you how to create a list of the new hardware and software components required for your high-availability NAS solution. Therefore, to help you plan your high-availability NAS solution, you should get experienced guidance and view multimedia presentations from various vendors.

Finally, this chapter described how enterprises face considerable challenges brought by the rapid adoption of emerging Internet business applications and the associated high-availability NAS infrastructure requirements. High-availability NAS aims to deliver solutions to significantly mitigate these challenges. These solutions deliver key business benefits such as scalability, performance, simplified management, availability, and security, while leveraging existing investments and expertise in IP networks and drastically reducing TCO.

Page 123: High-Availability Network Attached Storage

Chapter 4

114

Chapter 4: Installing and Deploying High-Availability NAS Solutions

Today’s computer systems must provide reliable and timely information and services to assist personnel in making informed decisions crucial to the daily operations of the modern enterprise. Despite the rapid evolution in all aspects of computer technology, both computer hardware and software are prone to numerous failure conditions. Employing methods to minimize exposure to as many failure conditions as possible will significantly increase the use of a company’s resources and become a direct indicator of successful business operations.

Enterprises today are demanding their new computer systems be “right-sized” for quicker installation and deployment, lower cost of ownership, decreased support and maintenance expenses, and increased support for both homogeneous and heterogeneous distributed client/server environments. The concepts of data warehouses and replication services provide the mechanisms to ensure that the correct high-availability NAS information is always available to the appropriate personnel so that they can make important business decisions in time-critical situations. This new form of information service must be constantly monitored and tuned to provide reliable and accurate information delivery. Hardware failure of these central repositories could prove harmful in today’s competitive environment.

Installation and Deployment Steps After you’ve determined the design and planning for your high-availability NAS implementation and have purchased the necessary components, you are ready to install and deploy it. This part of the chapter will walk you through the steps of building a prototype, testing your high-availability NAS solution, and deploying your solution to production. In the installation and deployment phase, you should perform the following three steps:

1. Establish an implementation plan—Learn the key steps that are necessary to make the critical implementation phase of your high-availability NAS go smoothly.

2. Create a prototype and test your high-availability NAS solution—Build and test a working prototype of your high-availability NAS implementation to validate functionality and to ensure that your solution can recover gracefully from faults.

3. Transition and release to production—Orchestrate a smooth and successful high-availability NAS deployment by using an incremental deployment approach, detailed documentation, and a technical support checklist.

Each of the following sections explores these steps, providing an overview summary on the topic as well as a list of additional tools and information that you can browse for more in-depth information.

Page 124: High-Availability Network Attached Storage

Chapter 4

115

Establish an Implementation Plan A comprehensive implementation plan helps you get started and guides you through successful testing and deployment of your high-availability NAS solution. You should include the following as part of your implementation plan:

• Naming plan

• Prototype and testing plan

• Production deployment plan

Naming Plan It is a good idea to give each switch a unique name and set the IP address and possibly the gateway and subnet mask before you begin cabling. An effective naming convention helps make it easy to identify components during testing and troubleshooting. Also, the naming convention should allow for significant growth across your enterprise so that you don’t need to change the name of every switch at a later date. Switch names can be as long as 19 characters and can include letters, digits, and underscore characters but no spaces (for example, MOAB_C1_A_B4_544_R1 = Project Moab, Core Switch 1, Fabric A, Building 4, Room 544, Rack 1). Consider incorporating meaningful naming elements that include:

• Organization or project ID

• Switch type (such as core or edge)

• Fabric name (if redundant fabrics are being used)

• Site or building location

• Floor and room location

• Rack location

Prototype and Testing Plan It is recommended that you test and validate your high-availability NAS solution before transitioning that solution into production. To learn the best way to build a high-availability NAS implementation for your environment and use it for testing, create a high-availability NAS prototype during the test phase. This approach makes it easier to identify and isolate problems as they arise. Also, in order to ensure that comprehensive testing is performed, a testing plan is important—including high-availability NAS design validation and failure and failover scenarios. In addition, failure and failover scenarios will test your high-availability NAS to ensure that the fabric successfully reroutes traffic. Once the failure has been corrected, the fabric re-establishes connectivity and reverts to normal routing. Thus, to help ensure that testing goes quickly and give you direct access to an expert on detailed storage and server characteristics, include storage and server vendors for this part of the implementation.

Page 125: High-Availability Network Attached Storage

Chapter 4

116

Production Deployment Plan It is best to deploy a high-availability NAS solution incrementally. Before rolling over the remaining servers or application platforms, a best practice is to deploy one or two servers of each OS type or one or two application platforms at a time. That way, before deploying the rest, you can feel comfortable that each is working correctly.

Create a Prototype and Test Your High-Availability NAS Solution Starting with switches, the building of a high-availability NAS prototype is often done from the center out. Storage devices are then added around the center of the high-availability NAS, followed by the hosts in an incremental and iterative manner. This approach will help you quickly locate and isolate problems. You can then run testing scenarios to ensure that the high-availability NAS responds and recovers as expected, once your prototype is working.

Switches First, Edge Devices Second, Hosts Last There are five key steps in building your prototype:

1. Build your fabric—Connect each switch one by one and validate that each can see the others.

2. Incrementally add and validate your edge storage devices, ensuring that each is visible to the switches.

3. Add and validate your hosts incrementally, ensuring that each is visible to the switches.

4. Validate that your hosts and storage devices can see one another.

5. Once you’ve connected and validated visibility between your fabric, devices, and hosts, it’s a best practice to disable your unused ports.

Create a Baseline Logical and Physical Diagram A helpful aid in testing and troubleshooting is a logical and physical diagram of your high-availability NAS. A physical diagram includes the physical components of the high-availability NAS and how they are wired; while a logical diagram illustrates the relationships between high-availability NAS components, such as zones. These diagrams will provide a reference baseline during connectivity testing, and help set expectations for testing results.

Page 126: High-Availability Network Attached Storage

Chapter 4

117

Testing Scenarios To help you discover marginal connections and malfunctioning equipment, create failure scenarios such as power cycling and resetting devices. Testing should include the fabric, hosts, and storage devices, all of which should recover gracefully from failures. The following should be tested:

• Failure scenarios

• Failover scenarios for dual fabrics

• Cable disconnect and reconnect scenarios

• Loss of inter-switch connectivity scenarios

The fabric should be checked to ensure it reroutes traffic successfully after failures are injected; and, once the failure has been corrected, re-establishes connectivity and reverts to normal routing.

Running an I/O Load You can test performance by running an I/O load and analyzing the outcome, after failure scenario testing has been successfully completed. Using your planned application for load testing is optimal; however, doing so is often difficult if not impossible. You could use an I/O generator to run a simulated load if you cannot use your planned application. Also, OS manufacturers can provide recommendations for I/O generating tools.

Another option during I/O testing is to repeat your failure scenarios to determine how they affect performance. Once all testing has been completed successfully, the time is right to transition your high-availability NAS into production.

Transition and Release to Production To help you quickly release your high-availability NAS into production, use an incremental and iterative approach to deployment that involves documentation about your new high-availability NAS environment. The most successful high-availability NAS rollouts are done incrementally. The best practice is to move one application platform or one server OS onto a high-availability NAS at a time, until you feel comfortable that each is working correctly. Incremental deployment helps to quickly identify what action causes problems, and limits your scope to one server or application platform if a problem does arise.

Page 127: High-Availability Network Attached Storage

Chapter 4

118

Create Documentation About Your New High-Availability NAS After deployment, detailed documentation will facilitate a quick deployment and be valuable in managing your production high-availability NAS. In order to illustrate the logical relationships between high-availability NAS components and the physical wiring of the components, a final logical and physical diagram of your high-availability NAS provides a baseline for testing. Neatly organized and labeled cables as well as easy access to product manuals, software, and support contract information will all help in troubleshooting and managing your high-availability NAS in production. Thus, the goal of documentation is to provide enough information for someone else to recreate your high-availability NAS. Such documentation should include:

• Diagrams—Logical and physical high-availability NAS diagrams, switch topology, host and storage connections

• Firmware—A listing of each device and firmware, plus saved copies of each device’s firmware

• Switch information—Each switch’s configuration and other information

• Zoning and LUN information—Your zoning and LUN configurations

• Scripts—Scripts you created and use

• Change log—Details of any additions, changes, or deletions

To aid in troubleshooting problems, it is a good idea to create a subset of your documentation that can be sent to your technical support vendor. You should protect your company by removing sensitive information such as server names, customer information, and so on. This documentation subset should include your high-availability NAS topology with OS version and patch level, HBA type and firmware/driver versions, storage types, and so on—basically everything you need to know before calling technical support.

With the preceding considerations in mind, the groundwork has been laid for the installation and deployment of high-availability NAS. Now, in keeping with the theme of installing and deploying high-availability NAS solutions, let’s look at three product-specific case studies.

Page 128: High-Availability Network Attached Storage

Chapter 4

119

Case Studies The introduction of digital technology has propelled document printing from a mechanical to a computerized function, bringing opportunities for new imaging techniques, customized documents specific to each ad campaign or promotion, and online order processing. These new functions, however, take up vast amounts of storage space. The original computer systems purchased to handle office applications were not intended to accommodate the huge imaging files that digital printing requires, nor did they have backup systems in place to protect 24-hour online transaction data. Print vendors are scrambling to find cost-effective ways to increase storage and backup capabilities yet stay competitive in a very close market. The following three case studies highlight organizations that are very happy with their completely distinct solutions to the same basic storage problem.

Case Study 1: Expanding Storage the High-Availability NAS Way The printing industry’s data storage needs have surged since the onset of Print-On-Demand (POD) order fulfillment. The introduction of digital printing technology brought the ability to quickly change and customize printed documents on demand. Businesses no longer need to inventory printed in-stock brochures now that they’ve discovered how easily customized tradeshow handouts, menus, and promotional pieces can be obtained. The cost effectiveness of POD digital printing allows companies to do smaller runs of specific materials instead of ordering large amounts of generic brochures (50 percent of which lose relevancy before being used).

For example, MediaFlex of Campbell, California, provides the e-commerce infrastructure for a number of print vendors and their customers. A user can now log on to his or her print vendor’s e-business Web site, connect to a Web page of his or her previously ordered print templates, select a product, make changes to his or her company’s documents right on the screen, and place an order. The user can check price quotes, access order confirmation, and view a proof copy in real-time, 24 × 7.

To provide this level of ease for the customer, the vendor must employ a data storage methodology with 100 percent file availability, large storage capacity (a single typical document template is 25MB to 50MB), scalability for a growing client base, and complete security for online server data. MediaFlex chose a switched-fabric high-availability NAS to meet its needs. Four existing enterprise servers were integrated with a new RAID device in a high-availability NAS Fibre Channel switching environment. Redundant hardware throughout the file delivery subsystem over a dual-switched, multiple server-to-RAID storage channel provides automatic failover protection and fault isolation, guaranteeing continual operation. In that way, the configuration assures network availability, allows centralized backup functions with quick failover disaster recovery capabilities, permits file sharing between servers, and provides scalability without online interruptions.

The high-availability NAS contains four Sun Microsystems 400MHz E450 enterprise servers running version 7 of the Solaris OS. Each server is equipped with two Fabric cards, 124MB of RAM, and 9GB of hard disk space. Web traffic is funneled to two servers, and the other two are reserved for network files and applications. All four are active, but a failure in one will prompt another to take over in approximately 15 seconds.

Page 129: High-Availability Network Attached Storage

Chapter 4

120

A Hitachi 5846 full Fiber Channel RAID, with 140GB of total usable file storage space is connected directly to the Sun servers through two Brocade Fiber Channel switches. The RAID device is equipped with two controllers and 10 drives and is used for all storage needs, freeing up the servers’ hard disks for applications. The controllers connect to each other and to the servers’ host bus adapter (HBA) Fabric cards via redundant Fibre Channel bus paths. If one controller fails, the RAID automatically and transparently switches to the remaining controller.

In addition, a tape library allows near-online secondary storage, conserving online storage space by migrating rarely used files from the RAID through storage zoning and a continual automated backup process. VERITAS Volume Manager software handles all storage management and backup tasks.

The high-availability NAS storage solution was installed and running in 2 days. To date, Mediaflex, their print vendors, and their customers continue to benefit from the system’s speed (the high-availability NAS and RAID can read a 25MB file in 1 second), enhanced file accessibility (client files must be available to several applications and both Web servers), state-of-the-art online scalability, and complete reliability (system downtime is virtually eliminated).

Case Study 2: A Packaged Solution Dallas-based Blanks Color Imaging, which specializes in printing, also found its storage capacity inadequate for its expanding business. Founded in 1940 as a letterpress shop, Blanks expanded and diversified by combining prepress, sheet-fed printing, and digital photography services into a turnkey print delivery format. The company operates 7 days a week to produce brochures, advertising supplements, posters, and other marketing materials for direct mail businesses, department stores, and large national printing companies. Blanks’ data storage capacity could not meet the company’s growing workload.

The company had two-fold system needs. First, it was still relying on 2GB digital audio tape (DAT) and a 1.2GB optical disk. With the large, image-based files generated by the current turnkey format, the setup did not meet the company’s needs. Second, too much time was used shuttling 100MB to 150MB high-resolution files over the 10MB Ethernet network. Initially, the company attempted to save time by placing active files on a 2GB removable hard drive that was hand-carried as needed between imaging workstations—but that method proved very inefficient.

After researching other options, Blanks resolved its system needs with the purchase of a Scitex Server System package. The company’s 100 nodes now reside on a 100MB Ethernet LAN. The new server is a Scitex Ripro 5000 AIX with a 640GB RAID disk array, backed up by a Breece Hill automated tape library with four Quantum Digital Linear Tape (DLT) 7000 drives. Each cartridge holds as much as 70GB of compressed data, and the library currently holds 4.2TB (1,024GB) of data in near-online storage.

Legato NetWorker software electronically manages all files, migrating them at predetermined intervals from the RAID device to the tape library. That archiving and backup software, linked with the Scitex Timna database, manages 150 DLT cartridges that replace the original 1,800 DAT tapes. The cartridges provide ready availability to 7TB of recent and current images, layouts, and advertisements (representing $6 million in business). Designers waited 3 to 6 hours for that same data when it was stored on DAT. Seventy-five DLT cartridges are reserved for RAID array backup; 50 are used to archive past jobs and image files.

Page 130: High-Availability Network Attached Storage

Chapter 4

121

Twice-a-day incremental backups are implemented. A full system backup, which requires 8 to 9 hours, is performed weekly, and files are safeguarded by storing tape copies both onsite and offsite. They maintain three sets of all files, which are rotated continuously. At any one time, one set is in the RAID device, one is in the tape library, and one is offsite. This setup provides excellent protection for client files; should a large disaster ever befall, no more than 6 hours of work can be lost.

With the Legato and Scitex software, employees are able to track archived files. The automation software, faster network, and DLT library system have improved efficiency, increasing throughput by 30 percent to 40 percent and dramatically improving turnaround time to clients.

Case Study 3: Using HSM Technology Banta Digital Group’s expertise involves customized digital imaging and content management services. Customers’ desktops are connected directly to Banta’s WAN, from which they’re able to manage their own documents’ entire creative design, digital prepress, and digital printing processes. The company guarantees its customers 100 percent file availability, complete data protection, and ample storage capacity. More than 200 users can be logged on to the Banta network at any one time, resulting in an enormous number of files being updated and saved continuously. The company needed a system tailored specifically for its unique and extremely high-volume data storage and availability requirements.

Banta chose to install an HSM data protection and availability system consisting of an enterprise server connected to three levels of storage. The HSM software automatically, seamlessly, and transparently manages the network’s three-level storage hierarchy. New and frequently used files are saved on a RAID device (online storage), while less frequently used files are migrated to a large-capacity, near-online magneto-optical (MO) jukebox. Rarely used files are archived to a high-capacity 34TB DLT library.

As an added protection, the HSM software package is configured to automatically mirror RAID data onto the MO library within minutes of being received. In the case of extremely large customer files, RAID space is conserved by saving only 40GB of the file on the RAID device, automatically offloading the overage to MO. A second copy of the entire file is immediately saved with the archived files on tape.

Banta’s existing LAN has been upgraded to Fast-Ethernet/ATM. The connections coming from external nodes into the network were upgraded from T1 and frame relay to higher-capacity DS3 lines. To ensure complete file availability, redundant hardware was installed throughout the file delivery subsystem (access paths, controllers, and so on).

The enterprise server chosen by Banta is a Sun Microsystems 336MHz E6500, containing 10GB of RAM and running on Solaris 2.6. The E6500 has 10 CPUs, eliminating the possibility that a CPU malfunction could result in system failure or downtime.

Page 131: High-Availability Network Attached Storage

Chapter 4

122

The first tier of online data storage is a StorageTek CBNT-CO1 Fibre Channel RAID device with a 1TB capacity and 100MBps access speed. Banta is currently using 60 18GB drives, but the RAID device is scalable to 120 drives for each of its two controllers. Because each controller has its own Fibre Channel bus path, if one fails, the other (failover) controller can take over automatically.

A SCSI-attached disk El050 read/write jukebox is the second storage tier. The jukebox contains 1,000 platters (5TB of data storage) and 16 disc drives; it’s scalable to 32 drives to accommodate future needs. Offloading RAID online storage onto the near-online MO jukebox is a low-cost method of improving the performance and access speed of the RAID device while continuing to keep less frequently used files accessible.

The final tier of storage in the hierarchy is a StorageTek 9710 tape library. Banta uses 6 of the possible 10 DLT 7000 drives. The library has a total data capacity of 28TB. Though the rarely used files are now archived, all data in the HSM-controlled tape library remain available to the user. In contrast with typical tape archives, the HSM system eliminates the need to search for and reload offline tapes.

The bulk of Banta Digital Group’s customers are catalogers, advertisers, and creative designers, all of whom have benefited from the new HSM system. Besides enjoying 100 percent guaranteed file access and the disaster recovery safety net provided by the backup hierarchy and redundant hardware, they’ve seen new product introduction time drop from 32 to 19 days, and the prepress production cycle shortened from 3 days to 1 day. Transparent to the clients, Banta has also been able to reduce staffing requirements by eight people, giving them another advantage in a competitive industry.

Summary As the case studies illustrated, each company’s unique situation will dictate the storage solution that is best for the environment. No one solution will work in all cases. The three storage styles that this chapter explored, though not at all alike, incorporate cutting-edge technology and state-of-the-art hardware and software as well as advanced file availability, storage capacity, and data protection.

This chapter explored installation and deployment of high-availability NAS with examples of how real-world companies are implementing NAS solutions. These implementations can be used as inspiration for your organization. Although these organizations deployed different NAS solutions, each company ended up with a solution that worked for their organizations and benefited their everyone involved.