Top Banner
iSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath Dept. of Computer Science and Automation Indian Institute of Science Abstract This work describes the motivation, architec- ture and implementation of iSAN, an “intel- ligent” storage area network. The main con- tributions of this work are: (1) how to archi- tect an intelligent SAN that understands the storage-consumers 1 to serve them better; (2) how to realise this abstract architecture us- ing existing technologies; and (3) to demon- strate what benefits one would avail should such an intelligent SAN be available. Our results, drawn from three case studies under- taken, show that iSAN approach yields better benefits as compared to conventional SANs: iSAN facilitates true storage sharing, has a provably correct security architecture and of- fer better throughput guarantees. We also argue how iSAN approach turned out to be generic enough to capture a wide range of other requirements of SANs. 1 Introduction The Internet revolution drives a relentless demand for data to match the accelerating growth in users, digital content and network bandwidth availability ([52], [33]). Until re- cently, however, these storage services stay integrated with computing systems that sup- ported content and context in the data center; 1 A storage-consumer is the software layer that builds storage abstractions from block level storage pro- vided by SANs. This layer typically includes, but not limited to, Volume Managers, File Systems and Data Base Management Systems. The need to scale storage independently has been the primary catalyst for the emergence of storage tier providing logical as well as phys- ical separation of storage from the other ser- vices in the data center. The result is the advent of I/O architectures wherein the con- ventional storage devices and high-speed net- works were conflated into forming I/O net- works: both the storage and the storage- consumer remain connected to a highspeed network and communicate using SCSI com- mands. These I/O networks, called Storage Area Networks, provide better scalability and throughput guarantees as compared to tradi- tional captive storage architectures. Yet, scalability and throughput require- ments are not the only requirements imposed onto I/O sub systems. Multitude of application domains, from Content Distribution Networks to storage service providing, demand a wide range of properties that a successful I/O archi- tecture should support. Unfortunately, SANs, based on both FC [48] and iSCSI [43], do not export sufficient functionalities that are of di- rect use to storage-consumers. This is because traditionally SANs are seen merely as a re- placement for parallel SCSI bus. But as a dis- tributed shared storage system, SAN is more than an extended SCSI bus: SAN based sys- tems demands functionalities which are other- wise not needed in parallel SCSI based sys- tems. For instance, consider storage sharing: co- ordinating processes that access shared stor- age was not a problem in captive storage sys- tems as every access to the storage is arbitrated
33

iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

Apr 14, 2018

Download

Documents

trinhnhi
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

iSAN - An intelligent Storage Area NetworkArchitecture

Ganesh Narayan K GopinathDept. of Computer Science and Automation

Indian Institute of Science

Abstract

This work describes the motivation, architec-ture and implementation ofiSAN, an “intel-ligent” storage area network. The main con-tributions of this work are: (1) how to archi-tect an intelligent SAN that understands thestorage-consumers1 to serve them better; (2)how to realise this abstract architecture us-ing existing technologies; and (3) to demon-strate what benefits one would avail shouldsuch an intelligent SAN be available. Ourresults, drawn from three case studies under-taken, show thatiSAN approach yields betterbenefits as compared to conventional SANs:iSAN facilitates true storage sharing, has aprovably correct security architecture and of-fer better throughput guarantees. We alsoargue howiSAN approach turned out to begeneric enough to capture a wide range ofother requirements of SANs.

1 Introduction

The Internet revolution drives a relentlessdemand for data to match the acceleratinggrowth in users, digital content and networkbandwidth availability ([52], [33]). Until re-cently, however, these storage services stayintegrated with computing systems that sup-ported content and context in the data center;

1A storage-consumer is the software layer thatbuilds storage abstractions from block level storage pro-vided by SANs. This layer typically includes, but notlimited to, Volume Managers, File Systems and DataBase Management Systems.

The need to scale storage independently hasbeen the primary catalyst for the emergence ofstorage tier providing logical as well as phys-ical separation of storage from the other ser-vices in the data center. The result is theadvent of I/O architectures wherein the con-ventional storage devices and high-speed net-works were conflated into forming I/O net-works: both the storage and the storage-consumer remain connected to a highspeednetwork and communicate using SCSI com-mands. These I/O networks, called StorageArea Networks, provide better scalability andthroughput guarantees as compared to tradi-tional captive storage architectures.

Yet, scalability and throughput require-ments are not the only requirements imposedonto I/O sub systems. Multitude of applicationdomains, from Content Distribution Networksto storage service providing, demand a widerange of properties that a successful I/O archi-tecture should support. Unfortunately, SANs,based on both FC [48] and iSCSI [43], do notexport sufficientfunctionalitiesthat are of di-rect use to storage-consumers. This is becausetraditionally SANs are seen merely as a re-placement for parallel SCSI bus. But as a dis-tributed shared storage system, SAN is morethan an extended SCSI bus: SAN based sys-tems demands functionalities which are other-wise not needed in parallel SCSI based sys-tems.

For instance, consider storage sharing: co-ordinating processes that access shared stor-age was not a problem in captive storage sys-tems as every access to the storage is arbitrated

Page 2: iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

by the storage-consumer to which the storageis physically connected; essentially there is re-ally no direct sharing of storage. But in ashared storage system like SAN where storageis directly accessible from multiple storage-consumers, sharing becomes a critical issue.Thus, in order to ensure effective storage shar-ing, SAN should provide efficient, scalableconcurrency control mechanisms.

Similarly, in a captive storage system thestorage subsystem is trusted for most part;storage subsystem remains physically con-nected to the storage-consumer and is re-garded as safe as the storage-consumer. But ina distributed storage system like SAN wherethe storage could reside in a far off place, se-curity of data – both in-transit and in-store –becomes an important concern. Hence, SANshould provide certain means of protecting thedata it serves.

Besides, contemporary SANs are gener-ally unaware of storage-consumer’s exact re-quirements. For instance, different storage-consumers would expect different securityguarantees from SAN depending upon thethreat model that they foresee. Hence, itis beneficial to enforce security propertiesjudiciously especially when different secu-rity guarantees exhibit significantly differ-ing cost/performance profiles: for a storage-consumer which is built assuming a byzantinestorage, blind block level encryption done inSANs may not be of much use. Similarly, withdifferent concurrency control schemes provid-ing varying degrees of consistency guaran-tees with varying costs[29], one may wantto selectively enforce that particular schemewhich is economical and best suits the storage-consumer’s needs. Hence, SANs should bein-telligentenough to provide a set of guaranteesthat serves a storage-consumer best.

In this paper, we propose a novel SAN ar-chitecture callediSAN. iSAN identifies andprovides services that are of direct use to thestorage-consumers. In providing these ser-vices, iSAN also “understands” the servicesemantics sought by the storage-consumers

and provides the needed service in a waythat best suits the storage-consumer’s require-ments. The proposediSAN architecture is ex-tensible and generic.

The rest of the paper is organised as follows.Section 2 enumerates the requirements that anI/O architecture should satisfy. In Section 3we describe the architecture ofiSAN and cor-roborate the design criteria whose validity fol-lows directly from the architecture. Section4 describes the implementation details of theproposed architecture using Linux kernel andEnsemble [21]. In section 5 we discuss howthree important SAN services – concurrencycontrol, security, stable storage – are realisedin iSAN. We compare the performance of thesuggested solutions with that of existing solu-tions in Section 6. We conclude the paper insection 8.

2 Design

iSAN design is founded on number of require-ments; some of the them, like throughput andscalability, are inherited from basic SAN ar-chitecture with little or no enhancements. Thissection explains each of these requirement anddiscusses their applicability in the state of theart SAN realisations.

2.1 Interoperability

In a SAN, a path from any storage-consumerto any storage device may include variouscombinations and permutations of host busadaptors, hubs or switches and SCSI peripher-als. Not all permutations and combinations arefeasible even if all the subsystems were builtaround same network technology, let alonewith dissimilar technologies. Thus a criti-cal, and essential, feature of any SAN archi-tecture is ensuring interoperability of compo-nents within the SAN.

Fibre Channel SANs suffer from interoper-ability problems as the higher level FC stan-dards are loosely adhered to in the past: zon-

Page 3: iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

ing implementation is very much vendor de-pendent; FSPF protocol which ensures switch-to-switch interoperability is yet to be put onfirm grounds; public and private loop devicesstill have problems when it comes to fabriclogins [15]. While network technologies likeTCP/IP and Ethernet have a strong tradition ofmulti-vendor interoperability, FC devices donot. This poses serious problems in the de-sign, procurement, and operation of SANs andinteroperability is thus becoming an increas-ingly important requirement.

2.2 Throughput

With high speed network and disk interfaces,SANs are expected to be able to move thedata as fast as possible. However, as the car-rier bandwidth increases, the transport proto-col inefficiencies at the end-points degeneratethe effective delay and throughput. For in-stance, the effective throughput of TCP, with-out considerable hardware support from NIC,over Gbps network is only a fraction of the re-alisable throughput [20]. Even with zero-copyand checksum support from NIC, TCP still hasother problems when it comes to gigabit net-works [36]. In order to handle these issues,many modifications have been proposed. Butnot many NIC products properly implementthese extensions; for instance, correct imple-mentation of Echo Timestamp feature [13] islacking in many systems. In addition, the com-plexity of TCP is to be considered. Also, withmultiple independent TCP connection(s) be-tween the SCSI end-points, iSCSI handles is-sues like congestion control less effectively.Importantly, failure detection becomes non-trivial as different TCP connections break atdifferent time instants, depending upon theirrespective past activity. Each of these short-comings affect the observable throughput iniSCSI SANs.

Though in LAN environments FC SANsprovide better throughput guarantees com-pared to iSCSI SANs, their throughput edge isconsiderably less when it comes to WAN links

due to the credit based flow control schemeused in FC SANs [47]. Given these obser-vations, it is understandable that SAN shouldleverage a SCSI transport protocol that is fast,efficient and simple, and has better flow con-trol support.

2.3 Availability

Today faced with critical need to ensurethe availability and continuous operation inspite of isolated failures of disk, switch andlinks or the catastrophic loss of the comput-ing/communication facilities, SANs need tobe highly available. While FC SANs pro-vide subsecond reconfiguration periods in caseof a component failure, the traditional Span-ning Tree Protocol (STP) employed in Eth-ernet takes tens of seconds to converge; itis to be noted that during the reconfigurationphase the extended Ethernet LAN isfrozen.STP also has other problems: inefficient band-width usage, link blockage and STP is Vir-tual LAN (VLAN) unaware. Combinationof Rapid Spanning Tree (RSTP)(802.1w) andlink aggregation (802.3ad) would reduce thereconfiguration stalls to even tens of millisec-onds. However, RSTP does not use the band-width effectively and still has link blockageproblems.

Apart from ensuring the availability of SANinfrastructure, SAN should also provide primi-tives using which storage-consumers could en-sure data availability: multicast is one suchprimitive that helps in providing high avail-ability using replication. However, proper-ties provided by the traditional hardware mul-ticast are notsufficient to ensure the mutualconsistency of replicated copies; often timesthe network multicast is combined with proto-cols providing stronger guarantees like atom-icity and message ordering to handle problemslike replication more effectively. If SAN is toprovide stronger multicast guarantees, it willease the effort needed in providing transpar-ent, block level replication.

Page 4: iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

2.4 Storage Sharing

An I/O architecture permitting sharing enablesseamless fail-over of the storage-consumersthat share data sets. Thus storage sharing iscrucial to provide uninterrupted service dis-semination. Shared storage architectures alsoprovide better scalability since storage capac-ity and processing power could be added dy-namically to the pool by adding more storage-consumers and storage. Additionally, datasharing gives high flexibility for dynamic loadbalancing since the data is uniformly acces-sible from any storage-consumer. Sharingalso facilitates storage consolidation which re-duces management and operation costs whileincreasing security and system usage.

But to achieve effective data sharing, SANmay need to assure certain properties at thenetwork level. For instance, in order for astorage-consumer to failovercorrectly, SANmay need to ensure mechanisms such as I/Ofencing. In fact, many commercial paralleldatabase clusters expect the underlying clustertransport to provide I/O fencing. Also, con-current access to the shared storage has to bemediated through certain concurrency controlmechanisms. Neither FC nor iSCSI provideany concurrency control primitives; Nor dothey provide I/O fencing.

2.5 Security

The traditional, captive-storage based storage-consumers are built assuming that the con-nected storage is inherently secure. But in adistributed storage system like SAN, such as-sumptions hold no longer true. In order tobridge the easy migration path for legacy sys-tems, which were built assuming the physi-cal security of storage, SAN should providemeans of enforcing the needed security byother means, say, cryptographically. Present-day SANs based on FC are built around rel-atively secure fiber transport and are not yetwell equipped to enforce cryptographic secu-rity. On the other hand, iSCSI SANs – whichare built around the insecure IP networks, are

bound to enforce the necessary security us-ing cryptographic protocols [1] and IPSec/IKEwere chosen as the cryptographic infrastruc-ture for iSCSI SANs.

But, IPSec has many problems that are yetto resolved [17]; so does IKE ([35], [44]).Also, the proposed iSCSI level CRC mech-anism and TCP checksum do not co-existharmoniously owing to strict layering restric-tions; with multiple TCP connections andSCSI command ordering in place, handlingCRC error induced resynchronizations effi-ciently could get problematic. On the otherhand, comprehensive multicast security, in-cluding key management, is still very much ininfancy; of the many suggested key manage-ment protocols, only SKIP [4] discusses se-curity in multicast communications explicitly.However, the problem concerning SKIP, andin general, IP multicast is the fact that they aremembership unaware– a potentially inappro-priate design for a restricted environment likeSAN.

Thus iSAN should have a simple and wellunderstood security infrastructure for bothunicast and multicast traffic. Since security isone place where it does not pay to bealmostcorrect, one would prefer the security infras-tructure ofiSAN to be provably correct.

2.6 Intelligence

Traditional storage-consumers interact withthe storage using standard storage protocolslike SCSI. The storage-consumers are unawareof the underlying storage technology: in gen-eral, a storage-consumer running on a hard-ware RAID5 is same as the one that runsover a single SCSI disk system. This en-ables an easy migration path from captive stor-age system to SAN based storage. However,the converse, that SAN is unaware of storage-consumers could result in poor performance.For instance, in a shared storage system likeSAN, the correctness criterion gauging per-missible interleaving is very much storage-consumer/application dependent.

Page 5: iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

Supposing that SAN is to default to strongconsistency models like linearizability [30], itwill be grossly inefficient since there are manystorage-consumers who could subsist withmuch weaker consistency guarantees. ThusSAN should be intelligent enough to deployappropriate concurrency control mechanismthat suffices the storage-consumer’s require-ments; this is especially important as differ-ent consistency mechanisms may incur differ-ent cost/performance tradeoffs [3]. In general,the data access path of a particular storage-consumer should be efficiently tailored tothe exact needs of the storage-consumer andSANs should have provisions for doing so.

Besides, though the Keep it Simple, Stupid![41] approach works well in networks, it maynot always be efficient: there are certain crit-ical problems that do not permit efficient so-lutions in systems designed strictly using end-to-end argument. QoS, multicast and VPN aresuch problems whose efficient solutions de-mand certain amount of intelligence in the in-termediate nodes orswitches. SANs shouldhave enough intelligence in the switches tohandle these critical problems efficiently.

3 Architecture

iSAN uses Logical Link Control (LLC) [22]for transport, Group Communication System(GCS) for membership services and VLANfor grouping; VLANs in iSAN provide ef-ficient application/SCSI level routing. Theedge switches that are part of a VLAN forma group. Fig. 1 shows this is done in iSAN.The group end-point that is housed in an edgeswitch, calledSanlet, acts as a SCSI Targetemulator: SCSI commands sent by storage-consumers are received by the Sanlet and aredispatched to the appropriate physical storagedevice. Thus Sanlet can be seen as a thin Vol-ume Manager residing in the edge switch thatmanages the virtualized storage2. Fig. 2 de-

2A Sanlet need not always be a SCSI proxy. Sanletscan act on behalf of an entity at any layer as discussed

picts the flow of data throughiSAN. Rest ofthis section discusses each of the architecturalcomponents in detail.

3.1 Logical Link Control

LLC sub layer is the upper portion of the datalink layer. LLC is defined as an interoperabil-ity layer and supports medium-independentdata link functions. LLC level multiplexing isdone using Source/Destination Service AccessPoints (S/DSAP); SSAP and DSAP are akinto ports in TCP. LLC supports three types ofservices: connectionless, connection-orientedand connectionless acknowledgement.iSANemploys connectionless (Type I LLC) as SANtransport. This is because Type I LLC issimpler, faster and flexible: it merely pro-vides LLC level addressing and an interoper-able datagram service.

3.2 Virtual LAN

VLAN forms the building blocksof iSAN:both storage and storage-consumers are partof atleast one VLAN. VLAN is a logical over-lay that can span multiple switches: a VLANshares all the properties of physical LAN ex-cept the fact that it is a logical entity and ex-ists only by the special VLAN switching statemaintained in the switches that the VLANspans. The VLANs used in theiSAN arepruned using MAC address of the storage-consumers. Storage-consumers with similarneeds are connected together using VLAN;VLANs are the unit of service commissioningand discrimination.

3.3 GCS

Cooperating Sanlets – the ones that are part ofthe same VLAN, form a process group. This,we believe, is a natural confederation: bothVLANs and process groups detect failures, de-liver views and have the notion of member-

in the iSAN’s suggested PFS implementation 5.1.1.

Page 6: iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

ship. In iSAN VLAN/Sanlet group member-ship provides strong virtual synchrony [18].

Virtual synchrony [8] is a powerfulparadigm for developing asynchronous dis-tributed systems since itsimulatesa reliabledelivery, fail-stop model to the application;virtual synchrony creates an illusion to theapplication that it runs in an environment inwhich crashed processes are always detected,and if a certain process is suspected of beingcrashed, then this process is really crashed.This is done by presenting processes withviews, which consists of the set of currentlyreachable and operational processes. Thesystem then guarantees that between everytwo consecutive viewsv1 andv2, no messagethat was sent from a process not inv1 can bedelivered and that all processes that appearin both v1 andv2 have to see the same set ofmessages. In strong virtual synchrony, thissemantics is further strengthened by requiringthat the view in which a message is deliveredis the sameview in which it was sent. Thisproperty is very useful since it helps inminimising the amount of context informationthat needs to be sent on messages, and theamount of computation which is required inorder to process a message in the Sanlet.

3.4 Protocol Composition

In iSAN, service discrimination is done byallowing individual groups to have atai-lored protocol stack that serves the storage-consumers better. We argue that, this ap-proach – called protocol composition, pro-vides enough flexibility so that storage-consumers with very different goals can poten-tially agree on sharing of a common infrastruc-ture, within which their commonality is cap-tured by layers that they share, and their dif-ferences reflected by layers built specificallyfor their needs.

Since VLAN, by construction, housesstorage-consumers with similar requirements,it is natural to use the VLAN tag to choosebetween the various built-in protocol stacks.

Presently alliSAN stacks share a common setof lower layers – the set formed by the collec-tion of microprotocols that assure strong vir-tual synchrony; any further service specialisa-tion is done on the foundation of virtual syn-chrony.

3.5 Design Criteria Revisited

The iSAN architecture proposed isinteroper-ableand is likely to provide betterthroughputand latency guarantees as it is based on LLCand VLAN. By providing virtual synchronyand stronger multicast guarantees (atomic-ity and ordering),iSAN effectively facilitatesreplication based high availability schemes.However, the high availability ofiSAN infras-tructure itself does not directly follow from theproposed architecture. We believe enhancedRSTP like mechanism could be used to ad-dress this issue; we do not discuss this aspectof iSAN any further in this paper. Being basedon strong virtual synchrony model,iSAN au-tomatically provides basic sharing protectionlike I/O fencing. Other critical aspects of shar-ing like concurrency control are discussed insection 5.1. As to be discussed in section5.2, iSAN deploys Ensemble-basedfortressmodel of security which is provably correct.Above all, the protocol composition mecha-nism presentsiSAN with an efficient means ofdistinguishing and servicing different storage-consumers effectively.

4 Implementation

iSAN is implemented using Ensemble groupcommunication system [21] and Linux ker-nel v2.4. The reason for choosing Ensem-ble are many fold. First, it is a mature, wellunderstood GCS toolkit with a vast array offunctionalities. Secondly, Ensemble is theonly group communication system that we areaware of providing protocol composability.3

3Horus [37], predecessor of Ensemble, also pro-vides composability. Since Ensemble being an im-

Page 7: iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

Thirdly, Ensemble has a powerful, multicastaware security infrastructure [39]. Besides,Ensemble is event driven and is superior tothread based systems like Horus . Last but notthe least, Ensemble is well documented andhas an exceptionally helpful developer team.

Ensemble is implemented using OCaml andis provided as a user level library that couldbe linked to an Ensemble application. But forour purpose, we needed a version of Ensemblethat is written in C so that we could port it toLinux kernel and C-Ensemble serves this pur-pose precisely. However, C-Ensemble (v0.10)includes only a subset of the functionality pro-vided by Ensemble and is accessible only as auser level library; we ported the C-Ensembledistribution to Linux kernel with minimalchanges. This is achieved by wrapping systemcalls to provide a libc like interface accessiblefrom inside the kernel and by rewriting partof the event handling code. We also portedthe Ensemble’s total, causal ordering protocolsto C-Ensemble Linux kernel port. We addeda simple application level credit based flowcontrol protocol to C-Ensemble. We mod-ified both Ensemble and C-Ensemble to in-clude LLC transport provided by Linux nativeLLC implementation. However, we have notyet ported the security protocols of Ensem-ble to C-Ensemble. Hence,iSAN currentlyuses the Ensemble/user (v1.33) for the securityexperiments while the other experiments aredone with C-Ensemble/kernel. Sanlet uses asimple Target Emulator software, written fromscratch to be Ensemble friendly, to emulate theSCSI target.

4.1 SCSI Encapsulation

In an earlier implementation ofiSAN, the I/Oaccess commands exchanged between the Ini-tiator and the emulated Target [32] are encap-sulated using iSCSI. However this approach(Fig. 3) had serious problems. The Target mi-dlevel (kernel) thread – that otherwise handles

proved upon andoptimisedversion of Horus, we optedfor Ensemble.

the commands, has to now send the commandsto local Sanlet/Ensemble stack usinganotherTCP/LLC socket. This setup incurs unnec-essary copy/computation overhead that couldhave been avoided if one had developed an En-semble compatible Target code which is eventdriven and is tightly integrated with Ensemble.

In the latter version ofiSAN, the emulatedTarget is integrated with the Ensemble proto-col stack (Fig. 3b). Sanlet, upon initialization,creates a TCP socket through which the Ini-tiator can talk to the Sanletdirectly; the con-nectedsocket thus created is added to the En-semble’s socket pool. Basic target processingis handled in thesocket handlerfor that socket.Ideally this socket should have been an LLCsocket while TCP is chosen to ease the imple-mentation; all the I/O command transmissionis handled with the same encapsulation pro-tocol (Fig. 4): one on TCP (between Initiatorand Target/Sanlet) and the other over LLC (be-tween Sanlets).

4.2 iSAN Configuration

The Sanlet in a VLAN can be configured bytwo means: a relatively static Sanlet configu-ration could be realised by usingiSAN San-let configuration scriptthat a Sanlet uses toinitialize itself; in a more dynamic configu-ration, the storage-consumer can instruct theSanlet, at runtime, using well defined set ofin-band control messages. When commissioned,the Sanlet reads the configuration script andinitializes itself with the necessary parame-ters. iSAN configuration script also defineswhether the Sanlet should accept in-band con-trol messages or not. Sanlet script iniSANcomes in two syntactic flavours: The first oneis expressed as a sequence of[parameter =value] pairs (Fig. 8.1). The second one hasCstructure/union like syntax (Fig 8.2).

Page 8: iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

5 Case Studies

This section discusses the implementation ofthree critical storage services iniSAN andshows how theiSAN approach is better com-pared to the traditional approaches. Theservices implemented areconcurrency con-trol, storage securityandstable storage. Wedemonstrate that these three critical services,though seemingly dissimilar, can be imple-mented efficiently in aniSAN. We believe thatthis diversity speaks for the generality and ex-pressibility ofiSAN.

5.1 Concurrency Control

In a distributed, shared storage system likeSAN, the logical view of the storage seen bythe storage-consumer can be very much dif-ferent from the physical view. For example,what the storage-consumers see as a contigu-ous blocks of storage may not be contiguousat all; worse, may not even be from a singlestorage device. This is because the underlyingstorage system may transparently offer func-tionalities like striping and virtualization. Incase of striping, the logical byte/block rangegiven by the storage-consumer may need to besplit into a number of stripes and each of stripeI/O may need to be doneseparately. Also,storage systems might impose hidden relation-ships among the stored data, for example, inthe form ofsharedparity blocks which needscareful interleaving of I/O accesses. Thus, un-less proper care is taken to resolve concurrentstripe accesses, the storage-consumer may seeinconsistent data irrespective of the fact thatthe storage-consumer may themselves be or-chestrating some concurrency control mecha-nism at the higher levels [2].

Concurrency control is the activity of co-ordinating the actions of processes that oper-ate in parallel, access shared data, and there-fore potentially interfere with each other [6].The unit of a concurrent access, called transac-tion, consists of several lower level operationswhich are expected to be executedatomically.

There are four types of concurrency controlschemes that are prevalent in the literature –locking, timestamp ordering(TO)4, optimisticand hybrid.5 Detailed discussions concerningthese protocols can be found in [6] and [46].

Of the four afore mentioned schemes, TOemerges out as the optimal mechanism forshared storage ([2], [46]). However, thereare atleast three problems that are associ-ated with TO. First, it requires synchronizedclocks. Highspeed networks like SAN maynot increase the synchronization accuracy dra-matically for atleast two reasons: synchro-nization accuracy is bound by message delayvariance and not by the absolute delay ([28],[7]); also, clock synchronization messages –being few tens of bytes long, may not availsignificant latency reduction even in gigabitnetworks. However, highspeed networks willneed to handlehighernumber of active trans-actions in a given time-slice and hence requirebetterclocks. Second, if the transactions areto come in some wildly different order fromthe original issue order, TO will reject manytransactions. Studies like [12] show that prob-ability of such an occurrence could be high.The magnitude of network reordering dependson the existence of redundant links, their con-figuration and network load; not all of themare completely controllable. Thirdly, the orderof transaction executions as governed by theTO scheduler may or may not be conformingto certain expected ordering like causal order[24], depending upon the granularity of clocksynchronization and the delay characteristicsprevailing. Fig. 5 depicts how causal violationcould happen in a master slave clock synchro-nization setting. Thus, if one needs any deter-ministic ordering of messages, TO cannot beused to enforce such ordering reliably.

iSAN uses message ordering protocols forconcurrency control6. It is understood that dif-ferent ordering mechanisms – FIFO, casual,

4The timestamp ordering discussed here and else-where in the paper is Basic Timestamp Ordering.

5We assume strict schedules.6We assume that the storage device commits oper-

ations in the issue order. This is not a unreasonable

Page 9: iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

causally constrained total order and uncon-strained total order7 – guarantee different con-sistency semantics with cost and strictness in-creasing in that order.iSAN being storage-consumer aware, deploys thesuitablemessageordering thatsufficesthe storage-consumer’srequirements. To this effect, we have con-sidered five classes of storage-consumers andmapped their requirements to particular mes-sage ordering protocol.

5.1.1 Parallel File Systems

Parallel File Systems (PFS) cater to the I/O re-quirements of multi processor/computer sys-tems that exhibit significantly different I/O ac-cess profile compared to the traditional filesys-tems. Traditionally, PFS is organised as a setof clients – where the applications run, andservers – which serve the storage. Most PFSsdo not favour client side caching and it is tobe noted that the PFS clientscan tolerate mi-nor inconsistencies in the shared state whenthe conflicts occur. The clients may coop-erate in accessing a shared (sub)file and anyconflict resolution done at the subfile levelthus becomes redundant. For concreteness, weconsider Sun PFS [34], a Parallel File Sys-tem implementation from SUN. In Sun PFSthe servers are called I/O Daemons (IOD). IniSAN, the IODs reside in the edge switchesthat are adjoining the actual storage. The filesharing occurs at byte ranges. PFS clients donot expect IODs toorder conflicting writes.All the IOD has to ensure is the atomicity ofindividual stripe accesses. In a PFS VLAN,iSAN will not provide any ordering save theFIFO ordering of stripe updates between theclient and the IOD. This provides the expectedbehaviour with little or no additional cost. Ifone is to deploy TO or strict 2PL, the overheadis very likely to be high as it providesstricter

assumption as most of the commercial RAID systemsguarantee this.

7A total order that respects causal order is referred ascausally constrained total order while a total order thatmay not respect causal order is referred unconstrainedtotal order.

consistency guarantees than what is actuallyneeded. Thus, by making use of the seman-tics of PFS like filesystem,iSAN increasestheamount of concurrency available for sharedaccesses, leading to better performance.

5.1.2 Database Systems

Database systems do not favour one serialschedule over the other: all the strict execu-tions are taken to be equally correct. Yet, in or-der to avoid distributed deadlocks, a DB maywant to prune a total order out of conflict-ing transactions as in TO. However, the over-head of synchronizing clocks could be avertedif one is to use unconstrained total orderingprotocol in place of clocks. The Sanlet con-necting DB to storage will thus need to en-force the unconstrained total order and, this asa side effect, will solve the problem of mis-ordered transactions. [46] provides a total or-der based concurrency control protocol that isreadily deployable iniSAN.

5.1.3 Replication

Replication is an area of interest to bothfilesystems and databases and hence is of in-terest toiSAN. Replication protocols8 come invariety of forms and hues, differing in aspectslike models, assumptions, mechanisms, guar-antees provided, and implementation. Therehave been efforts to classify them and thereader may find [50] and [51] useful. Manyreplication schemes, notably the lazy schemes,will require certain ordering of updates andin iSAN this ordering is effected by using thecausally constrained or unconstrained total or-dering.

5.1.4 Hybrid Storage

In many of the conventional SAN based archi-tectures, SAN is hidden behind the fileserversor database servers and the effective through-put seen by theclients is thus still limited

8The ensuing discussion assumes block level repli-cation.

Page 10: iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

at the rate at which these storage-consumersare able to cater to the requests. One way tosolve the problem is to let the clients to accessstorage directly while expecting the storage-consumer to maintain the necessary metadataincluding the block map information; once theclient gets the metadata information from thestorage-consumer, say after a file open, it canaccess the storage directly. Any metadata dataupdate willstill involve the storage-consumerwhile the data is accessed directly from thestorage. This architecture we name as HybridStorage.

Given that metadata server and the clientmay both access the storage simultaneously,one need to ensurecorrect interleaving ofthese operations; fig. 6 depicts one such prob-lem case that would arise otherwise. A closerexamination of fig. 6 reveals that the prob-lem is indeed due to causal violation:deleteF1 should have been deliveredafterwrite F1message as the lattercausally precedesthe for-mer. So, in hybrid storage, Sanlets will en-force causal ordering of requests. Such mech-anisms would improve the asynchrony of thesystem while adding very little overhead.

5.1.5 Log Enhanced Filesystems

Ordering of I/O requests is beneficial not onlywhen the storage is accessed concurrently bytwo or more storage-consumers; it is use-ful even when the storage remain accessi-ble to only one storage-consumer like long-enhanced filesystems. For instance VxFS – alog enhanced filesystem, needs to commit themetadata changes to logbeforeit starts mak-ing changes to the on-disk filesystem structure.But in order to ensure that the log writes reachthe disk before filesystem updates, the log isusually writtensynchronously; actual filesys-tem updates are scheduled after their corre-sponding log record isstable. Thus for ev-ery metadata change, the filesystem suffers asynchronous log write. But, if the underly-ing SCSI layer is to provide FIFO orderingof commands, the filesystem need not have to

write the log records synchronously; it onlyhas toqueuethe log write before the corre-sponding metadata update. Since the SCSIlayer assures FIFO delivery of commands, bythe time the metadata updates reach the disk,the previously scheduled log writes wouldhave reached the disk too. Thus, providingFIFO ordering at SCSI layer would improvethe observed filesystem throughput. In a sim-ilar way, in iSAN, the Sanlet can provide theFIFO ordering of I/O requests to avoid the re-liability induced synchronous writes in a jour-naling filesystem.

5.2 Storage Security

Organizations increasingly depend on theirstorage infrastructure for storing critical infor-mation. Thus the I/O subsystem shouldunder-standthe sensitivity of the data its serving andshould ensure confidentiality, integrity, andavailability of the data both in-storage and in-transit. The following sections discusses thesecurity features deployed iniSAN.

5.2.1 Zoning

Traditional SANs permits logical grouping ofresources, both computing and storage, usingzoning. Zoning splits the SAN into subnet-works and the resources present in one zoneare usually inaccessible from outside. Group-ing of resources can be done using ports orthe endpoint addresses. While the former typeof zoning, calledhard zoning, is more se-cure and less manageable in a large setting,the latter, calledsoft zoning, is vulnerable tospoofing based attacks but more easily man-ageable. FC provides both types of zoning andthe implementation is highly proprietary: dif-ferent vendor’s zoning implementations maynot interoperate. iSCSI provides only soft zon-ing. In iSAN, VLAN naturally provides ef-ficient, highly interoperable zoning. VLANscould be pruned using ports, MAC addressor higher level protocol identifiers This alsomakes VLANs a more flexible zoning mech-

Page 11: iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

anism. Presently,iSAN, depending upon thesensitivity of the zones, provides either soft orhard zoning.

5.2.2 LUN Masking

With LUN masking, storage-consumers arerestricted to access only those logical stor-age devices which are assigned to them. Itcould also be used to assign appropriatelysized pieces of storage from a common stor-age pool to various storage-consumers; it alsooften times governs the fail-over process whena component in the storage path fails. Maskingoffers finer control of resources at the expenseof more complex administration. IniSAN,LUN masking is implemented at the ingressSanlet. The ingress Sanlet filters the noncom-plying accesses from the storage-consumerand to the storage. This method is independentof the storage-consumer/storage used and isscalable. Also, in our approach,only the San-lets are committed to enforce correct maskingbehaviour.

5.2.3 Cryptographic Security

The security mechanisms afore mentioned fa-cilitate segregation and/or aggregation of stor-age resources. These mechanisms provide ba-sic security; but both the data in-transit andin-store are still seen as plain text and are li-able to many attacks. The common attackson such SAN are spoofing, snooping, corrupt-ing, and denial of service. Given these threats,the basic security primitives that are requiredto build a secure SAN are message confiden-tiality, peer authentication, data integrity andavailability.

In iSAN, the confidentiality of the data isprovided by encrypting the data in-transit andin-store. The edge Sanlet acts as asecuritygateway; the inter Sanlet transmission is cryp-tographically protected using 3DES and MD5.The shared key for this purpose is arrived atusing Ensemble’s well understood security in-

frastructure9. The group members authenti-cate each other using Pretty Good Privacy. En-semble’s security infrastructure [38] is bothmembership and multicast aware. It providesboth forward and backward secrecy.

Sanlets in a VLAN respect the fortressmodel of security: only authenticated Sanletscan enter the VLAN and once admitted theSanlet is trusted for its life inside a VLAN.Though there are ”better” security models, likebyzantine model, that are arguably morereal-istic, such models may not scale well. Also,since if one is to assume the Sanlets/switchesto be byzantine, then even basic primitiveslike routing need to be handled under simi-lar footings. We believe that fortress modelis sufficient for most practical purposes sinceSAN is a controlled environment. Also, it isto be noted that iniSAN storage-consumerstrust the Sanlets while the Sanlets do not trustthe storage-consumers; this asymmetry givesa measure of deterrence should a storage-consumers be compromised. We assume thatthe switches are configurable only though atrusted management console.

5.2.4 Smart Security

The distinguishing aspect ofiSAN security isthe ability to enforce security policiesselec-tively anddynamically. We consider two casestudies that demonstrate these abilities. Firstis about protecting different data streams se-lectively. It is long understood that protect-ing stored data cryptographically is intrinsi-cally different from protecting data in tran-sit [42]: modes like Cipher Block Chaining(CBC), though are more secure compared toElectronic Code Book (ECB), cannot be usedto protect stored data; this is because storageaccess could be random while efficient imple-mentation of CBC requires sequential access.Also using CBC would introduce complex

9The shared key thus arrived at is used to exchangekeys which will be used to protect persistent/on-storedata. The shared key is used to encrypt the data streamwhen only on-wire data protection is needed.

Page 12: iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

data dependencies that would require costlyhandling mechanisms like copy-on-write andindexing for correctness. Also, CBC imple-mentations is not as readily parallelizable aswell as ECB.

However it has been observed that storageaccesses are, to a greater extend, sequential([5], [31]). Thus CBC could still be used to se-lectively protect that part of storage-consumerdata that is accessedsequentially. As prelim-inary case study, we have implemented sucha selective treatment for journal records of alog enhanced filesystem. Journal records arecritical for filesystem security and containallthe metadata changes that occur in a filesys-tem. Besides, journal records are highly struc-tured and this makes them especially vulnera-ble to dictionary attacks – the kind of attacksthat CBC could handle better. In most contem-porary journal based filesystems like VxFS,a contiguouschunk of storage is allocatedstatically for journal usually during filesys-tem creation. Thus, given the block numberin transit, a Sanlet could unambiguously dis-cern the log records and could treat them dif-ferently: in our case 3DES CBC is used forlog blocks while other blocks are treated with3DES ECB. This selective treatment does notincur any significant overhead; the paralleliz-ability edge of ECB is of no avail as journalsare synchronously written and, at any point intime, onlyvery fewjournal blocks are in tran-sit. Thus one would expect little or no extraoverhead in providing this extra level of secu-rity. The start and end of the journal shouldbe specified in theiSAN configuration scriptin order for the Sanlet to tell apart the journalblocks.

The dynamic in-band messaging could beused to selectively protect data that do notneed confidentiality guarantees; for such dataiSAN will provide in-transit integrity guar-antee alone. The data whose confidential-ity makes little sense are the public domaindata files – like RFCs, open/free software files,globally accessible executables, and othermiscellaneous files like font files. Since this

file level information is dynamic,iSAN scriptcan not be put to use to discern the to-be-secure and to-be-plain flows. However, thedriver on the storage-consumer side, when de-livered the buffer for transmission, could tracethe filename and its attributes from the bufferheader. In our case, this information is suf-ficient for the driver todrive the Sanlet ap-propriately; a rudimentary in-band messagingprotocol is developed and is used to instructthe Sanlet to dynamically protect the con-tent; depending upon the hint provided, San-let can chose to encrypt or pass the data unaf-fected. Since cryptographic security is costly,one would expect significant performance im-provement by doing so.

5.3 Stable Storage

There is a growing number of systems forwhich the cost of unpredictable, potentiallyhazardous failures or system service unavail-ability can be very significant. However, thetask of understanding and designing systemsthat tolerate failures is notoriously difficult:one has to stay in control of not only the stan-dard system activities when all componentsare well, but also of the complex situationswhich can occur when some components fail[16]. Complexity of this task would be re-duced if the subsystems from which the sys-tem is composed arefailure resilient. In thatway, the system architect is left with fewerfailure cases that he/she needs to handle. If thesubsystem that needs to be hardened is stor-age, the corresponding failure resilient, ideal-ized version of storage is called stable storage.This part of the paper discusses how it is im-plemented iniSAN.

5.3.1 TransientStable Storage

Most of the research in stable storage con-centrates mainly on improving the reliabilityof persistent storage devices by augmenting itwith software or hardware layer that protectsthe underlying raw storage against a suitable

Page 13: iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

class of failures [25]. Though this approxima-tion provides sufficient reliability guarantees,it could be inefficient for certain class of data.For instance, consider the case where one is tomake the transient data – which has only tem-poral validity – stable by writing it to a persis-tent stable storage. In this case, the data neednot have to be written to a persistent storage atall: the stability of the data written can as wellbe provided by replicating the data at multi-ple remote memory locations whose decay setis essentially non-overlapping. If the remotemachines main memory is to be used to storethe data, one would expect a low-latency, highthroughput stable storage implementation; inthis paper, the stable storage founded on mainmemory is calledtransientstable storage. Thisidea was demonstrated in the stable storagecontext by [14].

But we are aware neither of any stable stor-age implementation using virtual synchronynor of any work that distinguishes between thetransient and enduring data sets when provid-ing the stable storage. Conversely,iSAN of-fers two types of stable storage to the storage-consumers and both of them provide good ap-proximation of the expected stable storage se-mantics. The type of the stable storage de-pends on the data that is being stored; the firsttype of storage provided is the traditional sta-ble storage based on RAID5. The second isbased on the replicated main memory. In lat-ter, the remote memory of the Sanlets whichare part of the virtually synchronous group isused for this purpose.

As a concrete case, log records of VxFSare entrusted to transient stable storage sincethe journal data ceases to have any usabilityvalue once the home location update is com-plete. Such a preferential treatment of logrecords is beneficial especially for metadataintensive workloads and synchronous writesintensive workloads like NFS server. Since thejournal is of fixed size, the amount of mem-ory that needs to be committed in the San-lets is bounded; once the log wraps around,the Sanlets can start reusing the space avail-

able in the head of the log. The wrap aroundevent is clearly identifiable because the suc-cessive journal block numbers are guaranteedto be monotonically increasing and any breakin monotonicity is due to log wrap-around.

The Sanlet, upon receiving a stable writerequest for journal blocks from a storage-consumer, multicasts the data to all the con-nected Sanlets. The receiving Sanlets storethe message received in a volatile buffer thatis addressable using the sender id, before ac-knowledging the multicast10. For stable readrequests, the read request is multicast to theSanlets and the first reply is returned to thestorage-consumer. The Sanlet failures are as-sumed to beuncorrelated. The FIFO or-dered multicast provides the atomicity of thewrite operations and virtual synchrony pro-vides consistent ordering of failures to themembers of the group. The atomic multicastemployed isdynamically non-uniformwhich,we believe, is powerful enough to provide themutual consistency of the processes that arecorrect.

The problem of network partitions is han-dled by maintaining2f + 1 copies of data totoleratef Sanlet failures [6].f is configurabledynamically and it defaults to one. Depend-ing upon the number of machines that remainsoperational/connected after a failure,iSAN ei-ther switches to persistent stable storage or tocontinue with the transient stable storage. Lat-ter is the case when the active partition, theone that houses the storage-consumer that usesthe stable storage, has the required number ofconnected/available Sanlets. Whensufficientnumber of Sanlets are not available, the log en-tries whose updates have not reached the homelocation are flushed to a persistent storage de-vice and system continues operation with per-sistent stable storage.

10this explicit ack can be turned on/off.

Page 14: iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

6 Performance

This section describes the experimental setupand results of the conducted experiments.The setup consists of Intel machines (≥700Mhz) running Linux (v2.4) acting as “edgeswitches”; these edge switches are connectedusing a 100Mbps Ethernet. The Target and Ini-tiator are housed in thesameswitch to reducethe network interruption; it is a priori observedthat doing so does not significantly change theperformance profile. A FC JBOD, organizedas a RAID 5 with three disks, is connectedto one of the ”edge switches” and is used asthe data store11. Block level traces for VxFSare generated in a Ultra Sparc machine run-ning Solaris by running 4 benchmarks – ssh,ssl, gcc and postmark [23]. The other tracesused are the HP traces [40].

In all the experiments,iSAN script containsthe needed information like the start and endof journal blocks, the protocol stack that needsto be commissioned and the list of storage-consumers that span the VLAN. In selectiveencryption case, the needed policy is enforcedusing in-band messages. Following tables de-picts the results of the experiments conducted.All of them depict the percentage of through-put improvement achieved.

Table 1 shows that therelativeoverhead ofdifferent ordering protocols (with FIFO as thebase) is indeed significant; the rows are in-dexed by the cardinality of the group and thetraces used are HP traces. Since we did nothave any shared traces, we used the 3 diskstreams in HP to emulate shared access. Theresults signify that the storage consumers in-deed benefit from the selective deployment ofordering mechanisms. For instance, causallyconstrained total order is costlier by 30% com-pared to FIFO for a group size of 3. How-ever, increasing the group cardinality reducesthis performance disparity. This is due to thefact that 100Mbps Ethernet do not have effi-

11The JBOD is connected to the edge switch us-ing copper media over which SCSI commands are ex-changed using FCP.

cient flow control. Lack of proper flow controlat the lower layer significantly penalises thelow overhead/high throughput FIFO orderingand it explains the reduction in relative over-head. Table 2 compares the performance ofordered log writes compares and synchronouslog writes. This supports our argument thatordering of I/O commands helps even in a nondistributed setting as well.

For the security experiments, the through-put difference between ECB-for-all and CBC-for-journal-alone are observed to be with in3 – 5% range. iSAN thus achieves in-creased security at almost negligible cost. Theresults of experiments wherein the storage-consumer/driver controlling the Sanletdynam-ically using in-band messages are depicted intable 3. The experiment is conducted by al-lowing roughly 7%12 of the block access tobe transmitted in plain text; this is becausethe block level traces generated do not havefile names and their attributes. The 7% ofblock access that are transmitted in plain isuniformly distributed across the total accesses.The results show that the throughput improve-ment observed is indeed significant.

The stable storage experiments were con-ducted by varyingf ; its is observed that thethroughput improvements arrived at are notsignificantly affected by the cardinality of thereplication group that serves as the transientstable store. The results are depicted in Table4.

In addition to the summarised performancenumbers reported in the tables, we have addeda separate, elaborate section on graphs. Pleaseconsult the section?? for further details.

7 Related Work

Intelligence in iSAN is achieved using dy-namic protocol composition which standscomfortably midway between the Turing com-

12This number we have arrived at after observing theamount ofpublic domain datathat we found in our labmachines.

Page 15: iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

plete Active Networks [49] and the quasi-static Programmable Networks[10]. Compo-sition, unlike Programmable Networks, pro-vides non trivial specialization, yet, unlike Ac-tive Networks, can be very efficient and se-cure. A work that is very similar in spirit to theproposed architecture is that of Virtual Over-lay Networks (VON) [9]. However, our workis novel for many reasons: First, the architec-ture proposed in [9] is generic while our archi-tecture is tuned to the requirements of SAN.Secondly, [9] is abstract and leaves many en-gineering issues like selection of the minimalOverlay Network, the vantage point where thecode-stubs to be deployed, mode of contact-ing the code stub etc. open. Our architec-ture is more concrete, pinning down these cru-cial design parameters. Thirdly, to our knowl-edge, ours is thefirst application/realization of[9]. Collectively, our work could be seen asa harmonious integration of mechanisms likeDifferentiated Service [11], Application LevelActive Networks [19] and VON [9].

Facade [27] aims at providing avirtual-izedstorage that guarantees QoS for compet-ing storage-consumers. Facade does so by in-tercepting every I/O request that is passed andby suitably throttling the request. As an in-stance of attribute based storage, Facade pro-vides only statistical QoS guarantees while theattributes handled byiSAN are concurrencycontrol, security and stable storage. Besides,iSAN can be naturally extended with little ef-fort to provide better isolation guarantees as ithandles application/SCSI level routing usingVLANs.

Petal [26] consists of a collection of con-nected servers that cooperatively manage apool of physical disks. The storage thus gar-nered is exported as highly available virtualdisks. However, unlikeiSAN, Petal passnearly all the data in plain and it does not dis-criminate between storage-consumers.

SSD [45] acts as pseudo block device tointerpose the read/write requests between thefilesystem and disk. The access trace thus ob-tained is used to finger print filesystem and

to infer the filesystem structures. IniSAN,the Sanlet script serves the purpose. Alsothe optimizations employed iniSAN are verygeneric and can be easily ported to a largeset of filesystems.iSAN handles distributedfilesystems as well.

8 Conclusions

In this paper we have: designed and im-plemented an intelligent SAN architecture;demonstrated how this architecture can beused to efficiently solve some of the criticalproblems associated with conventional SANs;and evaluated the suitability of solutions. Wewould like to conclude thatiSAN approachof architecting SAN shows great promise asa means of constructing efficient, yet, flexibleSANs.

We are presently planning to portiSAN togigabit Ethernet. However, in long run, wewould like to extendiSAN design to investi-gate the following aspects. First of all, we areplanning add a scalable and manageable vir-tualization architecture intoiSAN. The ideais tp balance the virtualization overhead withthe consumer-awareness ofiSAN to arrive ata near zero cost virtualization scheme. Alsowe are planning to investigate how improvingasynchrony/concurrency at lower levels wouldimprove performance in the higher layers andto suggest such a scheme as a design princi-ple. Finally, throughiSAN research, we areattempting to understand the synergy betweenvirtual synchrony and filesystems.

Page 16: iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

References

[1] B. Aboba, Joshua Tseng, Jesse Walker,Venkat Rangan, and Franco Travostino.Securing block storage protocols over IP,December 2002.

[2] Khalil Amiri, Garth Gibson, and RichardGolding. Highly concurrent shared stor-age. InICDCS, April 2000.

[3] H. Attiya and J. Welch. Sequentialconsistency versus linearizability.ACMTOCS, 12(2), May 1994.

[4] Ashar Aziz, Tom Markson, andHemma Prafullchandra. Simplekey management for internet protocols(http://www.skip.org/), 1998.

[5] M Baker, J Hartman, M Kupfer,K Shirriff, and J Ousterhout. Measure-ments of a distributed file system. In13thACM SOSP, December 1991.

[6] Philip A. Bernstein, Vassos Hadzilacos,and Nathan Goodman. ConcurrencyControl and Recovery in Database Sys-tems. 1987.

[7] S Biaz and J L. Welch. Closed formbounds for clock synchronization undersimple uncertainty assumption. Info.Processing Letters, 2001.

[8] K. Birman and T. Joseph. Exploitingvirtual synchrony in distributed systems.In 11th ACM SIGOPS SOSP, November1987.

[9] Ken Birman. Technology requirementsfor virtual overlay networks.IEEE Sys-tems, Man and Cybernetics: Special is-sue on Information Assurance, Vol. 31,No 4, July 2001.

[10] Jit Biswas, Jean-Francois Huard, AurelLazar, Koonseng Lim, Semir Mahjoub,Louis-Francois Pau, Masaaki Suzuki,Soren Torstensson, Wang Weiguo, andSteve Weinstein. Application program-ming interfaces for networks - IEEEP1520, January 1999.

[11] S. Blake, D. Black, M. Carlson,E. Davies, Z. Wang, and W. Weiss. Anarchitecture for differentiated services -RFC 2475, December 1998.

[12] C. Bouras and P. Spirakis. Performancemodeling of distributed timestamp order-ing: Perfect and imperfect clocks. InPerformance Evaluation Journal, Else-vier Science, April 1996.

[13] R Braden. TIME-WAIT assassinationhazards in tcp, 1992.

[14] F. V. Brasileiro, W. Cirne, E. B. Passos,and T. S. Stanchi. Efficient stable stor-age through data replication. Technicalreport, UFPB/CCT/DSC/LSD, 2001.

[15] Thomas Clark.A Guide to iSCSI, iFCP,and FCIP Protocols for Storage AreaNetworks. Addison-Wesley, 2002.

[16] Flaviu Cristian. Understanding fault-tolerant distributed systems.Communi-cations of the ACM, 1991.

[17] Niels Ferguson, , and Bruce Schneier.A cryptographic evaluation of IPSec,February 1999.

[18] R. Friedman and R. van Renesse. Strongand weak virtual synchrony in Horus. In15th IEEE SRDS, 1996.

[19] M. Fry and A. Ghosh. Application levelactive networking.Computer Networks,31(7), July 1999.

[20] Andrew Gallatin, Jeff Chase, and KenYocum. Trapeze/IP: TCP/IP at near-gigabit speeds. InUSENIX TechnicalConference, June 1999.

[21] M. Hayden.The Ensemble System. PhDthesis, Department of Computer Science,Cornell University, 1998.

[22] ISO. Logical link control - ISO/IRC8802-2.

[23] J. Katcher. Postmark: A new file systembenchmark. Technical Report TR3022,Network Appliance Inc., october 1997.

[24] Leslie Lamport. Time, clocks, and the or-dering of events in a distributed system.Communication of the ACM, vol. 21, no.7, July 1978.

[25] B. Lampson and H. Sturgis. Crash recov-ery in a distributed system. Technical re-port, Computer Science Laboratory, Xe-rox, Palo Alto Research Center, 1976.

Page 17: iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

[26] Edward K. Lee and Chandramohan A.Thekkath. Petal: Distributed virtualdisks. In Proceedings of the SeventhInternational Conference on Architec-tural Support for Programming Lan-guages and Operating Systems, pages84–92, Cambridge, MA, 1996.

[27] Christopher Lumba, Arif Merchant, andGuillermo Alvarez. Facade: Virtual stor-age devices with performance guaran-tees. InFile and Storage technologies,March 2003.

[28] J Lundelius and N Lynch. An upper andlower bound for clock synchronization.Information and Control, Vol. 62, Nos.2/3, September 1984.

[29] David Mosberger. Memory consistencymodels. Operating Systems Review,1993.

[30] Herlihy M.P and Wing J.M. Lineariz-ability: a correctness condition for con-current objects. ACM Transactions onProgramming Languages and Systems,12(3), October 1990.

[31] J Ousterhout, H Da Costa, D Harrison,J Kunze, M Kupfer, and J Thompson.A trace driven analysis of the UNIX 4.2BSD file system. In10th ACM SIGOPSSOSP, December 1985.

[32] A Palekar, N Ganapathy, A Chadda, andR D. Russel. Design and implementationof a Linux SCSI target for storage areanetworks. In5th Annual Linux Showcase& Conference, November 2001.

[33] G. Papadopoulos. Moore’s law ain’tgood enough. InKey-note address at HotChips X, August 1998.

[34] A Sun White Paper. SunTMparallel filesystem, June 1999.

[35] Radia Perlman and Charlie Kaufman.Analysis of the ipsec key exchange stan-dard. IEEE Internet Computing 4(6),November 2000.

[36] Stephen Pink.TCP/IP on Gigabit Net-works. Kluwer Academic Publishers,October 1993.

[37] R Van Renesse, Ken Birman, B Glade,K Guo, M Hayden, T Hickey, D Malki,A Vaysburd, and W Vogels. Horus: Aflexible group communications system,1995.

[38] Ohad Rodeh, Ken Birman, and DannyDolev. The architecture and performanceof security protocols in the ensemblegroup communication system. Techni-cal Report TR2000-1822, Department ofComputer Science, Cornell University,October 2001.

[39] Ohad Rodeh, Kenneth P. Birman, MarkHayden, Zhen Xiao, and Danny Dolev.Ensemble security. Technical ReportTR98-1703, Department of ComputerScience, Cornell University, September1998.

[40] Chris Ruemmler and John Wilkes. Unixdisk access patterns. Technical report,HP labs, December 1992.

[41] J. H. Saltzer, D. P. Reed, and D. D. Clark.End-to-end arguments in system design.ACM TOCS 2(4), November 1984.

[42] Jerome H. Saltzer. Hazards of file en-cryption, May 1981.

[43] Julian Satran, Kalman Meth, Costa Sa-puntzakis, Mallikarjun Chadalapaka, andEfri Zeidner. iSCSI, 2001.

[44] W A Simpson. IKE/ISAKMP considereddangerous, June 1999.

[45] Muthian Sivathanu, Vijayan Prab-hakaran, Florentina I. Popovici, Tim-othy E. Denehy, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau.Semantically-smart disk systems. InSecond USENIX Conference on Fileand Storage Technologies (FAST 2003),March 2003.

[46] Rashmi Srinivasa. Network-AidedConcurrency Control in DistributedDatabases. PhD thesis, University ofVirginia, January 2002.

[47] Nishan Systems. Data storage anywhere,any time - metro and wide area storagenetworking with nishan systems ip stor-age switches, 2000.

[48] ANSI NCITS T10/1144D. Fibre channelprotocol for scsi, second version (FCP-2), revision 5, November 2000.

[49] David L. Tennenhouse, Jonathan M.Smith, W. David Sincoskie, David J.Wetherall, and Gary J. Minden. A sur-vey of active network research.IEEECommunications Magazine Vol. 35, No.1, January 1997.

Page 18: iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

[50] M Wiesman, F Pedone, A Schiper,B Kemme, and G Alonso. Databasereplication techniques: a three parameterclassification. In19th IEEE SRDS, Octo-ber 2000.

[51] M Wiesmann, F Pedone, A Schiper,B Kemme, and G Alonso. Understandingreplication in databases and distributedsystems. InICDCS, April 2000.

[52] Richard Winter and Kathy Auerbach.The big time. InWinter VLDB Survey,1998.

Page 19: iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

List of Figures

iSAN

StorageStorage

VxFSVxFS

VxFS VLAN

Oracle DB Server

Figure 1: Usage of VLANs iniSAN

T

SCPS

SPS

LLC/VLAN

Sanlet Sanlet

SC−Storage Consumer; S−Storage; PS−Protocol Stack; T−Target

Figure 2: Flow of data iniSAN

Page 20: iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

SCPS

TCP

Sanlet

T

SC

TCP

T PS

TCP

Sanlet

Sanlet : Target and Protocol Stack Synergism

(a)

(b)

Figure 3: Flow of Data Inside a Sanlet

OPERATION CODE

LOGICAL BLOCK ADDRESS

TRANSACTION ID

LOGICAL UNIT NU,MBER

TRANSFER LENGTH

8 16 240

FRAGMENT ID / FLAG

Figure 4: SCSI encapsulation iniSAN

T2

T0

T1

P

δ

ρ

δ

δ

30P 1P P2

Figure 5: MasterP1 broadcasts the recent clock readingδ, which includes the driftρ, to the slaves.Ti represents the clock reading prefixed to the message(mi) sent. Since these two events, clockcorrection and message exchange, happens essentially independently, causality violation may oc-cur.

Page 21: iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

Client

Del F1

Del F2

Write F1

Data StoreMeta data Server

Figure 6: Hybrid System Request Ordering: After sending thewrite F1 to the data store, theapplication deletes the file (delete F1) and creates a new file usingcreate F2. The metadata serverallocates the deallocated blocks fromF1 to F2. Thus a delayedwrite F1 can corrupt theF2’s datathat was freshly written, unless proper ordering is ensured.

Page 22: iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

Sample Sanlet Scripts

8.1 Parameter = Value Format

FS = VxFS#VxFS paramsVxFS = Varda; LOG DSK = Manwe;FS DSK = Manwe; JSTART = 17; JEND = 1040;SECSIZE = 512; DEV = raw1;#Trace paramsNAME = ssh; COMPRESS = 10; SMIN = 10;RAID = 3r5;TXN STEP = 1000; FPORT = 7777;MSECCNT = 512;#Netw paramsHRTBT = 0.001; CREDITS = 64;#VLAN paramsID = 3; SEC = NO; STACK = [VSync];NCLIENTS = 2; MACH0 = Varda; MACH1 = Manwe; MACH2 = Yavanna;

8.2 C Structure Format

params sanlet = {"VxFS",1, VxFS ID,

/* fs info */{{"Varda", "Manw e", "Manw e", 17, 1040 }},/* dev info */{512, "/de/raw/raw1" },/* trace info */{"ssh", 10, 10, "3r5", 1000,7777,512 },/* net info */{1000, 64 },/* vlan info */{3, 0, "VSync", 2, {"Varda", "Manw e", "Yavanna" }},

};

Page 23: iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

List of Tables

fifo causal total total+causal

#3 % 0 7.36 17.43 30.06#4 % 0 3.64 16.04 20.23

Table 1: CCTRL - Relative cost of different ordering protocols (with FIFO as base)

ssl ssh gcc postmark

% 11.46 20.90 13.64 15.05

Table 2: CCTRL - Ordered Vs Sync Writes - throughput improvement observed

ssl ssh gcc postmark

28.90 29.67 16.67 11.29

Table 3: SEC - Throughput improvement for 7% plain data

Page 24: iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

f ssl ssh gcc postmark

1 % 21.30 14.45 11.43 11.71

2 % 21.44 13.91 11.53 11.73

Table 4: STABLE - Transient Vs Persistent Stable Storage

Page 25: iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

List of Graphs

The experimental setup consists of four machines–Nienna, Varda, Manwe andUlmo [(700Mhz-1.2GHz), (256MB-1G) RAM], acting as switches, thus housing the Sanlets. The Sanlets form anEnsemble group, equipped with appropriate protocol stacks. The machines are connected using a100Mbps shared Ethernet.Ulmo is physically connected to the disk and the disk is shared acrossthe other machines that send the I/O access commands – read from the trace file – to the shareddisk thoughiSAN. The trace is fed to the Sanlet using an external TCP connection. Though thetrace feeder can reside anywhere in the network, it is deployed in the same machine as that of theSanlet to minimize the network interference. This we callembedded trace feeding. It has beenobserved, a priori, that theload on the system with an embedded trace-feeder did not vary muchwhen compared both to an ideal machine and a synthetic trace generator that was used to testthe setup for any implementation bugs13. Co-locating the trace-feeder and the Sanlet, in order tominimize the network interference, did not increase the load of the system significantly. Yet, as aprecaution, the machines with higher computing power and fast PCI/IDE are used as trace feederswherever applicable.

8.3 Concurrency Control

Fig. 7, 8, 9 & 10 show the results of experiments conducted. The first set of experiments arerepeated by varying the cardinality of Sanlet group membership and the protocol stack. Each ofthe Sanlets is fed with one particular disk’s operations from HP trace; in case of cardinality 3 run,trace of one of the disks is kept back while all the trace data is used for cardinality 4 run; machineNiennais not used in cardinality 3 experiment. The results are plotted as individual graphs (Fig.7 & 8) ranked by their cardinality; each graph depicts the result of experiments conducted withfour different message ordering. The different orderings used are causally constrained total order,unconstrained total order, causal order, and FIFO order.

The benefits of providing FIFO at the block level is demonstrated in the second set of graphs(Fig. 9 & 10). The graphs depict how FIFO ordering enforced at the Sanlet improves the through-put, esp., that of metadata intensive workloads. Four different traces – ssh, ssl, gcc and postmark,are used and throughput observed is plotted as four graphs.

13The precursor tests were conducted with synthetically generated work-loads with varying parameters; since syn-thetic trace feeder consume more CPU cycles as compared to pre-collected traces, the impact of the trace feeder onCPU could only get lesser as we move from synthetic to pre-collected traces.

Page 26: iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

0

200

400

600

800

1000

1200

1400

0 20000 40000 60000 80000 100000 120000

No of Accesses

Ordering CostsT

ime

Tak

en (

secs

)

fifo causal total total+causal

Figure 7: Throughput Comparison of Different Message Orderings – #3

0

500

1000

1500

2000

2500

3000

3500

4000

0 50000 100000 150000 200000 250000 300000 350000 400000 450000

No of Accesses

Ordering Costs

Tim

e T

aken

(se

cs)

f ifo causal total total+causal

Figure 8: Throughput Comparison of Different Message Orderings – #4

Page 27: iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

0

100

200

300

400

500

600

700

0 2000 4000 6000 8000 10000 12000 14000 16000

Number of Accesses

Tim

e T

aken (

secs)

syncfifo

0

100

200

300

400

500

600

700

800

0 2000 4000 6000 8000 10000 12000 14000

Number of Access

Tim

e T

aken (

secs)

syncfifo

(a) ssh (b) ssl

Figure 9: Ordered Writes Vs Sync Writes

0

500

1000

1500

2000

2500

3000

0 10000 20000 30000 40000 50000 60000 70000

Number of Accesses

Tim

e T

ake

n (

se

cs)

syncfifo

0

1000

2000

3000

4000

5000

6000

0 50000 100000 150000 200000 250000

Number of Access

Tim

e T

ake

n (

se

cs)

syncfifo

(c) gcc (d) postmark

Figure 10: Ordered Writes Vs Sync Writes

Page 28: iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

8.4 Storage Security

The experimental setup consists of 3 machines–Ulmo, VardaandManwe, with Vardaacting as theswitch connecting storage-consumer andUlmo playing the role of the switch conterminous withthe actual storage. Adding another switch to the group is to make the setting more realistic andto capture the communication overhead incurred for having a non trivial group configuration. Thetrace feeder that does embedded tracefeeding is housed inVarda.

The necessary shared DES key and the IV are established using the secure channel providedby Ensemble; the group members are authenticated using PGP and the messages are signed withMD5. The Sanlet was configured (usingiSAN configuration script) with the start and number ofsectors in the journal which it uses to distinguish log data from other data. This is an example ofstatic security configuration that has been mentioned before.

The first set of graphs (Fig. 11 & 12) shown compares the performance of application awaresecurity iniSAN with plain security: in one case the journal traffic is treated with CBC while allremaining data are treated with ECB; in the other case ECB is used for all data that has been trans-mitted. The measurements were taken with both LUN masking and Zoning enabled. Individualgraphs depict the behaviour of the particular trace used.

The next set of graphs (Fig. 13 & 14) depict how performance can be improved ifiSAN se-lectively encrypts the data blocks. The author has gathered information about how much of thedata that present in the local system ispublic, .i.e. those data whose confidentialityneed notbeenforced using cryptographic methods. The public domain data includes stored documents of pub-licly available standard’s like RFC, local documentation, C header files, LATEX and system fontfiles. The amount of public data varies from 10% to 15% across different machines. Thus, in thetraces used, roughly 7% – as a conservative estimate, of the accesses are treated as public data andarenot cryptographically protected. The remaining accesses are protected using 3DES in ECBmode. Thepublic accessare randomly generated and the Sanlet isinstructedusing in-band in-formation to select the security modes; this is an example of how Sanlet could be used to applydifferent security transformations dynamically using the in-band information.

Page 29: iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

0

100

200

300

400

500

600

0 2000 4000 6000 8000 10000 12000 14000 16000

No of Accesses

Tim

e T

aken (

sec)

cbcecb

0

100

200

300

400

500

600

700

0 2000 4000 6000 8000 10000 12000 14000 16000

No of Accesses

Tim

e T

aken (

sec)

cbcecb

(a) ssh (b) ssl

Figure 11: Overhead of CBC for journal

0

500

1000

1500

2000

2500

0 10000 20000 30000 40000 50000 60000 70000

No of Accesses

Tim

e T

ake

n (

se

c)

cbcecb

0

1000

2000

3000

4000

5000

6000

7000

0 50000 100000 150000 200000 250000 300000

No of Accesses

Tim

e T

ake

n (

se

c)

cbcecb

(a) gcc (b) postmark

Figure 12: Overhead of CBC for journal

Page 30: iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

0

100

200

300

400

500

600

0 2000 4000 6000 8000 10000 12000 14000 16000

No of Accesses

Tim

e T

aken (

sec)

enc = 100%enc = 93%

0

100

200

300

400

500

600

700

0 2000 4000 6000 8000 10000 12000 14000 16000

No of Accesses

Tim

e T

aken (

sec)

enc = 100%enc = 93%

(a) ssh (b) ssl

Figure 13: 100% encryption Vs 93% encryption

0

500

1000

1500

2000

2500

0 10000 20000 30000 40000 50000 60000 70000

No of Accesses

Tim

e T

ake

n (

se

c)

enc = 100%enc = 93%

0

1000

2000

3000

4000

5000

6000

7000

0 50000 100000 150000 200000 250000 300000

No of Accesses

Tim

e T

ake

n (

se

c)

enc = 100%enc = 93%

(a) gcc (b) postmark

Figure 14: 100% encryption Vs 93% encryption

Page 31: iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

8.5 Stable Storage

The test setup consists of 6 machines acting as switches—Ulmo, Varda, Nienna, YavannaandManwe. For persistent stable storage experiments, the disk connected toUlmowas used. The tran-sient stable storage experiments were conducted by multicasting the log writes to all the membersof the Sanlet group. The membership information is kept track using virtual synchrony mecha-nism. The graphs ( Fig. 15, 16, 17 & 18) are indexed by the cardinality of the group; each graphcompares the persistent stable and transient stable storage observational data for different tracesemployed.

Page 32: iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

0

500

1000

1500

2000

2500

3000

0 10000 20000 30000 40000 50000 60000 70000

No of Accesses

Tim

e T

ake

n (

se

c)

persistenttransient

0

100

200

300

400

500

600

700

800

0 2000 4000 6000 8000 10000 12000 14000 16000

No of Accesses

Tim

e T

ake

n (

se

c)

persistenttransient

(a) gcc (b) ssl

Figure 15: Transient Vs Persistent Stable Storage [Cardinality #3] – gcc & ssl

0

100

200

300

400

500

600

0 2000 4000 6000 8000 10000 12000 14000 16000

No of Accesses

Tim

e T

aken (

sec)

persistenttransient

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

0 50000 100000 150000 200000 250000 300000

No of Accesses

Tim

e T

aken (

sec)

persistenttransient

(a) ssh (c) postmark

Figure 16: Transient Vs Persistent Stable Storage [Cardinality #3] – gcc & ssl

Page 33: iSAN - An intelligent Storage Area Network Architecturedrona.csa.iisc.ernet.in/~gopi/docs/iSAN-hipc04.pdfiSAN - An intelligent Storage Area Network Architecture Ganesh Narayan K Gopinath

0

500

1000

1500

2000

2500

3000

0 10000 20000 30000 40000 50000 60000 70000

No of Accesses

Tim

e T

ake

n (

se

c)

persistenttransient

0

100

200

300

400

500

600

700

800

0 2000 4000 6000 8000 10000 12000 14000 16000

No of Accesses

Tim

e T

ake

n (

se

c)

persistenttransient

(a) gcc (b) ssl

Figure 17: Transient Vs Persistent Stable Storage [Cardinality #5] – gcc & ssl

0

100

200

300

400

500

600

0 2000 4000 6000 8000 10000 12000 14000 16000

No of Accesses

Tim

e T

aken (

sec)

persistenttransient

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

0 50000 100000 150000 200000 250000 300000

No of Accesses

Tim

e T

aken (

sec)

persistenttransient

(a) ssh (c) postmark

Figure 18: Transient Vs Persistent Stable Storage [Cardinality #5] – gcc & ssl