1 Replicated Replicated Information Information Services: Services: Sustaining the Availability Sustaining the Availability of Distributed Storage Across of Distributed Storage Across Dynamic Topological Changes Dynamic Topological Changes Sponsored by Program for Research in Computing and Information Sciences and Engineering (PRECISE) NSF-EIA Grant 99-77071 Jose Torres-Berrocal Dr. Bienvenido Velez- Rivera Research in Process Research in Process
31
Embed
1 Elastically Replicated Information Services: Sustaining the Availability of Distributed Storage Across Dynamic Topological Changes Sponsored by Program.
3 Availability Definition Availability generally refers to the probability (P) that a system is operating correctly at any given moment. AvailableFailed 1 - P P State Diagram
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
11
Elastically Replicated Elastically Replicated Information Services:Information Services:
Sustaining the Availability of Distributed Sustaining the Availability of Distributed Storage Across Dynamic Topological Storage Across Dynamic Topological
ChangesChanges
Sponsored by Program for Research in Computing andInformation Sciences and Engineering
(PRECISE) NSF-EIA Grant 99-77071
Jose Torres-BerrocalDr. Bienvenido Velez-Rivera
Research in ProcessResearch in Process
22
Research ObjectiveResearch Objective
Develop a Method or Algorithm to dynamically Develop a Method or Algorithm to dynamically sustain the availability of a distributed storage sustain the availability of a distributed storage system over a desire threshold value while having system over a desire threshold value while having topology changes.topology changes.
33
Availability DefinitionAvailability Definition
Availability generally refers to the Availability generally refers to the probability (P) that a system is operating probability (P) that a system is operating correctly at any given moment.correctly at any given moment.
A distributed storage cluster (DSC) comprises two or more
storage nodes which function in a coordinated fashion as a single
storage system.
0
N
55
Example of a DSC failuresExample of a DSC failures When a node fails, the objects it contains become unavailableWhen a node fails, the objects it contains become unavailable Thus the SYSTEM becomes unavailableThus the SYSTEM becomes unavailable
DSC with No Redundancy
X1 X2
Failed Node
System Fails due to missing
object
1 2
6650%
Using Replication toUsing Replication toTolerate Failures on a DSCTolerate Failures on a DSC
DSC with Redundancy
X1
X1 X2
X2ObjectReplicas
Object In Failed Node Available at
Another Node
This is what RAID’s
do
Failed Node
No
77
Storage Systems Must Adapt to Storage Systems Must Adapt to ChangesChanges
InternetStore
24/7 operation
Dynamic Changes
Unattended
88
Availability as nodes are addedAvailability as nodes are addedcompared to desired thresholdcompared to desired threshold
Adding nodes changes topology.Adding nodes changes topology. Topology changes could change at any time affecting availability.Topology changes could change at any time affecting availability.
Works with Write intensive as Works with Write intensive as well as Read intensive contextswell as Read intensive contexts
Manage Dynamic changes due Manage Dynamic changes due to the addition of nodesto the addition of nodes
Minimum human interventionMinimum human intervention
2323
Preliminary ConclusionsPreliminary Conclusions Availability decreases rapidly as nodes are added when using a
constant replication value on the System and maximum usability
An ERIS type method is needed.
The utilization of the System is a counter part of the availability, meaning that at increasing utilization, decreasing availability.
What actually makes the system vulnerable in terms of utilization is that the more places where the objects can be located the more opportunity is to lose an object.
The region or group of nodes where the fewer replicas are is the predominant point of failure of the System (The chain breaks on the weakest link).
2424
2525
2626
Current Methods CharacteristicsCurrent Methods Characteristics Pre Dynamic MethodsPre Dynamic Methods
Fit characteristicsFit characteristics Distributed StorageDistributed Storage Controlled RedundancyControlled Redundancy
Partial Fit characteristicsPartial Fit characteristics Works with Write intensive as well as Read intensive contexts – Works with Write intensive as well as Read intensive contexts –
Depends on pre configured parameter according to a priori studiesDepends on pre configured parameter according to a priori studies
Unfit characteristicsUnfit characteristics 24/7 operation – Has to stop operation to allow changes to pre 24/7 operation – Has to stop operation to allow changes to pre
configuration parametersconfiguration parameters Don’t manage dynamic incidental changes to any number of nodesDon’t manage dynamic incidental changes to any number of nodes Not fully automaticNot fully automatic
2727
Consensus Based CharacteristicsConsensus Based Characteristics