Highly Available Services and Transactions with Replicated Data Jason Lenthe
Jan 19, 2016
Highly Available Services and Transactions with Replicated Data
Jason Lenthe
Highly Available Services (1 of 17)
● What is availability?– The percentage of time that a service is “up”
● What is highly available?– Availability close to 100%– With reasonable response times– May not conform to sequential consistency
Highly Available Services (2 of 17)
● The Gossip Architecture– Is a framework for implementing highly available
services– Replica Managers periodically “gossip” with each to
convey updates they have received
Gossip
Front Ends
Clients
RMRM
RM
Highly Available Services (3 of 17)● The Gossip Architecture (con't)
– Outline for Processing Queries and Updates:● 1) Request – Front end sends request to a replica manager● 2) Update Reponse – If request is a update, the replica
manager replies when it has the request● 3) Coordination – Replica managers “gossip” (send gossip
messages)● 4) Execution – Replica manager executes the request● 5) Query Response – If request is a query, the replica
manager responds● 6) Agreement – More gossip messages may be sent out.
Generally, a lazy approach is taken
Highly Available Services (4 of 17)
● The Gossip Architecture (con't)– Each Front End maintains a vector timestamp for each
value that it has accessed● Contains last update for each replica manager● Is sent as part of a query/update
– Each Replica Manager uses the received vector timestamp is find out if they are up-to-date
● If they are not, they can wait for updates or request them explicitly
Highly Available Services (5 of 17)
● The Gossip Architecture (con't)– Examples of the Gossip architecture?
● Textbook is skimpy on examples in this section, but... ● Suggests a bulletin board service
– Clients may have a different view of the bulletin board at any time, if the network is partitioned
– All messages will eventually be propagated to each replication manager
Highly Available Services (6 of 17)
● The Gossip Architecture – Conclusions– Clients can operate when is partitioned network (as
long as 1 replica manager is accessible).– Lazy approach makes it inappropriate for near-real
time collaboration– Not particularly scalable
● 2 + (R – 1)/G equals the number of messages transmitted per update where R = number of replica manager and G = number of updates packed into a gossip message
Highly Available Services (7 of 17)
● The Bayou System– Another framework for providing highly available
services– Uses Operational Transformation
● Allows domain-specific conflict detection and conflict resolution
Highly Available Services (8 of 17)
● The Bayou System (con't)– Updates have two states:
● Tentative – may be undone or reapplied as the system becomes consistent
● Committed – cannot be undone
Highly Available Services (9 of 17)
● The Bayou System (con't)– Uses application specific dependency checks and
merge procedures● Dependency checks determine is an new update conflicts
with an update that has already been applied● Merge procedure produces a new update that does not
conflict with the previous update
Highly Available Services (10 of 17)
● The Bayou System – Conclusions– Uses application-specific logic to produce an
eventually sequentially consistent state– Complicated for the application programmer and user
● Programmer needs to provide dependency check and merge procedures
● User needs to deal with tentative data
– Generally limited to applications where● Conflicts are rare● Data semantics are simple
Highly Available Services (11 of 17)
● The Coda File System– Coda is basically a highly available version of AFS– Aims to provide constant data availability
● Good for mobile environments● Follows an optimistic strategy – conflicts are not likely
Highly Available Services (12 of 17)
● The Coda File System (con't)– Architecture
● Venus – client process● Vice – server process● Volume Storage Group (VSG) – the set of servers that
have a copy of a particular file volume● Available Volume Storage Group (AVSG) – the subset of
the VSG for a file volume that is accessible
Highly Available Services (13 of 17)
● The Coda File System (con't)– Basic Operation
● On open: – Venus gets the file from its local cache or – Determines which server in the AVSG has the most recent version
(the preferred server) and gets the file (and callback promises) from there
● On close (after modification):– Venus sends the updated file to everyone in the AVSG using
multicast RPC– But, some servers might be in the AVSG of this client...
Highly Available Services (14 of 17)
● The Coda File System (con't)– Venus periodically sends out of probe for each file in
its cache– This determines that AVSG for each file– Each server responds with a version CVV
● Contains summary of all files in the volume● Mismatches are detected
Highly Available Services (15 of 17)
● The Coda File System (con't)– Disconnected operation is supported (AVSG is
empty)– User specifies which files Venus should make
available during periods of disconnectivity– When connectivity is restored, the reintegration
process begins– Conflicts are detected and files are flagged for manual
integration
Highly Available Services (16 of 17)
● The Coda File System (con't)– Performance: Coda vs. AFS
● With no replication: about the same● With three-fold replication:
– For 5 users, Coda increases benchmark time by 5%– Going to 50 users, Coda increase benchmark time by 70% while
AFS increases it by 16%
Highly Available Services (17 of 17)
● The Coda File System – Summary– Coda FS provides a highly available filesystem which
works during periods of disconnectivity– Requires some user interaction
● Identifying files to be available during disconnectivity● Manually resolving occasional update conflicts
– Does not perform as well as AFS
Transactions with Replicated Data (1 of x)
● The goal of normal distributed transactions is serial equivalence
● When replicated data is involved, one-copy serializability is needed
● Which means the effect of the transactions is the same as if they were– Performed one at a time– On a single set of objects
Transactions with Replicated Data (2 of x)
● Architectural Issues– Eager vs. Lazy Update Propagation
● Eager – propagate updates to replica manager during the transaction (before commit)
● Lazy – commit the transaction and propagate updates later
– Two Phase Commit Protocol needed– Primary Copy Replication
● Only one replica manager at a time can interact with front ends
● All other replica managers are backups (could be the primary if the current one fails)
Transactions with Replicated Data (3 of x)
● Schemes for Dealing with Network Partitions– Available copies with validation– Quorum consensus– Virtual Partition
Transactions with Replicated Data (3 of x)
● Available Copies with Validation Method– Reads are serviced by any available replica manager– Updates must be performed by all available replica
managers (some replica managers may be unavailable)
– When the network is partitioned each partition can carry out transactions
– When the network is fixed, conflicts may have arisen● Conflicts are eliminated by aborting one of the transactions
Transactions with Replicated Data (4 of x)
● Quorum Consensus Method– Only one of the network partitions has the right to
carry on with transactions– When the network is fixed replica managers are
brought up-to-date with those in the quorum– Quorum is determined by a voting algorithm which is
applied on each operation request
Transactions with Replicated Data (4 of x)
● Virtual Partition Method– Combines Available copies method with Quorum
consensus method– New virtual partition created on write failure– If a virtual partition has a quorum, transactions can
proceed