Highly Available Services and Transactions with Replicated Data Jason Lenthe

Highly Available Services and Transactions with Replicated Data

Jason Lenthe

Highly Available Services (1 of 17)

● What is availability?– The percentage of time that a service is “up”

● What is highly available?– Availability close to 100%– With reasonable response times– May not conform to sequential consistency


● The Gossip Architecture– Is a framework for implementing highly available

services– Replica Managers periodically “gossip” with each to

convey updates they have received

Gossip

Front Ends

Clients

RMRM

RM

Highly Available Services (3 of 17)● The Gossip Architecture (con't)

– Outline for Processing Queries and Updates:● 1) Request – Front end sends request to a replica manager● 2) Update Reponse – If request is a update, the replica

manager replies when it has the request● 3) Coordination – Replica managers “gossip” (send gossip

messages)● 4) Execution – Replica manager executes the request● 5) Query Response – If request is a query, the replica

manager responds● 6) Agreement – More gossip messages may be sent out.

Generally, a lazy approach is taken


● The Gossip Architecture (con't)– Each Front End maintains a vector timestamp for each

value that it has accessed● Contains last update for each replica manager● Is sent as part of a query/update

– Each Replica Manager uses the received vector timestamp is find out if they are up-to-date

● If they are not, they can wait for updates or request them explicitly


● The Gossip Architecture (con't)– Examples of the Gossip architecture?

● Textbook is skimpy on examples in this section, but... ● Suggests a bulletin board service

– Clients may have a different view of the bulletin board at any time, if the network is partitioned

– All messages will eventually be propagated to each replication manager


● The Gossip Architecture – Conclusions– Clients can operate when is partitioned network (as

long as 1 replica manager is accessible).– Lazy approach makes it inappropriate for near-real

time collaboration– Not particularly scalable

● 2 + (R – 1)/G equals the number of messages transmitted per update where R = number of replica manager and G = number of updates packed into a gossip message


● The Bayou System– Another framework for providing highly available

services– Uses Operational Transformation

● Allows domain-specific conflict detection and conflict resolution


● The Bayou System (con't)– Updates have two states:

● Tentative – may be undone or reapplied as the system becomes consistent

● Committed – cannot be undone


● The Bayou System (con't)– Uses application specific dependency checks and

merge procedures● Dependency checks determine is an new update conflicts

with an update that has already been applied● Merge procedure produces a new update that does not

conflict with the previous update


● The Bayou System – Conclusions– Uses application-specific logic to produce an

eventually sequentially consistent state– Complicated for the application programmer and user

● Programmer needs to provide dependency check and merge procedures

● User needs to deal with tentative data

– Generally limited to applications where● Conflicts are rare● Data semantics are simple


● The Coda File System– Coda is basically a highly available version of AFS– Aims to provide constant data availability

● Good for mobile environments● Follows an optimistic strategy – conflicts are not likely


● The Coda File System (con't)– Architecture

● Venus – client process● Vice – server process● Volume Storage Group (VSG) – the set of servers that

have a copy of a particular file volume● Available Volume Storage Group (AVSG) – the subset of

the VSG for a file volume that is accessible


● The Coda File System (con't)– Basic Operation

● On open: – Venus gets the file from its local cache or – Determines which server in the AVSG has the most recent version

(the preferred server) and gets the file (and callback promises) from there

● On close (after modification):– Venus sends the updated file to everyone in the AVSG using

multicast RPC– But, some servers might be in the AVSG of this client...


● The Coda File System (con't)– Venus periodically sends out of probe for each file in

its cache– This determines that AVSG for each file– Each server responds with a version CVV

● Contains summary of all files in the volume● Mismatches are detected


● The Coda File System (con't)– Disconnected operation is supported (AVSG is

empty)– User specifies which files Venus should make

available during periods of disconnectivity– When connectivity is restored, the reintegration

process begins– Conflicts are detected and files are flagged for manual

integration


● The Coda File System (con't)– Performance: Coda vs. AFS

● With no replication: about the same● With three-fold replication:

– For 5 users, Coda increases benchmark time by 5%– Going to 50 users, Coda increase benchmark time by 70% while

AFS increases it by 16%


● The Coda File System – Summary– Coda FS provides a highly available filesystem which

works during periods of disconnectivity– Requires some user interaction

● Identifying files to be available during disconnectivity● Manually resolving occasional update conflicts

– Does not perform as well as AFS

Transactions with Replicated Data (1 of x)

● The goal of normal distributed transactions is serial equivalence

● When replicated data is involved, one-copy serializability is needed

● Which means the effect of the transactions is the same as if they were– Performed one at a time– On a single set of objects


● Architectural Issues– Eager vs. Lazy Update Propagation

● Eager – propagate updates to replica manager during the transaction (before commit)

● Lazy – commit the transaction and propagate updates later

– Two Phase Commit Protocol needed– Primary Copy Replication

● Only one replica manager at a time can interact with front ends

● All other replica managers are backups (could be the primary if the current one fails)


● Schemes for Dealing with Network Partitions– Available copies with validation– Quorum consensus– Virtual Partition


● Available Copies with Validation Method– Reads are serviced by any available replica manager– Updates must be performed by all available replica

managers (some replica managers may be unavailable)

– When the network is partitioned each partition can carry out transactions

– When the network is fixed, conflicts may have arisen● Conflicts are eliminated by aborting one of the transactions


● Quorum Consensus Method– Only one of the network partitions has the right to

carry on with transactions– When the network is fixed replica managers are

brought up-to-date with those in the quorum– Quorum is determined by a voting algorithm which is

applied on each operation request


● Virtual Partition Method– Combines Available copies method with Quorum

consensus method– New virtual partition created on write failure– If a virtual partition has a quorum, transactions can

proceed

Highly Available Services and Transactions with Replicated Data Jason Lenthe

Documents