SPORC: Group Collaboration using Untrusted Cloud Resourcesmfreed/docs/sporc-osdi10.pdf · Ariel J. Feldman, William P. Zeller, Michael J. Freedman, and Edward W. Felten Princeton

Appeared in 9th USENIX Symposium on Operating SystemsSystems Design and Implementation (OSDI ’10)

SPORC: Group Collaboration using Untrusted Cloud Resources

Ariel J. Feldman, William P. Zeller, Michael J. Freedman, and Edward W. FeltenPrinceton University

Abstract

Cloud-based services are an attractive deploymentmodel for user-facing applications like word processingand calendaring. Unlike desktop applications, cloud ser-vices allow multiple users to edit shared state concurrentlyand in real-time, while being scalable, highly available,and globally accessible. Unfortunately, these benefitscome at the cost of fully trusting cloud providers withpotentially sensitive and important data.

To overcome this strict tradeoff, we present SPORC, ageneric framework for building a wide variety of collabo-rative applications with untrusted servers. In SPORC, aserver observes only encrypted data and cannot deviatefrom correct execution without being detected. SPORCallows concurrent, low-latency editing of shared state,permits disconnected operation, and supports dynamicaccess control even in the presence of concurrency. Wedemonstrate SPORC’s flexibility through two prototypeapplications: a causally-consistent key-value store and abrowser-based collaborative text editor.

Conceptually, SPORC illustrates the complementarybenefits of operational transformation (OT) and fork*consistency. The former allows SPORC clients to executeconcurrent operations without locking and to resolve anyresulting conflicts automatically. The latter prevents amisbehaving server from equivocating about the order ofoperations unless it is willing to fork clients into disjointsets. Notably, unlike previous systems, SPORC can auto-matically recover from such malicious forks by leveragingOT’s conflict resolution mechanism.

1 Introduction

An emerging class of cloud-based collaborative services,such as online document processing and calendaring, pro-vides users with anywhere-available, real-time, and con-current access to shared state. Their deployments on man-aged cloud platforms enjoy global accessibility, high avail-ability, fault tolerance, and elastic resource allocation andscaling. Yet these benefits have come at the cost of havinga fully trusted server, creating a risk of privacy problemsdue to server-side information leaks. The history of suchservices is one rife with unplanned data disclosures andmalicious break-ins [24]. Indeed, the very centralizationof information makes cloud providers high value targetsfor attack. Further, the behavior of service providers them-

selves is a source of users’ privacy angst, as privacy poli-cies may be weakened due to market expediencies. Finally,cloud providers face pressure from government agenciesworld-wide to release information on demand [15].

This paper challenges the belief that applications mustsacrifice strong security and privacy to enjoy the benefitsof cloud deployment. We present a system, SPORC, thatoffers managed cloud-based deployment for group col-laboration services, yet does not require users to trust thecloud provider to maintain data privacy or even to oper-ate correctly. SPORC’s cloud servers see only encrypteddata, and clients will detect any deviation from correctoperation (e.g., adding, modifying, dropping, or reorder-ing operations) and will recover from the error. Muchlike SUNDR [24], SPORC bases its security and privacyguarantees on the security of users’ cryptographic keys,and not on the cloud provider’s good intentions nor onsome threshold-like protocol between servers [9] that issusceptible to administrative or software attacks.

SPORC provides a generic collaboration service inwhich users can create a document, modify its access con-trol list, edit it concurrently, experience fully automatedmerging of updates, and even perform these operationswhile disconnected. The SPORC framework supports abroad range of collaborative applications. Data updatesare encrypted before being sent to a cloud-hosted server.The server assigns a total order to all operations and re-distributes the ordered updates to clients. If a maliciousserver drops or reorders updates, the SPORC clients candetect the server’s misbehavior, switch to a new server,restore a consistent state, and continue. The same mech-anism that allows SPORC to merge correct concurrentoperations also enables it to transparently recover fromattacks that fork clients’ views.

From a conceptual distributed systems perspective,SPORC demonstrates the benefit of combining opera-tional transformation [11] and fork* consistency proto-cols [23]. Operational transformation (OT) defines aframework for executing lock-free concurrent operationsthat both preserves causal consistency and converges to acommon shared state. It does so by transforming opera-tions so they can be applied commutatively by differentclients, resulting in the same final state. While OT origi-nated with decentralized applications using pairwise rec-onciliation [11, 18], recent systems like Google Wave [44]have used OT with a trusted central server that orders andtransforms clients’ operations. Fork* consistency, on the

1

other hand, was introduced as a consistency model forinteracting with an untrusted server: If the server causesthe views of two clients to diverge, the clients must eithernever see each others’ subsequent updates or else identifythe server as faulty.

Recovering from a malicious fork is similar to recon-ciling concurrent operations in the OT framework. Upondetecting a fork, SPORC clients use OT mechanisms toreplay and transform forked operations, restoring a consis-tent state. Previous applications of fork* consistency [23]could only detect forks, but not resolve them.

This paper makes the following contributions:

§2 We identify and explore the conceptual connectionbetween operational transformation protocols and thefork* consistency model, and use this connection tomotivate SPORC’s design.

§3 We describe SPORC’s framework and protocols forreal-time collaboration. SPORC provides securityand privacy against both an untrusted server that me-diates communication and other clients that lack ac-cess control permissions.

§4 We demonstrate how to support dynamic access con-trol, which is challenging because SPORC supportsconcurrent operations and offline editing.

§5 We describe how clients can detect and recoverfrom maliciously-instigated forks. We also present acheckpoint mechanism that reduces saved client stateand minimizes the join overhead for new clients.

§6 We illustrate the extensibility of SPORC’s pluggabledata model by building both a key-value store and abrowser-based collaborative text editor. We imple-ment these services as both stand-alone applicationsand web services; the latter run in a browser, executein JavaScript (compiled from Java via GWT [12]),and require no prior installation.

We evaluate SPORC’s performance in Section 7 beforediscussing related work and concluding.

2 System ModelThe purpose of SPORC is to allow a group of users whotrust each other to collaboratively edit some shared state,which we call the document, with the help of an untrustedserver. SPORC is comprised of a set of client devicesthat modify the document on behalf of particular users,and a potentially-malicious server whose main role is toimpose a global order on those modifications. The serverreceives updates from individual clients, orders them, andthen broadcasts them to the other clients. Access to thedocument is limited to a set of authorized users, but eachuser may be logged into arbitrarily many clients simul-taneously (e.g., her desktop, laptop, and mobile phone).

Each client, even if it is controlled by the same user asanother client, has its own local view of the document thatmust be synchronized with all other clients.

2.1 GoalsWe designed SPORC with the following goals in mind:

Flexible framework for a broad class of collabora-tive services. Because SPORC uses an untrusted serverwhich does not see application-level content, the server isgeneric and can handle a broad class of applications. Onthe client side, SPORC provides a library suitable for useby a range of desktop and web-based applications.

Propagate modifications quickly. When a client isconnected to the network, its changes to the shared stateshould propagate quickly to all other clients so that clients’views are nearly identical. This property makes SPORCsuitable for building collaborative applications requiringnearly real-time updates, such as collaborative text editingand instant messaging.

Tolerate slow or disconnected networks. To allowclients to edit the document while offline or while experi-encing high network latency, clients in SPORC update thedocument optimistically. Every time a client generates amodification, the client applies it immediately to its localstate, and only later sends it to the server for redistribu-tion. As a result, clients’ local views of the document willinvariably diverge, and SPORC must be able to resolvethese divergences automatically.

Keep data confidential from the server and unau-thorized users. Since the server is untrusted, documentupdates must be encrypted before being sent to the server.For efficiency, the system should use symmetric-key en-cryption. SPORC must provide a way to distribute thissymmetric key to every client of authorized users. Whena document’s access control list changes, SPORC mustensure that newly added users can decrypt the entire docu-ment, and that removed users cannot decrypt any updatessubsequent to their expulsion.

Detect a misbehaving server. Even without access todocument plaintext, a malicious server could still do signif-icant damage by deviating from its assigned role. It couldattempt to add, drop, alter, or delay clients’ (encrypted)updates, or it could show different clients inconsistentviews of the document. SPORC must give clients a meansto quickly detect these kinds of misbehavior.

Recover from malicious server behavior. If clientsdetect that the server is misbehaving, clients should beable to failover to a new server and resume execution.Since a malicious server could cause clients to have incon-sistent local state, SPORC must provide a mechanism forautomatically resolving these inconsistencies.

To achieve these goals, SPORC builds on two concep-tual frameworks: operational transformation and fork*consistency.

2

2.2 Operational TransformationOperational Transformation (OT) [11] provides a generalmodel for synchronizing shared state, while allowing eachclient to apply local updates optimistically. In OT, theapplication defines a set of operations from which allmodifications to the document are constructed. Whenclients generate new operations, they apply them locallybefore sending them to others. To deal with the conflictsthat these optimistic updates inevitably incur, each clienttransforms the operations it receives from others beforeapplying them to its local state. If all clients transformincoming operations appropriately, OT guarantees thatthey will eventually converge to a consistent state.

Central to OT is an application-specific transformationfunction T (·) that allows two clients whose states havediverged by a single pair of conflicting operations to re-turn to a consistent, reasonable state. T (op1, op2) takestwo conflicting operations as input and returns a pair oftransformed operations (op′1, op

′2), such that if the party

that initially did op1 now applies op′2, and the party thatdid op2 now applies op′1, the conflict will be resolved.

To use the example from Nichols et al. [30], sup-pose Alice and Bob both begin with the same local state“ABCDE”, and then Alice applies op1 = ‘del 4’ locallyto get “ABCE”, while Bob performs op2 = ‘del 2’ toget “ACDE”. If Alice and Bob exchanged operations andexecuted each others’ naively, then they would end upin inconsistent states (Alice would get “ACE” and Bob“ACD”). To avoid this problem, the application supplies thefollowing transformation function that adjusts the offsetsof concurrent delete operations:

T (del x,del y) =

(del x− 1,del y) if x > y(del x,del y − 1) if x < y(no-op,no-op) if x = yThus, after computing T (op1, op2), Alice will applyop′2 =‘del 2’ as before but Bob will apply op

′1 = ‘del 3’,

leaving both in the consistent state “ACE”.Given this pair-wise transformation function, clients

that diverge in arbitrarily many operations can return to aconsistent state by applying the transformation functionrepeatedly. For example, suppose that Alice has optimisti-cally applied op1 and op2 to her local state, but has yetto send them to other clients. If she receives a new op-eration opnew, Alice must transform it with respect toboth op1 and op2: She first computes (op′new, op

′1) ←

T (opnew, op1), and then (op′′new, op′2)← T (op′new, op2).

This process yields op′′new, an operation that Alice has“transformed past” her two local operations and can nowapply to her local state.

Throughout this paper, we use the notation op′ ←T (op, 〈op1, . . . , opn〉) to denote transforming op past asequence of operations 〈op1, . . . , opn〉 by iteratively ap-

plying the transformation function.1 Similarly, we de-fine 〈op′1, . . . , op′n〉 ← T (〈op1, . . . , opn〉, op) to repre-sent transforming a sequence of operations past a singleoperation.

Operational transformation can be applied in a widevariety of settings, as operations, and the transforms onthem, can be tailored to each application’s requirements.For a collaborative text editor, operations may containinserts and deletes of character ranges at specific cursoroffsets, while for a causally-consistent key-value store,operations may contain lists of keys to update or remove.In fact, we have implemented both such systems on top ofSPORC, which we describe further in Section 6.

For many applications, with a carefully-chosen trans-formation function, OT is able to automatically returndivergent clients to a state that is not only consistent, butsemantically reasonable as well. But for some applica-tions, such as source-code version control, semantic con-flicts must be resolved manually. OT can support suchapplications through the choice of a transformation func-tion that does not try to resolve the conflict, but insteadinserts an explicit conflict marker into the history of oper-ations. A human can later examine the marker and resolvethe conflict by issuing new writes. These write operationswill supercede the conflicting operations, provided that thesystem preserves the global order of committed operationsand the partial order of each client’s operations. Section 3describes how SPORC provides these properties.

While OT was originally proposed for decentralized n-way synchronization between clients, many prominent OTimplementations are server-centric, including Jupiter [30]and Google Wave [44]. They rely on the server to resolveconflicts and to maintain consistency, and are architec-turally better suited for web services. On the flip side, amisbehaving server can compromise the confidentiality,integrity, and consistency of the shared state.

Later, we describe how SPORC adapts these server-based OT architectures to provide security against a mis-behaving server. At a high level, SPORC has each clientsimulate the transformations that would have been appliedby a trusted OT server, using the server only for ordering.But we still need to protect against inconsistent orderings,for which we leverage fork* consistency techniques [23].

2.3 Fork* ConsistencyTo prevent a malicious server from forging or modifyingclients’ operations, clients in SPORC digitally sign alltheir operations with their user’s private key. This is notsufficient for correctness, however: a misbehaving servercould still equivocate and present different clients withdivergent views of the history of operations.

1Strictly speaking, T always returns a pair of operations. For sim-plicity, however, we sometimes write T as returning a single operation,especially when the other is unchanged, as in our “delete char” example.

3

To defend against server equivocation, SPORC clientsenforce fork* consistency [23].2 In fork*-consistent sys-tems, clients share information about their individualviews of the history by embedding it in every operationthey send. As a result, if clients to whom the server hasequivocated ever communicate, they will discover theserver’s misbehavior. The server can still divide its clientsinto disjoint groups and only tell each client about oper-ations by others in its group. But, once the server hasforked two groups in this way, it cannot tell a memberof one group about an operation submitted by anothergroup’s members without risking detection.

As in BFT2F [23], each SPORC client enforces fork*consistency by maintaining a hash chain over its view ofthe committed history. In this context, a hash chain is amethod of incrementally computing the hash of a list ofelements. More specifically, if op1, . . . , opn are the opera-tions in the history, h0 is a constant initial value, and hi isthe value of the hash chain over the history up to opi, thenhi = H(hi−1||H(opi)), where H(·) is a cryptographichash function and || denotes concatenation. When a clientwith history up to opn submits a new operation, it includeshn in its message. On receiving the operation, anotherclient can check whether the included hn matches its ownhash chain computation over its local history up to opn.If they do not match, the client knows that the server hasequivocated.

2.4 The Benefits of Having a ServerSPORC uses a central untrusted server, but the server’ssole purpose is to order and store client-generated opera-tions. This limited role may lead one to ask whether theserver should be removed, leading to a completely peer-to-peer design. Indeed, many group collaboration systems,such as Bayou [43] and Network Text Editor [17], employdecentralized architectures. Decentralized designs are apoor fit, however, for applications in which a user needs atimely notification that her operation has been committedand will not be overridden by another’s (not yet received)operation. For example, to schedule a meeting room, anonline user should be able to quickly determine whetherher reservation succeeded, without worrying if an offlineclient’s request will override hers. Yet this is difficult toachieve without waiting to hear from all (or at least a quo-rum of) other clients, which poses a problem when clientsare regularly offline. In reaction, Bayou delegates com-

2Fork* consistency is a weaker variant of an earlier model called forkconsistency [27]. They differ in that under fork consistency, a pair ofclients only needs to exchange one message to detect server equivocation,whereas under fork* consistency, they may need to exchange two. ForOT systems like ours, this distinction makes little difference becauseclients constantly exchange small messages. On the other hand, fork*consistency permits a one-round protocol to submit operations, ratherthan two. Beyond efficiency, this also ensures that a crashed client cannotprevent the system from making progress.

mits to a (statically) designated, trusted “primary” peer,which is little different from having a server.

SPORC, on the other hand, only requires an untrustedserver for globally ordering operations. Thus, it can lever-age the benefits of a cloud deployment—high availabilityand global accessibility—to achieve timely commits. Weshow in Section 4.2 how SPORC’s centralized server alsohelps support dynamic access control and key rotation,even in the face of concurrent membership changes.

2.5 Deployment and Threat ModelDeployment Assumptions. While most of the paper dis-cusses the SPORC protocol in terms of a single server anda single document, we assume that a cloud-based SPORCdeployment would manage large numbers of users anddocuments by replicating functionality and partitioningstate over many servers. Each document in SPORC can bemanaged independently, leading naturally to the shared-nothing architectures [36] already common to scalablecloud services.

For a client to recover from a misbehaving server, weassume there exists some alternative (untrusted) serverto switch to after a client detects faulty behavior. Thesebackup servers may belong to the same or different admin-istrative domains as the original, depending upon the typeof faults that a SPORC deployment expects to encounter.

Note that even if malicious (Byzantine) behavior amongcloud servers is not a primary concern, this strong threatmodel also covers weaker non-crash failures related toserver misconfiguration, Heisenbugs, or “split-brain” par-titioned behavior. In all cases, failover and recovery isclient driven. Crash failures, unlike Byzantine failures,would not result in forks and could be handled by tra-ditional fault-tolerance techniques (e.g., primary/backupreplication) already employed in cloud services.

Threat Model. SPORC makes the following securityassumptions:

Server: The server is potentially malicious, and a mis-behaving server may be able to prevent progress, but itmust not be able to corrupt the clients’ shared state. Aserver may fork clients’ states, but only within the con-fines of the fork* consistency model. If clients are able tocommunicate either in-band or out-of-band, server equiv-ocation will be detected promptly by at least one client.

The server may be able to learn which users and clientsare sharing a document, but it must not learn what is in thedocument or even the contents of the individual operationsthat the clients submit. Since the server has access to thesize and timing of clients’ operations, it may be able toglean some information about the document via trafficanalysis. Traffic analysis is made more difficult by thefact that encrypted operations do not even reveal whichportions of the shared state they modify. Neverthless, thecomplete mitigation of traffic analysis is beyond the scope

4

of this work, but it would likely involve padding the lengthof operations and introducing cover traffic.

To attack availability, the server may arbitrarily eraseor refuse to return any of the encrypted data that it stores.To mitigate this threat, the encrypted data could be repli-cated on servers in other administrative domains. More-over, each client could replicate its own local state oncloud servers other than the main SPORC server. Notably,SPORC cannot guarantee recovery from every possiblefork, unless every client stores every operation that it hasseen either locally or remotely.

Clients: If a client is logged in as a particular user, thatclient is trusted to exercise the privileges granted to thatuser (e.g., to see the state, modify it, or modify accessprivileges). Otherwise, clients are untrusted, and theyshould not be able to see the document, or to modify thedocument or its access control list, even if they colludewith each other or with the server.

User authentication and keys: We assume that eachuser has a secure public/private key pair, and that clientshave a secure way to verify the public key of other users.

Application code: We assume the presence of a codeauthentication infrastructure that can verify that the appli-cation code run by clients is genuine. This mechanismmight rely on code signing or on HTTPS connections to atrusted server (different from the untrusted server used aspart of SPORC’s protocols).

3 System Design

This section describes SPORC’s design in more detail, in-cluding its synchronization mechanisms and the measuresthat clients implement to detect a malicious server thatmay modify, reorder, duplicate, or drop operations. Thissection assumes that the set of users and clients editing agiven document is fixed; we consider dynamic member-ship in Section 4.

3.1 System OverviewThe parties and stages involved with SPORC operationsare shown in Figure 1. At a high level, the local state ofa SPORC application is synchronized between multipleclients, using a server to collect updates from clients, orderthem, then redistribute the client updates to others. Thereare four types of state in the system.

(1) The local state is a compact representation of theclient’s current view of the document (e.g., the most recentversion of a collaborative-edited text).

(2) The encrypted history is the set of operations storedat and ordered by the server. The payloads of operationsthat change the contents of the document are encryptedto preserve confidentiality. The server orders the opera-tions oblivious to their payloads but aware of the previousoperations on which they causally depend.

Commi%ed History 

Client Applica4on 

Local 

To Srv 

From Srv 

Pending 

Remote 

VD

Local  State 

Client 

Server 

1 

2 

3 

4 

5  6  7 

username clntID 

clntSeqNo prevSeqNo 

payloadE signature seqNo 

prevHC 

Library 

op 

Figure 1: SPORC architecture and synchronization steps

(3) The committed history is the official set of (plain-text) operations shared among all clients, as ordered bythe server. Clients derive this committed history fromthe server’s encrypted history by transforming operations’payloads to reflect any changes that the server’s orderingmight have caused.

(4) A client’s pending queue is an ordered list of theclient’s local operations that have already been appliedto its local state, but that have yet to be committed (i.e.,assigned a sequence number by the server and added tothe client’s committed history).

SPORC synchronizes clients’ local state for a partic-ular document using the following steps, also shown inFigure 1. This section restricts its consideration to interac-tions with a static membership and well behaved server;we relax these restrictions in the next two sections, re-spectively. The flow of local operations to the server isillustrated by dashed blue arrows; the flow of operationsreceived from the server is shown by solid red arrows.

1. A client application generates an operation, appliesit to its local state immediately, and then places it atthe end of the client’s pending queue.

2. If the client does not currently have any operationsunder submission, it takes its oldest queued operationyet to be sent, op, assigns it a client sequence number(clntSeqNo), embeds in it the global sequence numberof the last committed operation (prevSeqNo) alongwith the corresponding hash chain value (prevHC),encrypts its payload, digitally signs it, and transmitsit to the server. (As an optimization, if the clienthas multiple operations in its pending queue, it cansubmit them as a single batched operation.)

3. The server adds the client-submitted op to its en-crypted history, assigning it the next available globalsequence number (seqNo). The server forwards opwith this seqNo to all the clients participating in thedocument.

4. Upon receiving an encrypted operation op, the clientverifies its signature (V) and checks that its clntSe-

5

qNo, seqNo, and prevHC fields have the expectedvalues. If these checks succeed, the client decryptsthe payload (D) for further processing. If they fail,the client concludes that the server is malicious.

5. Before adding op to its committed history, the clientmust transform it past any other operations that hadbeen committed since op was generated (i.e., all thosewith global sequence numbers greater than op’s prev-SeqNo). Once op has been transformed, the clientappends op to the end of the committed history.

6. If the incoming operation op was one that the clienthad initially sent, the client dequeues the oldest ele-ment in the pending queue (which will be the uncom-mitted version of op) and prepares to send its nextoperation. Otherwise, the client transforms op pastall its pending operations and, conversely, transformsthose operations with respect to op.

7. The client returns the transformed version of the in-coming operation op to the application. The applica-tion then applies op to its local state.

SPORC maintains the following invariants with respect tothe system’s state:

Local Coherence: A client’s local state is equivalentto the state it would be in if, starting with an initial emptydocument, it applied, in order, all of the operations in itscommitted history followed by all of the operations in itspending queue.

Fork* Consistency: If the server is well behaved, allclients’ committed histories are linearizable (i.e., for everypair of clients, one client’s committed history is equal toor a prefix of the other client’s committed history). If theserver is faulty, however, clients’ committed histories maybe forked [23].

Client-Order Preservation: The order that a non-malicious server assigns to operations originating froma given client must be consistent with the order that theclient assigned to those operations.

3.2 OperationsSPORC clients exchange two types of operations: docu-ment operations, which represent changes to the contentof the document, and meta-operations, which representchanges to document metadata such as the document’saccess control list. Meta-operations are sent to the serverin the clear, but the payloads of document operations areencrypted under a symmetric key that is shared amongall of the clients but is unknown to the server. (See Sec-tion 4.1 for a description of how this key is chosen anddistributed.) In addition, every operation is labeled withthe name of the user that created it and is digitally signedby that user’s private key. All operations also contain a

unique client ID (clntID) that identifies from which of theuser’s client machines it came.

3.3 The Server’s Limited RoleBecause the SPORC server is untrusted, its role is limitedto ordering and storing the operations that clients submit,most of which are encrypted. The server stores the opera-tions in its encrypted history so that new clients joining thedocument or existing clients that have been disconnectedcan request from the server the operations that they aremissing. This storage function is not essential, however,and in principle it could be handled by a different party.

Notably, since the server does not have access to theplaintext of document operations, the same generic serverimplementation can be used for any application that usesour protocol regardless of the kind of document beingsynchronized.

3.4 Sequence Numbers and Hash ChainsSPORC clients use sequence numbers and a hash chain toensure that operations are properly serialized and that theserver is well behaved. Every operation has two sequencenumbers: a client sequence number (clntSeqNo) which isassigned by the client that submitted the operation, anda global sequence number (seqNo) which is assigned bythe server. On receiving an operation, a client verifiesthat the operation’s clntSeqNo is one greater than the lastclntSeqNo seen from the submitting client, and that the op-eration’s seqNo is one greater than the last seqNo that thereceiving client saw. These sequence number checks en-force the “client order preservation” invariant and ensurethat there are no gaps in the sequence of operations.

When a client uploads an operation opnew to the server,the client sets opnew’s prevSeqNo field to the global se-quence number of the last committed operation, opn, thatthe client knows about. The client also sets opnew’s pre-vHC field to the value of the client’s hash chain over thecommitted history up to opn. A client who receives opnewcompares its prevHC with the client’s own hash chain com-putation up to opn. If they match, the recipient knows thatits committed history is identical to the sender’s committedhistory up to opn, thereby guaranteeing fork* consistency.

A misbehaving server cannot modify the prevSeqNo orprevHC fields, because they are covered by the submittingclient’s signature on the operation. The server can tryto tell two clients different global sequence numbers forthe same operation, but this will cause the two clients’histories—and hence their future hash chain values—todiverge, and it will eventually be detected.

To simplify the design, each SPORC client has at mostone operation “in flight” at any time: only the operationat the head of a client’s pending queue can be sent to theserver. Among other benefits, this rule ensures that oper-ations’ prevSeqNo and prevHC values will always refer

6

to operations that are in the committed history, and notto other operations that are “in flight.” This restrictioncould be relaxed, but only at considerable cost in complex-ity. For similar reasons, other OT-based systems such asGoogle Wave adopt the same rule [44].

Prohibiting more than one in-flight operation per clientis less restrictive than it might seem, as operations can becombined or batched. Like Wave, SPORC includes anapplication-specific composition function, which consoli-dates two operations into one. This can be used iterativelyto combine a sequence of operations into a single one.Further, it is straightforward to batch multiple operationsinto a single logical operation, which is then submitted asa unit. Because operations can be composed or batched, aclient can empty its pending queue every time it gets anopportunity to submit an operation to the server.

3.5 Resolving Conflicts with OTOnce a client has validated an operation received from theserver, the client must use OT to resolve the conflicts thatmay exist between the new operation and other operationsin the committed history and pending queue. These con-flicts might have arisen for two reasons. First, the servermay have committed additional operations since the newoperation was generated. Second, the receiving client’s lo-cal state might reflect uncommitted operations that resideon the client’s pending queue but that other clients do notyet know about.

Before a client appends an incoming operation opnewto its committed history, it compares opnew’s prevSeqNovalue with the global sequence number of the last com-mitted operation. The prevSeqNo field indicates the lastcommitted operation that the submitting client knew aboutwhen it uploaded opnew. Thus, if the values match, theclient knows that no additional operations have been addedto its committed history since opnew was generated, andthe new operation can be appended directly to the commit-ted history. But if they do not match, then other operationswere committed since opnew was sent, and opnew needsto be transformed past each of them. For example, ifopnew has a prevSeqNo of 10, but was assigned globalsequence number 14 by the server, then the client mustcompute op′new ← T (opnew, 〈op11, op12, op13〉) where〈op11, op12, op13〉 are the intervening committed opera-tions. Only then can the resulting transformed operationop′new be appended to the committed history. After ap-pending the operation, the client updates the hash chaincomputed over the committed history so that future incom-ing operations can be validated.

At this point, if op′new is one of the receiving client’sown operations that it had previously uploaded to theserver (or a transformed version of it), it will necessarilymatch the operation at the head of the pending queue.Since op′new has now been committed, its uncommitted

version can be retired from the pending queue, and thenext pending operation can be submitted to the server.Furthermore, since the client has already optimisticallyapplied the operation to its local state even before sendingit to the server, the client does not need to apply op′newagain, and nothing more needs to be done.

If op′new is not one of the client’s own operations, how-ever, the client must perform additional transformations inorder to reestablish the “local coherence” invariant, whichstates that the client’s local state is equal to the in-orderapplication of its committed history followed by its pend-ing queue. First, in order to obtain a version of op′new thatit can apply to its local state, the client must transformop′new past all of the operations in its pending queue. Thisstep is necessary because the pending queue contains oper-ations that the client has already applied locally, but havenot yet been committed and, therefore, were unknown tothe sender of op′new.

Second, the client must transform the entire pend-ing queue with respect to op′new to account for thefact that op′new was appended to the committed history.More specifically, the client computes 〈op′1, . . . , op′m〉 ←T (〈op1, . . . , opm〉, op′new) where 〈op1, . . . , opm〉 is thepending queue. This transformation has the effect of push-ing the pending queue forward by one operation to makeroom for the newly extended committed history. The op-erations on the pending queue need to stay ahead of thecommitted history because they will receive higher globalsequence numbers than any of the currently committedoperations. Furthermore, by transforming its unsent oper-ations in response to updates to the document, the clientreduces the amount of transformation that other clientswill need to do when they eventually receive its operations.

4 Membership Management

Document membership in SPORC is controlled at thelevel of users, each of which is associated with a public-private key pair. When a document is first created, only theuser that created it has access. Subsequently, privilegedusers can change the document’s access control list (ACL)by submitting ModifyUserOp meta-operations, whichget added to the document’s history (covered by its hashchain), much like normal operations.

A user can be given one of three privilege levels:reader, which entitles the user to decrypt the document butnot to submit new operations; editor, which entitles theuser to read the document and to submit new operations(except those that change the ACL); and administrator,which grants the user full access, including the abilityto invite new users and remove existing users. BecauseModifyUserOps are not encrypted, a non-maliciousserver will immediately reject operations from users withinsufficient privileges. But because the server is untrusted,

7

every client maintains its own copy of the ACL, basedon the history’s ModifyUserOps, and refuses to applyoperations that came from unauthorized users.

4.1 Encrypting Document OperationsTo prevent eavesdropping by the server or unapprovedusers, the payloads of document operations are encryptedunder a symmetric key known only to the document’scurrent members. More specifically, to create a new docu-ment, the creator generates a random AES key, encrypts itunder her own public key, and then writes the encryptedkey to the document’s initial create meta-operation. To addnew users, an administrator submits a ModifyUserOpthat includes the document’s AES key encrypted undereach of the new users’ public keys.

If users are removed, the AES key must be changedso that the removed users will not be able to decrypt sub-sequent operations. To do so, an administrator picks anew random AES key, encrypts it under the public keysof all the remaining participants, and then submits theencrypted keys as part of the ModifyUserOp.3 Thismeta-operation also includes an encryption of the old AESkey under the new AES key. This enables later users tolearn earlier keys and thus decrypt old operations, withoutrequiring the operations to be re-encrypted.

SPORC’s model ensures proper access control overoperations, based on how it tracks potential causalitythrough prevSeqNo dependencies. Operations concurrentto a ModifyUserOp removal may be ordered beforeit and remain accessible to the user. However, once aclient sees the removal meta-operation in its committedhistory any subsequent operation the client submits willbe inaccessible to the removed user.

4.2 Barrier OperationsConcurrency also poses a challenge to membership man-agement. Consider the situation when two clients con-currently issue ModifyUserOps that both attempt tochange the current symmetric key. If the server naivelyscheduled one after the other, then the continuous chainof old keys encrypted under new ones would be broken.

To address situations like this, we introduce a primitivecalled a barrier operation. When the server receives anoperation that is marked “barrier” and assigns it globalsequence number b, the server requires that every sub-sequent operation have a prevSeqNo ≥ b. Subsequentoperations that do not are rejected and must be revised andresubmitted with a later prevSeqNo. In this way, the server

3In our current implementation, the size of a ModifyUserOp maybe linear in the number of users participating in the document, becausethe operation may contain the current AES key encrypted under eachof the users’ RSA public keys. An optimization to achieve constant-sized ModifyUserOps could instead use a space-efficient broadcastencryption scheme [6].

can force all future operations to depend on the barrieroperation.4

Let us reconsider the example of two concurrentModifyUserOps, op1 and op2, that are marked as barri-ers. Suppose that the server received op1 first and assignedit sequence number b. Since the operations were submit-ted concurrently, op2’s prevSeqNo will necessarily be lessthan b, and op2 will be rejected. The client attempting tosend op2 must wait until it receives op1, at which time itwill adjust op2 to depend on this operation before resub-mitting (i.e., encrypt op1’s key under its new key, and setop2’s prevSeqNo ≥ b). As a result, the chain of old keysencrypted under new ones will be preserved.

Barrier operations have uses beyond membership man-agement. For example, as described next, they are usefulin implementing checkpoints on the history.

5 ExtensionsThis section describes extensions to the basic SPORCprotocols: supporting checkpoints to reduce the size re-quirements for storing the committed history (Section 5.1),detecting forks through out-of-band communication (Sec-tion 5.2), and recovering from forks by replaying and pos-sibly transforming forked operations (Section 5.3). Ourcurrent prototype does not yet implement these extensions,however.

5.1 CheckpointsIn order to reach a document’s latest state, a new client inour current implementation must download and apply theentire history of committed operations. It would be moreefficient for a new client to instead download a check-point of operations—a compact representation of the doc-ument’s state, akin to each client’s local state—and thenonly apply individual committed operations since the lastcheckpoint. Much as SPORC servers cannot transformoperations, they similarly cannot perform checkpoints;SPORC once again has individual clients play this role.

To support checkpoints, each client maintains a com-pacted version of the committed history up to the mostrecent barrier operation. When a client is ready to uploada checkpoint to the server, it encrypts this compacted his-tory under the current document key. It then creates a newCheckpointOp meta-operation containing the hash ofthe encrypted checkpoint data and submits it into the his-tory. Requiring the checkpoint data to end in a barrieroperation ensures that clients that later use the checkpointwill be able to ignore the history before the barrier withouthaving to worry that they will need to perform OT transfor-mations involving that old history. After all, no operation

4To prevent a malicious server from violating the rules governingbarrier operations, an operation’s “barrier” flag is covered by the opera-tion’s signature, and all clients verify that the server is handling barrieroperations correctly.

8

after a barrier can depend on an operation before it. If themost recent barrier is too old, the client can submit a newnull barrier operation before creating the checkpoint.5

Checkpoints raise new security challenges, however. Aclient that lacks the full history cannot verify the hashchain all the way back to the document’s creation. It canverify that the operations it has chain together correctly,but the first operation in its history (i.e., the barrier op-eration) is “dangling,” and its prevHC value cannot beverified. This is not a problem if the client knows inadvance that the CheckpointOp is part of the valid his-tory, but this is difficult to verify. The CheckpointOpwill be signed by a user, and users who have access to thedocument are assumed to be trusted; but there must be away to verify that the signing user had permission to ac-cess the document at the time the checkpoint was created.Unfortunately, without access to a verifiable history ofindividual ModifyUserOps going back the beginningof the document, a client deciding whether to accept acheckpoint has no way to be certain of which users wereactually members of the document at any given time.

To address these issues, we propose that the server andclients maintain a meta-history, alongside the committedhistory, that is comprised solely of meta-operations. Meta-operations are included in the committed history as before,but each one also has a prevMetaSeqNo pointer to a priorelement of the meta-history along with a correspondingprevMetaHC field. Each client maintains a separate hashchain over the meta-history and performs the same consis-tency checks on the meta-history that it performs on thecommitted history.

When a client joins, before it downloads a check-point, it requests the entire meta-history from the server.The meta-history provides the client with a fork* con-sistent view of the sequence of ModifyUserOps andCheckpointOps that indicates whether the check-point’s creator was an authorized user when the checkpointwas created. Moreover, the cost of downloading the entiremeta-history is likely to be low because meta-operationsare rare relative to document operations.

5.2 Checking for Forks Out-of-BandFork* consistency does not prevent a server from forkingclients’ state, as long as the server never tells any memberof one fork about any operation done by a member of an-other fork. To detect such forks, clients can exchange stateinformation out-of-band, for example, by direct socket

5Having the checkpoint data end in an earlier barrier operationis better than making CheckpointOps into barriers themselves. IfCheckpointOps were barriers, then either the client making the check-point would have to “lock” the history to prevent new operations frombeing admitted before the checkpoint was uploaded, or the system wouldhave to reject checkpoints that did not reflect the latest state, which couldpotentially lead to livelock.

connections, email, instant messaging, or posting on ashared server or DHT service.

Clients can exchange messages of the form 〈c, d, s, hs〉,asserting that in client c’s view of document d, the hashchain value as of sequence number s is equal to hs. Onreceiving such a message, a client compares its own hashchain value at sequence number s with hs, and if thevalues differ, it knows a fork has occurred. If the recipientdoes not yet have operations up to sequence number s, itrequests them from the server; a well behaved server willalways be able to supply the missing operations.

These out-of-band messages should be digitally signedto prevent forgery. To prevent out-of-band messages fromleaking information about which clients are collaborat-ing on a document, and to prevent a client from falselyclaiming that it was invited into the document by a forkedclient, the out-of-band messages should be encrypted andMACed with a separate set of symmetric keys that areknown only to nodes that have been part of the document.6

These keys might be conveyed in the first operation of thedocument’s history.

5.3 Recovering from a ForkA benefit of combining OT and fork* consistency is thatwe can use OT to recover from forks. OT is well suitedto this task because, in normal operation, OT clients areessentially creating small forks whenever they optimisti-cally apply operations locally, and resolving these forkswhen they transform operations to restore consistency. Inthis section, we sketch an algorithm that a pair of forkedclients can use to merge their divergent histories into aconsistent whole. This pairwise algorithm can be repeatedas necessary to resolve forks involving multiple clients, ormulti-way forks.

The basic idea of the algorithm is that the two clientswill abandon the malicious server and agree on a newone. Both clients will roll back their histories to their lastcommon point before the fork, and one of them will uploadthe common history, up to the fork point, to the new server.Finally, each client will resubmit the operations that itsaw after the fork. OT will ensure that these resubmittedoperations are merged safely so that both nodes end up inthe same state.

The situation becomes more complicated if the sameoperation appears in both histories. We cannot just removethe duplicate because later operations in the sequence maydepend on it. Instead, we must cancel it out. To makethis possible, we require that all operations be invertible:

6A client falsely claiming to have been invited into the documentin another fork will eventually be detected when the other clients tryto recover from the (false) fork. However, this is expensive so wewould prefer to avoid it. By protecting the out-of-band messages withsymmetric keys known only to clients who have been in the document atsome point, we reduce the set of potential liars substantially.

9

we must be able to construct an inverse operation op−1

such that applying op followed by op−1 results in a no-op. This is often easy to do in practice by having eachoperation store enough information about the prior stateto determine what the inverse should be. For example, adelete operation can store the information that was deleted,enabling the creation of an insert operation as the inverse.

To cancel each duplicate, we cannot simply splice itsinverse into the history right after it for the same reasonthat we cannot just remove the duplicate. Instead, wecompute the inverse operation and then transform it pastall of the operations following the duplicate. This processresults in an operation that has the effect of canceling outthe duplicate when appended to the end of the sequence.

6 Implementation

SPORC provides a framework for building collaborativeapplications that need to synchronize different kinds ofstate between clients. It consists of a generic server im-plementation and client-side libraries that implement theSPORC protocol, including the sending, receiving, en-cryption, and transformation of operations, as well as thenecessarily consistency checks and document membershipmanagement. To build applications within the SPORCframework, a developer only needs to implement client-side functionality that (i) defines a data type for SPORCoperations, (ii) defines how to transform a pair of opera-tions, and (iii) defines how to combine multiple documentoperations into a single one. The server need not be modi-fied, as it always deals with operations on encrypted data.

6.1 VariantsWe implemented two variants of SPORC: a command-lineversion in which both client and server are stand-aloneapplications, and a web-based version with a browser-based client and a Java servlet. The command-line ver-sion, which we use for later microbenchmarks, is writtenin approximately 5500 lines of Java code (per SLOC-Count [46]) and, for network communication, uses thesocket-based RPC library in the open-source release ofGoogle Wave [16]. Because the server’s role is limited toordering and storing client-supplied operations, its basicimplementation is simple and only requires approximately300 lines of code.

The web-based version shares the majority of its codewith the command-line variant. The server just encap-sulates the command-line server functionality in a Javaservlet. The client consists almost entirely of JavaScriptcode that was automatically generated using the Java-to-JavaScript compiler included with the Google Web Toolkit(GWT) [12]. Network communication uses a combina-tion of the GWT RPC framework, which wraps browserXmlHttpRequests, and the GWTEventService [37],

which allows the server to push messages to the browserasynchronously through a long-lived HTTP connection(the so-called “Comet” style of web programming). Thisprototype could be extended with HTML5’s offline stor-age to provide disconnected operation.

The client’s use cryptographic module was its onlycomponent that could not be translated to JavaScript.JavaScript remains too slow to implement public key cryp-tography efficiently, and browsers lack both secure storagefor cryptographic keys and a secure pseudorandom num-ber generator for key generation. To work around theselimitations, we encapsulate our cryptographic module in aJava applet and implement JavaScript-to-Java communica-tion using the LiveConnect API [28] (a strategy employedin [2, 47]). Our experience suggests it would be beneficialfor browsers to provide a JavaScript API that supportedbasic cryptographic primitives.

6.2 Building SPORC ApplicationsTo demonstrate the usefulness of our framework, webuilt two prototype applications: a causally-consistentkey-value store and a web-based collaborative text editor.The key-value store keeps a simple dictionary—mappingstrings to strings—synchronized across a set of partici-pating clients. To implement it, we defined a data typethat represents a list of keys to update or remove. Wewrote a simple transformation function that implements a“last writer wins” policy, as well as a composition functionthat merges two lists of key updates in a straightforwardmanner. Overall, the application-specific portion of thekey-value store only required 280 lines of code.

The collaborative editor allows multiple users to modifya text document simultaneously via their web browsersand see each other’s changes in near real-time. It pro-vides a user experience similar to Google Docs [14] andEtherPad [13], but, unlike those services, it does not re-quire a trusted server. To implement it, we were able toreuse the data types and the transformation and compo-sition functions from the open-source release of GoogleWave. Although Wave is a server-centric OT system with-out SPORC’s level of security and privacy, we were ableto adapt its components for our framework with only 550lines of wrapper code.

7 Experimental Evaluation

The user-facing collaborative applications for whichSPORC was designed—e.g., word processing, calendar-ing, and instant messaging—require latency that is lowenough for human users to see each others’ updates inreal-time. But unlike file or storage systems, their primarygoal is not high throughput. In this section, we presentthe results of several microbenchmarks of our Java-based

10

(a) Unloaded key-value store

(b) Unloaded text editor

Figure 2: Latency of SPORC with a single client writer

command-line version, to demonstrate SPORC’s useful-ness for this class of applications.

We performed our experiments on a cluster of five com-modity machines, each with eight 2.3 GHz AMD Opteroncores and 8 GB of RAM, that were connected by gigabitswitched Ethernet. In each of our experiments, we ran asingle server instance on its own machine, along with vary-ing numbers of client instances. To scale our system tomoderate numbers of clients, in many of our experiments,we ran multiple client instances on each machine. We ranall the experiments under the OpenJDK Java VM (versionIcedTea6 1.6). For RSA signatures, however, we usedthe Network Security Services for Java (JSS) library fromthe Mozilla Project [29] because, unlike Java’s defaultcryptography library, it is implemented in native code andoffers considerably better performance.Latency. To measure SPORC’s latency, we conductedthree minute runs with between one and sixteen clients forboth key-value and text editor operations. We tested oursystem under both low-load conditions, where only one ofthe clients submitted new operations (once every 200 ms),and high-load conditions, where all of the clients werewriters. We measured latency by computing the meantime that an operation was “in flight”: from the time that itwas generated by the sender’s application-level code, untilthe time it was delivered to the recipient’s application.

(a) Loaded key-value store

(b) Loaded text editor

Figure 3: Latency of SPORC with all clients issuing writes

Under low-load conditions with only one writer, wewould expect the load on each client to remain constant asthe number of clients increases, because each additionalclient does not add to the total number of operations inflight. We would, however, expect to see server latencyincrease modestly, as the server has to send operationsto increasing numbers of clients. Indeed, as shown inFigure 2, the latency due to server processing increasedfrom under 1 ms with one client to over 3 ms with sixteenclients, while overall latency increased modestly fromapproximately 19 ms to approximately 25 ms.7

On the other hand, when every client is a writer, wewould expect the load on each client to increase with thenumber of clients. As expected, Figure 3 shows that withsixteen clients under loaded conditions, overall latencyis higher: approximately 26 ms for key-value operationsand 33 ms for the more expensive text-editor operations.The biggest contributor to this increase is client queue-ing, which is primarily the time that a client’s receivedoperations spend in its incoming queue before being pro-cessed. Queueing delay begins at around 3 ms for one

7Figure 2 also shows small increases in the latency of client pro-cessing and queuing when the number of clients was greater than four.These increases are most likely due to the fact that, when we conductedexperiments with more than four clients, we ran multiple client instancesper machine.

11

800 900

1000 1100 1200 1300 1400 1500 1600 1700

0 2 4 6 8 10 12 14 16 0

5

10

15

20

25O

pera

tions

per

sec

ond

Thr

ough

put (

MB

/s)

Payload size (KB)

MB/sops/s

Figure 4: Server throughput as a function of payload size.

client and then increases steadily until it levels off at ap-proximately 8 ms for the key-value application and 14 msfor the text editor. Despite this increase, Figure 3 demon-strates that SPORC successfully supports real-time col-laboration for moderately-sized groups, even under load.As these experiments were performed on a local-area net-work, a wide-area deployment of SPORC would see anincrease in latency that reflects the correspondingly highernetwork round-trip-time.

Figures 2 and 3 also show that client-side cryptographicoperations account for a large share of overall latency.This occurs because SPORC performs a 2048-bit RSA sig-nature on every outgoing operation and because MozillaJSS, while better than Java’s cryptography built-in library,still requires about 10 ms to compute a single signature.Using an optimized implementation of a more efficient sig-nature scheme, such as ESIGN, could improve the latencyof signatures by nearly two orders of magnitude [24].

Server throughput. We measured the server’s maximumthroughput by saturating the server with operations using100 clients. These particular clients were modified toallow them to have more than one operation in flight ata time. Figure 4 shows server throughput as a functionof payload size, measured in terms of both operationsper second and MB per second. Each data point wascomputed by performing a three minute run of the systemand then taking the median of the mean throughput ofeach one second interval. The error bars represent the 5thand 95th percentiles. The figure shows that, as expected,when payload size increases, the number of operations persecond decreases, because each operation requires moretime to process. But, at the same time, data throughput(MB/s) increases, because the processing overhead perbyte decreases.

Client time-to-join. Because our current implementa-tion lacks the checkpoints of Section 5.1, when a clientjoins the document, it must first download each individ-ual operation in the committed history. To evaluate thecost of joining an existing document, we first filled thehistory with varying numbers of operations. Then, we

0 2 4 6 8

10 12 14 16 18 20

0 2000 4000 6000 8000 10000

Cli

ent

tim

e-to

-join

tim

e (s

)

Number of committed operations

Text Editor (w/ pending)Key-Value (w/ pending)Text EditorKey-Value

Figure 5: Client time-to-join given a variable length history

measured the time it took for a new client to receive theshared decryption key and download and process all ofthe committed operations. We performed two kinds ofexperiments: one where the client started with an emptylocal state, and a second in which the client had 2000pending operations that had yet to be submitted to theserver. The purpose of the second test was to measurehow long it would take for a client that had been work-ing offline for some length of time to synchronize withthe current state of the document. Synchronization re-quires the client to transform its pending operations pastthe committed operations that the client has not seen; thus,it is more costly than joining a document with an emptylocal state. Notably, since the-fork recovery algorithmsketched in Section 5.3 relies on the same mechanism thatis used to synchronize clients that have been offline—ittreats operations after the fork as if they were pendinguncommitted operations—this test also sheds light on thecost of recovering from a fork.

Figure 5 shows time-to-join as a function of history size.Each data point represents the median of ten runs, and theerror bars correspond to the 10th and 90th percentiles. Wefind that time-to-join is linear in the number of committedoperations. It takes a client with an empty local stateapproximately one additional second to join a documentfor every additional 1000 committed operations.

In addition, the figure shows that the time-to-join witha significant number of pending operations varies greatlyby application. In the key-value application, the transfor-mation function is cheap, because it is effectively a no-opif the given operations do not affect the same keys. Asa result, the cost of transforming 2000 operations addslittle to the time-to-join. By contrast, the text editor’smore complex transformation function adds a non-trivial,although still acceptable, amount of overhead.

8 Related Work

Real-time “groupware” collaboration systems haveadapted classic distributed systems techniques for time-stamping and ordering (e.g., [4, 5, 20]), but have alsointroduced novel techniques to automatically resolve

12

conflicts between concurrent operations in an intention-preserving manner (e.g., [11, 18, 33, 38, 39, 40, 41, 42]).These techniques form the basis of SPORC’s client syn-chronization mechanism and allow it to support slow ordisconnected networks. Several systems also use OT to im-plement undo functionality (e.g., [32, 33]), and SPORC’sfork recovery algorithm draws upon these approaches.Furthermore, as an alternative to OT, Bayou [43] allowsapplications to specify conflict detection and merge pro-tocols to reconcile concurrent operations. Most of theseprotocols focus on decentralized settings and use n-wayreconciliation, but several well-known systems use a cen-tral server to simplify synchronization between clients(including Jupiter [30] and Google Wave [44]). SPORCalso uses a central server for ordering and storage, but al-lows the server to be untrusted. Secure Spread [3] presentsseveral efficient message encryption and key distributionarchitectures for such client-server group collaborationsettings. But unlike SPORC, it relies on trusted serversthat can generate keys and re-encrypt messages as needed.

Traditionally, distributed systems have defended againstpotentially malicious servers by replicating functional-ity and storage over multiple servers. Protocols, suchas Byzantine fault tolerant (BFT) replicated state ma-chines [9, 21, 48] or quorum systems [1, 26], can thenguarantee safety and liveness, provided that some fractionof these servers remain non-faulty. Modern approachesoptimize performance by, for example, concurrently exe-cuting independent operations [19], permitting client-sidespeculation [45], or supporting eventual consistency [35].BFT protocols face criticism, however, because when thenumber of correct servers falls below a certain threshold(typically two-thirds), they cannot make progress.

Subsequently, variants of fork consistency protocols(e.g., [7, 27, 31]) have addressed the question of howmuch safety one can achieve with a single untrusted server.These works demonstrate that server equivocation can al-ways be detected unless the server permanently forks theclients into groups that cannot communicate with eachother. SUNDR [24] and FAUST [8] use these fork consis-tency techniques to implement storage protocols on topof untrusted servers. Other systems, such as A2M [10]and TrInc [22], rely on trusted hardware to detect serverequivocation. BFT2F [23] combines techniques fromBFT replication and SUNDR to achieve fork* consistencywith higher fractions of faulty nodes than BFT can resist.SPORC borrows from the design of BFT2F in its use ofhash chains to limit equivocation, but unlike BFT2F orany of these other systems, SPORC allows disconnectedoperation and enables clients to recover from server equiv-ocation, not just detect it.

Like SPORC, two very recent systems, Venus [34] andDepot [25], allow clients to use a cloud resource withouthaving to trust it, and they also support some degree of

disconnected operation. Venus provides strong consis-tency in the face of a potentially malicious server, butdoes not support applications other than key-value storage.Furthermore, unlike SPORC, it requires the majority ofa “core set” of clients to be online in order to achievemost of its consistency guarantees. In addition, althoughmembers may be added dynamically to the group editingthe shared state, it does not allow access to be revoked,nor does it provide a mechanism for distributing encryp-tion keys. Depot, on the other hand, does not rely onthe availability of a “core set” of clients and supports var-ied applications. Moreover, similar to SPORC, it allowsclients to recover from malicious forks using the samemechanism that it uses to keep clients synchronized. Butrather than providing a means for reconciling conflictingoperations as SPORC does with OT, Depot relies on theapplication for conflict resolution. Because Depot treatsclients and servers identically, it can also tolerate faultyclients, in addition to faulty servers. Unlike SPORC, how-ever, Depot does not consider dynamic access control orconfidentiality.

9 Conclusion

Our original goal for SPORC was to design a generalframework for web-based group collaboration that couldleverage cloud resources, but not be beholden to themfor privacy guarantees. This goal leads to a design inwhich servers only store encrypted data, and each clientmaintains its own local copy of the shared state. But wheneach client has its own copy of the state, the system mustkeep them synchronized, and operational transformationprovides a way do to so. OT enables optimistic updatesand automatically reconciles clients’ conflicting states.

Supporting applications that need timely commits re-quires a central server. But if we do not trust the serverto preserve data privacy, we should not trust it to commitoperations correctly either. This requirement led us toemploy fork* consistency techniques to allow clients todetect server equivocation about the order of committedoperations. But beyond the benefits that each providesindependently, this work shows that OT and fork* consis-tency complement each other well. Whereas prior systemsthat enforced fork* consistency alone were only able todetect malicious forks, by combining fork* consistencywith OT, SPORC can recover from them using the samemechanism that keeps clients synchronized.

In addition to these conceptual contributions, we presenta membership management architecture that provides dy-namic access control and key distribution with an un-trusted server, even in the face of concurrency. Finally,we also demonstrate the flexibility of our design by imple-menting two applications: a causally-consistent key-valuestore and a browser-based collaborative text editor.

13

Acknowledgments. We thank Siddhartha Sen, JinyuanLi, Alma Whitten, Alexander Shraer, and Christian Cachinfor their insights. We also thank our shepherd, LidongZhou, and the anonymous reviewers for their helpful com-ments. This research was supported by funding fromGoogle and the NSF CAREER grant CNS-0953197.

References[1] M. Abd-El-Malek, G. Ganger, G. Goodson, M. Reiter, and J. Wylie.

Fault-scalable byzantine fault-tolerant services. In Proc. SOSP,Oct. 2005.

[2] B. Adida. Helios: Web-based open-audit voting. In Proc. USENIXSecurity, Aug. 2008.

[3] Y. Amir, C. Nita-rotaru, J. Stanton, and G. Tsudik. Secure spread:An integrated architecture for secure group communication. IEEETrans. Dependable and Secure Computing, 2:248–261, 2005.

[4] P. Bernstein, N. Goodman, and V. Hadzilacos. Concurrency Con-trol and Recovery in Database Systems. Addison-Wesley, 1987.

[5] K. Birman, A. Schiper, and P. Stephenson. Lightweight causaland atomic group multicast. ACM Trans. Comp. Systems, 9(3):272–314, Aug. 1991.

[6] D. Boneh, C. Gentry, and B. Waters. Collusion resistant broadcastencryption with short ciphertexts and private keys. In Advances inCryptology – CRYPTO, Aug. 2005.

[7] C. Cachin, A. Shelat, and A. Shraer. Efficient fork-linearizableaccess to untrusted shared memory. In Proc. PODC, Aug. 2007.

[8] C. Cachin, I. Keidar, and A. Shraer. Fail-aware untrusted storage.In Proc. Dependable Systems and Networks (DSN), June 2009.

[9] M. Castro and B. Liskov. Practical Byzantine fault tolerance. InProc. OSDI, Feb. 1999.

[10] B.-G. Chun, P. Maniatis, S. Shenker, and J. Kubiatowicz. Attestedappend-only memory: Making adversaries stick to their word. InProc. SOSP, Oct. 2007.

[11] C. Ellis and S. Gibbs. Concurrency control in groupware systems.ACM SIGMOD Record, 18(2):399–407, 1989.

[12] Google. Google Web Toolkit (GWT). http://code.google.com/webtoolkit/, 2010.

[13] Google. EtherPad. http://etherpad.com/, 2010.[14] Google. Google Docs. http://docs.google.com/, 2010.[15] Google. Government requests directed to Google and YouTube.

http://www.google.com/governmentrequests/,2010.

[16] Google. Google Wave federation protocol. http://code.google.com/p/wave-protocol/, 2010.

[17] M. Handley and J. Crowcroft. Network text editor (NTE): Ascalable shared text editor for MBone. In Proc. SIGCOMM, Oct.1997.

[18] A. Karsenty and M. Beaudouin-Lafon. An algorithm for distributedgroupware applications. In Proc. ICDCS, May 1993.

[19] R. Kotla and M. Dahlin. High-throughput Byzantine fault tolerance.In Proc. Dependable Systems and Networks (DSN), June 2004.

[20] L. Lamport. Time, clocks, and the ordering of events in a dis-tributed system. Comm. ACM, 21(7):558–565, 1978.

[21] L. Lamport, R. Shostak, and M. Pease. The Byzantine generalsproblem. ACM Trans. Programming Language Systems, 4(3),1982.

[22] D. Levin, J. R. Douceur, J. R. Lorch, and T. Moscibroda. TrInc:Small trusted hardware for large distributed systems. In Proc.NSDI, Apr. 2009.

[23] J. Li and D. Mazières. Beyond one-third faulty replicas in Byzan-tine fault tolerant systems. In Proc. NSDI, Apr. 2007.

[24] J. Li, M. N. Krohn, D. Mazières, and D. Shasha. Secure untrusteddata repository (SUNDR). In Proc. OSDI, Dec. 2004.

[25] P. Mahajan, S. Setty, S. Lee, A. Clement, L. Alvisi, M. Dahlin, andM. Walfish. Depot: Cloud storage with minimal trust. In Proc.OSDI, Oct. 2010.

[26] D. Malkhi and M. Reiter. Byzantine quorum systems. In Proc.STOC, May 1997.

[27] D. Mazières and D. Shasha. Building secure file systems out ofbyzantine storage. In Proc. PODC, July 2002.

[28] Mozilla Project. LiveConnect. https://developer.mozilla.org/en/LiveConnect, 2010.

[29] Mozilla Project. Network security services for Java (JSS). https://developer.mozilla.org/En/JSS, 2010.

[30] D. A. Nichols, P. Curtis, M. Dixon, and J. Lamping. High-latency,low-bandwidth windowing in the Jupiter collaboration system. InProc. UIST, Nov. 1995.

[31] A. Oprea and M. K. Reiter. On consistency of encrypted files. InProc. Symposium on Distributed Computing (DISC), Sept. 2006.

[32] A. Prakash and M. Knister. A framework for undoing actions incollaborative systems. ACM Trans. Computer-Human Interaction,4(1):295–330, Dec. 1994.

[33] M. Ressel, D. Nitsche-Ruhland, and R. Gunzenhäuser. An integrat-ing, transformation-oriented approach to concurrency control andundo in group editors. In Proc. CSCW, Nov. 1996.

[34] A. Shraer, C. Cachin, A. Cidon, I. Keidar, Y. Michalevsky, andD. Shaket. Venus: Verification for untrusted cloud storage. In Proc.ACM CCSW, Oct. 2010.

[35] A. Singh, P. Fonseca, P. Kuznetsov, R. Rodrigues, and P. Maniatis.Zeno: eventually consistent byzantine-fault tolerance. In Proc.NSDI, Apr. 2009.

[36] M. Stonebraker. The case for shared nothing. IEEE DatabaseEngineering Bulletin, 9(1):4–9, 1986.

[37] S. Strohschein. GWTEventService. http://code.google.com/p/gwteventservice/, 2010.

[38] M. Suleiman, M. Cart, and J. Ferrié. Serialization of concurrentoperations in distributed collaborative environment. In Proc. Conf.Supporting Group Work (GROUP), Nov. 1997.

[39] C. Sun and C. Ellis. Operational transformation in real-time groupeditors: issues, algorithms, and achievements. In Proc. CSCW,Nov. 1998.

[40] C. Sun, X. Jia, Y. Yang, and Y. Zhang. A generic operation transfor-mation schema for consistency maintenance in realtime cooperativeediting systems. In Proc. Conf. Supporting Group Work (GROUP),Nov. 1997.

[41] C. Sun, X. Jia, Y. Zhang, Y. Yang, and D. Chen. Achievingconvergence, causality preservation, and intention preservationin real-time cooperative editing systems. ACM Trans. Computer-Human Interaction, 5(1):64–108, 1998.

[42] D. Sun, S. Xia, C. Sun, and D. Chen. Operational transformationfor collaborative word processing. In Proc. CSCW, Nov. 2004.

[43] D. Terry, M. Theimer, K. Petersen, A. Demers, M. Spreitzer, andC. Hauser. Managing update conflicts in Bayou, a weakly con-nected replicated storage system. In Proc. SOSP, Dec. 1995.

[44] D. Wang and A. Mah. Google wave operational transforma-tion. http://www.waveprotocol.org/whitepapers/operational-transform, Apr. 2010.

[45] B. Wester, J. Cowling, E. B. Nightingale, P. M. Chen, J. Flinn, andB. Liskov. Tolerating latency in replicated state machines throughclient speculation. In Proc. NSDI, Apr. 2009.

[46] D. Wheeler. SLOCCount. http://www.dwheeler.com/sloccount/, 2010.

[47] T. D. Wu. The secure remote password protocol. In Proc. NDSS,Mar. 1998.

[48] J. Yin, J.-P. Martin, A. Venkataramani, L. Alvisi, and M. Dahlin.Separating agreement from execution for Byzantine fault tolerantservices. In Proc. SOSP, Oct. 2003.

14

http://code.google.com/webtoolkit/http://code.google.com/webtoolkit/http://etherpad.com/http://docs.google.com/http://www.google.com/governmentrequests/http://code.google.com/p/wave-protocol/http://code.google.com/p/wave-protocol/https://developer.mozilla.org/en/LiveConnecthttps://developer.mozilla.org/en/LiveConnecthttps://developer.mozilla.org/En/JSShttps://developer.mozilla.org/En/JSShttp://code.google.com/p/gwteventservice/http://code.google.com/p/gwteventservice/http://www.waveprotocol.org/whitepapers/operational-transformhttp://www.waveprotocol.org/whitepapers/operational-transformhttp://www.dwheeler.com/sloccount/http://www.dwheeler.com/sloccount/

IntroductionSystem ModelGoalsOperational TransformationFork* ConsistencyThe Benefits of Having a ServerDeployment and Threat Model

System DesignSystem OverviewOperationsThe Server's Limited RoleSequence Numbers and Hash ChainsResolving Conflicts with OT

Membership ManagementEncrypting Document OperationsBarrier Operations

ExtensionsCheckpointsChecking for Forks Out-of-BandRecovering from a Fork

ImplementationVariantsBuilding SPORC Applications

Experimental EvaluationRelated WorkConclusion

SPORC: Group Collaboration using Untrusted Cloud Resourcesmfreed/docs/sporc-osdi10.pdf · Ariel J. Feldman, William P. Zeller, Michael J. Freedman, and Edward W. Felten Princeton

Documents