Carnegie Mello December 2005 SRS Principal Investigator Meeting Increasing Intrusion Tolerance Via Scalable Redundancy Mike Reiter [email protected] Natassa Ailamaki Greg Ganger Priya Narasimhan Chuck Cranor
Dec 16, 2015
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Increasing Intrusion Tolerance Via Scalable Redundancy
Mike Reiter
Natassa Ailamaki Greg Ganger Priya Narasimhan Chuck Cranor
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Technical Objective To design, prototype and evaluate new protocols for
implementing intrusion-tolerant services that scale better Here, “scale” refers to efficiency as number of servers and number of
failures tolerated grows
Targeting three types of services Read-write data objects Custom “flat” object types for particular applications, notably
directories for implementing an intrusion-tolerant file system Arbitrary objects that support object nesting
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
The Problem Space Distributed services manage redundant state across servers to
tolerate faults We consider tolerance to Byzantine faults, as might result from an
intrusion into a server or client A faulty server or client may behave arbitrarily
We also make no timing assumptions in this work An “asynchronous” system
Primary existing practice: replicated state machines Offers no load dispersion, requires data replication, and degrades as
system scales in terms of # messages When appropriate, we use Castro & Liskov’s BFT system to compare
against
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
This Talk in Context January 2005 PI meeting
Focused on the basic R/W protocol
July 2005 PI meeting Focused on the Q/U protocol for implementing arbitrary “flat” objects
This meeting Discuss “lazy verification” extensions to R/W protocol Discuss nested objects protocol
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Response time under load isfault-scalable
Highlights: Read/Write Response Time 10 clients and up to
26 storage-nodes 2.8 GHz Pentium IV
machines Used as storage-
nodes and clients 10 Clients, each with
2 reqs outstanding Mixed workload
equal parts reads and writes
4 KB data-item size
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Highlights: The Q/U Protocol
Working size of experiments fit in server memory
Tests run for 30 seconds Measurements taken in middle 10
Cluster of Pentium 4 2.8 GHz, 1GB RAM
1 Gb switched Ethernet No background traffic
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Read/Write Failure Scenarios Two types of failures
Incomplete writes Client writes data to subset of servers
Poisonous writes Client writes data inconsistently to servers
– Subsequent reader observes different values depending on which subset of servers they interact with
Replicated data: easy to handle (via hashes) Erasure-coded data: more difficult to handle
Protocols must verify writes to protect against incomplete and poisonous writes.
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
The Nature of Write Operations… Insight for protocol design
1) Single data version forces write-time verification Versioning servers remove destructive nature of writes
2) Obsolescent writes common in storage systems Read-time verification avoids unnecessary verifications
3) Low concurrency in most workloads Optimistic concurrency control
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Original Read/Write Protocol Use versioning servers
Frees servers from verifying every write at write-time Read-time verification performed by clients
Better scalability Avoids verification for obsolescent writes
Client read earlier versions in case of incomplete/poisonous writes
Optimism premised on low faults/concurrency Support erasure codes, Byzantine-tolerant, async Linearizable read/write ops on blocks
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Example Write and ReadT
ime
Completes write 2
Return version 2
Write 1
Write 2
Read
Completes write 1
2 22 2 21 111 1
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Tolerating Client CrashesT
ime
Different versions
Write version 2 to rest of servers
Write 2
Read
Repair
Client crashes…
2 2 22 21 111 1
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Erasure CodingReed-Solomon / information dispersal [Rabin89]
Each fragment object size
Total amount of data written: m
1
m
n
n fragments(160KB)
Object(64KB)
m fragments(64KB)
Object(64KB)
Example: 2-of-5 erasure coding
Encode DecodeWrite Read
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Tolerating Byzantine Servers: Cross Checksum
Cross checksum for 2-of-3 erasure coding
Hash eachfragment
Generatefragments
Concatenateto form cross
checksum
Append toeach
fragment
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
1
=≠
0
Tolerating Byzantine Clients
Generateparity of {1,0}
“Poisonous” 2-of-3 erasure coding of {1,0}
1
0
Value read depends on theset of fragments decoded
{1,0} {1,1} {0,0}
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Validating Timestamps
Embed cross checksum in timestamp
Each server validates its write fragment
Client validates cross checksum on read
Logicaltimestamp
Generatefragment
Generatecross
checksum
Readfragments
Validatecross
checksum
≠
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
How Do You Get Rid of Old Versions? Two more pieces to complete the picture
Garbage collection (GC) Unbounded number of incomplete/poisonous writes
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Lazy Verification Overview
Servers can perform verification, lazily, in idle time Shifts verification cost out of read/write critical path Allows servers to perform GC
Per-client, per-block limits on unverified writes Limits number of incomplete/poisonous writes
Maintains good R/W properties Optimism Verification-elimination for obsolescent writes
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Periodically, every server… Scans through all blocks…
Performs a read (acting like a normal client)
– Discover latest complete write timestamp (LCWTS)
– Reconstruct block to check for poisonous writes Deletes all versions prior to LCWTS
Basic Garbage Collection
Verification
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Limiting Unverified Writes Admin can set limits on # of unverified writes
Per-client, per-block, and per-client-per-block
Limit = 1 write-time verification Limit = ∞ read-time verification
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Scheduling
Background In idle time (hence the lazy in lazy verification)
On-demand Verification limits reached Low free space in history pool (cache or disk)
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Block Selection If verification invoked due to exceeded limit
No choice; verify that (clients’) block Else
Verify block with most versions Maximize the verification cost amortization
Prefer to verify blocks in cache No unnecessary disk write No read to start verification No cleaning of on-disk version structures
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Server Cooperation Simple: every server independently verifies every block ~n2 messages
Read requestRead reply
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Server Cooperation (con’t) Cooperation: b+1 servers perform verification, share result ~b·n messages
b = 1Verification hint
Read requestRead reply
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Experimental Setup 2.8 GHz Pentium 4 machines
Used as servers and clients
1 Gb switched Ethernet, no background traffic In-cache only (to evaluate protocol cost) 16KB blocks Vary number of server Byzantine failures (b)
n = 4b + 1
(b+1)-of-n encoding Maximal storage- and network-efficiency
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Response Time Experiment
1 client, 1 outstanding request Vary b from 1 to 5, to investigate changes in response times as we
tolerate more server failures Alternate between reads and writes Idle time: 10ms between operations
Allows verification to occur in background
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Write Response Time
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Write Response Time
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Write Response Time
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Read Response Time
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Read Response Time
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Read Response Time
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Write Throughput b = 1 N = 5 4 clients, 8 outstanding requests each
No idle time Server working set: 4096 blocks (64MB) 100% writes In-cache only
Full history pool triggers lazy verification Vary the server history pool size to see effect of
delaying verification
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Write Throughput (con’t)
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Write Throughput (con’t)
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Write Throughput (con’t)
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Write Throughput (con’t)
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Nested Objects Goal: support nested method invocations among Byzantine fault-
tolerant, replicated objects that are accessed via quorum systems Semantics and programmer interface modeled after Java Remote
Method Invocation (RMI) [http://java.sun.com/products/jdk/rmi/]
Distributed objects can be Passed as parameters to method calls on other distributed objects Returned from method calls on other distributed objects
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Java Remote Method Invocation (RMI) Standard Java mechanism to
invoke methods on objects in other JVMs
Local interactions are with a handle that implements interfaces of remote object
invocation
response
handle remoteobject
local client remote server
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
RMI: Nested Method Invocations
Handles can be passed as parameters into method invocations on other remote objects A method invocation on one remote object could result in a method invocation on
other remote objects
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
RMI: Handle Returned
Handles can be returned from method invocations on other remote objects
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Replicated Objects Replicas behave as a single logical object
Can withstand the Byzantine (arbitrary) failure of up to b servers
Scales linearly with number of servers
1
2
A B C D
handle
replicas
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Quorum Systems Given a universe of n servers Quorum system is a set of subsets
(quorums) of the universe, every pair of which intersect
Scales well as a function of n, as quorum size can be significantly smaller than n
Ex: Grid with n=1441 quorum = 1 row + 1 column
q1
q1
q2
q2
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Byzantine Quorum Systems
Extend quorum systems to withstand the Byzantine failure of up to b servers
Every pair of quorums intersect in >= 2b+1 servers (>= b+1 correct servers)
A new quorum must be selected if a response is not received from every server in a quorum
Ex: Grid with n=144, b=31 quorum = 2 rows + 2 columns
q1
q1
q2
q2
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Byzantine Quorum-Replicated Objects
Method invocations sent to a quorum
>= b+1 identical responses must be correct
1
2
A B C D
handle
replicas
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Nested Method Invocations
1
2
3
A B C D
Handles can be passed as parameters into method invocations on other distributed objects
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Handle Returned
1
2
3
A B C D
Handles can be returned from method invocations on other distributed objects
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Necessity of Authorization
1
2
3
A B C D
Faulty replicas can invoke unauthorized methods
Correct replicas might perform duplicate invocations
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Authorization Framework Requirements Method invocation authority can be delegated
Explicitly to other clients Implicitly to other distributed objects
Handle passed as a parameter to a method invocation on a second object
Handle returned to a method invocation from a second object Support arbitrary nesting depths
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Authorization Framework
1
2
3
A B C D
1 2 3
says (b+1 of
{ , , , }
speak for )
1 2 3 4
( , ) = private/public key pair
i= private key for i
= certificate:
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Operation Ordering Protocol Worst-case 4-round protocol
Get Suggest Propose Commit
Extends protocol previously used in Fleet [Chockler et al. 2001]
Operations are applied in batches, increasing throughput
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Operation Ordering Protocol - Client Side Fundamental challenge is the absence of a single trusted client
A trusted client could order all operations
Instead, a single untrusted client replica drives the protocol Driving client:
Acts as a point of centralization to distribute authenticated server messages
Makes no protocol decisions Is unable to cause correct servers to take conflicting actions Can be unilaterally replaced by another client replica when necessary
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Experimental Setup Implemented object nesting as an extension of Fleet Pentium 4 2.8GHz processors 1000Mbps Ethernet (TCP, not multicast) Linux 2.4.27 Java HotSpot™ Server VM 1.5.0 Native Crypto++ Library for key generation, signing, and
verification [http://www.cryptopp.com/]
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Latency for Non-Nested Invocation
0
500
1000
1500
2000
2500
4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55
Quorum size
Lat
ency
(m
s) ECDSA
DSA
RSA
HMAC
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
A Real Byzantine Fault
Carnegie Mellon
December 2005 SRS Principal Investigator Meeting
Impediments to Dramatic Increases Impossibility results
Load dispersion across quorums Round complexity of protocols
Strong consistency conditions Weakening consistency is one place to look for big improvements