Byzantine fault isolation \'biz-ən- tēn folt ī-sə-'lā- shən\ n (2006) : methodology for designing a distributed system that can, under Byzantine failure, operate with application-defined partial correctness
' '
˙
'
Farsite \'fär-sīt\ n (2000) : serverless distributed file system developed at Microsoft Research, designed to be scalable, strongly consistent, and secure despite running on an untrusted infrastructure of desktop PCs
Definitions
Byzantine fault \'biz-ən- tēn folt\ n (1982) : a failure of a system component that produces arbitrary behavior
'
˙
'
BFI \ bē-ef-'ī\ n (2006) : Byzantine fault isolation
'
Talk Outline
• Context – Farsite system
• Why BFT doesn’t scale
• Farsite’s use of multiple BFT groups
• The need for isolating Byzantine faults
• Formal system specification
• BFI in Farsite
users BFT groupclients
•Using Byzantineagreement protocol,assign sequencenumbers to messages
•Prepare-commitamong 2 T + 1 servers
T = tolerable faults
R = count of replicas
R > 3 T
•Deterministicallyupdate metadata
•Reply to client
Farsite System – Metadata
0
1
2
3
4
5
6
7
1 2 3 4 5 6 7
machine count
thro
ug
hp
ut
mu
ltip
le
ideal typical flat BFT
Throughput vs. Scale
Tree of BFT Groups
/
users
cruftemacs
viOutlook
public
Alice Bob
docscode
C++ C#
foo bar
Proj X
src bin src bin
Delegation to New Group
/
users
cruftemacs
viOutlook
public
Alice Bob
docscode
C++ C#
foo bar
Proj X
src bin src bin
Pathname Resolution
/
users
cruftemacs
viOutlook
public
Alice Bob
docscode
C++ C#
foo bar
Proj X
src bin src bin
/users/Alice/code/C#/bar
Quantitative Fault Analysis
• Example system– File system distributed among interacting BFT groups
• Simplifying assumptions– Files are partitioned evenly among BFT groups– Machine failures are independent
• Machine fault probability = 0.001• Evaluate: operational fault rate
– Probability that an operation on a randomly selected file exhibits a fault
Operational Faults vs. System Scale
1 10 100 1,000 10,000 100,000
system scale (count of BFT groups)
op
erat
ion
al f
ault
rat
e
BFT 4, no BFI BFT 7, no BFI BFT 10, no BFI
BFT 4, ideal BFI BFT 4, tree (4) BFI BFT 4, tree (16) BFI
10 –1
10 0
10 –2
10 –3
10 –4
10 –5
10 –6
10 –7
610 –6
0.45
610 –6
310 –5
BFI versus no BFI
computation
throughput reduction:
messages
4
32
10
60%
200
84%
4-member BFT groupswith BFI
10-member BFT groupswithout BFI
refinement
BFI via Formal Specification
state
actions
state
semanticspec
distributedsystemspec
actions+ faults
ment + faults
Impro
ved!NEW
C++ emacs
tools
src
a.h a.cpp a.exe
Farsite Semantic Spec
cl.exe
open handles pending operations
openread
move
/
code
bin
a.obj
Farsite Refinement
del
C++ emacs
tools
src
a.h a.cpp a.execl.exe
open handles pending operations
read
move
/
code
bin
a.obj
Refinement with Byzantine Faults
del
C++ emacs
tools
src
a.h a.cpp a.execl.exe
open handles pending operations
read
move
code
bin
a.obj
/
Refinement with Byzantine Faults
del
C++ emacs
tools
src
a.h a.cpp a.execl.exe
open handles pending operations
read
move
/
code
bin
a.obj
emacs src
a.h a.cpp a.exe
bin
a.obj
codeHelloworld
,,)*&#()*&{ 1[9^^x **{ o [[ …. 2 %%% @@)
,. ,. {^ \-~-/ ^} " " ,". { <o> _ <o> } / } ==_ .:Y:. _=={ { _/ `--^--' \_} } / \ / \ /{ ( ) y \ ! | | ! / ,-.i~ ~i i~ ~i,-.(!!( V )!!) ^-'-'-^-'-'-^
• Safety– A tainted file may have arbitrary contents and attributes– A tainted file may appear not linked into namespace– A tainted file may pretend not to have children it actually has– A tainted file may pretend to have children that do not exist– A tainted file may pretend another tainted file is a child or parent
• Liveness– Operations involving a tainted file may not complete
Semantic Fault Specification
C++
tools
cl.exe
/
A tainted file may have arbitrary contents and attributesA tainted file may appear not linked into namespaceA tainted file may pretend not to have children it actually hasA tainted file may pretend to have children that do not existA tainted file may pretend another tainted file is a child or parent
Operations involving a tainted file may not complete
foo bar
• Maintain redundant info across BFT group boundaries
• Augment messages with info that justifies correctness
• Ensure unambiguous chains of authority over data
• Carefully order messages and state updates for operations involving multiple BFT groups
Distributed-System ImprovementsMaintain redundant info across BFT group boundaries
Augment messages with info that justifies correctness
Ensure unambiguous chains of authority over data
Carefully order messages and state updates foroperations involving multiple BFT groups
Summary of BFI Methodology
• Formally specify your system– Semantic spec: user’s view of system– Distributed-system spec: designer’s view of system– Refinement interprets distributed-system spec in
semantic terms• Modify distributed-system spec to express
Byzantine faults• Simultaneously
– Strategically weaken semantic spec to describe faults– Improve distributed-system spec to quarantine faults
• Refinement lets you know when you are done
Conclusions
• BFT groups have negative throughput scaling• Scalable systems can be built from multiple BFT groups• System scale increases the probability of non-maskable
Byzantine faults• If faults are not isolated, a single faulty group can corrupt
the entire system.• BFI is a methodology for isolating Byzantine faults• BFI uses formal system specification• Improves fault tolerance without hurting throughput,
unlike increasing BFT group size