Top Banner
Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell
37

Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

Dec 18, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

Byzantine Fault Isolation in the Farsite Distributed File System

John R. Douceur and Jon Howell

Page 2: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

Byzantine fault isolation \'biz-ən- tēn folt ī-sə-'lā- shən\ n (2006) : methodology for designing a distributed system that can, under Byzantine failure, operate with application-defined partial correctness

' '

˙

'

Farsite \'fär-sīt\ n (2000) : serverless distributed file system developed at Microsoft Research, designed to be scalable, strongly consistent, and secure despite running on an untrusted infrastructure of desktop PCs

Definitions

Byzantine fault \'biz-ən- tēn folt\ n (1982) : a failure of a system component that produces arbitrary behavior

'

˙

'

BFI \ bē-ef-'ī\ n (2006) : Byzantine fault isolation

'

Page 3: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

Talk Outline

• Context – Farsite system

• Why BFT doesn’t scale

• Farsite’s use of multiple BFT groups

• The need for isolating Byzantine faults

• Formal system specification

• BFI in Farsite

Page 4: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

Farsite System

client

server

client

server

server

Page 5: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

Farsite System

users BFT group

metadata

clients

– Metadata

Page 6: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

users BFT groupclients

•Using Byzantineagreement protocol,assign sequencenumbers to messages

•Prepare-commitamong 2 T + 1 servers

T = tolerable faults

R = count of replicas

R > 3 T

•Deterministicallyupdate metadata

•Reply to client

Farsite System – Metadata

Page 7: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

The Cost of BFT Groups

computation

messages

message delays

1

2

2

4

32

5

Page 8: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

0

1

2

3

4

5

6

7

1 2 3 4 5 6 7

machine count

thro

ug

hp

ut

mu

ltip

le

ideal typical flat BFT

Throughput vs. Scale

Page 9: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

Workload Sharing

Workload

client server

Page 10: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

BFT at Scale

Page 11: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

Multiple BFT Groups

Page 12: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

Tree of BFT Groups

Page 13: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

Tree of BFT Groups

/

users

cruftemacs

viOutlook

public

Alice Bob

docscode

C++ C#

foo bar

Proj X

src bin src bin

Page 14: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

Delegation to New Group

/

users

cruftemacs

viOutlook

public

Alice Bob

docscode

C++ C#

foo bar

Proj X

src bin src bin

Page 15: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

Pathname Resolution

/

users

cruftemacs

viOutlook

public

Alice Bob

docscode

C++ C#

foo bar

Proj X

src bin src bin

/users/Alice/code/C#/bar

Page 16: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

Machine Failures at Scale

Page 17: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

Group Failures at Scale

Page 18: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

System Failure at Scale

Page 19: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

Quantitative Fault Analysis

• Example system– File system distributed among interacting BFT groups

• Simplifying assumptions– Files are partitioned evenly among BFT groups– Machine failures are independent

• Machine fault probability = 0.001• Evaluate: operational fault rate

– Probability that an operation on a randomly selected file exhibits a fault

Page 20: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

Operational Faults vs. System Scale

1 10 100 1,000 10,000 100,000

system scale (count of BFT groups)

op

erat

ion

al f

ault

rat

e

BFT 4, no BFI BFT 7, no BFI BFT 10, no BFI

BFT 4, ideal BFI BFT 4, tree (4) BFI BFT 4, tree (16) BFI

10 –1

10 0

10 –2

10 –3

10 –4

10 –5

10 –6

10 –7

610 –6

0.45

610 –6

310 –5

Page 21: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

BFI versus no BFI

Page 22: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

BFI versus no BFI

computation

throughput reduction:

messages

4

32

10

60%

200

84%

4-member BFT groupswith BFI

10-member BFT groupswithout BFI

Page 23: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

refinement

BFI via Formal Specification

state

actions

state

semanticspec

distributedsystemspec

actions+ faults

ment + faults

Impro

ved!NEW

Page 24: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

C++ emacs

tools

src

a.h a.cpp a.exe

Farsite Semantic Spec

cl.exe

open handles pending operations

openread

move

/

code

bin

a.obj

Page 25: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

Farsite Distributed-System Spec

Page 26: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

Farsite Refinement

del

C++ emacs

tools

src

a.h a.cpp a.execl.exe

open handles pending operations

read

move

/

code

bin

a.obj

Page 27: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

Actions are State Transitions/

openhandles

pendingoperations

a.cpp

Page 28: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

Proving Refinement Inductively/

openhandles

pendingoperations

a.cpp

Page 29: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

Refinement with Byzantine Faults

del

C++ emacs

tools

src

a.h a.cpp a.execl.exe

open handles pending operations

read

move

code

bin

a.obj

/

Page 30: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

Refinement with Byzantine Faults

del

C++ emacs

tools

src

a.h a.cpp a.execl.exe

open handles pending operations

read

move

/

code

bin

a.obj

Page 31: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

emacs src

a.h a.cpp a.exe

bin

a.obj

codeHelloworld

,,)*&#()*&{ 1[9^^x **{ o [[ …. 2 %%% @@)

,. ,. {^ \-~-/ ^} " " ,". { <o> _ <o> } / } ==_ .:Y:. _=={ { _/ `--^--' \_} } / \ / \ /{ ( ) y \ ! | | ! / ,-.i~ ~i i~ ~i,-.(!!( V )!!) ^-'-'-^-'-'-^

• Safety– A tainted file may have arbitrary contents and attributes– A tainted file may appear not linked into namespace– A tainted file may pretend not to have children it actually has– A tainted file may pretend to have children that do not exist– A tainted file may pretend another tainted file is a child or parent

• Liveness– Operations involving a tainted file may not complete

Semantic Fault Specification

C++

tools

cl.exe

/

A tainted file may have arbitrary contents and attributesA tainted file may appear not linked into namespaceA tainted file may pretend not to have children it actually hasA tainted file may pretend to have children that do not existA tainted file may pretend another tainted file is a child or parent

Operations involving a tainted file may not complete

foo bar

Page 32: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

• Maintain redundant info across BFT group boundaries

• Augment messages with info that justifies correctness

• Ensure unambiguous chains of authority over data

• Carefully order messages and state updates for operations involving multiple BFT groups

Distributed-System ImprovementsMaintain redundant info across BFT group boundaries

Augment messages with info that justifies correctness

Ensure unambiguous chains of authority over data

Carefully order messages and state updates foroperations involving multiple BFT groups

Page 33: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

Summary of BFI Methodology

• Formally specify your system– Semantic spec: user’s view of system– Distributed-system spec: designer’s view of system– Refinement interprets distributed-system spec in

semantic terms• Modify distributed-system spec to express

Byzantine faults• Simultaneously

– Strategically weaken semantic spec to describe faults– Improve distributed-system spec to quarantine faults

• Refinement lets you know when you are done

Page 34: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

Conclusions

• BFT groups have negative throughput scaling• Scalable systems can be built from multiple BFT groups• System scale increases the probability of non-maskable

Byzantine faults• If faults are not isolated, a single faulty group can corrupt

the entire system.• BFI is a methodology for isolating Byzantine faults• BFI uses formal system specification• Improves fault tolerance without hurting throughput,

unlike increasing BFT group size

Page 35: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

Contact Information

[email protected]

[email protected]

http://research.microsoft.com/farsite

Page 36: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

Backup Slides

Page 37: Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell.

• Semantic specification– 1800 lines of TLA+– 114 definitions

• Distributed-system specification– 11,500 lines of TLA+– 775 definitions

• Why so big?– Windows file-system semantics are complex– Scalability and strong consistency– Byzantine fault isolation

Farsite Spec Stats