Top Banner
October 4, 2011 Recon: Verifying File System Consistency at Runtime Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, Shaun Benjamin, Angela Demke Brown and Ashvin Goel University of Toronto
25

Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always

Oct 08, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always

October 4, 2011

Recon: Verifying File System

Consistency at Runtime

Daniel Fryer, Jack (Kuei) Sun,

Rahat Mahmood, TingHao Cheng, Shaun Benjamin, Angela Demke Brown and Ashvin Goel

University of Toronto

Page 2: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always

Metadata Integrity is Crucial

You don’t know what

you’ve got ’til it’s gone…

2

D D a

D D D

D D t

D D a

Kernel

Block Layer

M M M

Storage

File System

Page 3: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always

File Systems Have Bugs

Why can’t existing solutions handle this problem?

3

Bugs in Linux Ext3 File System Closed

panic/ext3 fs corruption with RHEL4-U6-re20070927.0 2007-11

Re: [2.6.27] filesystem (ext3) corruption (access beyond end) 2008-06

linux-2.6: ext3 filesystem corruption 2008-09

linux-image-2.6.29-2-amd64: occasional ext3 filesystem

corruption

2009-06

ENOSPC during fsstress leads to filesystem corruption on ext2,

ext3, and ext4

2010-03

ext3: Fix fs corruption when make_indexed_dir() fails 2011-06

Data corruption: resume from hibernate always ends up with

EXT3 fs errors

Not yet

Page 4: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always

“Solutions”

4

None of these protect against bugs in file systems

Existing approaches assume file systems are correct

Kernel

Block Layer

Storage

File System

RAID?

Checksums? Journals?

Page 5: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always

Offline Checking

• Check consistency offline, e.g., fsck

• Consistency properties necessary for correctness

5

FS1: No double

allocation FS2: Refcount-based

sharing

D D

M M

D Ref: 2

M M metadata

data

Page 6: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always

Problems with Offline Checking

• Slow, getting slower with larger disks

• Requires taking file system offline

• After the fact, repair is error prone

6

M M

D

metadata

data

Page 7: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always

Outline

• Problem

• Metadata can be corrupted by bugs and existing

techniques are inadequate

• Our Solution: Recon

• a system for protecting metadata from bugs

• Key idea

• Runtime consistency checking

• Design

• Evaluation

7

Page 8: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always

Runtime Consistency Checking

• Ensure every update results in a consistent file

system

• Makes repair unnecessary!

• “What happens in DRAM stays in DRAM”

BUT

• Consistency properties are global

• Global properties require full scan

• We can’t run fsck at every write

8

Page 9: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always

Consistency Invariants

• We transform global consistency properties to

fast, local consistency invariants

• Assume initial consistent state

• New file system is clean

• Use checksums/redundancy to handle errors below FS

• At runtime, check only what is changing

• Do so before changes become persistent

• Resulting new state is consistent

9

Page 10: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always

size

Example: Block Allocation in Ext3

• Ext3 maintains a block bitmap – every allocated

block is marked in the bitmap

10

Block Bitmap

5 6 7 8 9

Block 7

inode

time

7

Block 8

Updated Block 8 8 U

pdate

d B

lock

Page 11: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always

Example: Block Allocation in Ext3

• Consistency Invariant

• Invariant fails if either update is missing

• Should not mark allocated without setting block pointer

• Should not set block pointer without marking allocated

• Can any consistency property be transformed?

• File systems should maintain consistency efficiently

11

Bitmap bit X flip

from “0” to “1”

Block pointer

set to X

Page 12: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always

When to Check Invariants

• Invariants involve changes to multiple blocks

• When should they be consistent?

• Transactions are used for crash consistency

• Consistency can be checked at transaction

boundaries

12

Transaction

Must check transaction

just before commit block

reaches disk

Memory

Disk

Page 13: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always

Outline

• Problem

• Metadata corruption cause by bugs

• Solution

• Recon

• Key idea

• Runtime checking

• Design

• Metadata interpretation

• Logical change generation

• Evaluation

13

Page 14: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always

The Recon Design

14

Recon

File System

Ye Olde Disk

Block Layer

Metadata

Write Cache

Metadata

Read Cache

Ext3_Recon

Btrfs_Recon

FS Recon Interface

Metadata interpretation

Logical change generation

Page 15: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always

Metadata Interpretation

• To check invariants, we need to determine the

type of a block on a read or write

• Take advantage of tree structure of metadata

• Superblock is the root of the tree

• Parents are read before children

• For example, inode is read before indirect blocks

• We see the pointer to the block before the block, and

• The pointer within the parent determines the type of

the child block

15

Page 16: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always

Logical Change Generation

• Invariants are expressed in terms of logical

changes to structures, e.g., bitmaps, pointers

• Recon generates these changes based on

• Block types

• Comparing the blocks in the write and read cache

• Logical changes to metadata structures are

represented as a set of change records:

16

Bitmap bit X flip

from “0” to “1”

Block pointer

set to X

[type, id, field, old, new]

Page 17: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always

Checking with Change Records

17

type id field oldval newval

inode 12 blockptr[1] 0 501

inode 12 i_size 4096 8192

inode 12 i_blocks 8 16

Bitmap 501 -- 0 1

BGD 0 free_blocks 1500 1499

Transaction appends a new block to inode 12

Bitmap bit X flip

from “0” to “1”

Block pointer

set to X

Page 18: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always

Outline

• Problem

• Metadata corruption cause by bugs

• Solution

• Recon

• Key idea

• Runtime checking

• Design

• Evaluation

• Complexity

• Corruption detection

• Performance overhead

18

Page 19: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always

Complexity

• Much simpler than FS code

• Only need to verify result of file system operations

• Each invariant can be checked independently

• Code divided into three sections

• Generic Recon framework: 1.5 kLOC

• Ext3 metadata interpretation: 1.5kLOC

• 31 Ext3 invariants: 800 LOC

19

Page 20: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always

Corruption Detection

20

31

79

52 59 112 17 72 352

2

2

1

4

25 8 23

31

0%

100%

Corr

upti

ons

C

aught

Detected by both e2fsck only Recon only

inode (stat)

inode (blk ptr)

inode (others)

dir

bgd

bbm

ibm

random

Recon matches e2fsck

Page 21: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always

Performance Evaluation

• Used Linux port of Sun’s FileBench

• Used 5 different emulated workloads

• webserver, webproxy, varmail, fileserver, ms_nfs

• ms_nfs configured to match metadata

characteristics from Microsoft study (FAST’11)

• 3 GHz dual core Xeon CPUs, 2 GB RAM

• 1 TB ext3 file system

21

Page 22: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always

Performance Evaluation

22

webserver webproxy varmail fileserver ms_nfs

Cache Size = 128MB

For reasonable cache sizes, performance impact is modest

Page 23: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always

Handling Violations

Several options

• Prevent all writes, remount read-only

• Preserves correctness

• Reduces availability

• Take snapshot of filesystem and continue

• Minimal availability impact, snapshot is correct

• Requires repair afterwards

• Micro-reboot file system or kernel

• Transparent to applications

• Overcomes transient failures

23

Page 24: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always

Conclusion

• All consistency properties of fsck can be

enforced on updates without full disk scan

• Checking can be done outside the file system,

entirely at the block layer

• Preventing corruption from being committed is a

huge win over after-the-fact repair!

24

Page 25: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always

Thanks!

• To our anonymous reviewers

• To our shepherd, Junfeng Yang

• To the Systems Software Reading Group @ U of T

For their many insightful comments & suggestions!

• To Vivek Lakshmanan

For early insights that helped start the project!

This work was supported by NSERC through the Discovery

Grants program

25