Top Banner
Quo vadis Linux File Systems Ext4 or BTRFS Udo Seidel
50

Osdc2011.ext4btrfs.talk

May 19, 2015

Download

Technology

Udo Seidel

Some technical information on EXT4 and BTRFS (Spring 2011)
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Osdc2011.ext4btrfs.talk

Quo vadis Linux File Systems Ext4 or BTRFS

Udo Seidel

Page 2: Osdc2011.ext4btrfs.talk

OSDC 2011 2

Agenda

● Introduction/motivation● ext4 – the new member of the extfs family

● Facts, specs● Migration

● BTRFS – the newbie .. the hope● Facts, specs● Migration

● Summary

Page 3: Osdc2011.ext4btrfs.talk

OSDC 2011 3

Linux file systems

● More than 50 file systems shipped with Linux kernel● Local● Remote● Cluster● ...

● A few as standard for root directory● ext2, ext3● XFS

Page 4: Osdc2011.ext4btrfs.talk

OSDC 2011 4

Linux file systems – challenges

● ReiserFS sun-setted● Limitations of ext3● Changes in recent Enterprise distributions

Page 5: Osdc2011.ext4btrfs.talk

OSDC 2011 5

Linux file systems – new players

● New version of the ext family -> ext4● Marked as stable● Shipped with Enterprise distributions

● New approach with BTRFS● Still experimental● Default by some projects, e.g. MeeGo

Page 6: Osdc2011.ext4btrfs.talk

OSDC 2011 6

4th extended file system

● Shipped since 2.6.19● Stable since 2.6.28● To overcome limits of ext3

● Size● Performance

Page 7: Osdc2011.ext4btrfs.talk

OSDC 2011 7

Ext4 - history

● Successor of ext3● Started as set of patches for ext3● Later forked

● First called ext3dev (sometimes ext4dev)● Not impact ext3 stability● Less dependencies to ext3 code● Easier to maintain source code

Page 8: Osdc2011.ext4btrfs.talk

OSDC 2011 8

Ext4 - facts

● Max volume size: 1 EByte = 1024 PByte ● Max file size: 16 TByte● Max length of file name: 256 Bytes● Support of extended attributes● No encryption● Not really compression● Partially 64bit

Page 9: Osdc2011.ext4btrfs.talk

OSDC 2011 9

Ext4 – starting from known

● Known tools● mkfs● fsck● tune2fs● e2label

Page 10: Osdc2011.ext4btrfs.talk

OSDC 2011 10

Ext4 – global structure I

● Entry point -> superblock● Block size● Number of blocks and inodes● Number of free blocks and inodes

● Disk divided in block groups● backup of superblock ● Block group description (inode/block bitmaps)

Page 11: Osdc2011.ext4btrfs.talk

OSDC 2011 11

Ext4 – global structure II

● Similar to ext3● Inherits some ext3 limitations

● Number of inodes per block group

● 2nd type of block groups => flexible ● Flexible placement of bitmaps

● Bigger inodes to store additional information● 256 Bytes● Nano second time stamps

Page 12: Osdc2011.ext4btrfs.talk

OSDC 2011 12

Ext4 – from blocks to extents

● Common addressing for modern file systems● Contiguous area of blocks

● Less management information needed● Less meta data operations● Less “fragmentation”

● Requires change of on-disk format

Page 13: Osdc2011.ext4btrfs.talk

OSDC 2011 13

Ext4 – extent I● 15 bit for extent size

● Block size of 4 KByte => 128 MByte

● 1 bit for extent initialization information

struct ext4_extent {

  __le32  ee_block; /* first logical block extent covers */

  __le16  ee_len;  /* number of blocks covered by extent */

  __le16  ee_start_hi; /* high 16 bits of physical block */

  __le32  ee_start_lo; /* low 32 bits of physical block */

};

Page 14: Osdc2011.ext4btrfs.talk

OSDC 2011 14

Ext4 – extent II

● 32 bit for block addresses inside file● Block size of 4 KByte => 16 TByte

● 48 (!) bit for block addresses of file system● Block size of 4 KByte => 1 EByte

Page 15: Osdc2011.ext4btrfs.talk

OSDC 2011 15

Ext4 – extent III

● 60 Byte for extent information● 12 Byte for extent header● 12 Byte for extent structure

– Up to 4 extents per inode – max. 512 MByte direct addressable (ext3: 48 KByte)– Different schema for bigger files

Page 16: Osdc2011.ext4btrfs.talk

OSDC 2011 16

Ext4 – extent tree I

● For files > 512 MByte● B+ tree● Extent structure only at leaf nodes ● New element: extent index

● Same header structure like data extent ● Points to data block● Data block contains either extent index or extent

structure

Page 17: Osdc2011.ext4btrfs.talk

OSDC 2011 17

Ext4 – extent tree II

Page 18: Osdc2011.ext4btrfs.talk

OSDC 2011 18

Ext4 – from extents to blocks

● At the end block allocation● New features

● Multi-block allocation● Delayed allocation● Persistent allocation

Page 19: Osdc2011.ext4btrfs.talk

OSDC 2011 19

Ext4 – multi-block allocation

● Ext3: only one block● 12800 calls for 50 MByte file

● Ext4: multiple blocks per call● Less overhead● Contiguous physical location of data

Page 20: Osdc2011.ext4btrfs.talk

OSDC 2011 20

Ext4 – delayed allocation

● Ext3● Instant block allocation● Fragmentation due to buffers and caches

● Ext4● Delayed block allocation● Use cache information for placement● Risk of data loss in early versions => improved

since 2.6.30

Page 21: Osdc2011.ext4btrfs.talk

OSDC 2011 21

Ext4 – “clever” allocation

● Support of system call fallocate()● Application reserves blocks ahead● File system ensures disk space availability

● Allocation information in extent structure● Remember 16th bit

Page 22: Osdc2011.ext4btrfs.talk

OSDC 2011 22

Ext4 – consistent status

● New journaling => JBD2● Transactions have checksums● 64 bit ready● Deactivation possible

Page 23: Osdc2011.ext4btrfs.talk

OSDC 2011 23

Ext4 – repair

● Improved fsck()● No check of unused blocks

– information stored in block group header– Information secured via checksums– (de)activation possible at any time

● First run as slow like in ext3

Page 24: Osdc2011.ext4btrfs.talk

OSDC 2011 24

Ext4 – other news

● Nano second precision time stamps● Unix millennium bug shifted to 2514

● More subdirectories● Up to 65000● More than 65000 ... with limitation

Page 25: Osdc2011.ext4btrfs.talk

OSDC 2011 25

Ext4 – general migration paths

● mkfs() and backup/restore● Clean new file system structure● Only way for file systems other than ext2/3● Extended outage

● Conversion via tune2fs● Partial only● Only possible for ext family● Faster/easier

Page 26: Osdc2011.ext4btrfs.talk

OSDC 2011 26

Ext4 – background for migration

● 2 kind of changes compared to ext3● change of ondisk format:

– Extents– Only enabled for new files via tune2fs– Additional tasks needed

● Ondisk format not relevant– block allocation– Immediately enabled via tune2fs

Page 27: Osdc2011.ext4btrfs.talk

OSDC 2011 27

Ext4 – migration via tune2fs

● Results in mix of ext3 and ext4 structure● Access via ext3 driver impossible● fsck() needed

parameter description

extent Extent based block allocation

flex_bg Flexible placement of meta data

uninit_bg Flag uninitialized blocks for faster fsck

dir_nlink Infinite number of sub directories

extra_isize Timestamps with nano seconds

Page 28: Osdc2011.ext4btrfs.talk

OSDC 2011 28

Ext4 – migration hints

● fsck() recommended● /boot – booting from ext4 possible?● Rescue media enabled for ext4?

Page 29: Osdc2011.ext4btrfs.talk

OSDC 2011 29

Ext4 – summary

● Good successor of ext3● Manages higher amount of data● Faster

● Performance● recovery

● Safer● Sufficient migration options from ext2/3

Page 30: Osdc2011.ext4btrfs.talk

OSDC 2011 30

Better/b-tree file system

● Shipped since 2.6.29● Still experimental● Replace ext3/4● New storage management approach

Page 31: Osdc2011.ext4btrfs.talk

OSDC 2011 31

BTRFS - history

● Basic idea ● Shown 2007● Usage of B trees for standard structures● Not new ... see XFS, ReiserFS

● Chris Mason● Worked on ReiserFS for SUSE● Moved to Oracle -> started BTRFS developement

Page 32: Osdc2011.ext4btrfs.talk

OSDC 2011 32

BTRFS - facts

● Max file/volume size: 16 EByte ● Max length of file name: 256 Bytes● Support of

● Extended attributes● Encryption● Compression● Snapshot● Copy-on-Write

Page 33: Osdc2011.ext4btrfs.talk

OSDC 2011 33

BTRFS – global structure

● Entry point -> superblock● More than one file system per volume● Extents

● Put together in block groups● No mix of data and meta data

Page 34: Osdc2011.ext4btrfs.talk

OSDC 2011 34

BTRFS – internals: the trees

● Consists of B+ trees● Root tree● File system tree● Extent allocation tree● Checksum tree● Log tree● Chunk & device tree● Data relocation tree

Page 35: Osdc2011.ext4btrfs.talk

OSDC 2011 35

BTRFS – internals: structures

● 3 structures● Key

– index of the tree structure● Block header

– ID of file system– Reference of insert time– Level position

● Item– Different types: inodes, extents, directories

Page 36: Osdc2011.ext4btrfs.talk

OSDC 2011 36

BTRFS – internals: the key

● Index of the tree structure● Size: 136 bit● First 64 bit: unique object ID● Next 8 bit: type/item● Last 64 bit: item dependent

● e.g. Hash of directory name● e.g. Number of elements in directory● e.g. object ID of upper layer directory

Page 37: Osdc2011.ext4btrfs.talk

OSDC 2011 37

BTRFS – internals: the item

● More than one item per object ID possibleItem Value

INODE_ITEM 1

XATTR_ITEM 24

DIR_ITEM 84

DIR_INDEX 96

EXTENT_DATA 108

EXTENT_CSUM 128

ROOT_ITEM 132

EXTENT_ITEM 168

Page 38: Osdc2011.ext4btrfs.talk

OSDC 2011 38

BTRFS – more about trees

● Highest layer● Root tree● Referenced in superblock● Other trees => object ID in root tree

● Some trees unique● Extent allocation● Data relocation

● Possibly multiple trees● File system

Page 39: Osdc2011.ext4btrfs.talk

OSDC 2011 39

BTRFS – file system tree

● Visible part● Contains:

● Inode items ● Reference items

● No data of files ● See extents● Exception: small files

Page 40: Osdc2011.ext4btrfs.talk

OSDC 2011 40

BTRFS – extent allocation tree

● Space management● Backward reference

● file system object ● Possibly multiple per extent● Maybe move to extent data reference object

Page 41: Osdc2011.ext4btrfs.talk

OSDC 2011 41

BTRFS – other trees

● Log tree● Collects fsync() calls● Journal of this kind of COW calls

● Checksum tree● CRC32 checksums of data and meta data

● Chunk tree● Manage devices: device item and chunk map item

● Device tree● Counterpart of chunk tree

Page 42: Osdc2011.ext4btrfs.talk

OSDC 2011 42

BTRFS – device management

● Included volume manager ● pool concept● RAID-0 and RAID-1

● For data and meta data● Not necessarily identical

● Chunk tree● abstract from disk block

Page 43: Osdc2011.ext4btrfs.talk

OSDC 2011 43

BTRFS – extents, chunks, blocks

Page 44: Osdc2011.ext4btrfs.talk

OSDC 2011 44

BTRFS – what else

● Transparent compression via zlib● Support of POSIX ACL's● Online grow/shrink● Online add/removal of disks● No fsck() tool (yet)● Management tool evolution (btrfsctl -> btrfs)

Page 45: Osdc2011.ext4btrfs.talk

OSDC 2011 45

BTRFS – migration I

● Via tool btrfs-convert● du/df not fully BTRFS-aware● In place from ext3/4

● Via libe2fs ● BTRFS meta data location flexible● Old ext3/4 organized in snapshot● Roll-back possible to date/time of conversion

Page 46: Osdc2011.ext4btrfs.talk

OSDC 2011 46

BTRFS – migration II

Page 47: Osdc2011.ext4btrfs.talk

OSDC 2011 47

BTRFS summary

● Still experimental● Meets standard file systems requirements● Bridges existing gaps

● e.g. snapshots

● easy migration from ext3/4 possible● New approach to storage management

● e.g. included volume manager

Page 48: Osdc2011.ext4btrfs.talk

OSDC 2011 48

Summary

● Improvement moving to ext4● Safe switching to ext4● In place migration from ext3 possible● Future is BTRFS● In place migration from ext3/4 to BTRFS

possible

Page 49: Osdc2011.ext4btrfs.talk

OSDC 2011 49

References

● http://ext4.wiki.kernel.org● http://btrfs.wiki.kernel.org

Page 50: Osdc2011.ext4btrfs.talk

OSDC 2011 50

Thank you!