EROFS: A Compression-friendly Readonly File System for ...

Post on 05-May-2022

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

EROFS: A Compression-friendly ReadonlyFile System for Resource-scarce Devices

Xiang Gao#, Mingkai Dong*, Xie Miao#,Wei Du#, Chao Yu#, Haibo Chen*,#

#Huawei Technologies Co., Ltd.*IPADS, Shanghai Jiao Tong University

USENIX ATC 2019, Renton, WA, USA

2

512 654 650 840

26543072

2.3.6 4.3 5.1 6.0 7.0 8.0Android Version

/system Partition Size (MB)

System resources in Android

...

System resources consume significant storage!Read-only system partitions → compressed read-only file systems

/system

Read-only /oem

Read-only

/odm

Read-only

/vendor

Read-only

3

Existing solutionSquashfs is the state-of-the-art compressed read-only file systemWe tried to use squashfs for system resources in Android

/systemSquashfs

/oemSquashfs

/vendorSquashfs

/odmSquashfs

Result:The system lagged and even froze for seconds and then rebooted

4

Why does Squashfs fail?

6

Fixed-sized input compression

1. Divide to fixed-sized chunks

2. Compress each chunk

3. Concatenate

Fixed-sized input compression

Why does Squashfs fail?

1. Divide to fixed-sized chunks

2. Compress each chunk

3. Concatenate

7

Read Amplification

... ...

1 byte

decompressionamplification

I/Oamplification

Why does Squashfs fail?

Fixed-sized input compression

8

Read Amplification

...

buffer_head

...

decompress

Page cache

... copy

1 byte

Massive Memory Consumption

decompressionamplification

I/Oamplification

Why does Squashfs fail?

allocation

allocation

allocation

copy

9

EROFS

10

1. Prepare a large amount of data

2. Compress to a fixed-sized block

3. Repeat

Fixed-sized output compression

EROFS

1. Prepare a large amount of data

2. Compress to a fixed-sized block

3. Repeat

Fixed-sized output compression

ü Reduce read amplification

ü Better compression ratio

ü Allows in-place decompression

12

Choosing the page for I/O

Cached I/OFor partially decompressed blocksAllocate a page in dedicated page cache for I/Oü Following decompression can reuse the cached page

In-place I/OFor blocks to be fully decompressedReuse the page allocated by VFS if possibleü Memory allocation and consumption are reduced

Page cache

Page cache for partiallycompressed blocks

Page cache

Page cache of requested file

13

Decompression

0 1 2 3 4 5 6 7

Page cache

14

Decompression

0 1 2 3 4 5 6 7

Page cache

How data is compressed

Vmap decompression1. Count data blocks to decompress.2. Allocate temporary physical pages or choose pages allocated by VFS.3. Allocate a continuous VM area via vmap(), and map physical pages in the area.4. For in-place I/O, copy the compressed block to a temporary per-CPU page.5. Decompress to the VM area.

15

Decompression

0 1 2 3 4 5 6 7

Page cache

How data is compressed

Vmap decompression1. Count data blocks to decompress.2. Allocate temporary physical pages or choose pages allocated by VFS.3. Allocate a continuous VM area via vmap(), and map physical pages in the area.4. For in-place I/O, copy the compressed block to a temporary per-CPU page.5. Decompress to the VM area.

16

Decompression

0 1 2 3 4 5 6 7

Page cache

How data is compressed

same physical pages

Vmap decompression1. Count data blocks to decompress.2. Allocate temporary physical pages or choose pages allocated by VFS.3. Allocate a continuous VM area via vmap(), and map physical pages in the area.4. For in-place I/O, copy the compressed block to a temporary per-CPU page.5. Decompress to the VM area.

17

Decompression

0 1 2 3 4 5 6 7

Page cache

How data is compressed

same physical pages

Vmap decompression1. Count data blocks to decompress.2. Allocate temporary physical pages or choose pages allocated by VFS.3. Allocate a continuous VM area via vmap(), and map physical pages in the area.4. For in-place I/O, copy the compressed block to a temporary per-CPU page.5. Decompress to the VM area.

18

Decompression

0 1 2 3 4 5 6 7

Vmap decompression1. Count data blocks to decompress.2. Allocate temporary physical pages or choose pages allocated by VFS.3. Allocate a continuous VM area via vmap(), and map physical pages in the area.4. For in-place I/O, copy the compressed block to a temporary per-CPU page.5. Decompress to the VM area.

Page cache

0 1 2 3 4

How data is compressed

same physical pages

19

Decompression

Vmap decompression

✓ For all cases✘ Frequent vmap/vunmap✘ Unbounded physical page allocations✘ Data copy for in-place I/O

20

Decompression

Vmap decompression

✓ For all cases✘ Frequent vmap/vunmap✘ Unbounded physical page allocations✘ Data copy for in-place I/O

Buffer decompressionPre-allocate four-page per-CPU buffers✘ For decompression <4 pages✓ No vmap/vunmap✓ No physical page allocation✓ No data copy for in-place I/O

21

DecompressionVmap decompression

✓ For all cases✘ Frequent vmap/vunmap✘ Unbounded physical page allocations✘ Data copy for in-place I/O

Buffer decompressionPre-allocate four-page per-CPU buffers✘ For decompression <4 pages✓ No vmap/vunmap✓ No physical page allocation✓ No data copy for in-place I/O

22

In-place decompression✓ No data copy for in-place I/O

Ø Decompression policy

Ø Optimizations

Details in the paperRolling decompression✓ For decompression < a pre-allocated

VM area size✓ No vmap/vunmap✓ No physical page allocation✘ Data copy for in-place I/O

Evaluation SetupPlatform CPU DRAM StorageHiKey 960 Kirin 960 (Cortex-A73 x 4 + Cortex-A53 x 4) 3 GB 32 GB UFS

Low-end smartphone MT6765 (Cortex-A53 x 8) 2 GB 32 GB eMMC

High-end smartphone Kirin 980 (Cortex-A76 x 4 + Cortex-A55 x 4) 6 GB 64 GB UFS

Micro-benchmarks• Platform: HiKey 960• Tool: FIO• Workload: enwik9, silesia.tar

Application benchmarks• Platform: smartphones• Workload: 13 popular applications

Evaluated file systemsEROFS: LZ4, 4KB-sized outputSquashfs: LZ4, {4,8,16,128}KB chunk sizeBtrfs: LZO, 128KB chunk size

readonly mode w/o integrity checksExt4: no compressionF2FS: no compression

23

More results in the paper

Micro-benchmark: FIO Throughput

0

50

100

150

200

250

300

Sequential Random Stride

Thro

ughp

ut(M

B/s)

EROFSSquashfs-4KSquashfs-8KSquashfs-16KSquashfs-128KExt4F2FSBtrfs

FIO, enwik9, HiKey 960 A73 2362 MHz

24

Micro-benchmark: FIO Throughput

0

50

100

150

200

250

300

Sequential Random Stride

Thro

ughp

ut(M

B/s)

EROFSSquashfs-4KSquashfs-8KSquashfs-16KSquashfs-128KExt4F2FSBtrfs

FIO, enwik9, HiKey 960 A73 2362 MHz

25Btrfs performs worst in all cases.

Larger chunk size brings better performance for Squashfs,if the cached results can are used.

Micro-benchmark: FIO Throughput

0

50

100

150

200

250

300

Sequential Random Stride

Thro

ughp

ut(M

B/s)

EROFSSquashfs-4KSquashfs-8KSquashfs-16KSquashfs-128KExt4F2FSBtrfs

FIO, enwik9, HiKey 960 A73 2362 MHz

26

Micro-benchmark: FIO Throughput

0

50

100

150

200

250

300

Sequential Random Stride

Thro

ughp

ut(M

B/s)

EROFSSquashfs-4KSquashfs-8KSquashfs-16KSquashfs-128KExt4F2FSBtrfs

FIO, enwik9, HiKey 960 A73 2362 MHz

27EROFS performs comparable or even better than Ext4.

Read Amplification and Resource Consumption

IO (MB) Consumption (MB)Seq. Random Stride Storage Memory

Requested/Ext4 16 16 16 953.67 988.51Squashfs-4K 10.65 26.19 26.23 592.43 1597.50Squashfs-8K 9.82 33.52 34.08 530.43 1534.09Squashfs-16K 9.05 46.42 48.32 479.38 1481.12Squashfs-128K 7.25 165.27 203.91 379.76 1379.84EROFS 10.14 26.12 25.93 533.67 1036.88

28

Read Amplification and Resource Consumption

84% 87%

IO (MB) Consumption (MB)Seq. Random Stride Storage Memory

Requested/Ext4 16 16 16 953.67 988.51Squashfs-4K 10.65 26.19 26.23 592.43 1597.50Squashfs-8K 9.82 33.52 34.08 530.43 1534.09Squashfs-16K 9.05 46.42 48.32 479.38 1481.12Squashfs-128K 7.25 165.27 203.91 379.76 1379.84EROFS 10.14 26.12 25.93 533.67 1036.88

29

Read Amplification and Resource Consumption

84% 87% 92%44%

IO (MB) Consumption (MB)Seq. Random Stride Storage Memory

Requested/Ext4 16 16 16 953.67 988.51Squashfs-4K 10.65 26.19 26.23 592.43 1597.50Squashfs-8K 9.82 33.52 34.08 530.43 1534.09Squashfs-16K 9.05 46.42 48.32 479.38 1481.12Squashfs-128K 7.25 165.27 203.91 379.76 1379.84EROFS 10.14 26.12 25.93 533.67 1036.88

30

Real-world Application Boot Time

97% 87% 95% 104% 92% 104%104% 97% 110%104%89% 90% 85%

0%25%50%75%

100%

1 2 3 4 5 6 7 8 9 10 11 12 13

95% 86% 89% 81%93% 89% 85%

101%77%

95%81%

99%87%

0%25%50%75%

100%

1 2 3 4 5 6 7 8 9 10 11 12 13

Boot

Tim

e (R

elat

ive

to E

xt4)

Low-end Smartphone

High-end Smartphone

3.2%

10.9%

31

Deployment

üDeployed in HUAWEI EMUI 9.1 as a top featureüUpstreamed to Linux 4.19üSystem storage consumption decreased >30% üPerformance comparable or even better than Ext4üRunning on 10,000,000+ smartphones

32

ConclusionEROFS: an Enhanced Read-Only File System with compression supportFixed-sized output compression with four decompression approaches:

Vmap decompressionBuffer decompressionRolling decompressionIn-place decompression

Running on 10,000,000+ smartphonesüReduce system storage consumption >30%üProvide comparable or even better performance than Ext4

Thank you & questions?

EROFSEROFS

EROFS

33

34

Backup Slides

35

Throughput and space savings

0100200300400500600700800

0 6 15 24 34 47 52 65 74 85 90 96

Thro

ughp

ut(M

B/s)

Space savings (%)

EROFS-Seq. Ext4-Seq.

EROFS-Rand. Ext4-Rand.

Computation > I/O I/O > ComputationLess I/O => Better Throughput

FIO, customized workload generated from enwik9, HiKey 960 A73 2362 MHz 36

Rolling decompressionPre-allocate a large VM area and 17 physical pages per CPUUse 17 physical pages in turn.

✓ For decompression < the VM area size✓ No vmap/vunmap✓ No physical page allocation✘ Data copy for in-place I/O

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1 2

DecompressionObservation:In decompression, LZ4 looks backward at most 64KB of decompressed data.

37

In-place decompressionIf no corruption will happen✓ For decompression < the VM area size✓ No vmap/vunmap✓ No physical page allocation✓ No data copy for in-place I/O

Decompression

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1 2

Seq. A Seq. corrupted

decompressed Seq. A

Before

decompressin-place

After

38

top related