Top Banner
Ward: Efficiently Mitigating Transient Execution Attacks using the Unmapped Speculation Contract 1 Jonathan Behrens, Anton Cao, Cel Skeggs, Adam Belay, M. Frans Kaashoek, Nickolai Zeldovich
32

Efficiently Mitigating Transient Execution Attacks using ...

May 20, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Efficiently Mitigating Transient Execution Attacks using ...

Ward:Efficiently Mitigating Transient Execution Attacks using the Unmapped Speculation Contract

1

Jonathan Behrens, Anton Cao, Cel Skeggs, Adam Belay, M. Frans Kaashoek, Nickolai Zeldovich

Page 2: Efficiently Mitigating Transient Execution Attacks using ...

Transient execution attacks risk leaking information

2

Linux maintains security using software mitigations

Page 3: Efficiently Mitigating Transient Execution Attacks using ...

Software mitigations are expensive

3

LEBench [SOSP ‘19] with/without mitigations on Linux

Page 4: Efficiently Mitigating Transient Execution Attacks using ...

Goal: faster mitigations

Threat model

● Similar security to Linux

Main ideas

● Unmapped Speculation Contract

● Ward kernel design

4

Page 5: Efficiently Mitigating Transient Execution Attacks using ...

z = shared[y * CACHE_LINE];}

y = array[sysarg];

if (sysarg < SIZE) { // speculate taken

Memory

Cache

Transient execution attack example

5

char array[SIZE];int secret;char shared[256 * CACHE_LINE];

// vulnerable system call code// if sysarg >= SIZE

shared

y z

array secret

// userspace attacker codesecret = is_in_cache(&shared[0]);

Page 6: Efficiently Mitigating Transient Execution Attacks using ...

Memory

Cache

Typical mitigation approach

6

char array[SIZE];int secret;char shared[256 * CACHE_LINE];

// vulnerable system call code// if sysarg >= SIZEif (sysarg < SIZE) { // speculate taken lfence(); // prevents speculation y = array[sysarg]; z = shared[y * CACHE_LINE];}

// userspace attacker codesecret = is_in_cache(&shared[0]);

sharedarray secret

Page 7: Efficiently Mitigating Transient Execution Attacks using ...

Memory

Cache

Ward has a different approach

7

char array[SIZE];int secret;char shared[256 * CACHE_LINE];

// vulnerable system call code// if sysarg >= SIZEif (sysarg < SIZE) { // speculate taken

y = array[sysarg]; z = shared[y * CACHE_LINE];}

// userspace attacker codesecret = is_in_cache(&shared[0]);

sharedarray secret

Page Fault

Secret not mapped...

Page 8: Efficiently Mitigating Transient Execution Attacks using ...

Our observation: Unmapped Speculation Contract (USC)

If some memory has never been mapped in the current address space...

CPU state should be unaffected by values stored there

8

Page 9: Efficiently Mitigating Transient Execution Attacks using ...

USC is a good hardware-software contract

● Allows most speculation

● Processors seem to be able to provide it:

“AMD processors are designed to not speculate into memory that is not valid in the current virtual address memory range defined by the software defined page tables.”

— “Speculation behavior in AMD micro-architectures” white paper

9

Page 10: Efficiently Mitigating Transient Execution Attacks using ...

Design

10

Page 11: Efficiently Mitigating Transient Execution Attacks using ...

Split kernel to leverage USC

Ward extends Linux’s PTI:

● K-domain (“kernel domain”) has a page table with all physical memory

11

0xfffffffffffff

0x800000000000

User

Kernel Text

Direct Map

0x000000000000

K-domain

Page 12: Efficiently Mitigating Transient Execution Attacks using ...

The Ward kernel is split in half

Ward extends Linux’s PTI:

● K-domain (“kernel domain”) has a page table with all physical memory

● Q-domain (“quasi-visible domain”) has a page table with user mappings, and safe kernel mappings.

12

0xfffffffffffff

0x800000000000

User

Kernel Text

0x000000000000

Direct Map

Q-domain

Page 13: Efficiently Mitigating Transient Execution Attacks using ...

Syscalls start executing in the Q-domain

● Any syscall or trap handler that doesn’t access any secret data will run entirely in the Q-domain.

● When this happens, we are able to avoid many mitigations:

○ No need for page table swap

○ Don’t have to flush microarchitectural

buffers

○ Retpolines are not required

13

User

Kernel Text

Q-domain

Page 14: Efficiently Mitigating Transient Execution Attacks using ...

...but sometimes we must enter the K-domain

14

User

Kernel Text

Q-domain

Page 15: Efficiently Mitigating Transient Execution Attacks using ...

...but sometimes we must enter the K-domain

15

User

Kernel Text

Q-domain

User

Kernel Text

K-domain

world switch

Page 16: Efficiently Mitigating Transient Execution Attacks using ...

World switches use two stacks

16

Q-stack

Q-stack

K-stack2: memcpy

1: switch page table

Q-domain K-domain

Q-text K-text 3: resume executing

Steps in a world switch…

1. Switch to K-domain page table2. Copy Q-stack contents to K-stack3. Resume executing

Page 17: Efficiently Mitigating Transient Execution Attacks using ...

● Both code segments are compiled the same

○ Matching instruction addresses and stack layouts

● At runtime, Q-text has mitigations patched out

○ lfence○ verw○ retpoline

Q and K Kernel

17

Q-stack

Q-stack

K-stack

Q-domain K-domain

Q-text K-text

Page 18: Efficiently Mitigating Transient Execution Attacks using ...

Redesigning the kernel to avoid switches

● Kernel data structures may mix secret and non-secret data

18

struct proc_public { proc_public* next; int pid; ...};struct proc_private { proc_public* pproc; uint64_t saved_regs[16]; ...};

struct proc { proc* next; int pid; uint64_t saved_regs[16]; ...};

Page 19: Efficiently Mitigating Transient Execution Attacks using ...

Manipulating page tables while in the Q-domain

● The physical memory pages backing the page tables, are themselves in the Q-domain

● Powerful capability which enables Q-domain to...○ Allocate anonymous memory○ Create temporary mappings○ Move kernel pages into/out of the Q-domain

19

Page 20: Efficiently Mitigating Transient Execution Attacks using ...

Allocating memory without world switches

● Have a per-core list of zeroed memory pages mapped in the Q-domain

○ Refreshed in batches

● Used for a variety of purposes:○ Page tables○ Q-domain kernel data structures○ Lazy allocation of user memory

20

Buddy Allocator (4KB - 32MB)

Per-core free list (4KB)

zeroed alloc (4KB)

public alloc (4KB)Q-domain alloc (4KB)

Page 21: Efficiently Mitigating Transient Execution Attacks using ...

Implementation

● Based on sv6 research kernel○ 34K lines of C++ code, plus libraries

● Supports all relevant mitigations from Linux○ Focus on Skylake (2015-19) microarchitecture

● Binary compatible with a subset of Linux’s syscall API○ Can run unmodified binaries!

21

Page 22: Efficiently Mitigating Transient Execution Attacks using ...

Results

22

Page 23: Efficiently Mitigating Transient Execution Attacks using ...

Does Ward reduce overhead?

Ward configurations:

● Linux-style: Standard mitigations like the ones in Linux

● USC-based: Fast mitigations

● Baseline: All mitigations disabled

Workloads:

● LEBench ● git

23

Page 24: Efficiently Mitigating Transient Execution Attacks using ...

Ward does better on LEBench

24

Page 25: Efficiently Mitigating Transient Execution Attacks using ...

Ward does better on LEBench

25

Low

er is

bet

ter

Page 26: Efficiently Mitigating Transient Execution Attacks using ...

Ward does better on LEBench

26

Low

er is

bet

ter

Page 27: Efficiently Mitigating Transient Execution Attacks using ...

Ward does better on LEBench

27

Low

er is

bet

ter

Page 28: Efficiently Mitigating Transient Execution Attacks using ...

Ward does better on LEBench

28

Low

er is

bet

ter

Page 29: Efficiently Mitigating Transient Execution Attacks using ...

Git benchmark

● Ward also demonstrates application-level performance improvements

● Runtime for git status on a 100 MB repository:

Configuration Overhead

Linux-style 24.6%

USC-based 11.2%

29

Page 30: Efficiently Mitigating Transient Execution Attacks using ...

Related Work: Spectrum of defenses

30

● Pure software defenses like Linux’s PTI, retpoline, etc.

● Hardware-software co-designs like ConTExT [CoRR], and SpecCFI [SP ‘20]

● Hardware defenses: Intel/AMD designs, Specshield [PACT ‘19], NDA [MICRO ‘19], and Speculative Taint Tracking [MICRO ‘19]

Page 31: Efficiently Mitigating Transient Execution Attacks using ...

Open question: what is the best way to mitigate attacks?

● Intel Cascade Lake (2019) has hardware mitigations for many attacks○ Eliminates need for software mitigations○ Toggling mitigations is almost free, but...

● New processor up to 33% slower executing LEBench syscalls○ Compared to 2016 CPU model with same clock speed and core count○ When mitigations disabled for both

Can hardware mitigations leverage the USC to get better performance?

31

Page 32: Efficiently Mitigating Transient Execution Attacks using ...

Conclusion

● The Unmapped Speculation Contract defines a division of responsibility between hardware and software

● Using USC, Ward reduces the performance cost of mitigations in software

32

github.com/mit-pdos/ward

Contact: [email protected]