Defense against the AnC Attack on BOOM Rui Hou, Xiaoxin Li, Wei Song, Dan Meng
单击此处编辑母版标题样式
Defense against the AnC Attack on BOOM
Rui Hou, Xiaoxin Li, Wei Song, Dan Meng
Outline
• Security vulnerabilities due to resource sharing
• AnC attack
• Solution
– RCL
– PTE-isolation
Security vulnerabilities due to resource sharing
• Resource sharing is a classic optimization in architecture design
– Register file
– On-chip cache
– SMT
• Unfortunately, it might cause side-channel information leakage
– attacks on crypto
– attacks on SGX and TrustZone
– attacks on ASLR
PTE & Normal data co-location
ASLR⊕Cache (AnC) Attack
• ASLR is Widely deployed to mitigate code reuse attack
– Choose a different location for code and data every time a process is run.
• Increasing the exploits difficulty
• Usually exploits need to know the location of certain data in memory.
• Exploit writers need to find a bug which leaks addresses without crashing the program.
• AnC Goal: Breaking ASLR through cache side channel attack
• Typical attack scenario
– Running malicious JavaScript in victim’s browser by luring the victim into visiting a malicious website
– Escaping the JavaScript sandbox via a memory corruption vulnerability(e.g. Vtableinjection attack via use-after-free )
– Since there is no pointer concept in javascript, AnC is used to get the VA of targeted malicious code piece
Affected systemsResults: tested m icroarchitectures
95
C P U M odel M icroarchitecture Year
Intel Xeon E3-1240 v5 Skylake 2015
Intel Core i7-6700K Skylake 2015
Intel Celeron N2840 Silvermont 2014
Intel Xeon E5-2658 v2 Ivy Bridge EP 2013
Intel Atom C2750 Silvermont 2013
Intel Core i7-4500U Haswell 2013
Intel Core i7-3632QM Ivy Bridge 2012
Intel Core i7-2620QM Sandy Bridge 2011
Intel Core i5 M480 Westmere 2010
Intel Core i7 920 Nehalem 2008
AMD FX-8350 8-Core Piledriver 2012
AMD FX-8320 8-Core Piledriver 2012
AMD FX-8120 8-Core Bulldozer 2011
AMD Athlon II 640 X4 K10 2010
AMD E-350 Bobcat 2010
AMD Phenom 9550 4-Core K10 2008
Allwinner A64 ARM Cortex A53 2016
Samsung Exynos 5800 ARM Cortex A15 2014
Samsung Exynos 5800 ARM Cortex A7 2014
Nvidia Tegra K1 CD580M-A1 ARM Cortex A15 2014
Nvidia Tegra K1 CD570M-A1 ARM Cortex A15; LPAE 2014
High-level introduction of AnC attack
By monitoring the PTW accesstrace through cache side channelattack, the attacker gets three pieces of information to infer the whole VPN :
• the cache indexes of the four related PTEs
• the offset of each PTE inside its cache line
• and the mapping between PT offsets and PT levels.
Combine with the page offset, theattacker get the VA.
Use typical EVICT+Time side-channel attack to get cache index of PTE
The attacker traverses the LLC via accessing a big array
In each iteration, the attacker:
– Access the targeted object or variable (to fetch the related PTEs into caches)
– Flush TLB, forcing page table walk for the later access of the target
– Evict PTE cacheline from cache (Evict)
– Access the target again, measure its access time (Time)
The longest access indicates the pte location in the cache, thus get the
corresponding cache index of one PTE.
How to get cache indexes?
How to get the rest information?
• 2 remaining pieces information
– the offset of each PTE inside its cache line
– the mapping between PT offsets and PT
levels
• Solution
– Sliding
– Also use Evict+Time
– Check More details from the AnC paper
???
Level 4 Level 3 Level 2 Level 1
?2
000 010 ???
7 ?
000 111 ???
32 ?
100 000 ???
56 ?
111 000 ???
Insights & our idea
Two key HW features enables AnC attack:• A direct mapping between addresses and cache indexes.
– It is easy to guess the portion of address once its cache location is known.
– Idea: Remapping the cache layout
• Uniformed caches that stores both data and page table entries.
– The attacker can use the data under her control to evict the PTE from cache.
– Idea: Hardware isolation between the PTE and data
Both are software transparent
Experiment Platform
Boom SOC
7/3/2018 10
Original cache layout
•••
•••
•••
page A•••
•••
•••
•••
page BCacheline index = page offset [11:6]
PTE offset == {Cacheline index, Cacheline offset}
Cacheline index corresponds to part of the page offset.
So the index is predictable, the attacker can use the data under her
control to evict any specific cacheline.
Remapping the cache layout
•••
•••
•••
page A•••
•••
•••
•••
page BCacheline index = page offset [11:6] ^ PA [20:15]
???
???
???
???
PTE offset =/= {Cacheline index, Cacheline offset}
After remapping, the cache location is not predictable.
The attacker cannot evict the cacheline effectively and deduce the
page offset from cache location information.
RCL implementation
baseline cache RCL cache
RCL evaluation
(a) Execution time overhead compared with the baseline.
(b) Cache miss per kilo instructions (MPKI).On average, the execution time increases by 1.0%.
PTE isolation
• Naïve: Uncache
– Make the PTE un-cacheable
– Significant performance overhead
• Enhancement: Uncache-PTC
– Caching the PTE in dedicated cache
(modify the page walk cache to cache all level PTE including the leaf PTE)
PTE isolation evaluation
Performance overhead of Isolation schemes
The average performance overhead is 39.60% for fully uncache scheme
and significantly decreases to 7.3% for the modified PWC scheme.
Thank You