-
CHERIvoke: Characterising Pointer Revocation using
CHERICapabilities for Temporal Memory Safety
Hongyan Xia∗†University of Cambridge
Jonathan Woodruff ∗†University of Cambridge
Sam Ainsworth∗†University of Cambridge
Nathaniel W. Filardo∗University of Cambridge
Michael Roe∗University of Cambridge
Alexander Richardson∗University of Cambridge
Peter Rugg∗University of Cambridge
Peter G. NeumannSRI International
Simon W. Moore∗University of Cambridge
Robert N. M. Watson∗University of Cambridge
Timothy M. Jones∗University of Cambridge
ABSTRACTA lack of temporal safety in low-level languages has led
to anepidemic of use-after-free exploits. These have surpassed in
numberand severity even the infamous buffer-overflow exploits
violatingspatial safety. Capability addressing can directly enforce
spatialsafety for the C language by enforcing bounds on pointers
and byrendering pointers unforgeable. Nevertheless, an efficient
solutionfor strong temporal memory safety remains elusive.
CHERI is an architectural extension to provide hardware
capa-bility addressing that is seeing significant commercial and
open-source interest. We show that CHERI capabilities can be used
asa foundation to enable low-cost heap temporal safety by
facilitat-ing out-of-date pointer revocation, as capabilities
enable preciseand efficient identification and invalidation of
pointers, even whenusing unsafe languages such as C. We develop
CHERIvoke, a tech-nique for deterministic and fast sweeping
revocation to enforcetemporal safety on CHERI systems. CHERIvoke
quarantines freeddata before periodically using a small shadow map
to revoke alldangling pointers in a single sweep of memory, and
provides atunable trade-off between performance and heap growth. We
eval-uate the performance of such a system using high-performance
x86processors, and further analytically examine its primary
overheads.When configured with a heap-size overhead of 25%, we find
thatCHERIvoke achieves an average execution-time overhead of
under5%, far below the overheads associated with traditional
garbagecollection, revocation, or page-table systems.
∗Email: {firstname.lastname}@cl.cam.ac.uk†These authors
contributed equally to this paper, and are named in reverse
alphabeticalorder.
Permission to make digital or hard copies of all or part of this
work for personal orclassroom use is granted without fee provided
that copies are not made or distributedfor profit or commercial
advantage and that copies bear this notice and the full citationon
the first page. Copyrights for components of this work owned by
others than theauthor(s) must be honored. Abstracting with credit
is permitted. To copy otherwise, orrepublish, to post on servers or
to redistribute to lists, requires prior specific permissionand/or
a fee. Request permissions from [email protected],
October 12–16, 2019, Columbus, OH, USA© 2019 Copyright held by the
owner/author(s). Publication rights licensed to ACM.ACM ISBN
978-1-4503-6938-1/19/10. . .
$15.00https://doi.org/10.1145/3352460.3358288
CCS CONCEPTS• Computer systems organization → Architectures; •
Secu-rity and privacy → Systems security; Security in hardware;
Soft-ware and application security.
KEYWORDStemporal safety, use-after-free, architecture,
securityACM Reference Format:Hongyan Xia, Jonathan Woodruff, Sam
Ainsworth, Nathaniel W. Filardo,Michael Roe, Alexander Richardson,
Peter Rugg, Peter G. Neumann, SimonW. Moore, Robert N. M. Watson,
and Timothy M. Jones. 2019. CHERIvoke:Characterising Pointer
Revocation using CHERI Capabilities for TemporalMemory Safety. In
The 52nd Annual IEEE/ACM International Symposiumon
Microarchitecture (MICRO-52), October 12–16, 2019, Columbus, OH,
USA.ACM,NewYork, NY, USA, 13 pages.
https://doi.org/10.1145/3352460.3358288
1 INTRODUCTIONLarge codebases written in low-level languages
have been plaguedby violations of temporal safety. A typical
temporal-safety viola-tion consists of a pointer to a deallocated
object being mistakenlyreused by the programmer. Such a
use-after-free temporal-safetyviolation, in combination with other
program behaviour, can re-sult in a security vulnerability.
Temporal-safety vulnerabilities canallow attackers manipulating
data inputs to achieve full controlof a program [8, 11], or even
the entire system [46]. Indeed, thisform of attack was recently
found to be more common [26, 30] thanbuffer-overflow attacks that
result from spatial-safety violations.Future computer systems
require much stronger enforcement ofboth spatial and temporal
memory safety.
Recent research has shown that spatial safety can be
guaranteedat low cost by using architectural extensions, such as
CHERI [44, 45],a hardware capability architecture that is
influencing the directionof industry [19]. CHERI replaces pointers
with unforgeable, archi-tecturally identifiable references
(capabilities) that convey not onlythe current address, but also
the full range that is legally accessiblethrough that reference. In
this paper, we show that CHERI furtherallows us to achieve low-cost
temporal safety for the heap in low-level languages, such as C and
C++, with only minor architecturalchanges. By contrast, legacy
architectures do not allow fine-grained
https://doi.org/10.1145/3352460.3358288https://doi.org/10.1145/3352460.3358288
-
MICRO-52, October 12–16, 2019, Columbus, OH, USA Xia, Woodruff,
Ainsworth, et al.
temporal safety of untrusted programs, as they can neither
elimi-nate the possibility of retaining or fabricating references
to freedmemory nor distinguish dangling pointers from innocuous
data.
We have designed CHERIvoke, a technique providing tempo-ral
safety for memory allocators on top of hardware
capabilities,complete with minor architectural extensions for
performance.CHERIvoke delays reallocation of memory, holding
manually freedobjects in a quarantine buffer until performing a
sweep of memoryto remove all references to these objects. An
implementation ofCHERIvoke thus prevents reallocation of memory
that may still beaddressable by references available to the
program. Furthermore,CHERIvoke specifies a data structure for
describing the freed mem-ory locations using a shadow map wherein
one bit represents a16-byte allocation granule. This shadowmap
enables a single sweepto revoke access to an arbitrary number of
memory locations, re-gardless of heap layout.
CHERIvoke provides fast, uncircumventable temporal guaran-tees:
memory cannot be addressed without a capability, and allreferences
to a region can be found as each capability containsfull bounds
information. In contrast to other temporal-safety sys-tems for C
[12, 27, 41], pointers cannot be hidden from the sys-tem. CHERIvoke
also has a variety of useful performance proper-ties: memory
overhead can be capped due to active revocation (vsgarbage
collection), and regions with no capabilities can be
entirelyskipped based on hardware-tag metadata.
We evaluate the performance of CHERIvoke by reproducing
itsbehaviour on a modern x86 system to account for
state-of-the-artmemory subsytems, and demonstrate overheads far
lower than inprevious work [12, 27, 41]. CHERIvoke can achieve
strong temporalmemory safety for the heap at an average of 4.7%
runtime overhead(and a maximum of 51%) at the cost of 25% increase
in heap sizeacross SPEC CPU2006 benchmarks. Sweeping-time overhead
isdetermined by the application’s pointer density and the rate
atwhich memory is freed, rather than more complicating factors
suchas the number of frees, number of loads, or allocation
strategy.Further, this cost can be deterministically traded for
increased heapoverhead. Unlike pure software approaches, full
memory safetyfor low-level languages is practical and efficient
using CHERIvoke,and its overheads are predictable and intuitive to
understand. Thecontributions of this paper are:
• The case for temporal safety built on top of tagged
CHERIcapability pointers.
• An algorithm for CHERI temporal safety that uses
bufferedrevocation to achieve predictable costs that are
substantiallylower than previous techniques.
• An evaluation of this algorithm on a state-of-the-art
memorysubsystem.
• Lightweight CHERI extensions to optimise sweeping mem-ory to
identify capabilities.
2 BACKGROUND2.1 Temporal-Safety ViolationsThe Cmemorymodel
presents to the programmer a view ofmemoryconsisting of a set of
objects. C allows the programmer to manip-ulate pointers to these
objects and perform arithmetic on thesepointers. As a result it is
possible for programmers to mistakenly
Figure 1: A use-after-free attack which overwrites a
reallo-cated vtable pointer to reference attacker function
pointers.
violate the memory model of the language such that a referenceto
one object may actually reach a different object. Objects in theC
memory model are distinct from one another in both space andtime.
That is, two objects may be distinguished from one anotherby
occupying disparate addresses in memory, or by existing at
dif-ferent times during the program’s execution. C
implementationshave traditionally allowed violations of both of
these boundaries,dubbed spatial and temporal safety respectively.
CHERI can natu-rally enforce spatial safety by attaching bounds to
pointers suchthat no manipulation of a reference to one object can
cause it toreach another object. However, CHERI does not naturally
defendagainst temporal-safety violations that arise from using a
pointerafter the program has asked for the object to be freed.
Accidental reuse of objects past their point of deallocation
iscommon in low-level languages such as C and C++. These
violationsof temporal safety can result in security
vulnerabilities, wherebyan attacker can manipulate memory reached
through a danglingpointer, causing it to point to a different
object. This routinely allowsattackers the flexibility to fully
compromise computer systems.
An illustrative temporal-safety violation for C++ is depicted
infigure 1. Here, delete is called on an object, which jumps to the
de-structor from the object’s vtable which will free the object.
Thoughthe object is now notionally deleted, a pointer to the
object’s oldlocation in memory is still accessible and now becomes
a danglingpointer. This memory is then reallocated by the program
to an ob-ject that holds external data input that has come from the
attacker.An accidental second call to delete on the dangling
pointer willnow jump to an address of the attacker’s choosing,
ceding controlover the process, and, if the vulnerability is within
kernel mode,the entire system.
Besides pointer corruption, data corruption can change
programexecution [10], for example, to alter administrator checks
to gaincontrol of a program.
The above scenarios would commonly be classified as
use-after-free vulnerabilities [11], but, more accurately, these
are examplesof use-after-reallocation attacks. Attacks that take
advantage of
-
CHERIvoke: Characterising Pointer Revocation using CHERI
Capabilities for Temporal Memory Safety MICRO-52, October 12–16,
2019, Columbus, OH, USA
063
perms’15 compressed bounds’46
address’64
}128 bits
Figure 2: Bit representation of a CHERI-128 capability.
reallocation are most dangerous from a security perspective as
theyallow an attacker to take advantage of the mismatch in
memoryinterpretation to gain influence over execution, particularly
if oneof the interpretations includes user-supplied data. By
comparison,use-after-free before reallocation does not allow
manipulation ofa different object’s data and, while erroneous,
rarely results insecurity vulnerabilities. The enforcement of
use-after-reallocationrather than strict use-after-free allows us
the flexibility to batchrevocations to achieve reasonable
performance [27].
There is a class of use-after-free attacks that do not require
reallo-cation, but corrupt allocator metadata that has been stored
in freedmemory. These vulnerabilities are solved relatively
inexpensivelyby careful placement of metadata, as in BIBOP designs
[16, 39, 40]and need not be addressed by revocation.
2.2 CHERI CapabilitiesCHERI is an instruction-set extension [45]
that requires addressingmemory through unforgeable, bounded
references called capabil-ities after the classic concept from
computer science [14]. CHERIcapabilities embed protected metadata
to each pointer word, typ-ically extending pointers to 128-bits for
a 64-bit address space,or 64-bits for a 32-bit address space. As
shown in figure 2, pro-tected metadata includes the bounds of the
object referenced andpermissions granted by this reference. To
enforce monotonicity ofaccess rights, capability instructions do
not allow the bounds of acapability to be enlarged.1 To enforce
unforgeability of capabilities,each capability word is protected by
a 1-bit tag [22] that distin-guishes a capability from arbitrary
data. This tag is cleared on anon-capability write, preventing that
word from being used as acapability. As a result, the virtual
addresses accessible to a programare limited to those authorised by
capabilities in the register fileand reachable capabilities in
(transitively) authorised regions ofmemory.
Prior work has described CheriABI [13], a new application
binaryinterface for C and C++ programs under a CHERI-aware branchof
FreeBSD. Programs compiled to CheriABI use capabilities forevery
reference, achieving spatial safety (against attacks such asbuffer
overflows) for all references, including the stack, heap,
andglobals, at an overhead that is typically less than 10%, even
forpointer-heavy applications.
A primary benefit of the CHERI architecture is that the set
ofmemory locations accessible to the program is entirely encodedin
the memory state. Tags uniquely identify capability pointers,and
these capabilities entirely define the range of memory they
canreference. This structure facilitates precise pointer
identification,eliminating both false negatives and false
positives; inspection ofmemory cannot miss “hidden” pointers and
cannot mistake data1At CPU power-on, the register file is
initialised with omnipotent capabilities, bearingall permissions to
all words of memory. Every capability created during the
system’sexecution traces its provenance to these; there are no
architectural operations thatderive a tagged word exclusively from
untagged inputs, and, for all derivations, theresult bounds are no
larger than those of a tagged input.
for a pointer. This visibility at the architectural level
enables atemporal-safety system that is both strong and high
performance.
A secondary benefit of the CHERI architecture is its strong
spa-tial safety, providing object allocators with the ability to
boundreturned pointers and, thereby, ensure that every object is
accessedonly within its bounds. For example, cross-object
buffer-overflowattacks are impossible in C programs compiled to
CheriABI, whenlinked with a correct, bounds-setting allocator.
CHERIvoke usesthis ability to ensure that each application-held
capability with au-thority to access the heap has authority to
exactly one heap object,so that object lifetimes imply capability
lifetimes.
Prior work on CHERI has analysed the performance of tag stor-age
[22]. Tag performance can have a major effect on pointer
inspec-tion, particularly if tags are read separately from their
associateddata in order to avoid loading untagged non-pointer data.
CHERIprototypes store capability tags in a hierarchical table in
conven-tional DRAM, and introduce a tag cache to reduce additional
DRAMtraffic. This tag cache achieves very high hit rates, while
separationof tags and data facilitates efficient tag inspection
without loadingall associated data.
2.3 Threat ModelOur threat model assumes a non-malicious
programmer who hasinadvertently created a local program with a
use-after-free vul-nerability, and a malicious external attacker
able to influence itsbehaviour – for example, via I/O over a
network socket. By manip-ulating the vulnerable program, the
attacker can utilise danglingpointers caused by this use-after-free
vulnerability, to induce readsand/orwrites via both the prior and
current pointers to thatmemory.This allows an attacker broad scope
for exploitative data corruptionand control-flow attacks [9, 47],
particularly where user-supplieddata is confused for trusted data
or function pointers.
Our aim is to remove dangling pointers to address-space
regionsbefore they are reallocated. This strategy addresses a
critical set ofexploit techniques relating to manipulation of data
through differ-ent object pointers to the same memory. As with
other techniquesin the literature [27, 41], CHERIvoke does not
address a broadercategory of temporal-safety violations, such as
use of uninitialiseddata [28] or information leakage between prior
and current alloca-tions, and should be used alongside orthogonal
low-cost protectionmechanisms [29] for this purpose.
CHERIvoke could also be extended to address stronger CHERIthreat
models, such as software compartmentalisation, in whichthe local
programmer may also be malicious [42]. This requires en-suring that
shared memory referenced by two mutually distrustingcompartments
could not be improperly freed by either compart-ment. We do not
address more sophisticated guarantees requiredby such use cases in
this paper.
3 CHERIVOKEWe propose CHERIvoke as a technique to enforce
temporal safetyusing CHERI by revoking access to freed memory
before allowingreallocation. To revoke a capability is to remove
all copies and allderivatives of that capability from a program.
While this could bedone on every free, CHERIvoke periodically
performs bulk revoca-tion to reduce overhead. This is achieved by
holding manually freed
-
MICRO-52, October 12–16, 2019, Columbus, OH, USA Xia, Woodruff,
Ainsworth, et al.
Figure 3: Deallocations are kept in a quarantine buffer be-fore
they are revoked. Revocation is implemented efficientlyby using a
small shadowmap of the heap that marks deallo-cated regions in
quarantine.Memory and registers are sweptusing this shadowmap to
identify any dangling pointers. Af-ter the sweep, CHERIvoke clears
the shadowmap andmovesquarantined locations into the free list for
reallocation.
heap memory in a quarantine buffer until CHERIvoke has
sweptthrough program memory to clear tags on all capability
referencesto quarantined memory, as shown in figure 3.
3.1 Quarantine BufferIn order to prevent use-after-reallocation
attacks, an allocator mustnot reissue freed address space until it
has ensured that there are noremaining references to this memory in
memory segments availableto its caller. When allocations are freed,
our allocator does notimmediately return addresses to the
reallocatable state, but placesthem in a quarantine buffer. When
this quarantine buffer is full, wesweep all memory that could
contain references to the heap andinvalidate any capability
reference that points to any region in thequarantine buffer. After
the sweep, all quarantined addresses arereturned to a free,
reallocatable state.
In order to maintain a consistent memory overhead, this
buffercan be set to a fixed proportion of heap size. For example,
wemay initiate a revocation sweep when the quarantined data
hasreached 14 the size of the rest of the heap. The
quarantine-buffer sizecan be scaled to trade off memory overhead
for runtime overhead,increasing or reducing sweeping frequency.
3.2 Revocation Shadow MapTo achieve reliable, high performance
regardless of application, thesweeping procedure should ideally be
deterministic and indepen-dent of heap layout. We achieve these
properties by maintainingrevocation metadata in a revocation
shadowmap. For each allocationgranule, which we choose to be 16
bytes of memory to match thedefault in dlmalloc [25], we allocate 1
bit in a shadow map; thisshadow space occupies less than 1% of the
heap. Before a sweep,for all allocations in the quarantine buffer,
we “paint” the bits of theshadow map corresponding to the
allocation granules to indicatethat references to this memory
should be revoked in the sweep.
The actual sweeping procedure performs a lookup in the shadowmap
using the base of each capability to detect if it is pointing intoa
revoked object.2
This shadow-map scheme allows fast, flat index lookup for
test-ing each capability reference during a sweep, and is
deterministicin its instruction count. As the shadow map is
significantly smallerthan the heap itself, and accesses to it are
highly likely to be bothtemporally and spatially local, the
shadow-map working set willtypically fit in the last-level cache,
and accesses to it should notlimit DRAM bandwidth available to the
primary sweep.
Most importantly, this shadow-space strategy allows revocationof
all quarantined address space in a single sweep, with the
resultthat sweeping frequency depends purely on the free rate of
theapplication (in MB/s) and the size of the quarantine buffer, and
noton heap layout. This ensures predictable and reliable
performancefor all applications.
3.3 Sweeping ProcedureA revocation sweep must cover all memory
that could contain capa-bility references to the heap. This
includes the heap itself, the stack,register files, and global
segments (such as .data and .bss). Thissweep is the primary
overhead in CHERIvoke. The sweep needsto be fast, and should aim to
fully utilise the DRAM bandwidth ofthe system, requiring a highly
optimised inner loop. While we limitour investigation to the
efficiency of software implementations ofthis loop in the
evaluation section, it would be reasonable to extenddirect memory
access (DMA) engines or digital signal processors(DSPs) in the
system to perform this loop at bus speed and withoutCPU
involvement.
In software, this inner loop consists of the following code:1
for(uintptr_t* x=MIN_ADDR; x>= 4; // 16-byte alloc granule5 //
Get the byte from the shadow space.6 char shadowbyte =
shadowbyte_get(capword);7 // Get the bit index.8 int bitIdx =
capword & 0x7;9 if(shadowbyte & (1
-
CHERIvoke: Characterising Pointer Revocation using CHERI
Capabilities for Temporal Memory Safety MICRO-52, October 12–16,
2019, Columbus, OH, USA
CPU
I$ D$
L2$
Bk0Tag$
DRAM
Bk1 Bk2 Bk3
Bk0 Bk1 Bk2 Bk3
vs.
$Line 0$Line 1
Tags
Figure 4: The implementation of CLoadTags requiresmovingtags
from the data banks to tag metadata.
tags, but need not directly dereference the capabilities
themselves,since the sweeping loop only looks up the pointer values
in theshadow map.
3.4 New Hardware SupportA CHERI capability system tracks the
presence of capability refer-ences in hardware and can therefore
facilitate a sweep that inspectsonly genuine capability pointers.
This gives us an opportunity tooptimise the sweeping procedure: as
capability state is an archi-tectural feature, we can avoid
sweeping through entire regions ofmemory that are pointer free,
fundamentally decreasing the amountof work that must be done.
However, to check whether a memory word is tagged (i.e.
con-tains a valid capability), the current CHERI ISA requires a
loadof the full capability word and tag into a register followed by
theCGetTag instruction to query the tag bit. Use of this
mechanismrequires loading all data into caches, despite (as we
measure laterin table 2) fewer than a quarter of cache lines
holding pointersin many applications. To implement CHERIvoke
efficiently, weshould directly exploit tag metadata to eliminate
non-capabilitydata from the sweep to save DRAM bandwidth and power,
and toincrease performance. We propose two new architectural
assistsatop CHERI’s existing, spatial-safety-focused specification
[45]. Insection 6.3, we show that these significantly reduce DRAM
trafficand time consumed by sweeping revocation.
3.4.1 CLoadTags. We introduce a new instruction, CLoadTags,
tothe CHERI architecture, which directly loads tag bits without
load-ing the data from the given address. If CLoadTags returns a
zero,this cache line can be skipped in the sweep because it
contains nocapabilities, thus avoiding DRAM traffic for this
line.
The implementation of CLoadTags requires extensive integra-tion
with the memory hierarchy. First, we require a new memory-request
type that loads only the tags of a cache line. Such requestsrequire
support in the L1 and L2 caches, as well as the tag
controller.Furthermore, the L1 and L2 caches needed to be modified
to be ableto report all tags for a cache line in a single lookup.
In the CHERI-MIPS implementation [45], cache lines are stored
across four banks,so four cycles are required to read the entirety
of the cache line.Storing capability tags with data rules out a
single-cycle responseto a CLoadTags bus request. We therefore
implement CHERI cachesthat store capability tags in a tag metadata
block for each line, asshown in figure 4.
Any cache where the line is held will respond to a CLoadTagsbus
request. If the CLoadTags request misses in all data caches, thetag
controller will respond with only the tags of that line
withoutfetching the corresponding data from DRAM. As this
responsecontains only the tags of a cache line, it is inconvenient
to cache theresult in intervening caches; as a result we
approximate streamingsemantics for CLoadTags requests.
Conveniently, this instructionis likely to be used only when
sweeping memory, and caching itsresponse is unlikely to be helpful.
Future microarchitectures mightconsider prefetching data for a
cache line when CLoadTags returnsa non-zero result from the tag
cache.
3.4.2 Page-table capability dirty (PTE CapDirty) bits. At a
coarserscale, we repurpose a flag from the existing CHERI-MIPS
page-table entries to avoid sweeping entire pages that do not
containcapabilities. This flag is similar to a traditional dirty
flag in page-table entries, although it specifically records the
presence of validcapability writes in a page.3 If CapDirty
indicates that a pageis clean, a store of a word tagged as a
capability will throw anexception, allowing the operating system to
record the presenceof capabilities in that page by marking CapDirty
in that page-table entry. Clean pages will not contain capabilities
and need notbe scanned during a sweep. As with the traditional
dirty flag inpage-table entries, some architectures may maintain
PTE CapDirtyentirely in hardware. This approach has false
positives, as clearingall capabilities in a page will not reset
CapDirty, though the pagecan be marked clean again if found to be
without capabilities onthe next sweep. However, our preliminary
evaluation finds that thefalse-positive rate is negligible for all
the benchmarks we evaluate,as the use of a page generally
determines whether it can and doeshold capabilities; we rarely
encounter pages that alternate betweenholding capabilities and
holding none.
3.5 Opportunities for ParallelismOur description of CHERIvoke so
far has described sweeping aspart of application execution, that
is, a program is paused whilethe sweep occurs. However, sweeping
revocation can be madeindependent of execution and can run
alongside the execution of theprogram. In addition, the sweep
procedure itself is embarrassinglyparallel. The shared revocation
shadow map is read-only during thesweep, and pages to sweep can be
distributed between independentthreads. For this reason, it is not
unreasonable to expect that evena pur- software sweeping routine
could realistically saturate thefull DRAM bandwidth of a
system.
Shadow-mapmaintenance also has convenient concurrency
prop-erties. Updating the shadow map of different memory chunks in
thequarantine buffer may occur in parallel, though care must be
takento prevent race conditions on bit masks within the sameword.
Paint-ing the shadow map may use vector instructions, but is
unlikely torequire this level of optimisation, as explored in
section 6.1.2.
Our evaluation framework using the x86 architecture does
notallow a meaningful measurement of concurrent revocation, so wedo
not explore the implications of parallelism further in this
paper.3The capability-store-inhibit bit, S, is only lightly used in
existing CHERI software. Itssole use is reflecting static
properties of kernel-managed objects, e.g. preventing capa-bility
stores to shared memory segments (potentially violating capability
provenancewithin an address space) or direct mappings of file pages
(because the file system isnot capable of storing tags).
-
MICRO-52, October 12–16, 2019, Columbus, OH, USA Xia, Woodruff,
Ainsworth, et al.
3.6 Role of AllocatorCHERIvoke must invalidate all references to
quarantined memoryavailable to the program. Nevertheless, the
allocator itself musthold references to heap memory, including
quarantined memory, ifit is to later reallocate memory to the
program. CHERIvoke countsthe allocator as part of the trusted
computing base (TCB), and relieson the allocator to enforce
temporal safety. Indeed, the definitionof temporal safety itself is
derived from allocator state. In order topreserve allocator
references, CHERIvokemust distinguish betweenpointers held by the
allocator and pointers issued to the program.While there are
several plausible mechanisms for this preservation,one simple
option is for the allocator to always use whole-heap-spanning
capabilities whose bases are never quarantined.
3.7 Protection GuaranteesCHERIvoke enforces temporal safety for
heap allocations only.Heap allocations have proven to be the most
dangerous and com-mon source of temporal-safety exploits [7], and
stack exploits canbe prevented using other techniques, such as
escape analysis [12].Strictly, CHERIvoke prevents
use-after-reallocation rather than use-after-free, as the program
still holds references to quarantined mem-ory until a revocation
sweep. Nevertheless, CHERIvoke guaranteesthat an allocated object
can be accessed only through referencesderived from the latest
allocation of that memory. While CHERIcould facilitate strict
use-after-free for debugging if a sweep was per-formed on every
free, CHERIvoke is designed to enforce temporalsafety for deployed
user programs. This subset of heap tempo-ral memory safety provides
protection from the vast majority ofexploitable bugs while taking
advantage of buffering to achievereasonable performance [27].
3.8 SummaryCHERIvoke is a technique for temporal safety on
architectures withCHERI support. Dangling pointers can be revoked
by sweepingthrough an application’s memory, to remove references to
deallo-cated locations stored within a quarantine space. We can do
thisbecause CHERI uniquely distinguishes pointer capability
locationsat the architectural level, along with the valid ranges to
which acapability can point. Revocation can be implemented
efficiently byusing a shadow map to indicate invalid capability
pointers; withthe addition of new hardware support, PTE CapDirty
bits andCLoadTags instructions, we can limit the memory that needs
to beswept to include only cache lines that contain pointers.
CHERIvoke’s quarantine bufferwith shadow-map strategy
achievesoverheads that we now show are far lower than existing
systems inpractice, and further, can be easily understood and
accounted for.
4 CHERI BENEFITSCHERIvoke relies on the CHERI capability
architecture to provideprecise pointer identification, spatial
enforcement, and efficientpointer-location metadata. These
mechanisms enable the propertiesdiscussed below.
4.1 Efficient and Precise RevocationIn programs compiled to
CHERI’s pure capability mode, all pointersare tagged as
capabilities to distinguish them from data. CHERI
therefore eliminates conservative pointer classification that
causesintegers to be misclassified as pointers in garbage
collection [6] andother techniques [12]. Conversely, clearing the
tag of a capabilityon revocation completely prevents its use for
referencing memory.
Furthermore, CHERIvoke relies on CHERI bounds enforcementto
ensure that capabilities to the heap are easily attributed to
exactlyone allocation. Specifically, the base of any heap
capability mustremain within the original allocation, even as the
pointer addresscan wander out of bounds. This relies on the
property that thebounds of capabilities cannot be expanded, and
that there is nomechanism to “fuse” adjacent objects into one
capability, whichcould then reference multiple allocations with
different lifetimes.
As CHERI identifies references with certainty and associatesthem
uniquely to allocations, even a simple system can
correctlyinvalidate references to quarantined memory knowing that
it willnot affect the behaviour of a correct program.
4.2 Full Memory SafetyCHERI capabilities provide spatial safety
and unforgeability: that is,all memory accesses must be within the
bounds of their allocation,and capabilities to other allocations
cannot be fabricated. Basedon these properties, CHERIvoke can
completely prevent access toheap allocations after revocation, even
in the face of adversarialprograms. As capabilities are easily
identified and trivially associ-ated with their original
allocation, we can reliably identify danglingpointers, and pointers
can never be hidden from CHERIvoke with-out destroying their
ability to ever reference memory.
CHERIvoke thus completely prevents even adversarial programsfrom
accessing deallocated memory after a revocation sweep.
Con-sequently, the temporal-security guarantees in the presence of
capa-bilities are significantly stronger than in previous work [26,
27, 41].
4.3 Efficient Pointer SearchCHERI’s architecturally visible
capability tags not only enable iden-tification of pointer words,
but can even detect the presence ofpointers in memory. Optimised
tag storage in current CHERI proto-types enables CLoadTags to
eliminate non-pointer data, reducingwork by limiting the revocation
sweep to regions that may containdangling pointers.
5 EXPERIMENTAL SETUP5.1 SystemsThe systems we use in our
evaluation are shown in table 1. Inaddition to evaluation of our
hardware extensions on the CHERIFPGA platform [45], we have
designed experiments to evaluateCHERIvoke revocation on a modern
x86-64 machine to establishperformance expectations for a wide
deployment of mature CHERIimplementations. Memory-sweeping
performance depends heavilyon the microarchitecture. These
experiments allow us to charac-terise revocation using
state-of-the-art memory systems, vectorextensions, and out-of-order
superscalar hardware. We simulate theexistence of capabilities in
these experiments using conservativepointer estimation, as used by
garbage collectors [6], consideringany 64-bit integer that is a
valid virtual address to be a pointer. Eval-uating on a mature x86
platform also provides higher applicationcoverage, as the current
CHERI prototype implements the 64-bit
-
CHERIvoke: Characterising Pointer Revocation using CHERI
Capabilities for Temporal Memory Safety MICRO-52, October 12–16,
2019, Columbus, OH, USA
System Specification
x86-64 Intel Core i7-7820HK CPU, 2.9GHz, 4 cores 8 threads,
8MiBLLC, 14–18 stage out-of-order superscalar pipeline,
AVX2support, 16GiB DDR4 2400, FreeBSD 12.0
CHERI Stratix IV FPGA, 100MHz, single core, 256KiB LLC,
6-stagein-order scalar pipeline, 1GiB DDR2
Table 1: System setup for processors used in the evaluation.
MIPS instruction set that lacks ports for many applications
andbenchmarks. Because the toolset for CHERI is based on FreeBSD,
wealso run FreeBSD on our x86 system to ensure uniformity,
thoughresults apply to any operating system.
To measure the impact of our new hardware additions, we ex-tend
a 64-bit CHERI core and cache subsystem to implement theCLoadTags
instruction, and add PTE CapDirty support to our pro-totype
operating system. To measure their impact on performance,we perform
revocation sweeps on the CHERI FPGA implementa-tion over
application memory dumps taken from our x86 system,allowing us to
measure data elimination for applications that arenot yet able to
execute natively on the CHERI-MIPS architecture.
5.2 dlmalloc_cherivokeWe have implemented dlmalloc_cherivoke as
an extension ofdlmalloc [25], a classic allocator that remains in
wide use. Thismodified allocator maintains a quarantine buffer
proportional toheap size, and also maintains the corresponding
shadow map. Callsto free() insert allocations into a quarantine
buffer that uses thedlmalloc constant-time algorithm for
aggregating contiguous al-locations. When a certain proportion of
the heap is in quarantine,dlmalloc_cherivoke logs a simulated sweep
event and returns allchunks in the quarantine buffer to the
internal free list. As a resultof aggregation, the number of
internal frees may be much smallerthan the number of frees without
quarantine.
To implement the shadow map, each mmap() call is accompaniedby a
smaller mapping at a fixed transform from the original alloca-tion.
This allows the sweeping procedure to index the shadow mapfor any
heap allocation, by shifting the pointer by a fixed amountand
adding to the base of the shadow map. As dlmalloc alignsallocations
to at least 16-byte boundaries (128 bits), each shadowmap is 1128
of the primary allocation. When a region is unmapped,its
corresponding shadow map is also unmapped.
Besides shadow-map allocation, dlmalloc_cherivoke
delaysshadow-space operations until a simulated sweep is triggered.
Be-fore a sweep event, we traverse the quarantined chunks in the
bufferand set shadow-map bits for each. After a sweep event, these
bitsare cleared. We have optimised the shadow-map painting
proceduresuch that large and aligned contiguous regions use byte,
half-word,word, and double-word store instructions when possible,
ratherthan setting individual bits.
5.3 Sweeping Costdlmalloc_cherivoke evaluates all overheads
besides the revoca-tion sweep itself. In order to accurately model
a CHERI revocation
loop, pointers must be architecturally visible. Simulating this
visi-bility requires memory state to be preprocessed, which
preventsaccurate performance modeling during execution. To capture
mem-ory state, we dump the core image periodically when the
quarantinebuffer is full and a sweep would have been triggered. We
preprocessthe memory image to identify all virtual addresses that
lie withinregions of the core dump, and zero all non-pointer words.
Thisallows a test against zero to simulate the ability to test the
capabil-ity tag in a true CHERI system. The core dump also
preserves therevocation shadow map, which is used during the
sweep.
We simulate a sweep that uses PTE CapDirty optimisations
(sec-tion 3.4.2) to eliminate non-pointer data at a page
granularity, butthat does not use the CLoadTags instruction. While
page elimina-tion can be modelled sufficiently on a standard
microarchitecture,CLoadTags is difficult to model due to its
interaction with taggedmemory and a tag cache. As a result, our
performance numbersare a pessimistic estimation of the full
optimisations possible onCHERI. Our sweep procedure simulates a
system API that returnsan array of pages that could contain
capabilities4 according to PTECapDirty flags.
To evaluate the overall cost, we perform revocation sweeps onten
sample core dumps from across each application’s execution.5We then
multiply the average sweep time by the total number ofsweep events
to derive the total sweeping cost for that execution.
5.4 BenchmarksTo evaluate CHERIvoke, we are interested in both
worst-case andaverage-case overhead. To do this, we evaluate on
benchmarkstaken mostly from SPEC CPU2006 [21], in line with other
papersin the literature [12, 27, 41]. The subset we evaluate
includes thethree most allocation-intensive workloads [41]: dealII,
omnetpp,and xalancbmk. We also include all other SPEC CPU2006
bench-marks that would compile under the 64-bit FreeBSD setup
neces-sary to use our current CHERI infrastructure: astar, bzip2,
gobmk,h264ref, hmmer, lbm, libquantum, mcf, milc, povray, sjeng,
soplex,and sphinx3. In each case, we evaluate on the reference
input. Wefurther add ffmpeg, which has a larger allocation
throughput thanany SPEC benchmark and is useful to more fully
account for worst-case application behaviour. We take the average
of 5 runs for eachbenchmark.
6 EVALUATIONThe overall observed overhead of CHERIvoke is shown
in fig-ure 5, compared with other temporal-safety techniques in the
lit-erature [6, 12, 27, 41] that do not make use of CHERI
capabilities.For a target 25% heap storage overhead in the
quarantine buffer,we achieve an average 4.7% execution time and
12.5% total mem-ory overhead. This significantly outperforms any
other technique.Further, CHERIvoke performs far more reliably, with
only 1.51×and 1.35× maximum runtime and memory overheads.
CHERIvoke
4A similar API, GetWriteWatch(), is implemented in Windows to
return the list ofpages that have been written since last reset to
accelerate garbage collection andlanguage runtimes [23].5Collecting
more than ten core dumps per application increased evaluation time
butwas not found to improve the accuracy of results. Sweep time for
each core dump isaveraged over 20 sweeps.
-
MICRO-52, October 12–16, 2019, Columbus, OH, USA Xia, Woodruff,
Ainsworth, et al.
1
1.2
1.4
1.6
1.8
2
astar
bzip2
dealII
gobmk
h264re
fhm
mer
lbm
libquan
tum mcf
milc
omnet
pppov
ray sjeng
soplex
sphinx
3
xalanc
bmk
geome
an
Norm
alis
ed E
xecution T
ime
CHERIvoke Oscar pSweeper2.9 4.6 4.1
DangSan Boehm-GC4.2 9.4 9.7 3.8 14.4 2 31.6 2.57.5
(a) Execution Time
0
1
2
3
4
5
astar
bzip2
dealII
gobmk
h264re
f
hmme
rlbm
libquan
tum mcf
milc
omnet
pppov
ray sjeng
soplex
sphinx
3
xalanc
bmk
geome
an
No
rma
lise
d M
em
ory
Utiliz
atio
n
226.5 135
(b) Memory. The dashed line shows CHERIvoke’s default quarantine
size at 25% of the heap.
Figure 5: Overheads for CHERIvoke, compared with results
reported by other state-of-the-art techniques.
0.95
1
1.05
1.1
1.15
1.2
1.25
1.3
ffmpegast
arbzi
p2dea
lIIgob
mk
h264re
fhm
merlbm
libquan
tum mcfmil
c
omnet
pppov
raysjengsop
lex
sphinx
3
xalanc
bmk
geome
an
Norm
alis
ed E
xecution T
ime
CHERIvoke with quarantine buffer only+ shadow space+
sweeping
1.51
Figure 6: Decomposition of run-time overheads ofCHERIvoke, with
the default 25% heap overhead.
has significantly more predictable behaviour regardless of
work-load, as its sweeping technique suffers none of the worst
casesencountered by more complex temporal-safety schemes:
overheadsare proportional to memory freed and pointer density,
rather thanpointer movement, number of frees, loads per second, or
memorylayout. In addition, CHERI enforces the strongest safety
guarantees:construction of references to freed memory is impossible
by anymeans after revocation.
6.1 Breakdown of OverheadsFigure 6 shows overheads for
successively adding constituent partsof CHERIvoke, beginning with
quarantining freed memory, addingshadow-map maintenance, and,
finally, full-memory sweeps. While
memory sweeping is usually the dominant overhead, we
discovernotable exceptions that are discussed below.
6.1.1 Quarantine buffer. Many existing allocators, including
dl-malloc, attempt to reuse freed memory as quickly as possible
toimprove cache performance. dlmalloc_cherivoke, however,
intro-duces a quarantine buffer where freed memory is detained,
missingthe opportunity to reuse cached memory.
The quarantine buffer has negligible impact onmost
benchmarks.For xalancbmk, however, the quarantine buffer increases
executiontime by 22%. Performance counters confirm that instruction
countonly grows by 3%, but level-2 cache misses grow by 50%.
Whilefavorable deallocation patterns allow us to simply move to
fresh,unquarantined cache lines, xalancbmk has a combination of
smallallocations, a high allocation throughput (nearly 1 million
per sec-ond according to table 2), and temporal fragmentation.
Temporalfragmentation occurs when objects with very different
lifetimesare interspersed on the heap, leaving holes of quarantined
memoryin cache lines that are still in use. This suggests that a
CHERIvokememory allocator might attempt to group objects of similar
life-time. Nevertheless, we discover in section 6.4 that increasing
thequarantine-buffer size consistently improves cache
performancefor xalancbmk.
The quarantine buffer actually improves performance in most
ofthe benchmarks. One reason for this is batching and
aggregatingcalls to free. DealII, for example, has 630,000 calls to
free per second(see table 2), constituting a significant amount of
execution time.dlmalloc_cherivoke quarantines these allocations at
typicallyless than half the execution time of a real free. If these
freed regionsaggregate well, many fewer free operations will be
performed whenthe quarantine buffer is drained than would have been
performed
-
CHERIvoke: Characterising Pointer Revocation using CHERI
Capabilities for Temporal Memory Safety MICRO-52, October 12–16,
2019, Columbus, OH, USA
Benchmark Pages withpointersFree rate(MiB/s)
Frees(thousands/s)
ffmpeg 4% 1268 44astar 62% 24 27bzip2 0% 0 ≈ 0dealII 70% 40
498gobmk 54% 1 1h264ref 9% 3 1hmmer 4% 17 12lbm 0% 5 ≈ 0libquantum
1% 5 ≈ 0mcf 46% 53 ≈ 0milc 3% 224 ≈ 0omnetpp 95% 175 1027povray 19%
1 17sjeng 24% 0 ≈ 0soplex 23% 287 2sphinx3 18% 33 30xalancbmk 86%
371 811
Table 2: Deallocation metadata from applications.
on demand. While this effect is minor, many of the
benchmarksthat gain advantage from the quarantine buffer do not
experiencea net overhead for full temporal safety.
6.1.2 Shadow-map maintenance. CHERIvoke also requires
mainte-nance of the revocation shadow map (the second bar in figure
6).While the size of the shadow map is small compared to the
heapitself, and the quarantined portion is even smaller, the
overhead ofpainting is hard to predict due to sensitivity towards
the alignmentand size of allocations. Nevertheless, the net impact
of shadow-space maintenance is minor for all applications
benchmarked.
6.1.3 Sweeping overhead. Where CHERIvoke has significant
exe-cution time overhead, the largest cost is in memory sweeping.
Ofthe four benchmarks in figure 6 that have overheads beyond
5%,dealII, omnetpp and soplex are dominated by sweeping overheadand
xalancbmk is a special case, as discussed above. Sweeping costis
predictable and can be described mathematically.
A memory sweep will be initiated when the amount of mem-ory
freed reaches the current size of the quarantine buffer, andthus
the frequency of sweeping is directly proportional to
theQuarantineSize and the FreeRate (in MB/s). This relation
allowsus to analytically derive an estimation for runtime overhead
for asingle-threaded implementation:
RuntimeOverhead ≈ FreeRate · PointerDensityScanRate ·
QuarantineFraction
This equation assumes that we must only sweep the proportionof
memory that contains pointers, PointerDensity, which is at a
pagegranularity for this experiment. This equation also assumes
thequarantine-buffer size to be a fixed proportion of the total
memoryspace (rather than of the heap), which is a rough
approximation ifthe heap is large. Nevertheless, this equation
provides an intuitivemodel for the cost of sweeping using
CHERIvoke.
The numerator, (FreeRate · PointerDensity), constitutes an
app-lication-specific cost factor. A low throughput for frees, or
con-versely a low pointer density, will result in a low sweeping
cost forCHERIvoke. In the denominator, the ScanRate is a function
of thememory bandwidth of the system and the efficiency of the
sweep-ing loop, and QuarantineFraction is a tunable property to
balanceperformance and memory consumption.
0
2000
4000
6000
8000
10000
ffmpegast
ardea
lIIgob
mk
h264re
fhm
mer
mcf
milc
omnet
pppov
raysop
lex
sphinx
3
xalanc
bmk
geome
an
DR
AM
Bandw
idth
(M
iB/s
)
Simple loopUnrolling + manual pipelining
AVX2
Figure 7: Memory bandwidth achieved for the sweep loopwith
different optimisations. The system’s full read band-width is
19,405MiB/s.
This analytical model, along with the data in table 2, allowsus
to understand the sweeping overheads measured in figure 6.Xalancbmk
and omnetpp have significant free rates and pointerdensities over
85%, followed by dealII and soplex, whose pointerdensities are 70%
and 23% respectively. These four are indeed theonly benchmarks with
over 5% execution time overhead, as sug-gested by the model. Ffmpeg
has a very high free rate, but a lowpointer density, such that
sweeping overhead does not break 5%.
6.2 Sweeping-Loop OptimisationThe speed of the memory sweep is
critical to the performance ofCHERIvoke. In figure 7, we evaluate
the performance of severalimplementations of our sweeping-procedure
kernel on each bench-mark that features significant deallocation. A
CHERIvoke sweepmight approach the 19,405MiB/s read bandwidth of the
system ifthe procedure is not compute bound, and if the indirect
shadowlookup is entirely cached. We find that a naïve sweeping loop
(pre-sented in red) utilises only 28% of read bandwidth on average,
andunrolling and manually pipelining the loop for better
schedulingachieves 32%. We were able to fully vectorise the loop
using AVX2to sweep an entire cache line in 28 instructions,
achieving 39% ofthe read bandwidth on average, but required an
unconditional storeto possibly clear dangling pointers, limiting us
to memory copyperformance. The performance of the AVX2 loop is
roughly con-stant at almost 8GiB/s. AVX2 is not always the fastest;
in hmmerand sphinx3 our vectorised implementation cannot compete
withthe unrolled loop. Mcf and milc see lower bandwidth
utilisation, astheir small, infrequent sweeping loops do not reach
full throughput.Since none of these cases are allocation intensive,
these outliers donot have a significant performance impact in
figure 6.
6.3 Hardware OptimisationsBecause of the new hardware
optimisations introduced in sec-tion 3.4, we need not sweep all of
memory. Two mechanisms avoidreading segments of memory without
pointers: PTE CapDirty bits(section 3.4.2), which remove the need
to scan through pages with-out capabilities, and the more
fine-grained CLoadTags instruction(section 3.4.1), which allows us
to skip cache lines with no tag-bitsset. The results of our
evaluations are shown in figure 8.
-
MICRO-52, October 12–16, 2019, Columbus, OH, USA Xia, Woodruff,
Ainsworth, et al.
0
0.2
0.4
0.6
0.8
1
ffmpe
gas
tarbz
ip2de
alII
gobm
k
h264
ref
hmme
rlbm
libqua
ntummc
fmi
lc
omne
tpppo
vraysje
ngso
plex
sphin
x3
xalan
cbmk
Mem
ory
Sw
eep P
roport
ion
PTE CapDirty CLoadTags
(a) Proportion of memory that needs to be swept for specific
bench-marks, with work reduction both on a page-table granularity
(PTECapDirty) and on a cache-line granularity (CLoadTags).
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Norm
alis
ed E
xecution T
ime
Pointer Density (at a Page or Cache Line Granularity)
CLoadTagsPTE dirtyIdealised
(b) Normalised execution time for sweeping through memory
withthe addition of PTE dirty bits to exclude capability-free
pages, andCLoadTags instructions to exclude capability-free cache
lines. Eachis plotted versus their target granularities: PTE dirty
is plottedagainst page density, andCLoadTags against cache-line
density. Thedotted line shows the ideal improvement from each
technique.
Figure 8: Impact of the hardware optimisations from sec-tion 3.4
on both amount of memory that needs to be swept,and on resultant
execution time as measured on CHERI.
Figure 8(a) shows the proportion of memory that must be
sweptunder each optimisation, derived from the densities of
capabilitiesboth at the cache line and the page granularities. In
most cases,the PTE CapDirty bits in the page table are sufficient
to reach theachievable reduction in work, though there are several
workloadswhere CLoadTags instructions allow a significant further
reduc-tion. Figure 8(b) shows how these mechanisms map to
performanceimprovements on our CHERI FPGA hardware. We see that
PTECapDirty bits get close to an ideal performance improvement,
inthat the blue line is very close to the dotted x = y line, and
sothe effect of not having to walk through pointer-free pages
corre-sponds directly to a performance improvement. Performance
withCLoadTags (orange) is more complex: though it can capture
morefine-grained density data, and therefore theoretically reduce
theamount of work more, its performance in practice is less
close-to-ideal and can even lower performance. This reflects the
larger
1
1.5
2
2.5
3
0 20 40 60 80 100 120 140 160 180 200
Norm
alis
ed E
xecution T
ime
Heap Overhead (%)
XalancbmkOmnetpp
Figure 9: Normalised execution time for the two workloadswith
highest overheads, at varying heap overhead. Defaultsetup shown by
dotted line.
amount of work necessary to exploit this more fine-grained
infor-mation. To determine if a cache line of 8 pointers can be
skipped,CLoadTags must query the L1 and L2 caches and reach the
tagcache of the system (around 10 cycles round trip in our FPGA
im-plementation), and perform an unpredictable branch. In
contrast,the PTE CapDirty implementation can skip a page of 256
point-ers by inspecting page metadata. In practice, both
coarse-grained(PTE CapDirty) and fine-grained (CLoadTags)
optimisations arenecessary for optimal work reduction.
6.4 Sweep-Frequency Trade-OffsTime and space overheads can be
traded off for one another inCHERIvoke. To see the extent of this,
we re-evaluated xalancbmkand omnetpp, our workloads with the
highest overheads at defaultsettings, with different target
heap-space overheads. The results ofthis are shown in figure 9. We
see that the higher the heap overheadwe are willing to tolerate,
the less of a performance impact we willobserve, even on highly
allocation-intensive workloads.
There are two reasons for this. The first is that if we are
willingto tolerate a higher heap overhead, deallocations can be
left in quar-antine for longer, and so we sweep proportionately
less often as aresult. This accounts for the majority of the
performance increasewe see with larger quarantine buffers, as most
of the overheadof CHERIvoke is brought about via the sweeping
procedure. Thesecond is more subtle: for xalancbmk, by the time we
reach 100%heap overhead, the normalised execution time is actually
lowerthan the non-sweeping costs alone in figure 6. We found a
con-sistent reduction in non-sweeping overheads corresponding to
anincrease in observed cache hit rate for the program as we moved
tolarger quarantine buffers. This counterintuitive result is caused
bybetter allocation-fragmentation properties as we increase the
heapsize: under severe temporal fragmentation, it is better to
quaran-tine memory for longer to allow cache lines to fall entirely
out ofuse rather than frequently releasing small fragments in a
severelyfragmented heap.
6.5 Sweeping-Traffic OverheadThe results in figure 10 show the
extra traffic generated from sweep-ing. We use Intel performance
counters [43] to report the “off-core”
-
CHERIvoke: Characterising Pointer Revocation using CHERI
Capabilities for Temporal Memory Safety MICRO-52, October 12–16,
2019, Columbus, OH, USA
0
2
4
6
8
10
12
14
16
18
ffmpe
gas
tarbz
ip2de
alII
gobm
k
h264
ref
hmme
rlbm
libqu
antum mc
fmi
lc
omne
tpp
povra
ysje
ngso
plex
sphin
x3
xalan
cbmk
Tra
ffic
Overh
ead (
%)
Figure 10: Off-core-traffic overhead.
traffic, which on our benchmark machine indicates the traffic
tothe shared L3 cache and above. We find that extra traffic
utilisa-tion is either comparable to (dealII) or significantly
lower than(omnetpp, soplex, xalancbmk) the performance overhead.
This isunsurprising: CHERIvoke only pays overhead on workloads
thatare allocation intensive, and workloads that are allocation
intensivetend to be memory-bandwidth intensive, rendering
CHERIvokesweeping overheads less significant by comparison.
We can use this information to make judgements about the im-pact
CHERIvoke has both on energy consumption and performanceon
multicores. In effect, energy-consumption overhead should
scalecomparably to performance overhead, as the additional factor,
off-core traffic, and thus DRAM traffic, is comparable or smaller.
Sim-ilarly, accesses to the shared L3-cache resource outside the
core,which will affect performance of other applications running on
amulticore, are typically minimal, and in allocation-intensive
envi-ronments comparable to, though lower than, performance
overhead.
6.6 SummaryCHERIvoke significantly outperforms any other system
designedto provide strong performance guarantees [6, 12, 27, 41],
both inaverage (4.7% runtime and 12.5% overall memory overheads)
andworst case (51% performance and 35% memory). These
overheadstypically come from the sweeping procedure, which is a
small codekernel that can be heavily optimised using vector
instructions, andthe cost of which can be analytically understood
in simple termsof volume of freed data and density of pointers in
memory. Ourhardware extensions, CLoadTags and PTE CapDirty, both
serve tosignificantly reduce the amount of work performed by
CHERIvoke.Where performance overhead is high, memory can be traded
tomeet the target performance.
7 RELATEDWORK7.1 Revocation TechniquesRevocation techniques that
do not make use of hardware capabili-ties have been explored. These
include DangSan [41], DangNull [26],FreeSentry [48] and PSweeper
[27]. These use the compiler to dis-ambiguate pointers from data,
add code for each pointer creationthat inserts the pointer to a
per-allocation list, and nullify all en-tries when data is freed.
However, this per-allocation list is highlyperformance- and
storage-intensive, which makes these techniques
infeasible for allocation-heavy workloads. Additionally,
pointerscan be hidden, so such techniques cannot guarantee temporal
safety.
With CHERI, we can disambiguate pointers at run-time withoutany
additional metadata, by using 1-bit tag metadata [22]. Thismeans
that we can instead sweep through memory to nullify anydangling
pointers, avoiding the large memory and performanceoverheads
associated with this complex metadata. It also meansthat the
compiler need not be involved: the only change required isfor the
free method to add the quarantine list. CHERI also innatelyprevents
hidden pointers, so can guarantee temporal safety.
BOGO [49], like CHERIvoke, builds temporal safety on top
ofspatial safety, in this case, Intel MPX. Due to a lack of a
quarantinebuffer for batching and due to the complex MPX table
structure,BOGO’s overheads are significantly higher than CHERIvoke:
onSPEC CPU2006, CHERIvoke pays 4.7% average overhead and 50%worst
case, whereas BOGO pays 60% average and 1,616% worstcase.
7.2 Page-Table TechniquesDangling pointers can be prevented from
being used via protectionat the granularity of the page table, by
poisoning regions of mem-ory upon a free. This is the technique
used by Electric Fence [1].Dhurjati and Adve [15] extend the
technique to allow reuse of theunderlying physical address to
reduce overheads by aliasing virtualpages, and Dang et. al [12]
present Oscar, which better supportsconcurrency and looks at more
common workloads.
Page granularity can achieve low overheads when allocationsare
large. However, frequent small allocations can cause perfor-mance
and memory overheads to increase enormously, as eachallocation must
be given its own virtual page, as well as increasingTLB pressure,
causing significant slowdown.
7.3 Garbage CollectionGarbage collection solves the problem of
use-after-frees by theinverse of pointer nullification: it prevents
data from being freeduntil all references are removed. Examples of
garbage collectionused for this approach include FailSafe-C [35]
and CCured [33].
As pointers can be hidden in low-level languages such as C
andC++, this makes safe garbage collection a challenge [4, 5, 17]:
wecannot trade the security issue of temporal safety for a
program-safety issue of premature deletion of still-needed data.
However, inthe CHERI architecture, pointers cannot be hidden, as
all memoryaccesses occur by unforgeable capabilities that can be
distinguishedfrom other data by tags. This means that CHERI avoids
both thesafety issue and any pointer aliasing from conservative
garbage-collection techniques.
In addition, garbage collection suffers from two weaknesses
thatCHERIvoke does not. Because references to pointers may exist
untillong after the data is no longer being used, garbage
collectors cansuffer significant memory overhead evenwith frequent
mark-sweepprocedures. To counteract this, techniques such as the
Boehm-Demers-Weiser garbage collector [6] also allowmanual
deallocationof objects. This means that use-after-free and
use-after-reallocateviolations can still occur in high-performance
garbage collectors.
The second issue is related to performance. A garbage
collec-tor’s marking procedure is significantly slower than
CHERIvoke’s
-
MICRO-52, October 12–16, 2019, Columbus, OH, USA Xia, Woodruff,
Ainsworth, et al.
sweeping procedure, as marking involves a complex and
memory-irregular graph search through each allocation, whereas
sweepingcan be performed at close to the rate of memory bandwidth
via asimple, easy-to-optimise loop. Further, with CHERIvoke we
knowprecisely how much memory can be reclaimed by each
sweepingprocedure, as this is supplied by the programmer with their
manualdeallocations. This means we can optimise by calling
sweepingprocedures only when there is sufficient useful work to be
done (inour case, when the quarantine buffer is 25% of the rest of
the heap),vastly reducing overheads without increasing memory
usage.
Moreover, for many programmers, the malloc/free model is sim-ply
familiar. The semantics of explicitly managing memory is
wellunderstood and acceptable to a large class of programmers,
asexemplified by the C and C++ communities. Existing
codebases,especially legacy C and C++, need extra care to be ported
to a GCmodel to function well under reasonable memory and
performanceoverhead. On the contrary, malloc/free with sweeping
revocationprovides temporal safety without perturbing
memory-allocationsemantics, as well as having much more predictable
memory andperformance overhead — as this paper demonstrates.
7.4 Partial Temporal SafetyTechniques to reduce (but not
eliminate) temporal-safety bugs haveseen use both in academia and
in practice. Cling [2] reduces theclasses of use-after-free bugs
that can be exploited by promotingtype-safe reuse of pointers,
based on size and call site, to reduce theprovenance of reused
memory, along with a more general delay-of-use to prevent memory
exhaustion at the expense of security.Other techniques that use a
delay-of-reuse technique, to make itharder for an attacker to
reallocate data that is falsely freed but stillin use, include
DieHard [3], DieHarder [34], and FreeGuard [39].
7.5 Detection MethodsRuntime protection can fully guarantee
temporal safety. However,some protection can also be brought about
by detection methodsdesigned to debug applications. An example of
this strategy is Ad-dressSanitizer [37], which poisons deallocated
regions to flag upany accesses to them. The performance loss as a
result is substan-tial, and so software AddressSanitizer can only
be used in a debugsetting. However, hardware acceleration of memory
debug is im-plemented in the Sparc M7’s ADI technique [24, 36] and
in ArmMTE [18]. These use a small number of shadow bits to tag
pointers,such that accesses to a region de- or re-allocated and
tagged with adifferent bit value will fail. However, the small
number of bits inthese tags means that a motivated attacker can
exhaust the space,to reallocate data with the correct tag. These
techniques are there-fore only suitable for runtime fault reporting
rather than security.Another detection method is Undangle [7],
which finds danglingpointers within a program at run-time, at the
cost of false positives,since dangling pointers themselves may not
result in future use.
7.6 Tagged MemoryCHERI [45] is just oneway of using taggedmemory
to improve secu-rity or debug properties of a system. Other uses
include annotatingaddress validity, version numbers, object types
and ownership [20].While CHERI uses one bit per capability-aligned
region to prevent
arbitrary changes of capabilities, other techniques use
multiplebits to provide memory versioning. These include SPARC ADI
[36]and Arm MTE [18]. Another tagged-memory debug technique,AArch64
HWASAN, combines memory tagging with a modifiedcompiler toolchain
for a hardware-assisted AddressSanitizer-likescheme [37, 38], by
utilising unused top bits in pointers as memorytags to detect stale
references.
CETS [32] uses word-length unique tags for memory accesses,such
that a memory access will fail if the tag does not match the
allo-cated region. This means that pointers are as large as
CHERI’s, butat the same time a large false-positive rate is
suffered due to pointerhiding, which is valid in non-CHERI C.
Unlike in CHERI, spatialsafety cannot be guaranteed, and as there
is no hardware supportfor CETS, it results in a significant
performance loss. Watchdog [31]uses unique pointer and allocation
identifiers to provide tempo-ral safety in hardware: for the
benchmarks in common betweenWatchdog and CHERIvoke, Watchdog pays
17% average overhead,whereas CHERIvoke pays less than 1%.
8 CONCLUSIONWe have shown that it is possible to enforce
temporal safety onmodern systems with hardware capability support
at low over-head. CHERIvoke, a technique that sweeps through memory
tofind architecturally visible capability pointers, and uses an
efficientrevocation shadow map to identify those that need to be
revoked; itcan achieve performance overheads of under 5% for a 25%
heap sizeincrease, and these can be traded off to match system
requirements.
Our presentation of CHERIvoke considers only the
fundamentalmechanisms necessary for high-performance temporal
safety; fullimplementations could be optimised further. Techniques
such asreuse of physical addresses for page-size deallocations
[12], type-based reuse of allocation data [2], and delaying of
revocation byreusing locations over multiple MTE-style history bits
[18] all havethe potential to combine with CHERIvoke to make strong
memory-safety properties cheap enough in all cases to become
ubiquitousin all future systems.
AcknowledgementsApproved for public release; distribution is
unlimited. This work ispart of the CTSRD and ECATS projects
sponsored by the DefenseAdvanced Research Projects Agency (DARPA)
and the Air Force Re-search Laboratory (AFRL), under contracts
FA8750-10-C-0237 andHR0011-18-C-0016. The views, opinions, and/or
findings containedin this paper are those of the authors and should
not be interpretedas representing the official views or policies,
either expressed orimplied, of the Department of Defense or the
U.S. Government. Thiswork was also supported by the Engineering and
Physical SciencesResearch Council (EPSRC), through grant references
EP/K026399/1,EP/P020011/1, and EP/K008528/1 and by Arm Limited and
Google,Inc. We would like to acknowledge the contributions of John
Bald-win, Matthias Boettcher, David Chisnall, Brooks Davis,
LawrenceEsswood, Alexandre Joannou, Lucian Paul-Trifu, Stacey Son,
andHugo Vincent. Additional data related to this publication is
availablein the data repository at
https://doi.org/10.17863/CAM.42436.
https://doi.org/10.17863/CAM.42436
-
CHERIvoke: Characterising Pointer Revocation using CHERI
Capabilities for Temporal Memory Safety MICRO-52, October 12–16,
2019, Columbus, OH, USA
REFERENCES[1] 2015. Electric Fence.
https://elinux.org/index.php?title=Electric_Fence[2] Periklis
Akritidis. 2010. Cling: AMemory Allocator to Mitigate Dangling
Pointers.
In USENIX Security.[3] Emery D. Berger and Benjamin G. Zorn.
2006. DieHard: Probabilistic Memory
Safety for Unsafe Languages. In PLDI.[4] Hans-J. Boehm. 1996.
Simple Garbage-Collector-Safety. In PLDI.[5] Hans-J. Boehm and
David Chase. 1992. A Proposal for Garbage-Collector-Safe C
Compilation. Journal of C Language Translation 4, 2 (1992).[6]
Hans-Juergen Boehm and Mark Weiser. 1988. Garbage Collection in an
Uncoop-
erative Environment. Softw. Pract. Exper. 18, 9 (1988).[7] Juan
Caballero, Gustavo Grieco, Mark Marron, and Antonio Nappa. 2012.
Un-
dangle: Early Detection of Dangling Pointers in Use-after-free
and Double-freeVulnerabilities. In ISSTA.
[8] Oliver Chang. 2016. Racing MIDI messages in Chrome.
https://googleprojectzero.blogspot.com/2016/02/racing-midi-messages-in-chrome.html
[9] Oliver Chang. 2016. Racing MIDI messages in Chrome.
https://googleprojectzero.blogspot.com/2016/02/racing-midi-messages-in-chrome.html.
[10] Shuo Chen, Jun Xu, Emre C. Sezer, Prachi Gauriar, and
Ravishankar K. Iyer. 2005.Non-control-data Attacks Are Realistic
Threats. In SSYM.
[11] The MITRE Corporation. 2018. CWE-416: Use After Free.
https://cwe.mitre.org/data/definitions/416.html
[12] Thurston H.Y. Dang, Petros Maniatis, and David Wagner.
2017. Oscar: A PracticalPage-Permissions-Based Scheme for Thwarting
Dangling Pointers. In USENIXSecurity.
[13] Brooks Davis, Robert N. M. Watson, Alexander Richardson,
Peter G. Neumann,Simon W. Moore, John Baldwin, David Chisnall,
James Clarke, Nathaniel WesleyFilardo, Khilan Gudka, Alexandre
Joannou, Ben Laurie, A. Theodore Markettos,J. EdwardMaste,
AlfredoMazzinghi, Edward Tomasz Napierala, RobertM. Norton,Michael
Roe, Peter Sewell, Stacey Son, and Jonathan Woodruff. 2019.
CheriABI:Enforcing Valid Pointer Provenance and Minimizing Pointer
Privilege in thePOSIX C Run-Time Environment. In ASPLOS.
[14] Jack B. Dennis and Earl C. Van Horn. 1966. Programming
semantics for multi-programmed computations. Commun. ACM 9, 3
(1966).
[15] Dinakar Dhurjati and Vikram Adve. 2006. Efficiently
Detecting All DanglingPointer Uses in Production Servers. In
DSN.
[16] R. Kent Dybvig, David Eby, and Carl Bruggeman. 1994. Don’t
stop the BIBOP:Flexible and Efficient Storage Management for
Dynamically-Typed Languages.Technical Report 400. Indiana
University School of Informatics, Computing, andEngineering.
[17] John R. Ellis and David L. Detlefs. 1994. Safe, Efficient
Garbage Collection forC++. In CTEC.
[18] Matthew Gretton-Dann. 2018. Arm A-Profile Architecture
Develop-ments 2018: Armv8.5-A.
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-a-profile-architecture-2018-developments-armv85a
[19] Richard Grisenthwaite. 2019. Supporting the UK in becoming
a leadingglobal player in cybersecurity.
https://community.arm.com/blog/company/b/blog/posts/supporting-the-uk-in-becoming-a-leading-global-player-in-cybersecurity
[20] Richard H. Gumpertz. 1981. Error Detection with Memory
Tags. Ph.D. Dissertation.Carnegie Mellon University.
[21] John L. Henning. 2006. SPEC CPU2006 Benchmark Descriptions.
SIGARCHComput. Archit. News 34, 4 (2006).
[22] A. Joannou, J. Woodruff, R. Kovacsics, S. W. Moore, A.
Bradbury, H. Xia, R. N. M.Watson, D. Chisnall, M. Roe, B. Davis, E.
Napierala, J. Baldwin, K. Gudka, P. G.Neumann, A. Mazzinghi, A.
Richardson, S. Son, and A. T. Markettos. 2017. Effi-cient Tagged
Memory. In ICCD.
[23] Piyus Kedia, Manuel Costa, Matthew Parkinson, Kapil
Vaswani, Dimitrios Vy-tiniotis, and Aaron Blankstein. 2017. Simple,
Fast, and Safe Manual MemoryManagement. In PLDI.
[24] G. K. Konstadinidis, H. P. Li, F. Schumacher, V.
Krishnaswamy, H. Cho, S. Dash,R. P. Masleid, C. Zheng, Y. D. Lin,
P. Loewenstein, H. Park, V. Srinivasan, D.Huang, C. Hwang, W. Hsu,
C. McAllister, J. Brooks, H. Pham, S. Turullols, Y.Yanggong, R.
Golla, A. P. Smith, and A. Vahidsafa. 2016. SPARC M7: A 20 nm
32-Core 64 MB L3 Cache Processor. IEEE J. of Solid-State
Circuits 51, 1 (2016).[25] Doug Lea. 2000. A Memory Allocator.
(2000). http://g.oswego.edu/dl/html/
malloc.html[26] Byoungyoung Lee, Chengyu Song, Yeongjin Jang,
Tielei Wang, Taesoo Kim, Long
Lu, and Wenke Lee. 2015. Preventing Use-after-free with Dangling
PointersNullification.. In NDSS.
[27] Daiping Liu, Mingwei Zhang, and Haining Wang. 2018. A
Robust and EfficientDefense Against Use-after-Free Exploits via
Concurrent Pointer Sweeping. InCCS.
[28] Kangjie Lu, Marie-Therese Walter, David Pfaff, Stefan
Nuernberger, Wenke Lee,and Michael Backes. 2017. Unleashing
Use-Before-Initialization Vulnerabilitiesin the Linux Kernel Using
Targeted Stack Spraying. In NDSS.
[29] Alyssa Milburn, Herbert Bos, and Cristiano Giuffrida. 2017.
SafeInit: Compre-hensive and Practical Mitigation of Uninitialized
Read Vulnerabilities. In NDSS.
[30] S. S. Nagaraju, C. Craioveanu, E. Florio, and M. Miller.
2013. Software vulnerabilityexploitation trends. Technical Report.
Microsoft.
[31] Santosh Nagarakatte, Milo M. K. Martin, and Steve
Zdancewic. 2012. Watchdog:Hardware for Safe and Secure Manual
Memory Management and Full MemorySafety. In ISCA.
[32] Santosh Nagarakatte, Jianzhou Zhao, Milo M.K. Martin, and
Steve Zdancewic.2010. CETS: Compiler Enforced Temporal Safety for
C. In ISMM.
[33] George C. Necula, Jeremy Condit, Matthew Harren, Scott
McPeak, and WestleyWeimer. 2005. CCured: Type-safe Retrofitting of
Legacy Software. ACM Trans.Program. Lang. Syst. 27, 3 (2005).
[34] Gene Novark and Emery D. Berger. 2010. DieHarder: Securing
the Heap. In CCS.[35] Yutaka Oiwa. 2009. Implementation of the
Memory-safe Full ANSI-C Compiler.
In PLDI.[36] Oracle 2016. Oracle’s SPARC T7 and SPARC M7 Server
Architecture. Oracle.[37] Konstantin Serebryany, Derek Bruening,
Alexander Potapenko, and Dmitry
Vyukov. 2012. AddressSanitizer: A Fast Address Sanity Checker.
In USENIXATC.
[38] Kostya Serebryany, Evgenii Stepanov, Aleksey Shlyapnikov,
Vlad Tsyrklevich,and Dmitry Vyukov. 2018. Memory Tagging and how it
improves C/C++ memorysafety. CoRR abs/1802.09517 (2018).
[39] Sam Silvestro, Hongyu Liu, Corey Crosser, Zhiqiang Lin, and
Tongping Liu. 2017.FreeGuard: A Faster Secure Heap Allocator. In
CCS.
[40] Jr. Steele, Guy Lewis. 1977. Data representations in PDP-10
MACLISP. TechnicalReport AIM-420. MIT.
[41] Erik van der Kouwe, Vinod Nigade, and Cristiano Giuffrida.
2017. DangSan:Scalable Use-after-free Detection. In EuroSys.
[42] Robert N. M. Watson, Jonathan Woodruff, Peter G. Neumann,
Simon W. Moore,Jonathan Anderson, David Chisnall, Nirav Dave, Brook
s Davis, Khilan Gudka,Ben Laurie, Steven J. Murdoch, Robert Norton,
Michael Roe, Stacey Son, andMunraj Vadera. 2015. CHERI: A Hybrid
Capability-System Architecture forScalable Software
Compartmentalization. In IEEE S&P.
[43] Thomas Willhalm, Roman Dementiev, and Patrick Fay. 2012.
Intel PerformanceCounter Monitor - A Better Way to Measure CPU
Utilization. Intel.
[44] Jonathan Woodruff, Alexandre Joannou, Hongyan Xia, Brooks
Davis, Peter GNeumann, Robert Nicholas Maxwell Watson, Simon Moore,
Anthony Fox, RobertNorton, and David Chisnall. 2019. Cheri
concentrate: Practical compressedcapabilities. IEEE Trans. Comput.
(2019).
[45] Jonathan Woodruff, Robert N.M. Watson, David Chisnall,
Simon W. Moore,Jonathan Anderson, Brooks Davis, Ben Laurie, Peter
G. Neumann, Robert Norton,and Michael Roe. 2014. The CHERI
Capability Model: Revisiting RISC in an Ageof Risk. In ISCA.
[46] Wen Xu, Juanru Li, Junliang Shu, Wenbo Yang, Tianyi Xie,
Yuanyuan Zhang,and Dawu Gu. 2015. From Collision To Exploitation:
Unleashing Use-After-FreeVulnerabilities in Linux Kernel. In
CCS.
[47] Wen Xu, Juanru Li, Junliang Shu, Wenbo Yang, Tianyi Xie,
Yuanyuan Zhang,and Dawu Gu. 2015. From Collision To Exploitation:
Unleashing Use-After-FreeVulnerabilities in Linux Kernel. In
CCS.
[48] Yves Younan. 2015. FreeSentry: protecting against
use-after-free vulnerabilitiesdue to dangling pointers. In
NDSS.
[49] Tong Zhang, Dongyoon Lee, and Changhee Jung. 2019. BOGO:
Buy SpatialMemory Safety, Get Temporal Memory Safety (Almost) Free.
In ASPLOS.
https://elinux.org/index.php?title=Electric_Fencehttps://googleprojectzero.blogspot.com/2016/02/racing-midi-messages-in-chrome.htmlhttps://googleprojectzero.blogspot.com/2016/02/racing-midi-messages-in-chrome.htmlhttps://googleprojectzero.blogspot.com/2016/02/racing-midi-messages-in-chrome.htmlhttps://googleprojectzero.blogspot.com/2016/02/racing-midi-messages-in-chrome.htmlhttps://cwe.mitre.org/data/definitions/416.htmlhttps://cwe.mitre.org/data/definitions/416.htmlhttps://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-a-profile-architecture-2018-developments-armv85ahttps://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-a-profile-architecture-2018-developments-armv85ahttps://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-a-profile-architecture-2018-developments-armv85ahttps://community.arm.com/blog/company/b/blog/posts/supporting-the-uk-in-becoming-a-leading-global-player-in-cybersecurityhttps://community.arm.com/blog/company/b/blog/posts/supporting-the-uk-in-becoming-a-leading-global-player-in-cybersecurityhttps://community.arm.com/blog/company/b/blog/posts/supporting-the-uk-in-becoming-a-leading-global-player-in-cybersecurityhttp://g.oswego.edu/dl/html/malloc.htmlhttp://g.oswego.edu/dl/html/malloc.html
AbstractIntroductionBackgroundTemporal-Safety ViolationsCHERI
CapabilitiesThreat Model
CHERIVOKEQuarantine BufferRevocation Shadow MapSweeping
ProcedureNew Hardware SupportOpportunities for ParallelismRole of
AllocatorProtection GuaranteesSummary
CHERI BenefitsEfficient and Precise RevocationFull Memory
SafetyEfficient Pointer Search
Experimental SetupSystemsdlmalloc_cherivokeSweeping
CostBenchmarks
EvaluationBreakdown of OverheadsSweeping-Loop
OptimisationHardware OptimisationsSweep-Frequency
Trade-OffsSweeping-Traffic OverheadSummary
Related WorkRevocation TechniquesPage-Table TechniquesGarbage
CollectionPartial Temporal SafetyDetection MethodsTagged Memory
ConclusionReferences