Venerable Variadic Vulnerabilities Vanquished - ICS, UCI

Venerable Variadic Vulnerabilities Vanquished

Priyam Biswas1, Alessandro Di Federico2, Scott A. Carr1, Prabhu Rajasekaran3, Stijn Volckaert3,Yeoul Na3, Michael Franz3, and Mathias Payer1

1Department of Computer Science, Purdue University{biswas12, carr27}@purdue.edu, [email protected] of Computer Engineering, Politecnico di Milano

[email protected] of Computer Science, University of California, Irvine

{rajasekp, stijnv, yeouln, franz}@uci.edu

AbstractProgramming languages such as C and C++ support vari-adic functions, i.e., functions that accept a variable num-ber of arguments (e.g., printf). While variadic func-tions are flexible, they are inherently not type-safe. Infact, the semantics and parameters of variadic functionsare defined implicitly by their implementation. It is leftto the programmer to ensure that the caller and callee fol-low this implicit specification, without the help of a statictype checker. An adversary can take advantage of a mis-match between the argument types used by the caller of avariadic function and the types expected by the callee toviolate the language semantics and to tamper with mem-ory. Format string attacks are the most popular exampleof such a mismatch.

Indirect function calls can be exploited by an adver-sary to divert execution through illegal paths. CFI re-stricts call targets according to the function prototypewhich, for variadic functions, does not include all the ac-tual parameters. However, as shown by our case study,current CFI implementations are mainly limited to non-variadic functions and fail to address this potential attackvector. Defending against such an attack requires a state-ful dynamic check.

We present HexVASAN, a compiler based sanitizer toeffectively type-check and thus prevent any attack viavariadic functions (when called directly or indirectly).The key idea is to record metadata at the call site andverify parameters and their types at the callee wheneverthey are used at runtime. Our evaluation shows that Hex-VASAN is (i) practically deployable as the measuredoverhead is negligible (0.45%) and (ii) effective as weshow in several case studies.

1 Introduction

C and C++ are popular languages in systems program-ming. This is mainly due to their low overhead ab-

stractions and high degree of control left to the devel-oper. However, these languages guarantee neither typenor memory safety, and bugs may lead to memory cor-ruption. Memory corruption attacks allow adversaries totake control of vulnerable applications or to extract sen-sitive information.

Modern operating systems and compilers implementseveral defense mechanisms to combat memory corrup-tion attacks. The most prominent defenses are AddressSpace Layout Randomization (ASLR) [47], stack ca-naries [13], and Data Execution Prevention (DEP) [48].While these defenses raise the bar against exploitation,sophisticated attacks are still feasible. In fact, eventhe combination of these defenses can be circumventedthrough information leakage and code-reuse attacks.

Stronger defense mechanisms such as Control FlowIntegrity (CFI) [6], protect applications by restrictingtheir control flow to a predetermined control-flow graph(CFG). While CFI allows the adversary to corrupt non-control data, it will terminate the process whenever thecontrol-flow deviates from the predetermined CFG. Thestrength of any CFI scheme hinges on its ability to stati-cally create a precise CFG for indirect control-flow edges(e.g., calls through function pointers in C or virtual callsin C++). Due to ambiguity and imprecision in the analy-sis, CFI restricts adversaries to an over-approximation ofthe possible targets of individual indirect call sites.

We present a new attack against widely deployed mit-igations through a frequently used feature in C/C++ thathas so far been overlooked: variadic functions. Variadicfunctions (such as printf) accept a varying number ofarguments with varying argument types. To implementvariadic functions, the programmer implicitly encodesthe argument list in the semantics of the function andhas to make sure the caller and callee adhere to this im-plicit contract. In printf, the expected number of argu-ments and their types are encoded implicitly in the for-mat string, the first argument to the function. Anotherfrequently used scheme iterates through parameters until

a condition is reached (e.g., a parameter is NULL). List-ing 1 shows an example of a variadic function. If an ad-versary can violate the implicit contract between callerand callee, an attack may be possible.

In the general case, it is impossible to enumerate thearguments of a variadic function through static analysistechniques. In fact, their number and types are intrinsicin how the function is defined. This limitation enables (orfacilitates) two attack vectors against variadic functions.First, attackers can hijack indirect calls and thereby callvariadic functions over control-flow edges that are nevertaken during any legitimate execution of the program.Variadic functions that are called in this way may inter-pret the variadic arguments differently than the functionfor which these arguments were intended, and thus vio-late the implicit caller-callee contract. CFI countermea-sures specifically prevent illegal calls over indirect calledges. However, even the most precise implementationsof CFI, which verify the type signature of the targets ofindirect calls, are unable to fully stop illegal calls to vari-adic functions.

A second attack vector involves overwriting a variadicfunction’s arguments directly. Such attacks do not vi-olate the intended control flow of a program and thusbypass all of the widely deployed defense mechanisms.Format string attacks are a prime example of such at-tacks. If an adversary can control the format stringpassed to, e.g., printf, she can control how all of thefollowing parameters are interpreted, and can potentiallyleak information from the stack, or read/write to arbitrarymemory locations.

The attack surface exposed by variadic functions issignificant. We analyzed popular software packages,such as Firefox, Chromium, Apache, CPython, nginx,OpenSSL, Wireshark, the SPEC CPU2006 benchmarks,and the FreeBSD base system, and found that variadicfunctions are ubiquitous. We also found that many ofthe variadic function calls in these packages are indirect.We therefore conclude that both attack vectors are realis-tic threats. The underlying problem that enables attackson variadic functions is the lack of type checking. Vari-adic functions generally do not (and cannot) verify thatthe number and type of arguments they expect matchesthe number and type of arguments passed by the caller.We present HexVASAN, a compiler-based, dynamic san-itizer that tackles this problem by enforcing type checksfor variadic functions at run-time. Each argument that isretrieved in a variadic function is type checked, enforc-ing a strict contract between caller and callee so that (i) amaximum of the passed arguments can be retrieved and(ii) the type of the arguments used at the callee are com-patible with the types passed by the caller. Our mecha-nism can be used in two operation modes: as a runtimemonitor to protect programs against attacks and as sani-

tizer to detect type mismatches during program testing.We have implemented HexVASAN on top of the

LLVM compiler framework, instrumenting the compiledcode to record the types of each argument of a variadicfunction at the call site and to check the types when-ever they are retrieved. Our prototype implementationis light-weight, resulting in negligible (0.45%) overheadfor SPEC CPU2006. Our approach is general as we showby recompiling the FreeBSD base system and effective asshown through several exploit case studies (e.g., a formatstring vulnerability in sudo).

We present the following contributions:

• Design and implementation of a variadic functionsanitizer on top of LLVM;

• A case study on large programs to show the preva-lence of direct and indirect calls to variadic func-tions;

• Several exploit case studies and CFI bypasses usingvariadic functions.

2 Background

Variadic functions are used ubiquitously in C/C++ pro-grams. Here we introduce details about their use and im-plementation on current systems, the attack surface theyprovide, and how adversaries can abuse them.

#include <stdio.h>#include <stdarg.h>

int add(int start, ...) {int next, total = start;va_list list;va_start(list, start);do {

next = va_arg(list, int);total += next;

} while (next != 0);va_end(list);return total;

}

int main(int argc, const char *argv[]) {printf("%d\n", add(5, 1, 2, 0));return 0;

}

Listing 1: Example of a variadic function in C. Thefunction add takes a non-variadic argument start(to initialize an accumulator variable) and a seriesof variadic int arguments that are added until theterminator value 0 is met. The final value is returned.

2.1 Variadic functions

Variadic functions (such as the printf function in the Cstandard library) are used in C to maximize the flexibil-ity in the interface of a function, allowing it to accept anumber of arguments unknown at compile-time. Thesefunctions accept a variable number of arguments, whichdo not necessarily have fixed types. An example of avariadic function is shown in Listing 1. The function addaccepts one mandatory argument (start) and a varyingnumber of additional arguments, which are marked bythe ellipsis (...) in the function definition.

The C standard defines several macros that portableprograms may use to access variadic arguments [33].stdarg.h, the header that declares these macros, definesan opaque type, va_list, which stores all information re-quired to retrieve and iterate through variadic arguments.In our example, the variable list of type va_list is ini-tialized using the va_start macro. The va_arg macroretrieves the next variadic argument from the va_list,updating va_list to point to the next argument as a sideeffect. Note that, although the programmer must specifythe expected type of the variadic argument in the call tova_arg, the C standard does not require the compiler toverify that the retrieved variable is indeed of that type.va_list variables must be released using a call to theva_end macro so that all of the resources assigned to thelist are deallocated.printf is an example of a more complex variadic

function which takes a format string as its first argument.This format string implicitly encodes information aboutthe number of arguments and their type. Implementa-tions of printf scan through this format string severaltimes to identify all format arguments and to recoverthe necessary space in the output string for the specifiedtypes and formats. Interestingly, arguments do not haveto be encoded sequentially but format strings allow out-of-order access to arbitrary arguments. This flexibility isoften abused in format string attacks to access arbitrarystack locations.

2.2 Variadic functions ABI

The C standard does not define the calling conventionfor variadic functions, nor the exact representation of theva_list structure. This information is instead part of theABI of the target platform.

x86-64 ABI. The AMD64 System V ABI [36], whichis implemented by x86-64 GNU/Linux platforms, dic-tates that the caller of a variadic function must adhere tothe normal calling conventions when passing arguments.Specifically, the first six non-floating point argumentsand the first eight floating point arguments are passedthrough CPU registers. The remaining arguments, if any,

are passed on the stack. If a variadic function accepts fivemandatory arguments and a variable number of variadicarguments, then all but one of these variadic argumentswill be passed on the stack. The variadic function itselfmoves the arguments into a va_list variable using theva_start macro. The va_list type is defined as follows:

typedef struct {unsigned int gp_offset;unsigned int fp_offset;void *overflow_arg_area;void *reg_save_area;

} va_list[1];

va_start allocates on the stack a reg_save_area tostore copies of all variadic arguments that were passedin registers. va_start initializes the overflow_arg_area

field to point to the first variadic argument that waspassed on the stack. The gp_offset and fp_offset fieldsare the offsets into the reg_save_area. They representthe first unused variadic argument that was passed in ageneral purpose register or floating point register respec-tively.

The va_arg macro retrieves the first unused vari-adic argument from either the reg_save_area orthe overflow_arg_area, and either it increasesthe gp_offset/fp_offset field or moves theoverflow_arg_area pointer forward, to point to thenext variadic argument.

Other architectures. Other architectures may imple-ment variadic functions differently. On 32-bit x86, forexample, all variadic arguments must be passed on thestack (pushed right to left), following the cdecl callingconvention used on GNU/Linux. The variadic functionitself retrieves the first unused variadic argument directlyfrom the stack. This simplifies the implementation ofthe va_start, va_arg, and va_end macros, but it generallymakes it easier for adversaries to overwrite the variadicarguments.

2.3 Variadic attack surfaceWhen calling a variadic function, the compiler staticallytype checks all non-variadic arguments but does not en-force any restriction on the type or number of variadic ar-guments. The programmer must follow the implicit con-tract between caller and callee that is only present in thecode but never enforced explicitly. Due to this high flex-ibility, the compiler cannot check arguments statically.This lack of safety can lead to bugs where an adver-sary achieves control over the callee by modifying thearguments, thereby influencing the interpretation of thepassed variadic arguments.

Modifying the argument or arguments that control theinterpretation of variadic arguments allows an adversary

to change the behavior of the variadic function, causingthe callee to access additional or fewer arguments thanspecified and to change the interpretation of their types.

An adversary can influence variadic functions in sev-eral ways. First, if the programmer forgot to validate theinput, the adversary may directly control the argumentsto the variadic function that controls the interpretation ofarguments. Second, the adversary may use an arbitrarymemory corruption elsewhere in the program to influ-ence the argument of a variadic function.

Variadic functions can be called statically or dynam-ically. Direct calls would, in theory, allow some staticchecking. Indirect calls (e.g., through a function pointer),where the target of the variadic function is not known, donot allow any static checking. Therefore, variadic func-tions can only be protected through some form of run-time checker that considers the constraints of the call siteand enforces them at the callee side.

2.4 Format string exploits

Format string exploits are a perfect example of corruptedvariadic functions. An adversary that gains control overthe format string used in printf can abuse the printffunction to leak arbitrary data on the stack or even re-sort to arbitrary memory corruption (if the pointer to thetarget location is on the stack). For example, a formatstring vulnerability in the smbclient utility (CVE-2009-1886) [40] allows an attacker to gain control over theSamba file system by treating a filename as format string.Also, in PHP 7.x before 7.0.1, an error handling functionin zend execute API.c allows an attacker to execute arbi-trary code by using format string specifiers as class name(CVE-2015-8617) [1].

Information leaks are simple: an adversary changesthe format string to print the desired information that re-sides somewhere higher up on the stack by employing thedesired format string specifiers. For arbitrary memorymodification, an adversary must have the target addressencoded somewhere on the stack and then reference thetarget through the %n modifier, writing the number of al-ready written bytes to that memory location.

The GNU C standard library (glibc) enforces someprotection against format string attacks by checking ifa format string is in a writable memory area [29]. Forformat strings, the glibc printf implementation opens/proc/self/maps and scans for the memory area of theformat string to verify correct permissions. Moreover, acheck is performed to ensure that all arguments are con-sumed, so that no out-of-context stack slots can be usedin the format string exploit. These defenses stop some at-tacks but do not mitigate the underlying problem that anadversary can gain control over the format string. Notethat this heavyweight check is only used if the format

string argument may point to a writable memory areaat compile time. An attacker may use memory corrup-tion to redirect the format string pointer to an attacker-controlled area and fall back to a regular format stringexploit.

3 Threat model

Programs frequently use variadic functions, either in theprogram itself or as part of a shared library (e.g., printfin the C standard library). We assume that the programcontains an arbitrary memory corruption, allowing theadversary to modify the arguments to a variadic functionand/or the target of an indirect function call, targeting avariadic function.

Our target system deploys existing defense mecha-nisms like DEP, ASLR, and a strong implementation ofCFI, protecting the program against code injection andcontrol-flow hijacking. We assume that the adversarycannot modify the metadata of our runtime monitor. Pro-tecting metadata is an orthogonal engineering problemand can be solved through, e.g., masking (and-ing everymemory access), segmentation (for x86-32), protectingthe memory region [9], or randomizing the location ofsensitive data. Our threat model is a realistic scenario forcurrent attacks and defenses.

4 HexVASAN design

HexVASAN monitors calls to variadic functions andchecks for type violations. Since the semantics of howarguments should be interpreted by the function are in-trinsic in the logic of the function itself, it is, in general,impossible to determine the number and type of argu-ments a certain variadic function accepts. For this rea-son, HexVASAN instruments the code generated by thecompiler so that a check is performed at runtime. Thischeck ensures that the arguments consumed by the vari-adic function match those passed by the caller.

The high level idea is the following: HexVASANrecords metadata about the supplied argument types atthe call site and verifies that the extracted argumentsmatch in the callee. The number of arguments and theirtypes is always known at the call site and can be encodedefficiently. In the callee this information can then be usedto verify individual arguments when they are accessed.To implement such a sanitizer, we must design a meta-data store, a pass that instruments call sites, a pass thatinstruments callers, and a runtime library that managesthe metadata store and performs the run-time type verifi-cation. Our runtime library aborts the program whenevera mismatch is detected and generates detailed informa-tion about the call site and the mismatched arguments.

source1.c

C frontend

HexVASANinstrumentation

Compile

IR

IR

source2.cpp

C++ frontend


Compile

IR

IR

source3.c

C frontend


Compile

IR

IR

Link

Object fileObject file

Object file

output.elfhexvasan.a

Figure 1: Overview of the HexVASAN compilationpipeline. The HexVASAN instrumentation runs rightafter the C/C++ frontend, while its runtime library,hexvasan.a, is merged into the final executable at linktime.

4.1 Analysis and InstrumentationWe designed HexVASAN as a compiler pass to be runin the compilation pipeline right after the C/C++ fron-tend. The instrumentation collects a set of staticallyavailable information about the call sites, encodes it inthe LLVM module, and injects calls to our runtime toperform checks during program execution.

Figure 1 provides an overview of the compilationpipeline when HexVASAN is enabled. Source files arefirst parsed by the C/C++ frontend which generates the in-termediate representation on which our instrumentationruns. The normal compilation then proceeds, generatinginstrumented object files. These object files, along withthe HexVASAN runtime library, are then passed to thelinker, which creates the instrumented program binary.

4.2 Runtime supportThe HexVASAN runtime augments every va_list in theoriginal program with the type information generated byour instrumentation pass, and uses this type informationto perform run-time type checking on any variadic argu-ment accessed through va_arg. By managing the type in-formation in a metadata store, and by maintaining a map-ping between va_lists and their associated type infor-mation, HexVASAN remains fully compatible with theplatform ABI. This design also supports interfacing be-tween instrumented programs and non-instrumented li-braries.

The HexVASAN runtime manages the type informa-tion in two data structures. The core data structure, calledthe variadic list map (VLM), associates va_list struc-

tures with the type information produced by our instru-mentation, and with a counter to track the index of thelast argument that was read from the list. A second datastructure, the variadic call stack (VCS), allows callers ofvariadic functions to store type information of variadicarguments until the callee initializes the va_list.

Each variadic call site is instrumented with a call topre call, that prepares the information about the callsite (a variadic call site descriptor or VCSD), and acall to post call, that cleans it up. For each vari-adic function, the va start calls are instrumented withlist init, while va copy, whose purpose is to clone ava list, is instrumented through list copy. The tworun-time functions will allocate the necessary data struc-tures to validate individual arguments. Calls to va endare instrumented through list end to free up the corre-sponding data structures.

Algorithm 1 summarizes the two phases of our anal-ysis and instrumentation pass. The first phase identifiesall the calls to variadic functions (both direct and indi-rect). Note that identifying indirect calls to variadic func-tions is straight-forward in a compiler framework since,even if the target function is not statically known, its typeis. Then, all the parameters passed by that specific call

input: a module m/* Phase 1 */foreach function f in module m do

foreach variadic call c with n arguments in f dovcsd.count← n;foreach argument a of type t do

vcsd.args.push(t);endemit call to pre call(vcsd) before c;emit call to post call() after c;

endend/* Phase 2 */foreach function f in module m do

foreach call c to va start(list) doemit call to list init(&list) after c;

endforeach call c to va copy(dst,src) do

emit call to list copy(&dst,&src) after c;endforeach call c to va end(list) do

emit call to list free(&list) after c;endforeach call c to va arg(list, type) do

emit call to check arg(&list, type) before c;end

endAlgorithm 1: The instrumentation process.

site are inspected and recorded, along with their type ina dedicated VCSD which is stored in read-only globaldata. At this point, a call to pre call is injected beforethe variadic function call (with the newly created VCSDas a parameter) and, symmetrically, a call to post callis inserted after the call site.

The second phase identifies all calls to va start andva copy, and consequently, the va list variables in theprogram. Uses of each va list variable are inspected inan architecture-specific way. Once all uses are identified,we inject a call to check arg before dereferencing theargument (which always resides in memory).

4.3 Challenges and DiscussionWhen designing a variadic function call sanitizer, severalissues have to be considered. We highlight details aboutthe key challenges we encountered.

Multiple va lists. Functions are allowed to createmultiple va_lists to access the same variadic arguments,either through va_start or va_copy operations. Hex-VASAN handles this by storing a VLM entry for eachindividual va_list.

Passing va_lists as function arguments. While un-common, variadic functions are allowed to pass theva_lists they create as arguments to non-variadic func-tions. This allows non-variadic functions to access vari-adic arguments of functions higher in the call stack. Ourdesign takes this into account by maintaining a list map(VLM) and by instrumenting all va_arg operations, re-gardless of whether or not they are in a variadic function.

Multi-threading support. Multiple threads are sup-ported by storing our per-thread runtime state in a thread-local variable as supported on major operating systems.

Metadata format. We use a constant data structure pervariadic call site, the VCSD, to hold the number of ar-guments and a pointer to an array of integers identifyingtheir type. The check arg function therefore only per-forms two memory accesses, the first to load the numberof arguments and the second for the type of the argumentcurrently being checked.

To uniquely identify the data types with an integer, wedecided to build a hashing function (described in Algo-rithm 2) using a set of fixed identifiers for primitive datatypes and hashing them in different ways depending onhow they are aggregated (pointers, union, or struct).The last hash acts as a terminator marker for aggre-gate types, which allows us to, e.g., distinguish between{struct{ int }, int} and {struct {struct{ int,int }}}. Note that an (unlikely) hash collision only re-sults in two different types being accepted as equal. Sucha hashing mechanism has the advantage of being deter-ministic across compilation units, removing the need for

input : a type t and an initial hash value houtput: the final hash value hh = hash(h, typeID(t));switch typeID(t) do

case AggregateType/* union, struct and pointer */foreach c in componentTypes(t) do

h = hashType(c, h);end

case FunctionTypeh = hashType(returnType(t), h);foreach a in argTypes(t) do

h = hashType(a, h);end

endendswh = hash(h, typeID(t));return h

Algorithm 2: Algorithm describing the type hashingfunction hashType. typeID returns an unique identifierfor each basic type (e.g., 32-bit integer, double), typeof aggregate type (e.g., struct, union...) and functions.hash is a simple hashing function combining two inte-gers. componentTypes returns the components of an ag-gregate type, returnType the return type of a functionprototype and argTypes the type of its arguments.

keeping a global map of type-unique id pairs. Due tothe information loss during the translation from C/C++

to LLVM IR, our type system does not distinguish be-tween signed and unsigned types. The required meta-data is static and immutable and we mark it as read-only,protecting it from modification. However, the VCS stillneeds to be protected through other mechanisms.

Handling floating point arguments. In x86-64 ABI,floating point and non-floating point arguments are han-dled differently. In case of floating point arguments,the first eight arguments are passed in the floating pointregisters whereas in case of non-floating point the firstsix are passed in general-purpose registers. HexVASANhandles both argument types.

Support for aggregate data types. According toAMD64 System V ABI, the caller unpacks the fields ofthe aggregate data types (structs and unions) if the argu-ments fit into registers. This makes it hard to distinguishbetween composite types and regular types – if unpackedthey are indistinguishable on the callee side from argu-ments of these types. HexVASAN supports aggregatedata types even if the caller unpacks them.

Attacks preserving number and type of arguments.Our mechanism prevents attacks that change the num-ber of arguments or the types of individual arguments.

Format string attacks that only change one modifier cantherefore be detected through the type mismatch even ifthe total number of arguments remains unchanged.

Non-variadic calls to variadic functions. Consider thefollowing code snippet:

typedef void (*non_variadic)(int, int);

void variadic(int, ...) { /* ... */ }

int main() {non_variadic function_ptr = variadic;function_ptr(1, 2);

}

In this case, the function call in main to function_ptr

appears to the compiler as a non-variadic function call,since the type of the function pointer is not variadic.Therefore, our pass will not instrument the call site, lead-ing to potential errors.

To handle such (rare) situations appropriately, wewould have to instrument all non-variadic call sites too,leading to an unjustified overhead. Moreover, the codeabove represents undefined behavior in C [27, 6.3.2.3p8]and C++ [26, 5.2.10p6], and might not work on certain ar-chitectures where the calling convention for variadic andnon-variadic function calls are not compatible. The GNUC compiler emits a warning when a function pointer iscast to a different type, therefore we require the devel-oper to correct the code before applying HexVASAN.

Central management of the global state. To allow theHexVASAN runtime to be linked into the base system li-braries, such as the C standard library, we made it a staticlibrary. Turning the runtime into a shared library is pos-sible, but would prohibit its use during the early processinitialization – until the dynamic linker has processed allof the necessary relocations. Our runtime therefore ei-ther needs to be added solely to the C standard library(so that it is initialized early in the startup process) orthe runtime library must carefully use weak symbols toensure that each symbol is only defined once if multiplelibraries are compiled with our countermeasure.

C++ exceptions and longjmp. If an exception is raisedwhile executing a variadic function (or one of its callees),the variadic function may not get a chance to clean up themetadata for any va_lists it has initialized, nor may thecaller of this variadic function get the chance to clean upthe type information it has pushed onto the VCS. Otherfunctions manipulating the thread’s stack directly, suchas longjmp, present similar issues.

C++ exceptions can be handled by modifying theLLVM C++ frontend (i.e., clang) to inject an objectwith a lifetime spanning from immediately before a vari-adic function call to immediately after. Such an objectwould call pre_call in its constructor and post_call in

the destructor, leveraging the exception handling mech-anism to make HexVASAN exception-safe. Functionslike longjmp can be instrumented to purge the portionsof HexVASAN’s data structures that correspond to thediscarded stack area. We did not observe any such callsin practice and leave the implementation of handling ex-ceptions and longjump across variadic functions as futureengineering work.

5 Implementation

We implemented HexVASAN as a sanitizer for theLLVM compiler framework [31], version 3.9.1. Wehave chosen LLVM for its robust features on analyzingand transforming arbitrary programs as well as extract-ing reliable type information. The sanitizer can be en-abled from the C/C++ frontend (clang) by providing the-fsanitize=vasan parameter at compile-time. No an-notations or other source code changes are required forHexVASAN. Our sanitizer does not require visibility ofwhole source code (see Section 4.3), but works on indi-vidual compilation units. Therefore link-time optimiza-tion (LTO) is not required and thus fits readily into exist-ing build systems. In addition, HexVASAN also supportssignal handlers.

HexVASAN consists of two components: a static in-strumentation pass and a runtime library. The static in-strumentation pass works on LLVM IR, adding the nec-essary instrumentation code to all variadic functions andtheir callees. The support library is statically linked tothe program and, at run-time, checks the number andtype of variadic arguments as they are used by the pro-gram. In the following we describe the two componentsin detail.

Static instrumentation. The implementation of thestatic instrumentation pass follows the description inSection 4. We first iterate through all functions, lookingfor CallInst instructions targeting a variadic function(either directly or indirectly), then we inspect them andcreate for each one of them a read-only GlobalVariableof type vcsd t. As shown in Listing 2, vcsd t is com-posed by an unsigned integer representing the numberof arguments of the considered call site and a pointer toan array (another GlobalVariable) with an integer el-ement for each argument of type t. type t is an inte-ger uniquely identifying a data type obtained using thehashType function presented in Algorithm 2. At thispoint a call to pre call is injected before the call site,with the newly create VCSD as a parameter, and a call topost call is injected after the call site.

During the second phase, we first identify all va_start,va_copy, and va_end operations in the program. In the IRcode, these operations appear as calls to the LLVM in-

struct vcsd_t {unsigned count;type_t *args;

};

thread_local stack<vcsd_t *> vcs;thread_local map<va_list *,

pair<vcsd_t *, unsigned>> vlm;

void pre_call(vcsd_t *arguments) {vcs.push_back(arguments);

}void post_call() {vcs.pop_back();

}void list_init(va_list *list_ptr) {vlm[list_ptr] = { vcs.top(), 0 };

}

void list_free(va_list *list_ptr) {vlm.erase(list_ptr);

}

void check_arg(va_list *list_ptr, type_t type) {pair<vcsd_t *, unsigned> &args = vlm[list_ptr];unsigned index = args.second++;assert(index < args.first->count);assert(args.first->args[index] == type);

}

int add(int start, ...) {/* ... */va_start(list, start);list_init(&list);do {check_arg(&list, typeid(int));total += va_arg(list, int);

} while (next != 0);va_end(list);list_free(&list);/* ... */

}

const vcsd_t main_add_vcsd = {.count = 3,.args = {typeid(int), typeid(int), typeid(int)}

};

int main(int argc, const char *argv[]) {/* ... */pre_call(&main_add_vcsd);int result = add(5, 1, 2, 0);post_call();printf("%d\n", result);/* ... */

}

Listing 2: Simplified C++ representation of theinstrumented code for Listing 1.

trinsics llvm.va_start, llvm.va_copy, and va_end. Weinstrument the operations with calls to our runtime’slist_init, list_copy, and list_free functions respec-tively. We then proceed to identify va_arg operations.Although the LLVM IR has a dedicated va_arg instruc-tion, it is not used on any of the platforms we tested.The va_list is instead accessed directly. Our identifi-cation of va_arg is therefore platform-specific. On x86-64, our primary target, we identify va_arg by recogniz-ing accesses to the gp_offset and fp_offset fields in thex86-64 version of the va_list structure (see Section 2.2).The fp_offset field is accessed whenever the programattempts to retrieve a floating point argument from thelist. The gp_offset field is accessed to retrieve any othertypes of variadic arguments. We insert a call to our run-time’s check_arg function before the instruction that ac-cesses this field.

Listing 2 shows (in simplified C) how the code in List-ing 1 would be instrumented by our sanitizer.

Dynamic variadic type checking. The entire runtimeis implemented in plain C code, as this allows it to belinked into the standard C library without introducinga dependency to the standard C++ library. The VCS isimplemented as a thread-local stack, and the VLM asa thread-local hash map. The pre_call and post_call

functions push and pop type information onto and fromthe VCS. The list_init function inserts a new entryinto the VLM, using the top element on the stack as theentry’s type information and initializing the counter forconsumed arguments to 0.

check arg looks up the type information for theva_list being accessed in the VLM and checks if therequested argument exists (based on the counter of con-sumed arguments), and if its type matches the one pro-vided by the caller. If either of these checks fails, exe-cution is aborted, and the runtime will generate an errormessage such as the one shown in Listing 3. As a con-sequence, the pointer to the argument is never read orwritten, since the pointer to it is never dereferenced.

Error: Type MismatchIndex is 1Callee Type : 43 (32-bit Integer)Caller Type : 15 (Pointer)Backtrace:[0] 0x4019ff <__vasan_backtrace+0x1f> at test[1] 0x401837 <__vasan_check_arg+0x187> at test[2] 0x8011b3afa <__vfprintf+0x20fa> at libc.so.7[3] 0x8011b1816 <vfprintf_l+0x86> at libc.so.7[4] 0x801200e50 <printf+0xc0> at libc.so.7[5] 0x4024ae <main+0x3e> at test[6] 0x4012ff <_start+0x17f> at test

Listing 3: Error message reported by HexVASAN

6 Evaluation

In this section we present a case study on variadic func-tion based attacks against state-of-the-art CFI implemen-tations. Next, we evaluate the effectiveness of Hex-VASAN as an exploit mitigation technique. Then, weevaluate the overhead introduced by our HexVASANprototype implementation on the SPEC CPU2006 in-teger (CINT2006) benchmarks, on Firefox using stan-dard JavaScript benchmarks, and on micro-benchmarks.We also evaluate how widespread the usage of variadicfunctions is in SPEC CPU2006 and in Firefox 51.0.1,Chromium 58.0.3007.0, Apache 2.4.23, CPython 3.7.0,nginx 1.11.5, OpenSSL 1.1.1, Wireshark 2.2.1, and theFreeBSD 11.0 base system.

Note that, along with testing the aforementioned soft-ware, we also developed an internal set of regressiontests. Our regression tests allow us to verify that oursanitizer correctly catches problematic variadic functioncalls, and does not raise false alarms for benign calls.The test suite explores corner cases, including trying toaccess arguments that have not been passed and trying toaccess them using a type different from the one used atthe call site.

6.1 Case study: CFI effectiveness

One of the attack scenarios we envision is that an at-tacker controls the target of an indirect call site. If theintended target of the call site was a variadic function,the attacker could illegally call a different variadic func-tion that expects different variadic arguments than the in-tended target (yet shares the types for all non-variadicarguments). If the intended target of the call site was anon-variadic function, the attacker could call a variadicfunction that interprets some of the intended target’s ar-guments as variadic arguments.

All existing CFI mechanisms allow such attacks tosome extent. The most precise CFI mechanisms, whichrely on function prototypes to classify target sets (e.g.,LLVM-CFI, piCFI, or VTV) will allow all targets withthe same prototype, possibly restricting to the subsetof functions whose addresses are taken in the program.This is problematic for variadic functions, as only non-variadic types are known statically. For example, ifa function of type int (*)(int, ...) is expected tobe called from an indirect call site, then precise CFIschemes allow calls to all other variadic functions of thattype, even if those other functions expect different typesfor the variadic arguments.

A second way to attack variadic functions is to over-write their arguments directly. This happens, for ex-ample, in format string attacks, where an attacker canoverwrite the format string to cause misinterpretation

of the variadic arguments. HexVASAN detects both ofthese attacks when the callee attempts to retrieve thevariadic arguments using the va_arg macro describedin Section 2.1. Checking and enforcing the correcttypes for variadic functions is only possible at runtimeand any sanitizer must resort to run-time checks to doso. CFI mechanisms must therefore be extended witha HexVASAN-like mechanism to detect violations. Toshow that our tool can complement CFI, we create testprograms containing several variadic functions and onenon-variadic function. The definitions of these functionsare shown below.

int sum_ints(int n, ...);int avg_longs(int n, ...);int avg_doubles(int n, ...);void print_longs(int n, ...);void print_doubles(int n, ...);int square(int n);

This program contains one indirect call site fromwhich only the sum_ints function can be called legally,and one indirect call site from which only the square

function can be legally called. We also introduce a mem-ory corruption vulnerability which allows us to overridethe target of both indirect calls.

We constructed the program such that sum_ints,avg_longs, print_longs, and square are all address-takenfunctions. The avg_doubles and print_doubles functionsare not address-taken.

Functions avg_longs, avg_doubles, print_longs, andprint_doubles all expect different variadic argumenttypes than function sum_ints. Functions sum_ints,avg_longs, avg_doubles, and square do, however, allhave the same non-variadic prototype (int (*)(int)).

We compiled six versions of the test program,instrumenting them with, respectively, HexVASAN,LLVM 3.9 Forward-Edge CFI [59], Per-Input CFI [44],CCFI [35], GCC 6.2’s VTV [59] and Visual C++ ControlFlow Guard [37]. In each version, we first built an attackinvolving a variadic function, by overriding the indirectcall sites with a call to each of the variadic functions de-scribed above. We then also tested overwriting the argu-ments of the sum_ints function, without overwriting theindirect call target. Table 1 shows the detection results.

LLVM Forward-Edge CFI allows calls to avg_longs

and avg_doubles from the sum_ints indirect call site be-cause these functions have the same static type signa-ture as the intended call target. This implementation ofCFI does not allow calls to variadic functions from non-variadic call sites, however.

CCFI only detects calls to print_doubles, a functionthat is not address-taken and has a different non-variadicprototype than square, from the square call site. It allowsall of the other illegal calls.

Actual target

Intended target Prototype A.T.? LLVM-CFI pi-CFI CCFI VTV CFG HexVASAN

VariadicSame Yes 7 7 7 7 7 X

No 7 X 7 7 7 X

Different Yes X X 7 7 7 X

No X X 7 7 7 X

Non-variadicSame Yes X X 7 7 7 X

No X X 7 7 7 X

Different Yes X X 7 7 7 X

No X X X 7 7 X

Original Overwritten Arguments 7 7 7 7 7 X

Table 1: Detection coverage for several types of illegal calls to variadic functions. X indicates detection, 7 indicatesnon-detection. “A.T.” stands for address taken.

GCC VTV, and Visual C++ CFG allow all of the ille-gal calls, even if the non-variadic type signature does notmatch that of the intended call target.

pi-CFI allows calls to the avg_longs function from thesum_ints indirect call site. avg_longs is address-takenand it has the same static type signature as the intendedcall target. pi-CFI does not allow illegal calls to non-address-taken functions or functions with different statictype signatures. pi-CFI also does not allow calls to vari-adic functions from non-variadic call sites.

All implementations of CFI allow direct overwrites ofvariadic arguments, as long as the original control flowof the program is not violated.

6.2 Exploit DetectionTo evaluate the effectiveness of our tool as a real-worldexploit detector, we built a HexVASAN-hardened ver-sion of sudo 1.8.3. sudo allows authorized users to ex-ecute shell commands as another user, often one witha high privilege level on the system. If compromised,sudo can escalate the privileges of non-authorized users,making it a popular target for attackers. Versions 1.8.0through 1.8.3p1 of sudo contained a format string vul-nerability (CVE-2012-0809) that allowed exactly such acompromise. This vulnerability could be exploited bypassing a format string as the first argument (argv[0]) ofthe sudo program. One such exploit was shown to by-pass ASLR, DEP, and glibc’s FORTIFY SOURCE pro-tection [20]. In addition, we were able to verify that GCC5.4.0 and clang 3.8.0 fail to catch this exploit, even whenannotating the vulnerable function with the format func-tion attribute [5] and setting the compiler’s format stringchecking (-Wformat) to the highest level.

Although it is sudo itself that calls the format stringfunction (fprintf), HexVASAN can only detect the vio-lation on the callee side. We therefore had to build hard-ened versions of not just the sudo binary itself, but alsothe C library. We chose to do this on the FreeBSD plat-form, as its standard C library can be easily built usingLLVM, and HexVASAN therefore readily fits into theFreeBSD build process. As expected, HexVASAN doesdetect any exploit that triggers the vulnerability, produc-ing the error message shown in Listing 4.

$ ln -s /usr/bin/sudo %x%x%x%x$ ./%x%x%x%x -D9 -A--------------------------Error: Index greater than Argument CountIndex is 1Backtrace:[0] 0x4053bf <__vasan_backtrace+0x1f> at sudo[1] 0x405094 <__vasan_check_index+0xf4> at sudo[2] 0x8015dce24 <__vfprintf+0x2174> at libc.so[3] 0x8015dac52 <vfprintf_l+0x212> at libc.so[4] 0x8015daab3 <vfprintf_l+0x73> at libc.so[5] 0x40bdaf <sudo_debug+0xdf> at sudo[6] 0x40ada3 <main+0x6c3> at sudo[7] 0x40494f <_start+0x17f> at sudo

Listing 4: Exploit detection in sudo.

6.3 Prevalence of variadic functions

To collect variadic function usage in real software,we extended our instrumentation mechanism to collectstatistics about variadic functions and their calls. Asshown in Table 2, for each program, we collect:

Call sites Func. Ratio

Program Tot. Ind. % Tot. A.T. Proto Tot. A.T.

Firefox 30225 1664 5.5 421 18 241 1.75 0.07

Chromium 83792 1728 2.1 794 44 396 2.01 0.11

FreeBSD 189908 7508 3.9 1368 197 367 3.73 0.53

Apache 7121 0 0 94 29 41 2.29 0.71

CPython 4183 0 0 382 0 38 10.05 0.00

nginx 1085 0 0 26 0 14 1.86 0.00

OpenSSL 4072 1 0.02 23 0 15 1.53 0.00

Wireshark 37717 0 0 469 1 110 4.26 0.01

perlbench 1460 1 0.07 60 2 18 3.33 0.11

bzip2 85 0 0 3 0 3 1.00 0.00

gcc 3615 55 1.5 125 0 31 4.03 0.00

mcf 29 0 0 3 0 3 1.00 0.00

milc 424 0 0 21 0 8 2.63 0.00

namd 485 0 0 24 2 8 3.00 0.25

gobmk 2911 0 0 35 0 8 4.38 0.00

soplex 6 0 0 2 1 2 1.00 0.50

povray 1042 40 3.8 45 10 16 2.81 0.63

hmmer 671 7 1 9 1 5 1.80 0.20

sjeng 253 0 0 4 0 3 1.33 0.00

libquantum 74 0 0 91 0 7 13.00 0.00

h264ref 432 0 0 85 5 13 6.54 0.38

lbm 11 0 0 3 0 2 1.50 0.00

omnetpp 340 0 0 48 23 19 2.53 1.21

astar 42 0 0 4 1 4 1.00 0.25

sphinx3 731 0 0 20 0 5 4.00 0.00

xalancbmk 19 0 0 4 2 4 1.00 0.50

Table 2: Statistics of Variadic Functions for DifferentBenchmarks. The second and third columns are vari-adic call sites broken into “Tot.” (total) and “Ind.” (indi-rect); % shows the percentage of variadic call sites. Thefifth and sixth columns are for variadic functions. “A.T.”stands for address taken. “Proto.” is the number of dis-tinct variadic function prototypes. “Ratio” indicates thefunction-per-prototypes ratio for variadic functions.

Call sites. The number of function calls targeting vari-adic functions. We report the total number and howmany of them are indirect, since they are of particularinterest for an attack scenario where the adversary canoverride a function pointer.

Variadic functions. The number of variadic functions.We report their total number and how many of themhave their address taken, since CFI mechanism cannot

prevent functions with their address taken from beingreachable from indirect call sites.

Variadic prototypes. The number of distinct variadicfunction prototypes in the program.

Functions-per-prototype. The average number of vari-adic functions sharing the same prototype. This mea-sures how many targets are available, on average, foreach indirect call sites targeting a specific prototype.In practice, this the average number of permitted des-tinations for an indirect call site in the case of a perfectCFI implementation. We report this value both consid-ering all the variadic functions and only those whoseaddress is taken.

Interestingly, each benchmark we analyzed containscalls to variadic functions and several programs (Fire-fox, OpenSSL, perlbench, gcc, povray, and hmmer) evencontain indirect calls to variadic functions. In addition tocalling variadic functions, each benchmark also definesnumerous variadic functions (421 for Firefox, 794 forChromium, 1368 for FreeBSD, 469 for Wireshark, and382 for CPython). Variadic functions are therefore preva-lent and used ubiquitously in software. Adversaries haveplenty of opportunities to modify these calls and to at-tack the implicit contract between caller and callee. Thecompiler is unable to enforce any static safety guaran-tees when calling these functions, either for the numberof arguments, nor their types. In addition, many of thebenchmarks have variadic functions that are called indi-rectly, often with their address being taken. Looking atFirefox, a large piece of software, the numbers are evenmore staggering with several thousand indirect call sitesthat target variadic functions and 241 different variadicprototypes.

The prevalence of variadic functions leaves both alarge attack surface for attackers to either redirect vari-adic calls to alternate locations (even if defense mecha-nisms like CFI are present) or to modify the arguments sothat callees misinterpret the supplied arguments (similarto extended format string attacks).

In addition, the compiler has no insight into thesefunctions and cannot statically check if the programmersupplied the correct parameters. Our sanitizer identi-fied three interesting cases in omnetpp, one of the SPECCPU2006 benchmarks that implements a discrete eventsimulator. The benchmark calls a variadic functions witha mismatched type, where it expects a char * but re-ceives a NULL, which has type void *. Listing 5 showsthe offending code.

We also identified a bug in SPEC CPU2006’sperlbench. This benchmark passes the result of a sub-traction of two character pointers as an argument to a

static sEnumBuilder _EtherMessageKind("EtherMessageKind",JAM_SIGNAL, "JAM_SIGNAL",ETH_FRAME, "ETH_FRAME",ETH_PAUSE, "ETH_PAUSE",ETHCTRL_DATA, "ETHCTRL_DATA",ETHCTRL_REGISTER_DSAP,

"ETHCTRL_REGISTER_DSAP",ETHCTRL_DEREGISTER_DSAP,

"ETHCTRL_DEREGISTER_DSAP",ETHCTRL_SENDPAUSE, "ETHCTRL_SENDPAUSE",0, NULL

);

Listing 5: Variadic violation in omnetpp.

variadic function. At the call site, this argument is a ma-chine word-sized integer (i.e., 64-bits integer on our testplatform). The callee truncates this argument to a 32-bit integer by calling va arg(list, int). HexVASANreports this (likely unintended) truncation as a violation.

6.4 FirefoxWe evaluate the performance of HexVASAN by in-strumenting Firefox (51.0.1) and using three differ-ent browser benchmark suites: Octane, JetStream, andKraken. Table 3 shows the comparison between the Hex-VASAN instrumented Firefox and native Firefox. To re-duce variance between individual runs, we averaged fif-teen runs for each benchmark (after one warmup run).For each run we started Firefox, ran the benchmark, andclosed the browser. HexVASAN incurs only 1.08% and1.01% overhead for Octane and JetStream respectivelyand speeds up around 0.01% for Kraken. These num-bers are indistinguishable from measurement noise. Oc-tane [4] and JetStream measure the time a test takes tocomplete and then assign a score that is inversely pro-portional to the runtime, whereas Kraken [3] measures

Benchmark Native HexVASAN

OctaneAVERAGE 31241.80 30907.73STDDEV 2449.82 2442.82OVERHEAD -1.08%

JetStreamAVERAGE 200.76 198.75STDDEV 0.66 1.68OVERHEAD -1.01%

KrakenAVERAGE [ms] 832.48 832.41STDDEV [ms] 7.41 12.71OVERHEAD 0.01%

Table 3: Performance overhead on Firefox benchmarks.For Octane and JetStream higher is better, while forKraken lower is better.

0.9

0.95

1

1.05

1.1

Native HexVASAN

Figure 2: Run-time overhead of HexVASAN in theSPECint CPU2006 benchmarks, compared to baselineLLVM 3.9.1 performance.

the speed of test cases gathered from different real-worldapplications and libraries.

6.5 SPEC CPU2006

We measured HexVASAN’s run-time overhead by run-ning the SPEC CPU2006 integer (CINT2006) bench-marks on an Ubuntu 14.04.5 LTS machine with an IntelXeon E5-2660 CPU and 64 GiB of RAM. We ran eachbenchmark program on its reference inputs and measuredthe average run-time over three runs. Figure 2 shows theresults of these tests. We compiled each benchmark witha vanilla clang/LLVM 3.9.1 compiler and optimizationlevel -O3 to establish a baseline. We then compiled thebenchmarks with our modified clang/LLVM 3.9.1 com-piler to generate the HexVASAN results.

The geometric mean overhead in these benchmarkswas just 0.45%, indistinguishable from measurementnoise. The only individual benchmark result that standsout is that of libquantum. This benchmark program per-formed 880M variadic function calls in a run of just 433seconds.

6.6 Micro-benchmarks

Besides evaluating large benchmarks, we have alsomeasured HexVASAN’s runtime overhead on a set ofmicro-benchmarks. We have written test cases for vari-adic functions with different number of arguments, inwhich we repeatedly invoke the variadic functions. Ta-ble 4 shows the comparison between the native andHexVASAN-instrumented micro-benchmarks. Overall,HexVASAN incurs runtime overheads of 4-6x for vari-adic function calls due to the additional security checks.In real-world programs, however, variadic functions areinvoked rarely, so HexVASAN has little impact on theoverall runtime performance.

# calls Native [µs] HexVASAN [µs]

Variadic functionargument count: 3 1 0 0

100 2 121000 20 125

Variadic functionargument count: 12 1 0 0

100 6 221000 55 198

Table 4: Performance overhead in micro-benchmarks.

7 Related work

HexVASAN can either be used as an always-on runtimemonitor to mitigate exploits or as a sanitizer to detectbugs, sharing similarities with the sanitizers that existprimarily in the LLVM compiler. Similar to HexVASAN,these sanitizers embed run-time checks into a programby instrumenting potentially dangerous program instruc-tions.

AddressSanitizer [54] (ASan), instruments memoryaccesses and allocation sites to detect spatial memoryerrors, such as out-of-bounds accesses, as well as tem-poral memory errors, such as use-after-free bugs. Unde-fined Behavior Sanitizer [52] (UBSan) instruments vari-ous types of instructions to detect operations whose se-mantics are not strictly defined by the C and C++ stan-dards, e.g., increments that cause signed integers to over-flow, or null-pointer dereferences. Thread Sanitizer [55](TSAN) instruments memory accesses and atomic opera-tions to detect data races, deadlocks, and various misusesof synchronization primitives. Memory Sanitizer [58](MSAN) detects uses of uninitialized memory.

CaVer [32] is a sanitizer targeted at verifying correct-ness of downcasts in C++. Downcasting converts a baseclass pointer to a derived class pointer. This operationmay be unsafe as it cannot be statically determined, ingeneral, if the pointed-to object is of the derived classtype. TypeSan [25] is a refinement of CaVer that reducesoverhead and improves the sanitizer coverage.

UniSan [34] sanitizes information leaks from the ker-nel. It ensures that data is initialized before leaving thekernel, preventing reads of uninitialized memory.

All of these sanitizers are highly effective at findingspecific types of bugs, but, unlike HexVASAN, they donot address misuses of variadic functions. The aforemen-tioned sanitizers also differ from HexVASAN in that theytypically incur significant run-time and memory over-head.

Different control-flow hijacking mitigations offer par-tial protection against variadic function attacks bypreventing adversaries from calling variadic functionsthrough control-flow edges that do not appear in legit-

imate executions of the program. Among these miti-gations, we find Code Pointer Integrity (CPI) [30], amitigation that prevents attackers from overwriting codepointers in the program, and various implementations ofControl-Flow Integrity (CFI), a technique that does notprevent code pointer overwrites, but rather verifies the in-tegrity of control-flow transfers in the program [6, 7, 11,14–16,21,22,28,35,37,38,41–44,46,49–51,59,61–66].

Control-flow hijacking mitigations cannot prevent at-tackers from overwriting variadic arguments directly.At best, they can prevent variadic functions from be-ing called through control-flow edges that do not ap-pear in legitimate executions of the program. We there-fore argue that HexVASAN and these mitigations areorthogonal. Moreover, prior research has shown thatmany of the aforementioned implementations fail to fullyprevent control-flow hijacking as they are too impre-cise [8, 17, 19, 23], too limited in scope [53, 57], vulner-able to information leakage attacks [18], or vulnerableto spraying attacks [24, 45]. We further showed in Sec-tion 6.1 that variadic functions exacerbate CFI’s impre-cision problems, allowing additional leeway for adver-saries to attack variadic functions.

Defenses that protect against direct overwrites or mis-use of variadic arguments have thus far only focused onformat string attacks, which are a subset of the possibleattacks on variadic functions. LibSafe detects potentiallydangerous calls to known format string functions suchas printf and sprintf [60]. A call is considered dan-gerous if a %n specifier is used to overwrite the framepointer or return address, or if the argument list for theprintf function is not contained within a single stackframe. FormatGuard [12] instruments calls to printfand checks if the number of arguments passed to printfmatches the number of format specifiers used in the for-mat string.

Shankar et al. proposed to use static taint analysis todetect calls to format string functions where the formatstring originates from an untrustworthy source [56]. Thisapproach was later refined by Chen and Wagner [10] andused to analyze thousands of packages in the Debian 3.1Linux distribution. TaintCheck [39] also detects untrust-worthy format strings, but relies on dynamic taint analy-sis to do so.

FORTIFY SOURCE of glibc provides some lightweightchecks to ensure all the arguments are consumed. How-ever, it can be bypassed [2] and does not check for type-mismatch. Hence, none of these aforementioned solu-tions provide comprehensive protection against variadicargument overwrites or misuse.

8 Conclusions

Variadic functions introduce an implicitly defined con-tract between the caller and callee. When the program-mer fails to enforce this contract correctly, the violationleads to runtime crashes or opens up a vulnerability toan attacker. Current tools, including static type check-ers and CFI implementations, do not find variadic func-tion type errors or prevent attackers from exploiting callsto variadic functions. Unfortunately, variadic functionsare prevalent. Programs such as SPEC CPU2006, Fire-fox, Apache, CPython, nginx, wireshark and librariesfrequently leverage variadic functions to offer flexibilityand abundantly call these functions.

We have designed a sanitizer, HexVASAN, that ad-dresses this attack vector. HexVASAN is a light weightruntime monitor that detects bugs in variadic functionsand prevents the bugs from being exploited. It imposesnegligible overhead (0.45%) on the SPEC CPU2006benchmarks and is effective at detecting type violationswhen calling variadic arguments.

9 Acknowledgments

We thank the anonymous reviewers for their insightfulcomments. We also thank our shepherd Adam Doupefor his informative feedback. This material is basedin part upon work supported by the National ScienceFoundation under awards CNS-1513783, CNS-1657711,and CNS-1619211, by the Defense Advanced ResearchProjects Agency (DARPA) under contracts FA8750-15-C-0124 and FA8750-15-C-0085, and by Intel Corpora-tion. We also gratefully acknowledge a gift from OracleCorporation. Any opinions, findings, and conclusions orrecommendations expressed in this material are those ofthe authors and do not necessarily reflect the views ofthe National Science Foundation, the Defense AdvancedResearch Projects Agency (DARPA) and its ContractingAgents, or any other agency of the U.S. Government.

References[1] http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=

CVE-2015-8617.

[2] A eulogy for format strings. http://phrack.org/issues/67/9.html.

[3] Kraken benchmark. https://wiki.mozilla.org/Kraken.

[4] Octane benchmark. https://developers.google.com/octane/faq.

[5] Using the gnu compiler collection (gcc) - function at-tributes. https://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Function-Attributes.html.

[6] ABADI, M., BUDIU, M., ERLINGSSON, U., AND LIGATTI, J.Control-flow integrity. In ACM Conference on Computer andCommunications Security (CCS) (2005).

[7] BOUNOV, D., KICI, R., AND LERNER, S. Protecting C++ dy-namic dispatch through vtable interleaving. In Symposium onNetwork and Distributed System Security (NDSS) (2016).

[8] CARLINI, N., BARRESI, A., PAYER, M., WAGNER, D., ANDGROSS, T. R. Control-flow bending: On the effectiveness ofcontrol-flow integrity. In USENIX Security Symposium (2015).

[9] CASTRO, M., COSTA, M., MARTIN, J.-P., PEINADO, M.,AKRITIDIS, P., DONNELLY, A., BARHAM, P., AND BLACK, R.Fast byte-granularity software fault isolation. In ACM Symposiumon Operating Systems Principles (SOSP) (2009).

[10] CHEN, K., AND WAGNER, D. Large-scale analysis of formatstring vulnerabilities in debian linux. In Proceedings of the 2007workshop on Programming languages and analysis for security(2007).

[11] CHENG, Y., ZHOU, Z., MIAO, Y., DING, X., AND DENG,R. H. ROPecker: A generic and practical approach for defendingagainst ROP attacks. In Symposium on Network and DistributedSystem Security (NDSS) (2014).

[12] COWAN, C., BARRINGER, M., BEATTIE, S., KROAH-HARTMAN, G., FRANTZEN, M., AND LOKIER, J. Formatguard:Automatic protection from printf format string vulnerabilities. InUSENIX Security Symposium (2001).

[13] COWAN, C., PU, C., MAIER, D., WALPOLE, J., BAKKE, P.,BEATTIE, S., GRIER, A., WAGLE, P., ZHANG, Q., AND HIN-TON, H. Stackguard: Automatic adaptive detection and preven-tion of buffer-overflow attacks. In USENIX Security Symposium(1998).

[14] CRISWELL, J., DAUTENHAHN, N., AND ADVE, V. KCoFI:Complete control-flow integrity for commodity operating systemkernels. In IEEE Symposium on Security and Privacy (S&P)(2014).

[15] DAVI, L., DMITRIENKO, A., EGELE, M., FISCHER, T., HOLZ,T., HUND, R., NURNBERGER, S., AND SADEGHI, A.-R.MoCFI: A framework to mitigate control-flow attacks on smart-phones. In Symposium on Network and Distributed System Secu-rity (NDSS) (2012).

[16] DAVI, L., KOEBERL, P., AND SADEGHI, A.-R. Hardware-assisted fine-grained control-flow integrity: Towards efficientprotection of embedded systems against software exploitation. InAnnual Design Automation Conference (DAC) (2014).

[17] DAVI, L., SADEGHI, A.-R., LEHMANN, D., AND MONROSE,F. Stitching the gadgets: On the ineffectiveness of coarse-grainedcontrol-flow integrity protection. In USENIX Security Symposium(2014).

[18] EVANS, I., FINGERET, S., GONZALEZ, J., OTGONBAATAR, U.,TANG, T., SHROBE, H., SIDIROGLOU-DOUSKOS, S., RINARD,M., AND OKHRAVI, H. Missing the point (er): On the effective-ness of code pointer integrity. In IEEE Symposium on Securityand Privacy (S&P) (2015).

[19] EVANS, I., LONG, F., OTGONBAATAR, U., SHROBE, H., RI-NARD, M., OKHRAVI, H., AND SIDIROGLOU-DOUSKOS, S.Control jujutsu: On the weaknesses of fine-grained control flowintegrity. In ACM Conference on Computer and CommunicationsSecurity (CCS) (2015).

[20] EXPLOIT DATABASE. sudo debug privilege escalation. https://www.exploit-db.com/exploits/25134/, 2013.

[21] GAWLIK, R., AND HOLZ, T. Towards Automated IntegrityProtection of C++ Virtual Function Tables in Binary Programs.In Annual Computer Security Applications Conference (ACSAC)(2014).

[22] GE, X., TALELE, N., PAYER, M., AND JAEGER, T. Fine-Grained Control-Flow Integrity for Kernel Software. In IEEEEuropean Symp. on Security and Privacy (2016).

http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-8617

http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-8617

http://phrack.org/issues/67/9.html

http://phrack.org/issues/67/9.html

https://wiki.mozilla.org/Kraken

https://developers.google.com/octane/faq

https://developers.google.com/octane/faq

https://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Function-Attributes.html

https://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Function-Attributes.html

https://www.exploit-db.com/exploits/25134/

https://www.exploit-db.com/exploits/25134/

[23] GOKTAS, E., ATHANASOPOULOS, E., BOS, H., AND POR-TOKALIDIS, G. Out of control: Overcoming control-flow in-tegrity. In IEEE Symposium on Security and Privacy (S&P)(2014).

[24] GOKTAS, E., GAWLIK, R., KOLLENDA, B., ATHANASOPOU-LOS, E., PORTOKALIDIS, G., GIUFFRIDA, C., AND BOS, H.Undermining information hiding (and what to do about it). InUSENIX Security Symposium (2016).

[25] HALLER, I., JEON, Y., PENG, H., PAYER, M., GIUFFRIDA,C., BOS, H., AND VAN DER KOUWE, E. Typesan: Practicaltype confusion detection. In ACM Conference on Computer andCommunications Security (CCS) (2016).

[26] Information technology – Programming languages – C++. Stan-dard, International Organization for Standardization, Geneva,CH, Dec. 2014.

[27] Information technology – Programming languages – C. Standard,International Organization for Standardization, Geneva, CH, Dec.2011.

[28] JANG, D., TATLOCK, Z., AND LERNER, S. SAFEDISPATCH:Securing C++ virtual calls from memory corruption attacks. InSymposium on Network and Distributed System Security (NDSS)(2014).

[29] JELINEK, J. FORTIFY SOURCE. https://gcc.gnu.org/ml/gcc-patches/2004-09/msg02055.html, 2004.

[30] KUZNETSOV, V., SZEKERES, L., PAYER, M., CANDEA, G.,SEKAR, R., AND SONG, D. Code-pointer integrity. In USENIXSymposium on Operating Systems Design and Implementation(OSDI) (2014).

[31] LATTNER, C., AND ADVE, V. Llvm: A compilation frameworkfor lifelong program analysis & transformation. In IEEE/ACMInternational Symposium on Code Generation and Optimization(CGO) (2004).

[32] LEE, B., SONG, C., KIM, T., AND LEE, W. Type casting verifi-cation: Stopping an emerging attack vector. In USENIX SecuritySymposium (2015).

[33] LINUX PROGRAMMER’S MANUAL. va start (3) - Linux ManualPage.

[34] LU, K., SONG, C., KIM, T., AND LEE, W. Unisan: Proactivekernel memory initialization to eliminate data leakages. In ACMConference on Computer and Communications Security (CCS)(2016).

[35] MASHTIZADEH, A. J., BITTAU, A., BONEH, D., ANDMAZIERES, D. Ccfi: cryptographically enforced control flowintegrity. In ACM Conference on Computer and CommunicationsSecurity (CCS) (2015).

[36] MATZ, M., HUBICKA, J., JAEGER, A., AND MITCHELL, M.System v application binary interface. AMD64 Architecture Pro-cessor Supplement, Draft v0.99 (2013).

[37] MICROSOFT CORPORATION. Control Flow Guard (Windows).https://msdn.microsoft.com/en-us/library/windows/desktop/mt637065(v=vs.85).aspx, 2016.

[38] MOHAN, V., LARSEN, P., BRUNTHALER, S., HAMLEN, K.,AND FRANZ, M. Opaque control-flow integrity. In Symposiumon Network and Distributed System Security (NDSS) (2015).

[39] NEWSOME, J., AND SONG, D. Dynamic taint analysis for auto-matic detection, analysis, and signature generation of exploits oncommodity software. In Symposium on Network and DistributedSystem Security (NDSS) (2005).

[40] NISSIL, R. http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2009-1886.

[41] NIU, B., AND TAN, G. Monitor integrity protection with spaceefficiency and separate compilation. In ACM Conference on Com-puter and Communications Security (CCS) (2013).

[42] NIU, B., AND TAN, G. Modular control-flow integrity. In ACMSIGPLAN Conference on Programming Language Design andImplementation (PLDI) (2014).

[43] NIU, B., AND TAN, G. RockJIT: Securing just-in-time compila-tion using modular control-flow integrity. In ACM Conference onComputer and Communications Security (CCS) (2014).

[44] NIU, B., AND TAN, G. Per-input control-flow integrity. In ACMConference on Computer and Communications Security (CCS)(2015).

[45] OIKONOMOPOULOS, A., ATHANASOPOULOS, E., BOS, H.,AND GIUFFRIDA, C. Poking holes in information hiding. InUSENIX Security Symposium (2016).

[46] PAPPAS, V., POLYCHRONAKIS, M., AND KEROMYTIS, A. D.Transparent ROP exploit mitigation using indirect branch tracing.In USENIX Security Symposium (2013).

[47] PAX TEAM. Pax address space layout randomization (aslr).

[48] PAX TEAM. PaX non-executable pages design & implementa-tion. http://pax.grsecurity.net/docs/noexec.txt, 2004.

[49] PAYER, M., BARRESI, A., AND GROSS, T. R. Fine-grainedcontrol-flow integrity through binary hardening. In Conferenceon Detection of Intrusions and Malware & Vulnerability Assess-ment (DIMVA) (2015).

[50] PEWNY, J., AND HOLZ, T. Control-flow restrictor: Compiler-based CFI for iOS. In Annual Computer Security ApplicationsConference (ACSAC) (2013).

[51] PRAKASH, A., HU, X., AND YIN, H. vfGuard: Strict Protectionfor Virtual Function Calls in COTS C++ Binaries. In Symposiumon Network and Distributed System Security (NDSS) (2015).

[52] PROJECT, G. C. Undefined behavior sanitizer.https://www.chromium.org/developers/testing/undefinedbehaviorsanitizer.

[53] SCHUSTER, F., TENDYCK, T., LIEBCHEN, C., DAVI, L.,SADEGHI, A.-R., AND HOLZ, T. Counterfeit object-orientedprogramming: On the difficulty of preventing code reuse attacksin c++ applications. In IEEE Symposium on Security and Privacy(S&P) (2015).

[54] SEREBRYANY, K., BRUENING, D., POTAPENKO, A., ANDVYUKOV, D. Addresssanitizer: a fast address sanity checker.In USENIX Annual Technical Conference (2012).

[55] SEREBRYANY, K., AND ISKHODZHANOV, T. Threadsanitizer:Data race detection in practice. In Workshop on Binary Instru-mentation and Applications (2009).

[56] SHANKAR, U., TALWAR, K., FOSTER, J. S., AND WAGNER,D. Detecting format string vulnerabilities with type qualifiers. InUSENIX Security Symposium (2001).

[57] SNOW, K. Z., MONROSE, F., DAVI, L., DMITRIENKO, A.,LIEBCHEN, C., AND SADEGHI, A. Just-in-time code reuse: Onthe effectiveness of fine-grained address space layout randomiza-tion. In IEEE Symposium on Security and Privacy (S&P) (2013).

[58] STEPANOV, E., AND SEREBRYANY, K. Memorysanitizer: Fastdetector of uninitialized memory use in c++. In IEEE/ACM In-ternational Symposium on Code Generation and Optimization(CGO) (2015).

[59] TICE, C., ROEDER, T., COLLINGBOURNE, P., CHECKOWAY,S., ERLINGSSON, U., LOZANO, L., AND PIKE, G. Enforcingforward-edge control-flow integrity in gcc & llvm. In USENIXSecurity Symposium (2014).

https://gcc.gnu.org/ml/gcc-patches/2004-09/msg02055.html

https://gcc.gnu.org/ml/gcc-patches/2004-09/msg02055.html

https://msdn.microsoft.com/en-us/library/windows/desktop/mt637065(v=vs.85).aspx

https://msdn.microsoft.com/en-us/library/windows/desktop/mt637065(v=vs.85).aspx

http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2009-1886

http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2009-1886

http://pax.grsecurity.net/docs/noexec.txt

https://www.chromium.org/developers/testing/undefinedbehaviorsanitizer

https://www.chromium.org/developers/testing/undefinedbehaviorsanitizer

[60] TSAI, T., AND SINGH, N. Libsafe 2.0: Detection of format stringvulnerability exploits. white paper, Avaya Labs (2001).

[61] VAN DER VEEN, V., ANDRIESSE, D., GOKTAS, E., GRAS, B.,SAMBUC, L., SLOWINSKA, A., BOS, H., AND GIUFFRIDA,C. PathArmor: Practical ROP protection using context-sensitiveCFI. In ACM Conference on Computer and Communications Se-curity (CCS) (2015).

[62] WANG, Z., AND JIANG, X. Hypersafe: A lightweight approachto provide lifetime hypervisor control-flow integrity. In IEEESymposium on Security and Privacy (S&P) (2010).

[63] YUAN, P., ZENG, Q., AND DING, X. Hardware-assisted fine-grained code-reuse attack detection. In International Symposiumon Research in Attacks, Intrusions and Defenses (RAID) (2015).

[64] ZHANG, C., SONG, C., CHEN, K. Z., CHEN, Z., AND SONG,D. VTint: Defending virtual function tables’ integrity. In Sympo-sium on Network and Distributed System Security (NDSS) (2015).

[65] ZHANG, C., WEI, T., CHEN, Z., DUAN, L., SZEKERES, L.,MCCAMANT, S., SONG, D., AND ZOU, W. Practical controlflow integrity and randomization for binary executables. In IEEESymposium on Security and Privacy (S&P) (2013).

[66] ZHANG, M., AND SEKAR, R. Control flow integrity for cotsbinaries. In USENIX Security Symposium (2013).

Venerable Variadic Vulnerabilities Vanquished - ICS, UCI

Documents