Top Banner
AUTOMATICALLY ELIMINATING SPECULATIVE LEAKS WITH BLADE Abstract We introduce BLADE, a new approach to automat- ically and efficiently synthesizing provably correct repairs for transient execution vulnerabilities like Spectre. BLADE is built on the insight that to stop speculative execution attacks, it suffices to cut the dataflow from expressions that specu- latively introduce secrets (sources) to those that leak them through the cache (sinks), rather than prohibiting speculation altogether. We formalize this insight in a static type sytem that (1) types each expression as either transient, i.e., possibly containing speculative secrets or as being stable, and (2) pro- hibits speculative leaks by requiring that all sink expressions are stable. We introduce protect, a new abstract primitive for fine grained speculation control that can be implemented via existing architectural mechanisms, and show how our type system can automatically synthesize a minimal number of protect calls needed to ensure the program is secure. We evaluate BLADE by using it to repair several verified, yet vulnerable WebAssembly implementations of cryptographic primitives. BLADE can fix existing programs that leak via speculation automatically, without user intervention, and efficiently using two orders of magnitude fewer fences than would be added by existing compilers, thereby and ensuring security with minimal performance overhead. 1 Introduction Implementing secure cryptographic algorithms is hard. The code must not only be functionally correct and memory safe, it must avoid divulging secrets indirectly through side channels like control-flow, memory-access patterns, or execution time. Consequently, much recent work focuses on how to ensure im- plementations do not leak secrets e.g., via type systems [12, 39], verification[4], and program transformations [6]. Unfortunately, these efforts are foiled by speculative exe- cution. Even if secrets are closely controlled via guards and access checks, the processor can simply ignore those checks when executing speculatively. An attacker can exploit this to leak secrets in turn. In principle, memory fences block speculation, and hence, offer a way to recover the original security guarantees. In practice, however, fences pose a confounding dilemma. Pro- grammers can either rely on heuristic approaches for inserting fences [37], but then forgo guarantees about the absence of side-channels. Alternatively, they can recover security guar- antees by conservatively inserting fences after every load, but endure the huge performance costs. In this paper, we introduce BLADE, a new approach to automatically, provably and efficiently eliminate speculation- based leakage. BLADE is based on the key insight that to prevent leaking data via speculative execution, it is unneces- sary to stop all speculation as done by traditional memory fences. Instead, it suffices to cut the data flow from expressions (sources) that speculatively introduce secrets to those that leak them through the cache (sinks). We develop this insight into an automatic enforcement algorithm via four contributions. 1. A Semantics for Speculation. Our first contribution is a formal operational semantics for a simple While language that precisely captures how speculation can occur and what an attacker can observe via speculation (§ 3). To prevent leak- age, we propose and formalize the semantics of an abstract primitive called protect that does not exist in today’s hard- ware but captures the essence of several primitives proposed in recent work [2, 32]. Furthermore, this primitive can be implemented in software e.g., via speculative load harden- ing [30]. Crucially, and in contrast to a regular fence which stops all speculation, protect only stops speculation for a given variable. For example x :=protect(e) ensures that e ’s value is only assigned to x after e has been assigned its stable, non-speculative value. 2. A Type System for Speculation. Our second contribution is an approach to conservatively approximating the dynamic semantics of speculation via a static type sytem that types each expression as either transient (T), i.e., expressions that may contain speculative secrets, or stable (S), i.e., those that cannot (§ 4.1). Our system prohibits speculative leaks by requiring that all sink expressions that can influence intrinsic attacker visible behavior (e.g., cache addresses) are typed as stable. We connect the static and dynamic semantics by proving that well-typed programs are indeed secure, i.e., satisfy a cor- rectness condition called speculative non-interference [17] which states that the program does not leak under speculative execution more than it would under sequential execution. 3. Automatic Protection. Existing programs that are free of protect statements are likely insecure under specula- tion and will be rejected by our type system. Thus, our third contribution is an algorithm that automatically synthesizes a minimal number of protect statements to ensure that the program satisfies speculative non-interference. To this end, we extend the type checker to construct a def-use graph that captures the data-flow between program expressions. A cut- set in the graph is a set of variables whose removal eliminates all paths from secret-sources to observable-sinks. We show that inserting a protect statement for each variable in a cut- set suffices to yield a program that is well-typed, and hence, secure with respect to speculation (§5.3). Happily, finding such cuts is an instance of the classic max-flow/min-cut prob- lem, so existing polynomial time algorithms let us efficiently 1
23

AUTOMATICALLYELIMINATINGSPECULATIVELEAKSWITH BLADEgoto.ucsd.edu/~gleissen/papers/BLADE.pdf · vulnerable WebAssembly implementations of cryptographic primitives. BLADE can fix existing

Jun 26, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: AUTOMATICALLYELIMINATINGSPECULATIVELEAKSWITH BLADEgoto.ucsd.edu/~gleissen/papers/BLADE.pdf · vulnerable WebAssembly implementations of cryptographic primitives. BLADE can fix existing

AUTOMATICALLY ELIMINATING SPECULATIVE LEAKSWITH BLADE

Abstract WeintroduceBLADE, a newapproach to automat-ically and efficiently synthesizing provably correct repairsfor transient execution vulnerabilities like Spectre. BLADE isbuilt on the insight that to stop speculative execution attacks,it suffices to cut the dataflow from expressions that specu-latively introduce secrets (sources) to those that leak themthrough the cache (sinks), rather than prohibiting speculationaltogether. We formalize this insight in a static type sytemthat (1) types each expression as either transient, i.e., possiblycontaining speculative secrets or as being stable, and (2) pro-hibits speculative leaks by requiring that all sink expressionsare stable. We introduce protect, a new abstract primitivefor fine grained speculation control that can be implementedvia existing architectural mechanisms, and show how ourtype system can automatically synthesize aminimal numberof protect calls needed to ensure the program is secure.We evaluate BLADE by using it to repair several verified, yetvulnerableWebAssembly implementations of cryptographicprimitives. BLADE can fix existing programs that leak viaspeculation automatically, without user intervention, andefficiently using two orders of magnitude fewer fences thanwould be added by existing compilers, thereby and ensuringsecurity with minimal performance overhead.

1 Introduction

Implementing secure cryptographic algorithms is hard. Thecodemustnot onlybe functionally correct andmemory safe, itmust avoid divulging secrets indirectly through side channelslike control-flow, memory-access patterns, or execution time.Consequently,much recentwork focusesonhowtoensure im-plementationsdonot leaksecrets e.g.,via typesystems[12,39],verification[4], and program transformations [6].

Unfortunately, these efforts are foiled by speculative exe-cution. Even if secrets are closely controlled via guards andaccess checks, the processor can simply ignore those checkswhen executing speculatively. An attacker can exploit this toleak secrets in turn.

In principle, memory fences block speculation, and hence,offer a way to recover the original security guarantees. Inpractice, however, fences pose a confounding dilemma. Pro-grammerscaneither relyonheuristic approaches for insertingfences [37], but then forgo guarantees about the absence ofside-channels. Alternatively, they can recover security guar-antees by conservatively inserting fences after every load, butendure the huge performance costs.In this paper, we introduce BLADE, a new approach to

automatically, provably and efficiently eliminate speculation-based leakage. BLADE is based on the key insight that to

prevent leaking data via speculative execution, it is unneces-sary to stop all speculation as done by traditional memoryfences. Instead, it suffices to cut thedataflowfromexpressions(sources) that speculatively introducesecrets to those that leakthem through the cache (sinks). We develop this insight intoan automatic enforcement algorithm via four contributions.1. A Semantics for Speculation.Our first contribution is aformal operational semantics for a simple While languagethat precisely captures how speculation can occur and whatan attacker can observe via speculation (§ 3). To prevent leak-age, we propose and formalize the semantics of an abstractprimitive calledprotect that does not exist in today’s hard-ware but captures the essence of several primitives proposedin recent work [2, 32]. Furthermore, this primitive can beimplemented in software e.g., via speculative load harden-ing [30]. Crucially, and in contrast to a regular fence whichstops all speculation, protect only stops speculation fora given variable. For example x :=protect(e) ensures that e’svalue is only assigned to x after e has been assigned its stable,non-speculative value.2. A Type System for Speculation.Our second contributionis an approach to conservatively approximating the dynamicsemantics of speculation via a static type sytem that types eachexpression as either transient (T), i.e., expressions that maycontain speculative secrets, or stable (S), i.e., those that cannot(§ 4.1). Our system prohibits speculative leaks by requiringthat all sink expressions that can influence intrinsic attackervisible behavior (e.g., cache addresses) are typed as stable.We connect the static and dynamic semantics by provingthat well-typed programs are indeed secure, i.e., satisfy a cor-rectness condition called speculative non-interference [17]which states that the programdoes not leak under speculativeexecution more than it would under sequential execution.3. Automatic Protection. Existing programs that are freeof protect statements are likely insecure under specula-tion and will be rejected by our type system. Thus, our thirdcontribution is an algorithm that automatically synthesizes aminimal number of protect statements to ensure that theprogram satisfies speculative non-interference. To this end,we extend the type checker to construct a def-use graph thatcaptures the data-flow between program expressions. A cut-set in the graph is a set of variables whose removal eliminatesall paths from secret-sources to observable-sinks. We showthat inserting aprotect statement for each variable in a cut-set suffices to yield a program that is well-typed, and hence,secure with respect to speculation (§5.3). Happily, findingsuch cuts is an instance of the classic max-flow/min-cut prob-lem, so existing polynomial time algorithms let us efficiently

1

Page 2: AUTOMATICALLYELIMINATINGSPECULATIVELEAKSWITH BLADEgoto.ucsd.edu/~gleissen/papers/BLADE.pdf · vulnerable WebAssembly implementations of cryptographic primitives. BLADE can fix existing

Conference’17, July 2017, Washington, DC, USA

1 void SHA2_update_last(int *input_len, ...)

2 {

3 if (! valid(input_len)) { ... }

4 int len = *input_len;

5 int *dst3 = ... + len;

6 _mm_lfence();7 int *dst3_safe = protect(.. + len);8 ...

9 *dst3_safe = pad;

10 ...

11 }

Figure 1. Code fragment from the HACL* SHA2 imple-mentation, containing a potential speculative executionvulnerability that leaks explicitly through the cache bywriting memory at a secret-tainted address (line 9). A naivepatch is shown is shown in red, the patch computed byBLADE is shown in green.

synthesize protect statements that resolve the dilemma ofenforcing security with minimal performance overhead.

4. Evaluation.Our final contribution is an implementationof our method in a tool called BLADE, and an evaluationusing BLADE to repair verified yet vulnerable (to transientexecution attacks) programs: theWebAssembly implemen-tations of the signal messaging Protocol and its respectivecryptographic libraries [29], and a number of verified cryp-tographic algorithms from [38] (§ 6). Our evaluation showsthat BLADE can automatically compute fixes for existingprograms. Compared to an existing fully automatic protec-tion as implemented in existing compilers (notably Clang),BLADE inserts two orders of magnitude fewer fences andthus imposes negligible performance overhead.

2 Overview

In this section,wepresent twopotential speculative executionvulnerabilities inHACL*— a verified cryptographic library— that were discovered by BLADE and discuss how BLADErepairs the vulnerabilities by inserting protect statements.We then showhowBLADE computes the repairs via ourmini-mal fence inference algorithm and finally how BLADE provesthat the repairs are indeed correct, via our transient-flow typesystem.

2.1 Two Speculation Bugs and Their Fixes

Figure 1 shows a code fragment from a function in the im-plementation of the SHA2 hash inHACL*. Though BLADEoperates onWebAssembly, we present equivalent simplifiedC code for readability. The function takes as input a pointerinput_len, validates the input (line 3), loads frommemory

the public length of the hash (line 4), calculates a target ad-dress dst3 (line 5), and finally pads the buffer pointed to bydst3 (line 9).1. Leaking Through a Memory Write. During normal, se-quential execution this code is not a problem: the functionvalidates the input to prevent classic buffer overflows vulner-abilities. However, an attacker can exploit the function to leaksenstive data during speculation. To do this, the attacker firsthas to modify the value that the pointer input_len holdsduring speculation. Since input_len is a function parame-ter, this can be achieved e.g., by calling the function repeatedlywith legitimate addresses, training thebranchpredictor topre-dict the next input to be valid. After (mis)training the branchpredictor, the attacker manipulates input_len to point toan address containing secret data (e.g., the secret key usedby the hash function) and calls the function again, this timewith an invalid pointer. As a result of the mistraining, thebranch predictor causes the processor to skip validation anderroneously load the secret intolen, which in turn, is used tocalculate pointerdst3. The buffer pointed to bydst3 is thenwritten in line 9, completing the attack. Even though pointerdst3 is incorrect due to misprediction and the write willtherefore be squashed, its side-effects persist, and thereforeremain visible to the attacker. The attacker can then extractthe target address — and thereby the secret via cache timingmeasurements [16].Preventing the Attack: Memory Fences. Since the attackexploits the fact that input validation is speculatively skipped,we can prevent it bymaking sure that the buffer in line 9 is notwritten until the input has been validated. To mitigate theseclass of attacks, Intel [19] and AMD [5] recommend insertinga speculationbarrier after critical validation check-points. Fol-lowing this strategy, wewould place amemory fence on line 6.This fencestopsall speculativeexecutionpast the fence, i.e.,nostatements after the fence are executed until all previous state-ments (including input validation) have been resolved. Whilethe effects of the fence prevent the attack, they are more re-strictive than necessary and incur high performance cost [33].Preventing the Attack Efficiently.We propose an alterna-tive way to stop speculation from reaching the write in line 9through a new primitive called protect. Rather than elimi-nate all speculation,protectonly stops speculation along aparticular data-path. We use protect to patch the programin line 7. Instead of assigning pointer dst3 directly as inline 5, the expression that computes the address is guarded bya protect statement. This ensures that the value assignedto dst3_safe is always guaranteed to use len’s final, non-speculative value. Therefore,writing todst3_safe in line 9prevents any invalid secret-tainted address fromspeculativelyreaching the store, where it could be leaked to the attacker.

Theprotectprimitiveoffers anabstract interface forfinegrained control of speculation. There are a number of possibleimplementations for this interface. For example, protect

Page 3: AUTOMATICALLYELIMINATINGSPECULATIVELEAKSWITH BLADEgoto.ucsd.edu/~gleissen/papers/BLADE.pdf · vulnerable WebAssembly implementations of cryptographic primitives. BLADE can fix existing

Automatically Eliminating Speculative Leaks with BLADE Conference’17, July 2017, Washington, DC, USA

1 void SHA2_update_last(int *input_len,...)

2 {

3 if (! valid(input_len)) { ... }

4 int len = *(input_len);

5 ...

6 int len_safe = protect(*input_len)7 for ( i = 0; i < len_safe + ...)

8 dst2[i] = 0;

9 ...

10 }

Figure 2. SHA2 code fragment containing a potentialspeculative execution vulnerability that leaks implicitlythrough a control-flow dependency.

could be implemented in hardware. While unfortunately,today’s hardware does not offer an equivalent instructionto protect, similar functionalities have been proposed inrecent work [2, 32]. Alternatively, protect can be imple-mented in software (a similar proposal has beenmade in [30]).In general, protect can be implemented through a fenceinstruction.However, better solutions exist for reading arrays.For example, Speculative Load Hardening (SLH), a mitigationdeployed in the code generated by Clang [10], stalls individ-ual array reads until the corresponding bounds-check con-dition gets resolved. We model software implementations ofprotect through a restricted primitive calledsafe_read,which can only be applied to array reads. We then formalizean implementation of safe_read via SLH in the supple-mentary material, and evaluate the number of protect andsafe_read needed to patchHACL* and their overhead inSection 6.

2. Leaking Through aControl-FlowDependency. Figure 2shows a code fragment taken from the same function as inFigure 1. The code contains a second potential vulnerabil-ity, but in contrast to Figure 1 the vulnerability leaks secretsimplicitly, through a control-flow dependency.The function reads from memory a (public) integer len

(line 4), which determines the number of initialization roundsin the condition of the for-loop (line 7). Like the previousvulnerability, the function is harmless under sequential ex-ecution, but leaks under speculation. As before, the attackermanipulates thepointerinput_len to point to a secret aftermistraining the branch predictor to skip validation. But in-steadof leaking thesecretdirectly throughthedatacache, theycan leak the value indirectly through a control-flow depen-dency, e.g., via the instructioncacheandnon-secretdependentlines of the data cache. In particular, the secret determineshow often the initialization loop (line 7) is executed duringspeculation, and therefore an attacker can make secret de-pendent observations via instruction- and data-cache timing

attacks. Like the previous vulnerability, this vulnerability canbe fixed via theprotect primitive, as shown in lines 6 and 7.

2.2 Computing Fixes ViaMinimal Fence Inference

BLADEautomatically infers theplacementof theseprotectstatements. We illustrate this process using a simple runningexample Ex1 shown in Figure 3. The code reads two valuesfrom an array (x :=a[i1] and y :=a[i2]), adds them (z :=x+y),and indexes another array with the result (w :=b[z]). We as-sume that all array operations are implicitly bounds-checkedand thus no explicit validation code is needed.Like the examples above, Ex1 contains a speculative exe-

cution vulnerability: the array reads may skip their boundscheck and so x and y can contain transient secrets (i.e., secretsintroducedbymisspeculation).This secretdata thenflows toz,and finally leaks through the data cache by the array read b[z].Def-Use Graph. To secure the program, we need to cut thedataflow between the array reads which could introduce tran-sient secret values into the program, and the index in the arrayreadwhere theyare leaked through the cache. For this,wefirstbuild a def-use graphwhose nodes and directed edges capturethe data dependencies between the expressions and variablesof a program. For example, consider the def-use graph of pro-gramEx1 in Figure 4. In the graph, the edge x→x+y indicatesthat x is used to compute x+y.1 To track how transient valuespropagate in the def-use graph, we extend the graph withthe special circle node T, which represents the source of tran-sient values of the program. Since reading memory createstransient values, we connect the T node to all nodes contain-ing expressions that explicitly read memory, e.g., T→a[i1].Following the data dependencies along the edges of the def-use graph, we can see that node T is transitively connectedto node z, which indicates that z can contain transient dataat run-time. To detect insecure uses of transient values, wethen extend the graph with the special circle node S, whichrepresents the sink of stable (i.e., non-transient) values of aprogram. Intuitively, this node draws all the values of a pro-gram thatmust be stable to avoid transient execution attacks.Therefore, we connect all expression used as array indices inthe program to the S node, e.g., z→S. The fact that the graphin Figure 3 contains a path fromT to S indicates that transientdata flows through data dependencies into (what should be)a stable index expression and thus the program is insecure.Cutting the Dataflow. In order to make the program safe,we need to cut the data-flow between T and S by introducingas few protect statements as necessary. This problem canbe equivalently restated as follows: find aminimal cut-set, i.e.,a minimal set of variables, such that removing the variablesfrom thegraph eliminates all paths fromT fromS. Each choiceof cut-set defines a way to repair the program: simply adda protect statement for each variable in the set. Figure 4

1To avoid ambiguities in the graph, we assume that each variable is assignedat most once, i.e., the code is in static single assignment form.

Page 4: AUTOMATICALLYELIMINATINGSPECULATIVELEAKSWITH BLADEgoto.ucsd.edu/~gleissen/papers/BLADE.pdf · vulnerable WebAssembly implementations of cryptographic primitives. BLADE can fix existing

Conference’17, July 2017, Washington, DC, USA

x :=a[i1] x :=protect(a[i1])y :=a[i2] y :=protect(a[i2])z :=x+y z :=protect(x+y)w :=b[z]

Figure 3. Ex1: Running Example. The optimal patchcomputed by BLADE is shown in green. A sub-optimal patchis shown in orange.

T

a[i2]

a[i1]b[z]

x

y

x+y z S

i1

i2

Figure 4. Def-use Graph of Ex1. We omit some irrelevantedges for readability. The Figure contains two choicesof cut-sets, shown as dashed lines. The left cut requiresremoving two nodes and thus, inserting two protectstatements. The right cut shows a minimal solution, whichonly requires removing a single node.

contains two choices of cut-sets, shown as dotted lines. Thecut-set on the left requires two protect statements, for vari-ablesx andy respectively, corresponding to theorange patchin Figure 3. The cut-set on the right is minimal, it requiresonly a single protect, for variable z, and corresponds to thegreen patch in Figure 3. In general, the a minimal cut-set canbe computed as a solution to theMin-Cut/Max-Flow problem,for which efficient polynomial-time algorithms exist [1].

2.3 Proving Correctness Via Transient-Flow Types

To formalize and verify the correctness of the patch computedby cutting the def-use graph, we define a transient-flow typesystem and construct the def-use graph for a given programfrom the type-constraints generated during type inference.Typing Judgement. The type system statically assigns atransient-flow type to each variable: a variable is typed astransient (written as T), if it can contain transient data (i.e.,potential secrets) at run-time, and as stable (written as S),otherwise. Given a typing environment Γ which assigns atransient flow type to each variable, and a command c , thetype system defines a judgement Γ ⊢c saying that c is free ofspeculative execution bugs. The type system enforces thattransient expressions may not be used in positions that mayleak their value by affecting memory reads and writes, e.g.,they may not be used as array indices and in loop condi-tions. Additionally, it requires that transient expressions maynot be assigned to stable variables, except through the useof protect. To show that our type system indeed preventsspeculative execution attacks, we define a semantics for spec-ulative execution of a while language (Section 3) and provethatwell-typed programs do not leak speculativelymore than

sequentially, that is by executing their statements in-orderand without speculation (see Section 5).

Type Inference. Given an input program, we construct thecorresponding def-use graph by collecting the type con-straints generated during type inference. Type inference isformalized by a typing-inference judgment Γ,Prot ⊢ c⇒ k,which extends the typing judgment from above with (1) aset of protected variables Prot (the cut-set), and (2) a set oftype-constraints k (the def-use graph). At a high level, typeinference has 3 steps: (i) generate a set of constraints underan initial typing environment and protected set that allowany program to type-check, (ii) construct the def-use graphfrom the constraints and find a cut-set, and (iii) compute theresulting typing environment. To characterize the security ofa still unrepaired program after type inference, we define atyping judgment Γ,Prot⊢c, where unprotected variables areexplicitly accounted for in the Prot set.2 Intuitively, the pro-gram is secure if we promise to insert a protect statementfor each variable in Prot.To repair programs, we simply honor the promise of in-

serting protect statements for each for each variable in theprotected set of the typing judgment obtained above. Oncerepaired, the program type checks under an empty protectedset and with the same typing environment.

2.4 AttackerModel

Before moving to the details of our semantics and transienttype system, we discuss the attacker model considered inthis work. The attacker runs cryptographic code on a spec-ulative out-of-order processor and, as usual, can choose thevalues of public inputs and observe public outputs, but maynot read secret data (e.g., cryptographic keys) in registersand memory. Additionally, the attacker can influence howprograms are speculatively executed through the branch pre-dictor and choose the instructions execution order in theprocessor pipeline. The effects of these actions are observablethrough the cache and are otherwise invisible at the ISA level.In particular, while programs run, the attacker can take pre-cise timing measurements through the data- and instruction-cache with a cache-line granularity, which may disclose se-cret data covertly. These features allow the attacker to mountSpectre-PHT [20, 21] and Spectre-STL [9] attacks and leakdata through FLUSH+RELOAD [43] and PRIME+PROBE [34]cache side-channels attacks. We do not consider speculativeattacks that relyon theReturnStackBuffer (e.g., Ret2Spec [25]and [22]) or the Branch Target Buffer (Spectre-BTB [21]). Wesimilarly do not consider attacks that do not use the cache toexfiltrate data, e.g., port contention (SMoTherSpectre [7]) andMeltdown attacks [9, 24], since hardware fixes address them.

Page 5: AUTOMATICALLYELIMINATINGSPECULATIVELEAKSWITH BLADEgoto.ucsd.edu/~gleissen/papers/BLADE.pdf · vulnerable WebAssembly implementations of cryptographic primitives. BLADE can fix existing

Automatically Eliminating Speculative Leaks with BLADE Conference’17, July 2017, Washington, DC, USA

Value v F n | b | aExpr. e F v | x | e1+e2 | e1⩽e2

| length(e) | base(e)Rhs. r F e | ∗e | e[e]Cmd. c F skip | x :=r | ∗e=e | e1[e2]:=e3

| if e thenc1 elsec2| whileedoc | fail | c1;c2| x :=stable_read(e1,e2)| x :=protect(r)

Figure 5. Surface Syntax.

3 A Semantics for Speculation

We now formalize the concepts presented in the overview.We start by giving a formal semantics for a while languagewith speculative execution. Figure 5 presents the language’ssurface syntax. Values consist of Booleans b, pointers n rep-resented as natural numbers, and arrays a. Array length andbase address are given by functions length(·) and base(·). Inaddition to variable assignments, pointer dereferences, arraystores, conditionals and loops, our language features two spe-cial commands that help prevent transient execution attacks.Command x :=protect(r) evaluates r and assigns its value tox, only after the value is stable (i.e., non-transient). Commandx :=stable_read(e1,e2) is a restricted version of protect(·) thatonly applies to array reads (see Section 3.4) Lastly, fail triggersa memory violation error (caused by reading or writing anarray out-of-bounds) and aborts the program.Processor Instructions.Oursemantics translates the surfacesyntax into an abstract set of processor instructions shown inFigure 6.Our processor instructions donot contain branching,they represent a single predictedpath through the control flow.The prediction choices are represented by a sequence of guardinstructions representing pending branch points. Guard in-structionshave formguard(eb,cs,p),which records thebranchcondition e, its predicted truth value b and a unique guardidentifier p, used in our security analysis (Section 5). Eachguard attests the fact that the current execution is valid only ifthe branch condition gets resolved as predicted. In order to en-able a roll-back in caseof amissprediction, guards additionallyrecord the set of commands cs along the alternative branch.Directives and Observations. Instructions do not have tobe executed in sequence, they can be executed in any order,enabling out-of-order execution. We use a simple three stageprocessor pipeline: the execution of each instruction is splitinto fetch, exec, and retire. We do not fix the order in whichinstructions, and their individual stages are executed, nor dowe supply a model of the branch predictor to decide whichcontrol flow path to follow. Instead, we let the attacker supply

2The judgment Γ ⊢c is just a short-hand for Γ,∅⊢c.

Instr. i F nop | x :=e | x :=load(e)| store(e1,e2) | x :=protect(e)| guard(eb,cs,p) | fail

Dir. d F fetch | fetchb | execn| retire

Obs. o F ϵ | load(n,ps) | store(n,ps)| fail | rollback(p)

Prediction b ∈ { true,false}Guard Id. p ∈ NReorder Buffer is F i :is | [ ]Cmd Stack cs F c :cs | [ ]Memory Store µ ∈ N⇀ValueVariables Map ρ ∈ Var→ValueConfiguration C F ⟨is,cs,µ,ρ⟩

Figure 6. Processor Syntax.

those decisions through a set of directives [11] shown in Fig.6. For example, directive fetch true fetches the true branchof a conditional and exec n executes the nth instruction inthe reorder buffer. Executing an instruction generates an ob-servation (Fig. 6) which records attacker observable behavior.Observations include speculativememory reads and writes(i.e., load(n,ps) and store(n,ps) issued while guards ps arepending), rollbacks (i.e., rollback(p) due to misspeculationof guard p), and memory violations (fail). Most instructionsgenerate the empty observation ϵ .Configurations and Reduction Relation. We formallyspecify our semantics as a reduction relation between proces-sor configurations. A configuration ⟨is,cs,µ,ρ⟩ consists of aqueue of in-flight instructions is called the reorder buffer, astack of commands cs, a memory µ, andmap from variables tovalues ρ. A reduction step C

d−→o C ′ denotes that, under direc-

tived, configurationC is transformed intoC ′ andgeneratesob-servation o. To execute a program cwith initialmemory µ andvariablemapρ, theprocessor initializes theconfigurationwithan empty reorder buffer and inserts the program into the com-mand stack, i.e., ⟨[ ],[c ],µ,ρ⟩. Then, the executionproceedsun-til both thereorderbufferandthestack in theconfigurationareempty , i.e., we reach a configuration of the form ⟨[ ],[ ],µ ′,ρ ′⟩,for some final memory store µ ′ and variable map ρ ′.

We nowdiscuss the semantics rules of each execution stageand then those for our security primitives.

3.1 Fetch Stage

The fetch stage flattens the input command into a sequenceof instructions which it stores in the reorder buffer. Figure 7presents selected rules; the remaining rules are in AppendixA. Rule [Fetch-Seq] pops command c1;c2 from the commandsstack and pushes the two sub-commands for further process-ing. [Fetch-Asgn] pops an assignment from the commandsstack and appends the corresponding processor instruction

Page 6: AUTOMATICALLYELIMINATINGSPECULATIVELEAKSWITH BLADEgoto.ucsd.edu/~gleissen/papers/BLADE.pdf · vulnerable WebAssembly implementations of cryptographic primitives. BLADE can fix existing

Conference’17, July 2017, Washington, DC, USA

Fetch-Seq

⟨is,(c1;c2):cs,µ,ρ⟩fetch−−−→ϵ ⟨is,c1 :c2 :cs,µ,ρ⟩

Fetch-Asgn

⟨is,x :=e :cs,µ,ρ⟩fetch−−−→ϵ ⟨is++[x :=e ],cs,µ,ρ⟩

Fetch-Ptr-Load

⟨is,x :=∗e :cs,µ,ρ⟩fetch−−−→ϵ ⟨is++[x :=load(e)],cs,µ,ρ⟩

Fetch-Array-Loadc=x :=e1[e2] e=e2< length(e1) fresh(p)e′=base(e1)+e2 c′= if e thenx :=∗e′ elsefail

⟨is,c :cs,µ,ρ⟩fetch−−−→ϵ ⟨is,c′ :cs,µ,ρ⟩

Fetch-If-Truec= if e thenc1 elsec2

fresh(p) i=guard(etrue,c2 :cs,p)

⟨is,c :cs,µ,ρ⟩fetchtrue−−−−−−−→ϵ ⟨is++[ i ],c1 :cs,µ,ρ⟩

Figure 7. Fetch stage (selected rules).

(x := e) at the end of the reorder buffer.3 Rule [Fetch-Ptr-Load] is similar and simply translates pointer dereferences tothe corresponding load instruction.Arrays provide amemory-safe interface to read andwritememory: the processor injectsbounds-checks when fetching commands that read and writearrays. For example, rule [Fetch-Load-True] expands com-mand x :=e1[e2] into the corresponding pointer dereference,but guards the commandwith abounds-check condition. First,the rule generates the condition e = e2 < length(e1) and cal-culates the address of the indexed element e′=base(e1)+e2.Then, it replaces the array read on the stack with commandif e then x :=∗e′ else fail to abort the program and preventthe buffer overrun if the bounds check fails. Later, we showthat speculative out-of-order execution can simply ignore thebounds check guard and cause the processor to transientlyread memory at an invalid address. Rule [Fetch-If-True]fetches a conditional branch from the stack and, following theprediction provided in directive fetch true, speculates thatthe condition ewill evaluate to true. Thus, the processor in-serts the corresponding instruction guard(etrue,c2 :cs,p)witha fresh guard identifier p in the reorder buffer and pushesthe then-branch c1 onto the stack cs. Importantly, the guardinstruction stores the else-branch together with a copy ofthe current commands stack (i.e., c2 :cs) as a rollback stack torestart the execution in case of misprediction.

3 Notation [i1, ..., in ] represents a list of n elements, is1++ is2 denotes listconcatenation, and |is | computes the length of the list is.

Execute|is1 |=n−1

ρ ′=ϕ(is1,ρ) ⟨is1,i,is2,cs⟩(µ,ρ′,o)

⟨is′,cs′⟩

⟨is1++[ i ]++is2,cs,µ,ρ⟩execn−−−−−→o ⟨is′,cs′,µ,ρ⟩

Exec-Asgni= (x :=e) v=JeKρ i′= (x :=v)

⟨is1,i,is2,cs⟩(µ,ρ,ϵ )

⟨is1++[ i′ ]++is2,cs⟩

Exec-Branch-Oki=guard(eb,cs′,p) JeKρ =b

⟨is1,i,is2,cs⟩(µ,ρ,ϵ )

⟨is1++[nop]++is2,cs⟩

Exec-Branch-Mispredicti=guard(eb,cs′,p) JeKρ ,b

⟨is1,i,is2,cs⟩(µ,ρ,rollback(p))

⟨is1,cs′⟩

Exec-Loadi= (x :=load(e)) store( , ) < is1

n=JeKρ ps=Lis1M i′= (x :=µ(n))

⟨is1,i,is2,cs⟩(µ,ρ,read(n,ps))

⟨is1++[ i′ ]++is2,cs⟩

Figure 8. Execute stage (selected rules).

3.2 Execute Stage

In the execute stage, the processor evaluates the operands ofinstructions in the reorder buffer and rolls back the programstate whenever it detects a misprediction.Transient Variable Map. To evaluate operands in the pres-ence of out-of-order execution, we need to take into accounthowprevious, possibly unresolved assignments in the reorderbuffer affect the variable map. In particular, we need to en-sure that an instruction cannot execute if it depends on apreceding assignment whose value is still unknown. To up-date variable map ρ with the pending assignments in reorderbuffer is, we define a function ϕ(is,ρ), called the transientvariable map. The function walks through the reorder buffer,registers each resolved assignment instruction (x :=v) in thevariable map (through function update ρ[x 7→v]) and marksvariables from pending assignments (i.e., x :=e, x := load(e),and x := protect(r)) as undefined (ρ[x 7→ ⊥]), making theirrespective values unavailable to following instructions.Execute Rule and Auxiliary Relation. Step rules for thereduction relation are shown in Figure 8. Rule [Execute] exe-cutes the n-th instruction in the reorder buffer, following thedirective execn. For this, the rule splits the reorder buffer intoprefix is1, n-th instrucion i and suffix is2. Next, it computesthe transient variable map ϕ (is1,ρ) and executes a transitionstep under the newmap using an auxiliary relation⇝. Notice

Page 7: AUTOMATICALLYELIMINATINGSPECULATIVELEAKSWITH BLADEgoto.ucsd.edu/~gleissen/papers/BLADE.pdf · vulnerable WebAssembly implementations of cryptographic primitives. BLADE can fix existing

Automatically Eliminating Speculative Leaks with BLADE Conference’17, July 2017, Washington, DC, USA

that [Execute] does not update the store or the variable map(the transient map is simply discarded). These changes areperformed later in the retire stage.

The rules for the auxiliary relation are shown in Fig. 8. Therelation transforms a tuple ⟨is1,i,is2,cs⟩ consisting of prefix,suffix and current instruction i into a tuple ⟨is′,cs′⟩ specifyingthe reorder buffer and command stack obtained by executingi. For example, rule [Exec-Asgn] evaluates the right-handside of the assignment x :=ewhere JeKρ denotes the value ofe under ρ. The premise v=JeKρ ensures that the expression isdefined i.e., it does not evaluate to⊥. Then, the rule substitutesthe computed value into the assignment (x :=v), and reinsertsthe instruction back into its original position in the reorderbuffer.

GuardsandRollback.Rules [Exec-Branch-Ok] and [Exec-Branch-Mispredict] resolve guard instructions. In rule[Exec-Branch-Ok], the predicted and computed value ofthe guard expression match, and the processor only has to re-place the guardwith anop. In contrast, in rule [Exec-Branch-Mispredict] the predicted and computed value differ (JeKρ ,b). This causes the processor to revert the program state andissue a rollback observation. For the rollback, the processordiscards the instructions past the guard (i.e., is2) and substi-tutes the current commands stack cswith the rollback stackcs′ which causes execution to revert to the alternative branch.

Loads. Rule [Exec-Load] executes a memory load. The rulecomputes the address (n = JeKρ ), retrieves the value at thataddress from memory (µ(n)) and rewrites the load into anassignment (x :=µ(n)). Inserting the assignment into the re-order buffer allows transiently forwarding the loaded valueto later instructions. The premise store( , ) < is1 preventsthe processor from reading stale data from memory: if theload aliases with a preceding (but pending) store, ignoringthe store would produce a stale read. To record that the loadis issues speculatively, the observation read(n,ps) stores listps containing the identifiers of the guards still pending in thereorder buffer. Function LisM simply extracts the identifiers ofthe guard instructions in the buffer is.

3.3 Retire Stage

The retire stage removes completed instructions from the re-order buffer and propagates their changes to variablemap andmemory store. While instructions are executed out-of-order,they are retired in-order to preserve the illusion of sequentialexecution to the user. Figure 9 presents the rules for the retirestage. Rule [Retire-Nop] removes nop. Rules [Retire-Asgn]and [Retire-Store] remove the resolved assignment x :=vand instruction store(n,v) from the reorder buffer and updatethe variable map (ρ[x 7→v]) and the memory store (µ[n 7→v])respectively. Rule [Retire-Fail] aborts the program by emp-tying reorder buffer and command stack and generates a failobservation, simulating a processor raising an exception (e.g.,a page fault).

Retire-Nop⟨nop:is,cs,µ,ρ⟩

retire−−−−→ϵ ⟨is,cs,µ,ρ⟩

Retire-Asgn⟨x :=v :is,cs,µ,ρ⟩

retire−−−−→ϵ ⟨is,cs,µ,ρ[x 7→v]⟩

Retire-Storei=store(n,v)

⟨i :is,cs,µ,ρ⟩retire−−−−→ϵ ⟨is,cs,µ[n 7→v],ρ⟩

Retire-Fail⟨fail:is,cs,µ,ρ⟩

retire−−−−→fail ⟨[ ],[ ],µ,ρ⟩

Figure 9. Retire stage.

We demonstrate how the attacker can leak a secret fromprogram Ex1 (Fig. 3) in ourmodel. First, the attacker instructsthe processor to fetch all the instructions, suppling predictiontrue for all bounds-check conditions. Figure 10 shows theresulting buffer and how it evolves after each attacker direc-tive, which instruct the processor to speculatively executethe load instructions and the assignment (but not the guardinstructions). Memory µ and variablemap ρ are shown on theright. Directive exec4 transiently reads array a past its bound,at index 2, reading into the memory (µ(3)=42) of secret arrays[0] and generates the corresponding observation. Finally, theprocessor forwards the values of x and y to compute theirsum in the fifth instruction, (z :=42), which is then used asan index in the last instruction and leaked to the attacker viaobservation read(42,[1,2,3]).

3.4 Security Primitives

Next, we turn to the rules describing our security primitives.Protect. Instruction x := protect(r) assigns the value of r ,only after all previous guard instructions have been exe-cuted, i.e., when the value has become stable and no morerollbacks are possible. Figure 11 formalizes this intuition. Rule[Fetch-Protect-Expr] fetches protect commands involv-ing simple expressions (x :=protect(e)) and inserts the cor-responding protect instruction in the reorder buffer. Rule[Fetch-Protect-Array] piggy-backs on the previous ruleby splitting a protect of an array read (x := protect(e1[e2]))into a separate assignment of the array value (x := e1[e2])and protect of the variable (x := protect(x)). Rules [Exec-Protect1] and [Exec-Protect2] extend auxiliary relation⇝. Rule [Exec-Protect1] evaluates the expression (v=JeKρ )and reinserts the instruction in the reorder buffer as if it werea normal assignment.However, the processor leaves the valuewrapped inside the protect instruction in the reorder buffer,i.e., x := protect(v), to prevent forwarding the value to the

Page 8: AUTOMATICALLYELIMINATINGSPECULATIVELEAKSWITH BLADEgoto.ucsd.edu/~gleissen/papers/BLADE.pdf · vulnerable WebAssembly implementations of cryptographic primitives. BLADE can fix existing

Conference’17, July 2017, Washington, DC, USA

Reorder Buffer exec 2 exec 4 exec 5 exec 71 guard((i1< length(a))true,[fail],1)2 x :=load(base(a)+i1) x :=µ(2)3 guard((i2< length(a))true,[fail],2)4 y :=load(base(a)+i2) y :=µ(3)5 z :=x+y z :=426 guard((z< length(b))true,[fail],3)7 w :=load(base(b)+z) w :=µ(42)

Observations: read(2,[1]) read(3,[1,2]) ϵ read(42,[1,2,3])

Memory Layout

µ(0)=0 b[0]µ(1)=0 a[0]µ(2)=0 a[1]µ(3)=42 s[0]··· ···

VariableMap

ρ(i1)=1ρ(i1)=2

Figure 10. Leaking execution of running example Ex1.

Fetch-Protect-Arrayc= (x :=protect(e1[e2]))

c1= (x :=e1[e2]) c2= (x :=protect(x))

⟨is,c :cs,µ,ρ⟩fetch−−−−→ϵ ⟨is,c1 :c2 :cs,µ,ρ⟩

Fetch-Protect-Exprc= (x :=protect(e)) i= (x :=protect(e))

⟨is,c :cs,µ,ρ⟩fetch−−−−→ϵ ⟨is++[ i ],cs,µ,ρ⟩

Exec-Protect1i= (x :=protect(e)) v=JeKρ i′= (x :=protect(v))

⟨is1,i,is2,cs⟩(µ,ρ,ϵ )

⟨is1++[ i′ ]++is2,cs⟩

Exec-Protect2i= (x :=protect(v)) guard( , , ) < is1 i′= (x :=v)

⟨is1,i,is2,cs⟩(µ,ρ,ϵ )

⟨is1++[ i′ ]++is2,cs⟩

Figure 11. Semantics of protect(·) (selected rules).

later instructions via the the transient variable map.When noguards are pending in the reorder buffer (guard( , , ) < is1),rule [Exec-Protect2] transforms the instruction into a nor-mal assignment, so that the processor can propagate andcommit its value.Example. Consider again Ex1 and the execution shown inFigure 10. In the repaired program, x + y is wrapped in aprotect statement. As a result, directive exec5 produces valuez :=protect(42), instead of z :=42which prevents instruction7 from executing (as its target address is undefined), until allguards are resolved. This in turn prevents the leaking of thetransient value.Stable Read. Unfortunately, current processors do not pro-vide themeans to implement protect in its full generality. Oursemantics therefore contains a primitve stable_read(e1,e2)that implements a restricted version of protect(e1[e2]) forarray reads. While protect(·) prevents forwarding loaded val-ues until all pending branches are resolved, stable_read(·)

stallsmemory loads until individual bounds-check conditionshave been resolved. stable_read(·) can be implemented us-ing today’s hardware, for example through speculative LoadHardening (SLH) [10], the spectremitigation proposed by anddeployed in the Clang compiler.We provide formal semanticsin Appendix B.Example.Consider again Ex1. Instead of using protect(·), wecan repair the example by inserting stable_read. Instead ofa single protect(·) for expression x+y, we however need toinsert two stable_read for a[i1] and a[i2], respectively.

4 Type System and Inference

In Section 4.1, we present a transient-flow type systemwhichstatically rejects programs that can potentially leak throughtransient execution attacks. Given an unannotated program,weapplyconstraint-based type inference [3, 27] togenerate itsuse-def graph and reconstruct type information (Section 4.2).Then, reusing off-the-shelf Max-Flow/Min-Cut algorithms,we analyze the graph and locate potential speculative vulner-abilities in the form of a variable min-cut set. Finally, usinga simple program repair algorithmwe patch the program byinserting a minimum number of protect so that it does notleak speculatively anymore (Figure 13).

4.1 Type System

Our transient-flow type system prevents programs from leak-ing transient values via cache timing channels. To this end, thetype system assigns a transient-flow type to expressions andtracks how transient values propagate within programs, re-jecting programs in which transient values reach commandswhich may leak them. An expression can either be typed asstable (S) indicating that it cannot contain transient valuesduring execution, or as transient (T) indicating that it can.These types form a 2-point lattice [23], which allows stableexpressions to be typed as transient, but not vice versa, i.e.,we define a can-flow-to relation ⊑ such that S ⊑ T, but T ̸⊑ S.Typing Expressions. Given a typing environment for vari-ables Γ ∈ Var→{S,T}, the typing judgement Γ ⊢ r :τ assigns atransient-flowtypeτ to r . Figure12presents selectedrules (seeAppendix C for the rest). The shaded part of the rules gener-ates type constraints during type inference and are explained

Page 9: AUTOMATICALLYELIMINATINGSPECULATIVELEAKSWITH BLADEgoto.ucsd.edu/~gleissen/papers/BLADE.pdf · vulnerable WebAssembly implementations of cryptographic primitives. BLADE can fix existing

Automatically Eliminating Speculative Leaks with BLADE Conference’17, July 2017, Washington, DC, USA

ValueΓ ⊢v :τ ⇒∅

VarΓ(x)=τ

Γ ⊢x :τ ⇒ x ⊑αx

BopΓ ⊢e1 :τ1 ⇒ k1 Γ ⊢e2 :τ2 ⇒ k2 τ1 ⊑ τ τ2 ⊑ τ

Γ ⊢e1 ⊕ e2 :τ ⇒k1∪k2∪(e1⊑e1⊕e2)∪(e2⊑e1⊕e2)

Array-ReadΓ ⊢e1 :S ⇒k1 Γ ⊢e2 :S ⇒k2

Γ ⊢e1[e2]:T ⇒k1∪k2∪(e1⊑S)∪(e2⊑S)∪(T⊑e1[e2])

(a) Typing Rules for Expressions and Arrays.

AsgnΓ ⊢ r :τ ⇒k τ ⊑ Γ(x)

Γ,Prot⊢x :=r ⇒k∪(r ⊑x)

ProtectΓ ⊢ r :τ ⇒k

Γ,Prot⊢x :=protect(r) ⇒k

Asgn-ProtΓ ⊢ r :τ ⇒k x ∈ Prot

Γ,Prot⊢x :=r ⇒k∪(r ⊑x)

Stable-ReadΓ ⊢e1 :S ⇒k Γ ⊢e2 :S

Γ,Prot⊢x :=stable_read(e1,e2) ⇒k∪(e1⊑S)∪(e2⊑S)

If-Then-ElseΓ ⊢e :S ⇒k Γ,Prot⊢c1 ⇒k1 Γ,Prot⊢c2 ⇒k2

Γ,Prot⊢ if e thenc1 elsec2 ⇒k∪k1∪k2∪(e ⊑S)

(b) Typing Rules fo Commands.

Figure 12. Transient flow type system andtype constraints generation (selected rules).

later. Values can assume any type. Variables are assigned theirrespective type from the environment. Rule [Bop] propagatesthe type of the operands to the result of binary operators⊕ ∈ {+,< }. Finally, rule [Array-Read] assigns the transienttype to array reads as the arraymaypotentially be indexedoutof bounds during speculation. Importantly, the rule requiresthe array index to be stable to prevent programs from leakingthrough the cache.Typing Commands.Given a set of protected variables Prot,we define a typing judgment Γ,Prot ⊢ c for commands. In-tuitively, a command c is well-typed under environment Γand set Prot, if c does not leak, under the assumption thatthe expressions assigned to all variables in Prot are protectedusing the protect(·) primitive. Figure 12b shows our typingrules. Rule [Asgn] disallows assignments from transient to

stable variables (as T ̸⊑ S). Rule [Protect] relaxes thispolicy as long as the right-hand side is explicitly protected.Intuitively, the result of protect(·) is stable and it can thusflow securely to variables of any type. Rule [Asgn-Prot] issimilar, but instead of requiring an explicit protect(·) state-ment, it demands that the variable is accounted for in theprotected set Prot. This is secure because all assignments tovariables in Prot will eventually be protected through therepair function discussed later in this section. Since prim-itive x := stable_read(e1,e2) corresponds to the array reade1[e2], rule [Stable-Read] requires the array and the indexargument to be stable like in rule [Array-Read]. Similar toprotect(·), the result of stable_read(·) is stable and thus thetype of the variable needs no constraints.Implicit Flows. To prevent programs from leaking data im-plicitly through their control flow, rule [If-Then-Else] re-quires the branch condition to be stable. This might seemoverly restrictive, at first: why can’t we accept a program thatbranches on transient data, as long as it does not performany attacker-observable operations (e.g., memory reads andwrites) along the branches? Indeed, classic information-flowcontrol (IFC) type systems (e.g., [36]) take this approach bykeeping track of an explicit program counter label. Unfor-tunately, such permissiveness is unsound under speculation.Even if a branch does not contain observable behavior, thevalue of the branch condition can be leaked by the instruc-tions that follow a mispredicted branch. In particular, therollback caused by a misprediction may cause to repeat loadand store instructions after the mispredicted branch, thus re-vealing whether the attacker guessed the value of the branchcondition.Example. Consider the following program: if tr then x :=0 else skip;y := a[0]. The program can leak the value of trduring speculative execution. To see that, assume that theprocessor predicts that tr will evaluate to true. Then, the pro-cessor speculatively executes the then-branch (x:=0) and theload instruction (y :=a[0]), before resolving the condition. Iftr is true, the memory trace of the program contains a sin-gle read observation. However, if tr is false, the processordetects a misprediction, restarts the execution from the otherbranch (skip) and executes the array read, producing a roll-back and two read observations. From these observations, anattacker could potentially make inferences about the value oftr . Consequently, if tr is typed as T, our type system rejectsthe program as unsafe.

4.2 Type Inference

We now present our type inference algorithm.Constraints. We start by collecting a set of constraints kvia typing judgement Γ,Prot ⊢ s ⇒ k . For this, we define adummy environment Γ∗ and protected set Prot∗, such thatΓ∗,Prot∗ ⊢c ⇒k holds foranycommandc , (i.e.,weletΓ∗=λx .Sand include all variables in the cut-set) and use it to extract

Page 10: AUTOMATICALLYELIMINATINGSPECULATIVELEAKSWITH BLADEgoto.ucsd.edu/~gleissen/papers/BLADE.pdf · vulnerable WebAssembly implementations of cryptographic primitives. BLADE can fix existing

Conference’17, July 2017, Washington, DC, USA

the set of constraints k . The syntax for constraints is shownin Figure 21. The constraints relate atomswhich represent theunknown type of variables, i.e., αx for x , and expression, i.e.,r . Constraints record can-flow-to relationships between theatoms and lattice valuesT and S. They are accumulated via op-erator∪, wherewe identifyk1∪···∪kn with the set {k1,...,kn}.Solutions and Satisfiability.Wedefine the solution to a setof constraints as a function σ from atoms to flow types, i.e.,σ ∈ Atoms 7→ {T,S}, and extend solutions to map T and S

to themselves. For a set of constraints k and a solution func-tionσ , wewriteσ ⊢k to say that the constraintsk are satisfiedunder solution σ . A solution σ satisfies k , if all can-flow-toconstraints hold, when the atoms are replaced by their valuesunder σ . We say that a set of constraints k is satisfiable, ifthere is a solution σ such that σ ⊢k .Def-Use Graph & Paths. The constraints generated by ourtype system give rise to the def-use graph of the type-checkedprogram. For a set of constraints k , we call a sequence ofatoms a1...an a path ink , if ai ⊑ai+1 ∈k for i ∈ {1,...,n−1} andsay that a1 is the path’s entry and an its exit. A T-S path is apathwith entryT andexitS. A set of constraintsk is satisfiableif and only if there is no T-S path in k , as such a path wouldcorrespond to a derivation of false. If k is satisfiable, we cancompute a solution σ (k) by letting σ (k)(a) =T, if there is apath with entry T and exit a, and S otherwise.Cuts. If a set of constraints is unsatisfiable, we can make itsatisfiable by removing someof thenodes in its graphor equiv-alently protecting some of the variables. A set of atomsA cutsa path a1...an , if some a ∈A occurs along the path, i.e., thereexistsa ∈A and i ∈ {1,...,n} such thatai =a.We callA a cut-setfor a set of constraints k , ifA cuts all T-S paths in k . A cut-setA is minimal for k , if all other cut-setsA′ contain as many ormore atoms thanA, i.e., #A⩽#A′.Extracting Types FromCuts. From a set of variablesA suchthat A is a cut-set of constraints k , we can extract a typingenvironment Γ(k,A) as follows: for an atom αx , we defineΓ(k,A)(x)=T, if there is a path with entry T and exit αx in kthat is not cut byA, and let Γ(k,A)(x)=S otherwise.

Proposition 1 (Type Inference). If Γ∗,Prot∗ ⊢c ⇒k andA isa set of variables that cut k , then Γ(k,A),A⊢s .

Remark. To infer a repair using stable_read instead ofprotect, we can restrict our cut-set to only include variablesthat are assigned from an array read.Example. Consider again Ex1 in Figure 3. The graph definedby the constraints k , given by Γ∗,Prot∗ ⊢Ex1 ⇒k is shown inFigure 4, wherewe have omittedα-nodes. The constraints arenot satisfiable, since there are T-S paths. Both {x ,y} and {z}are cut-sets, since they cut eachT-S path, however, the set {z}contains only one element and is therefore minimal. The typ-ing environment Γ(k,{x ,y}) extracted from the sub-obptimalcut {x ,y} types all variables as S, while the typing extractedfrom the optimal cut, i.e., Γ(k,{z}) types x andy as T and z, i1

Atom a F αx | rConstraint k F a⊑S | T⊑a | a⊑a | k∪k | ∅

Solution σ ∈ Atoms 7→{S,T}

Figure 13. Constraint Syntax.

and i2 as S. By Proposition 2 both Γ(k,{x ,y}),{x ,y} ⊢Ex1 andΓ(k,{z}),{z} ⊢Ex1 hold.

4.3 ProgramRepair

As a final step, our repair algorithm repair (c,Prot) traversesprogram c and inserts a protect(·) statement for each variablein the cut-setProt. Sinceweassume that programsare in staticsingle assignment form, there is a single assignment x := rfor each variable x ∈ Prot, and our repair algorithm simplyreplaces it with x :=protect(r).

5 Consistency and Security

We now present two formal results about our speculativesemantics and the security of the type system. Our full def-initions and proofs can be found in Appendix D.Consistency.Wewrite C ⇓DO C ′ for the complete speculativeexecution of configuration C to final configuration C ′, whichgenerates a trace of observations O under list of directivesD. Similarly, we write ⟨µ,ρ⟩ ⇓cO ⟨µ ′,ρ ′⟩ for the sequentialexecution of program c with initial memory µ and variablemap ρ resulting in final memory µ ′ and variable map ρ ′. Torelate speculative and sequential observations, we define aprojection function, written O↓, which removes predictionidentifiers, rollbacks, and mispeculated loads and stores.

Theorem 5.1 (Consistency). For all programs c, initial mem-ory stores µ, variable maps ρ, and directives D, such that⟨µ,ρ⟩ ⇓cO ⟨µ ′,ρ ′⟩ and ⟨[ ], [c ], µ,ρ⟩ ⇓DO′ ⟨[ ], [ ], µ ′′,ρ ′′⟩, thenµ ′=µ ′′, ρ ′=ρ ′′, andO�O′↓.

The theorem ensures equivalence of the final memorystores, variablemaps, and observation traces from the sequen-tial and the speculative execution. Notice that trace equiva-lence is up to permutation, i.e., O�O′↓, because the processorcan execute load and store instructions out-of-order.SpeculativeNon-Interference.Speculativenon-interferenceis parametric in the security policy that specifies whichvariables and part of the memory are controlled by the at-tacker [17]. In the following, we write L for the set of publicvariables and memory locations that are observable by theattacker. Two variable maps are indistinguishable to the at-tacker, written ρ1 ≈L ρ2, if and only if ρ1(x) = ρ2(x) for allx ∈ L. Similarly, memory stores are related pointwise, i.e.,µ1≈L µ2 iff µ1(n)=µ2(n) for all n ∈ L.

Definition 1 (Speculative Non-Interference). A program csatisfies speculative non-interference if and only if for all direc-tives D, memory stores and variable maps such that µ1 ≈L µ2

Page 11: AUTOMATICALLYELIMINATINGSPECULATIVELEAKSWITH BLADEgoto.ucsd.edu/~gleissen/papers/BLADE.pdf · vulnerable WebAssembly implementations of cryptographic primitives. BLADE can fix existing

Automatically Eliminating Speculative Leaks with BLADE Conference’17, July 2017, Washington, DC, USA

and ρ1 ≈L ρ2, let Ci = ⟨[ ],[c ],µi,ρi⟩ for i ∈ {1,2}, such thatC1 ⇓

DO1

C ′1, C2 ⇓

DO2

C ′2, ifO1↓=O2↓, thenO1=O2.

In the defintion above, programs leak by producing differ-ent observations starting from memories and variables in-distinguishable to the attacker. Speculative non-interferencerequires showing absence of leaks for the speculative traces(O1 =O2) assuming that the program does not already leaksequentially (O1↓= O2↓). Notice that here we consider syn-tactic equivalence for the traces because both executions fol-low the same list of directives. We now present our sound-ness theorem: well-typed programs satisfy speculative non-interference.

Theorem 5.2 (Soundness). For all programs c, if Γ ⊢c then csatisfies speculative non-interference.

We conclude with a corollary that combines all the compo-nents of our protection chain (type inference, type checkingand automatic repair via our security primitives) and showsthat repaired programs satisfy speculative non-interference.

Corollary 5.3. For all programs c, there exists a set of con-straints k such that Γ∗,Prot∗ ⊢c⇒k. Let A be a set of variablesthat cut k, then the repaired program repair (c,A) satisfies spec-ulative non-interference.

6 Implementation and Evaluation

Wenowdescribe our implementation and evaluateBLADE onan implementation of the Signal secure messaging protocoland various cryptographic algorithms. Our evaluation showsthatBLADEcansecureexistingsoftwaresystemsagainst spec-ulative execution attacks automatically. Moreover, BLADEintroduces twoorders ofmagnitude less fences thanabaselinealgorithm implemented in Clang. As a result, the repairs com-puted byBLADE incur only aminimal performance overhead.

6.1 Implementation

WeimplementedBLADE in 3500 lines ofHaskell code.BLADEtakes as input aWebAssembly program, computes a repairedprogram that is safe under speculative execution and verifiesits correctness via type-checking. Internally,BLADEproceedsin three stages. First, BLADE converts theWebAssembly pro-gram into an intermediate representation similar to theWhilelanguage in Figure 5. This simplifies further processing asWe-bAssembly is a stack-based language, i.e., arguments are notrepresented directly, but instead kept on an argument stack.Second, BLADE builds the use-def graph (§4.1) of the inputprogram, infers a minimal cut-set (§4.2), and computes therepair (§4.3). Finally, in the last stage,BLADEextracts a typing-environment from the use-def graph and type-checks the re-paired program (§4). This independent checking step providesextra confidence that the repaired program indeed does notleak more speculatively, than it does sequentially (§5). Sourcecode will be made available under an open source license.

6.2 Evaluation

We evaluate BLADE by answering three questions: (Q1) CanweapplyBLADE to secure existing software? (Q2)Howmanyprotect statements does BLADE have to insert in order tosecure those systems? and (Q3) How do the inserted fencesaffect performance?(Q1) Applicability. To evaluate BLADE’s applicability, werun it on crypto code, which is already carefully written to es-chew cache-timing side channels. Our benchmarks are takenfrom twomain sources: first, a verified implementation [29]of the Signal messaging protocol [15], and second, verifiedimplementations of several crypto primitives taken from [38].In particular, our benchmarks consist of▷ Themessaging algorithm implemented inmodule SignalCoreand common cryptographic constructions implemented inmodule Signal Crypto and used in Signal.▷ TheHACL*SHA2hash,AESblockcypher,Curve25519elypticcurve function, and ED25519 digital signature used in Signal.▷ The SALSA20 stream cypher, SHA2 hash, and TEA blockcypher from [38].The original implementations of our benchmarks are prov-ably free from cache and timing side-channel. However, thoseproofs considered only a sequential execution model andtherefore do not account for the speculative execution vul-nerabilities addressed in this work.Results. Table 1 shows the code size inWebssembly text for-mat, and the runtime of BLADE on each benchmark. Theruntime includes translation, repair and type-checking. Theresults are encouraging: the execution time scales propor-tionally with the code size and the analysis completes fairlyquickly, even for large benchmarks (>60kWASM LOC): theruntime is less than 10s for all of our benchmarks.(Q2) Number of Fences.Next, we evaluate howmany fencesthe analysis has to insert to make the programs secure.The results are shown in Table 1. Column B contains ourbaseline, which replaces all non-constant array reads, i.e.,reads whose address depends on a variable, with statementstable_read (Section 3.4), implementing a SLH-like mit-igation that masks the address with the array bounds-checkcondition. This is the proposed mitigation in the Clang com-piler [10]. ColumnP shows the number ofprotect insertedby BLADE. All benchmarks are modified by the baseline, ex-cept for TEA, which is a simple, toy encryption algorithm(that should not be used in practice). In particular, for five ofthe nine programs, BLADE does not need to insert any fences.ColumnP/B shows the ratio of protect statements to baselinereadmasks in percent. Formost benchmarks, our analysis hasto insert under 3% of fences compared to the baseline. For theSHA2 implementation of HACL* this rises to 11.5%. Across allbenchmarks, the number of fences is two orders ofmagnitudelower than the baseline. Since protect statements are anidealized primitive that are not available in todays hardware,

Page 12: AUTOMATICALLYELIMINATINGSPECULATIVELEAKSWITH BLADEgoto.ucsd.edu/~gleissen/papers/BLADE.pdf · vulnerable WebAssembly implementations of cryptographic primitives. BLADE can fix existing

Conference’17, July 2017, Washington, DC, USA

Name B P S P/B LOC Time

CRYPTO [29] 92 1 2 1.1 3386 181.0 msCORE [29] 47 1 2 2.1 6595 347.8 msSHA2 [29] 156 18 34 11.5 7310 286.7 msAES[29] 48 0 0 0 6284 28.95 msCURVE [29] 2214 0 0 0 59921 5.571 sED25519 [29] 2403 6 10 0.2 60308 8.797 sSALSA 20 [38] 7 0 0 0 529 20.20 msSHA 256 [38] 23 0 0 0 334 11.23 msTEA [38] 0 0 0 - 112 3.036 ms

Total 4990 26 48 0.5 144779 -

Table 1. (B) contains our baseline, i.e., the number ofstable_read, if every non-constant read is protected;(P) contains the number of protect statements insert byBLADE; (S) contains the number ofstable_read inserted,if stable_read is used to implement protect; (P/B)contains the ratio of protect statments to the baselinefences in %; (LOC) contains the number of lines of WASMcode in text format; (Time) shows the mean timing forfence inference, repair, and typechecking over 15 trials;Experiments were run on a 12” Macbook with 8GB RAM.

we show the number of stable-read primitives that areneeded to implement the protect in column S. The tableshows that using stable reads requires inserting more fencesby a factor of 1.8x,whichunderlines the benefits of ahardwareimplementation of protect.(Q3) Performance Impact of Fences. To evaluate the per-formance impact of our repair, we compared how a naiveplacement of fences—applying speculative load hardening toevery load of a non-constant address—compares against ourapproach.We picked the SHA2-512 hash function for this test,andused inputsof size 4KB.Naive fenceplacement introduced44 fences while ours introduced only 5. Our measurementsshowed that while the naive repair algorithm caused 13.9%overhead, the overhead of our minimal fence replacementalgorithmwas only 0.42%. We used a sample size of 500, andfound the relative margin of error of our measurements wereless than 0.07%.

7 RelatedWork

Transient Execution Attacks. Since Spectre [21] andMelt-down [24]were announced,many transient execution attacksexploiting different microarchitectural components and side-channels have been discovered and new ones come to light ata steadypace. These attacks leakdata across arbitrary securityboundaries, including SGX enclaves [14, 35], hypervisors andvirtual machines [40], and even remotely over a network [31].We refer to [9] for a comprehensive systematization.Detection and Repair. Wu and Wang [41] detect cacheside channels via abstract interpretation by augmenting thecontrol-flow to accomodate for speculation. Spectector [17]and Pitchfork [11] use symbolic execution on x86 binaries

to detect speculative vulnerabilities. Cheang et al. [13] andBloem et al. [8] apply bounded model checking to detect po-tential speculative vulnerabilities respectively via 4-ways self-composition and taint-tracking. Almost all these efforts [8, 11,13, 17, 41] consider only in-order execution (except Pitchfork[11]) for a fixed speculation bound, and focus on vulnerabilitydetection but do not propose techniques to repair vulnerableprograms. In contrast, our type system enforces speculativenon-interference even when program instructions are exe-cuted out-of-order with unbounded speculation and automati-cally synthesizes repairs.Givena set ofuntrusted input source,oo7Wang et al. [37] statically analyzes a binary to detect vul-nerable patterns and inserts fences in turn. Our tool, BLADE,notonlyrepairsvulnerableprogramswithoutuserannotation,but ensures that program patches contain a minimum num-ber of fences. Furthermore, BLADE formally guarantees thatrepaired programs are free from speculation-based attacks.

Speculative Execution Semantics.There have been severalrecent proposals for speculative execution semantics [11, 13,17, 26]. Of those, [11] is closest to ours, and inspired our se-mantics (e.g.,we share the3-stagespipeline, attacker-supplieddirectives and the instruction reorder buffer). However theirsemantics targets an assembly language with direct jumps,while we reason about speculative execution of imperativeprograms with structured control-flow.

Hardware Mitigations and Secure Design. Both AMDAMD [5] and Intel Intel [19] recommend inserting serializ-ing, fence instructions after bounds checks to protect againstSpectre v1 attacks and some compilers followed suit [18, 28].Unfortunately, these defenses causes significant performancedegradation [9]. Taram et al. [32] propose context-sensitivefencing, ahardware-basedmitigation thatdynamically insertsfences in the instruction streamwhen dangerous conditionsarise. Several secure hardware designs have been studied toremove speculative attacks from future processors. InvisiS-pec Yan et al. [42] is a new micro-architecture design thatfeatures a special speculative buffer to prevent speculativeloads from polluting the cache. STT [2] tracks speculativetaints inside the processor micro-architecture and preventspeculative values fromreaching instructions that could serveas covert channels.We think our approach could be applied toguide such hardware mitigations by pinpointing the programparts that need to be protected.

References

[1] Flows in Networks. Princeton University Press, 1962.[2] Speculative taint tracking (stt): A comprehensive protection for

speculatively accessed data. InMICRO, 2019.[3] Alex Aiken. Constraint-based program analysis. In Radhia Cousot

and David A. Schmidt, editors, Static Analysis, pages 1–1, Berlin,Heidelberg, 1996. Springer Berlin Heidelberg. ISBN 978-3-540-70674-8.

[4] José Bacelar Almeida, Manuel Barbosa, Gilles Barthe, François Dupres-soir, and Michael Emmi. Verifying constant-time implementations.In Usenix Security, 2016.

Page 13: AUTOMATICALLYELIMINATINGSPECULATIVELEAKSWITH BLADEgoto.ucsd.edu/~gleissen/papers/BLADE.pdf · vulnerable WebAssembly implementations of cryptographic primitives. BLADE can fix existing

Automatically Eliminating Speculative Leaks with BLADE Conference’17, July 2017, Washington, DC, USA

[5] AMD. Software techniques for managing speculation on AMDprocessors. https://developer.amd.com/wp-content/resources/

Managing-Speculation-on-AMD-Processors.pdf, 2018.[6] GILLES BARTHE, SANDRINE BLAZY, BENJAMIN GRÉGOIRE, RÉMI

HUTIN, VINCENT LAPORTE, DAVID PICHARDIE, and ALIX TRIEU.Formal verification of a constant-time preserving c compiler. In POPL,2020.

[7] Atri Bhattacharyya, Alexandra Sandulescu, Matthias Neugschwandt-ner, Alessandro Sorniotti, Babak Falsafi, Mathias Payer, and AnilKurmus. Smotherspectre: Exploiting speculative execution throughport contention. In Proceedings of the 2019 ACM SIGSAC Conferenceon Computer and Communications Security, CCS ’19, pages 785–800,New York, NY, USA, 2019. ACM. ISBN 978-1-4503-6747-9. doi: 10.1145/3319535.3363194. URL http://doi.acm.org/10.1145/3319535.3363194.

[8] Roderick Bloem, Swen Jacobs, and Yakir Vizel. Efficient information-flow verification under speculative execution. In Yu-Fang Chen,Chih-Hong Cheng, and Javier Esparza, editors,Automated Technologyfor Verification and Analysis, pages 499–514, Cham, 2019. SpringerInternational Publishing. ISBN 978-3-030-31784-3.

[9] Claudio Canella, Jo Van Bulck,Michael Schwarz,Moritz Lipp, BenjaminVon Berg, Philipp Ortner, Frank Piessens, Dmitry Evtyushkin, andDaniel Gruss. A systematic evaluation of transient executionattacks and defenses. In Proceedings of the 28th USENIX Conferenceon Security Symposium, SEC’19, pages 249–266, Berkeley, CA,USA, 2019. USENIX Association. ISBN 978-1-939133-06-9. URLhttp://dl.acm.org/citation.cfm?id=3361338.3361356.

[10] Chandler Carruth. Speculative load hardening. https:

//llvm.org/docs/SpeculativeLoadHardening.html, 2019.[11] Sunjay Cauligi, Craig Disselkoen, Klaus von Gleissenthall, Deian

Stefan, Tamara Rezk, and Gilles Barthe. Towards constant-timefoundations for the new spectre era. CoRR, abs/1910.01755, 2019. URLhttp://arxiv.org/abs/1910.01755.

[12] Sunjay Cauligi, Gary Soeller, Brian Johannesmeyer, Fraser Brown,Riad S. Wahby, John Renner, Benjamin Gregoire, Gilles Barthe, RanjitJhala, and Deian Stefan. FaCT: A dsl for timing-sensitive computation.In Programming Language Design and Implementation (PLDI). ACMSIGPLAN, June 2019.

[13] Kevin Cheang, Cameron Rasmussen, Sanjit A. Seshia, and PramodSubramanyan. A formal approach to secure speculation. In Proceedingsof the Computer Security Foundations Symposium (CSF), 2019.

[14] Guoxing Chen, Sanchuan Chen, Yuan Xiao, Yinqian Zhang, ZhiqiangLin, and Ten-Hwang Lai. Sgxpectre attacks: Leaking enclavesecrets via speculative execution. CoRR, abs/1802.09085, 2018. URLhttp://arxiv.org/abs/1802.09085.

[15] Katriel Cohn-Gordon, Cas Cremers, Benjamin Dowling, Luke Garratt,and Douglas Stebila. A formal security analysis of the signal messagingprotocol. In EuroS&P, 2017.

[16] Qian Ge, Yuval Yarom, David Cock, and Gernot Heiser. A survey of mi-croarchitectural timing attacks and countermeasures on contemporaryhardware. In Journal of Cryptographic Engineering, 2018.

[17] Marco Guarnieri, Boris Koepf, José FranciscoMorales, Jan Reineke, andAndrés Sánchez. Spectector: Principled detection of speculative infor-mation flows. In Proc. IEEE Symp. on Security and Privacy, SSP ’20, 2020.

[18] Intel. Using intel compilers tomitigate speculative execution side-channel issues. https://software.

intel.com/en-us/articles/using-intel-compilers-to-mitigate-speculative-execution-side-channel-issues,2018.

[19] Intel. Intel analysis of speculative execution side channels.https://newsroom.intel.com/wp-content/uploads/sites/11/2018/01/

Intel-Analysis-of-Speculative-Execution-Side-Channels.pdf, 2018.[20] Vladimir Kiriansky and Carl Waldspurger. Speculative buffer

overflows: Attacks and defenses. CoRR, abs/1807.03757, 2018. URLhttp://arxiv.org/abs/1807.03757.

[21] Paul Kocher, Jann Horn, Anders Fogh, , Daniel Genkin, DanielGruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard,Thomas Prescher, Michael Schwarz, and Yuval Yarom. Spectre attacks:Exploiting speculative execution. In 40th IEEE Symposium on Securityand Privacy (S&P’19), 2019.

[22] Esmaeil Mohammadian Koruyeh, Khaled N. Khasawneh, ChengyuSong, and Nael Abu-Ghazaleh. Spectre returns! speculationattacks using the return stack buffer. In Proceedings of the12th USENIX Conference on Offensive Technologies, WOOT’18,pages 3–3, Berkeley, CA, USA, 2018. USENIX Association. URLhttp://dl.acm.org/citation.cfm?id=3307423.3307426.

[23] J. Landauer. A lattice of information. In CSFW, 1993.[24] Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner

Haas, Anders Fogh, Jann Horn, StefanMangard, Paul Kocher, DanielGenkin, Yuval Yarom, and Mike Hamburg. Meltdown: Reading kernelmemory from user space. In 27th USENIX Security Symposium (USENIXSecurity 18), 2018.

[25] Giorgi Maisuradze and Christian Rossow. Ret2spec: Speculativeexecution using return stack buffers. In Proceedings of the 2018ACM SIGSAC Conference on Computer and Communications Secu-rity, CCS ’18, pages 2109–2122, New York, NY, USA, 2018. ACM.ISBN 978-1-4503-5693-0. doi: 10.1145/3243734.3243761. URLhttp://doi.acm.org/10.1145/3243734.3243761.

[26] Ross McIlroy, Jaroslav Sevcík, Tobias Tebbi, Ben L. Titzer, and ToonVerwaest. Spectre is here to stay: An analysis of side-channelsand speculative execution. CoRR, abs/1902.05178, 2019. URLhttp://arxiv.org/abs/1902.05178.

[27] Hanne Riis Nielson and Flemming Nielson. Flow logics for constraintbased analysis. In Kai Koskimies, editor, Compiler Construction, pages109–127, Berlin, Heidelberg, 1998. Springer Berlin Heidelberg. ISBN978-3-540-69724-4.

[28] Andrew Pardoe. Spectre mitigations in msvc. https://devblogs.

microsoft.com/cppblog/spectre-mitigations-in-msvc/, 2018.[29] Jonathan Protzenko, Benjamin Beurdouche, Denis Merigoux, and

Karthikeyan Bhargavan. Formally verified cryptographic webapplications in webassembly. In Security and Privacy, 2019.

[30] G. Romer and C. Carruth. C++ proposal, 2019. URL http:

//www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0928r0.pdf.[31] Michael Schwarz, Martin Schwarzl, Moritz Lipp, and Daniel Gruss. Net-

spectre: Read arbitrary memory over network. CoRR, abs/1807.10535,2018. URL http://arxiv.org/abs/1807.10535.

[32] Mohammadkazem Taram, Ashish Venkat, and Dean Tullsen. Context-sensitive fencing: Securing speculative execution via microcodecustomization. InASPLOS’19.

[33] Vadim Tkachenko. 20-30% performance hit from the spectre bugfix on ubuntu. https://www.percona.com/blog/2018/01/23/20-30-performance-hit-spectre-bug-fix-ubuntu/, Jan 2018.

[34] Eran Tromer, Dag Arne Osvik, and Adi Shamir. Efficient cacheattacks on aes, and countermeasures. J. Cryptol., 23(1):37–71,January 2010. ISSN 0933-2790. doi: 10.1007/s00145-009-9049-y. URLhttp://dx.doi.org/10.1007/s00145-009-9049-y.

[35] Jo Van Bulck,MarinaMinkin, OfirWeisse, Daniel Genkin, Baris Kasikci,Frank Piessens, Mark Silberstein, Thomas F. Wenisch, Yuval Yarom,and Raoul Strackx. Foreshadow: Extracting the keys to the Intel SGXkingdomwith transient out-of-order execution. In Proceedings of the27th USENIX Security Symposium. USENIX Association, August 2018.See also technical report Foreshadow-NG [40].

[36] D. Volpano, G. Smith, and C. Irvine. A Sound Type System for SecureFlow Analysis. J. Computer Security, 4(3):167–187, 1996.

[37] Guanhua Wang, Sudipta Chattopadhyay, Ivan Gotovchits, TulikaMitra, and Abhik Roychoudhury. oo7: Low-overhead defense againstspectre attacks via binary analysis. CoRR, abs/1807.05843, 2018. URLhttp://arxiv.org/abs/1807.05843.

Page 14: AUTOMATICALLYELIMINATINGSPECULATIVELEAKSWITH BLADEgoto.ucsd.edu/~gleissen/papers/BLADE.pdf · vulnerable WebAssembly implementations of cryptographic primitives. BLADE can fix existing

Conference’17, July 2017, Washington, DC, USA

[38] Conrad Watt, John Renner, Natalie Popescu, Sunjay Cauligi, andDeian Stefan. Ct-wasm: Type-driven secure cryptography for the webecosystem. In POPL, 2019.

[39] Conrad Watt, John Renner, Natalie Popescu, Sunjay Cauligi, andDeian Stefan. Ct-wasm: Type-driven secure cryptography forthe web ecosystem. Proc. ACM Program. Lang., 3(POPL):77:1–77:29, January 2019. ISSN 2475-1421. doi: 10.1145/3290390. URLhttp://doi.acm.org/10.1145/3290390.

[40] OfirWeisse, Jo Van Bulck,MarinaMinkin, Daniel Genkin, Baris Kasikci,Frank Piessens, Mark Silberstein, Raoul Strackx, Thomas F. Wenisch,and Yuval Yarom. Foreshadow-NG: Breaking the virtual memoryabstraction with transient out-of-order execution. Technical report,2018. See also USENIX Security paper Foreshadow [35].

[41] MengWuandChaoWang. Abstract interpretationunder speculative ex-ecution. InProceedings of the 40thACMSIGPLANConference onProgram-ming Language Design and Implementation, PLDI 2019, pages 802–815,New York, NY, USA, 2019. ACM. ISBN 978-1-4503-6712-7. doi: 10.1145/3314221.3314647. URL http://doi.acm.org/10.1145/3314221.3314647.

[42] Mengjia Yan, Jiho Choi, Dimitrios Skarlatos, AdamMorrison, Christo-pherW. Fletcher, and Josep Torrellas. Invisispec: Making speculativeexecution invisible in the cache hierarchy. In Proceedings of the 51stAnnual IEEE/ACM International Symposium on Microarchitecture,MICRO-51, pages 428–441, Piscataway, NJ, USA, 2018. IEEE Press.ISBN 978-1-5386-6240-3. doi: 10.1109/MICRO.2018.00042. URLhttps://doi.org/10.1109/MICRO.2018.00042.

[43] Yuval Yarom and Katrina Falkner. Flush+reload: A high reso-lution, low noise, l3 cache side-channel attack. In 23rd USENIXSecurity Symposium (USENIX Security 14), pages 719–732, SanDiego, CA, August 2014. USENIX Association. ISBN 978-1-931971-15-7. URL https://www.usenix.org/conference/usenixsecurity14/

technical-sessions/presentation/yarom.

Page 15: AUTOMATICALLYELIMINATINGSPECULATIVELEAKSWITH BLADEgoto.ucsd.edu/~gleissen/papers/BLADE.pdf · vulnerable WebAssembly implementations of cryptographic primitives. BLADE can fix existing

Automatically Eliminating Speculative Leaks with BLADE Conference’17, July 2017, Washington, DC, USA

A Full Calculus

Fetch-Skip

⟨is,skip:cs,µ,ρ⟩fetch−−−→ϵ ⟨is++[nop],cs,µ,ρ⟩

Fetch-Asgn

⟨is,x :=e :cs,µ,ρ⟩fetch−−−→ϵ ⟨is++[x :=e ],cs,µ,ρ⟩

Fetch-Seq

⟨is,c1;c2 :cs,µ,ρ⟩fetch−−−→ϵ ⟨is,c1 :c2 :cs,µ,ρ⟩

Fetch-Ptr-Load

⟨is,x :=∗e :cs,µ,ρ⟩fetch−−−→ϵ ⟨is++[x :=load(e)],cs,µ,ρ⟩

Fetch-Ptr-Store

⟨is,∗e1 :=e2 :cs,µ,ρ⟩fetch−−−→ϵ ⟨is++[store(e1,e2)],cs,µ,ρ⟩

Fetch-Fail

⟨is,fail:cs,µ,ρ⟩fetch−−−→ϵ ⟨is++[fail],cs,µ,ρ⟩

Fetch-Array-Loadc=x :=e1[e2] e=e2< length(e1) fresh(p)e′=base(e1)+e2 c′= if e thenx :=∗e′ elsefail

⟨is,c :cs,µ,ρ⟩fetch−−−→ϵ ⟨is,c′ :cs,µ,ρ⟩

Fetch-Array-Storec=e1[e2]:=e3 e=e2< length(e1) fresh(p)e′=base(e1)+e2 c′= if e then ∗e′ :=eelsefail

⟨is,c :cs,µ,ρ⟩fetch−−−→ϵ ⟨is,c′ :cs,µ,ρ⟩

Fetch-If-Truec= if e thenc1 elsec2

fresh(p) i=guard(etrue,c2 :cs,p)

⟨is,c :cs,µ,ρ⟩fetchtrue−−−−−−−→ϵ ⟨is++[ i ],c1 :cs,µ,ρ⟩

Fetch-If-Falsec= if e thenc1 elsec2

fresh(p) i=guard(efalse,c1 :cs,p)

⟨is,c :cs,µ,ρ⟩fetchfalse−−−−−−−→ϵ ⟨is++[ i ],c2 :cs,µ,ρ⟩

Fetch-Whilec1=c;whilee c c2= if e thenc1 elseskip

⟨is,whilee c :cs,µ,ρ⟩fetch−−−→ϵ ⟨is,c2 :cs,µ,ρ⟩

Figure 14. Fetch stage.

Page 16: AUTOMATICALLYELIMINATINGSPECULATIVELEAKSWITH BLADEgoto.ucsd.edu/~gleissen/papers/BLADE.pdf · vulnerable WebAssembly implementations of cryptographic primitives. BLADE can fix existing

Conference’17, July 2017, Washington, DC, USA

Execute|is1 |=n−1

ρ ′=ϕ(is1,ρ) ⟨is1,i,is2,cs⟩(µ,ρ′,o)

⟨is′,cs′⟩

⟨is1++[ i ]++is2,cs,µ,ρ⟩execn−−−−→o ⟨is′,cs′,µ,ρ⟩

Exec-Asgni= (x :=e) v=JeKρ i′= (x :=v)

⟨is1,i,is2,cs⟩(µ,ρ,ϵ )

⟨is1++[ i′ ]++is2,cs⟩

Exec-Branch-Oki=guard(eb,cs′,p) JeKρ =b

⟨is1,i,is2,cs⟩(µ,ρ,ϵ )

⟨is1++[nop]++is2,cs⟩

Exec-Branch-Mispredicti=guard(eb,cs′,p) JeKρ ,b

⟨is1,i,is2,cs⟩(µ,ρ,rollback(p))

⟨is1,cs′⟩

Exec-Loadi=x :=load(e) store( , ) < is1

n=JeKρ ps=Lis1M i′= (x :=µ(n))

⟨is1,i,is2,cs⟩(µ,ρ,read(n,ps))

⟨is1++[ i′ ]++is2,cs⟩

Exec-Store-Addri=store(e1,e2) n=Je1Kρ i′=store(n,e2)

⟨is1,i,is2,cs⟩(µ,ρ,ϵ )

⟨is1++[ i′ ]++is2,cs⟩

Exec-Store-Valuei=store(n,e)

v=JeKρ ps=Lis1M i′=store(n,v)

⟨is1,i,is2,cs⟩(µ,ρ,write(n,ps))

⟨is1++[ i′ ]++is2,cs⟩

Figure 15. Execute stage.

Retire-Nop⟨nop:is,cs,µ,ρ⟩

retire−−−−→ϵ ⟨is,cs,µ,ρ⟩

Retire-Asgn⟨x :=v :is,cs,µ,ρ⟩

retire−−−−→ϵ ⟨is,cs,µ,ρ[x 7→v]⟩

Retire-Storei=store(n,v)

⟨i :is,cs,µ,ρ⟩retire−−−−→ϵ ⟨is,cs,µ[n 7→v],ρ⟩

Retire-Fail⟨fail:is,cs,µ,ρ⟩

retire−−−−→fail ⟨[ ],[ ],µ,ρ⟩

Figure 16. Retire stage.

ϕ(ρ,[ ])=ρϕ(ρ,(x :=v):is)=ϕ(ρ[x 7→v],is)ϕ(ρ,(x :=e):is)=ϕ(ρ[x 7→⊥],is)ϕ(ρ,(x :=load(e)):is)=ϕ(ρ[x 7→⊥],is)ϕ(ρ,(x :=protect(e)):is)=ϕ(ρ[x 7→⊥],is)ϕ(ρ,i :is)=ϕ(ρ,is)

(a) Transient Variable Map.

JvKρ =vJxKρ =ρ(x)Jlength(e)Kρ = length(JeKρ )Jbase(e)Kρ =base(JeKρ )Je1+e2Kρ =Je1Kρ+Je2KρJe1⩽e2Kρ =Je1Kρ⩽Je2Kρ

(b) Evaluation Function.

L[ ]M= [ ]Lguard(eb,cs,p):isM=p :LisMLi :isM=LisM

(c) Pending Guard Identifiers.

Figure 17.Helper functions.

Page 17: AUTOMATICALLYELIMINATINGSPECULATIVELEAKSWITH BLADEgoto.ucsd.edu/~gleissen/papers/BLADE.pdf · vulnerable WebAssembly implementations of cryptographic primitives. BLADE can fix existing

Automatically Eliminating Speculative Leaks with BLADE Conference’17, July 2017, Washington, DC, USA

Fetch-Protect-Ptrc=x :=protect(∗e)

c1=x :=∗e c2=x :=protect(x)

⟨is,c :cs,µ,ρ⟩fetch−−−→ϵ ⟨is,c1 :c2 :cs,µ,ρ⟩

Fetch-Protect-Arrayc=x :=protect(e1[e2])

c1=x :=e1[e2] c2=x :=protect(x)

⟨is,c :cs,µ,ρ⟩fetch−−−→ϵ ⟨is,c1 :c2 :cs,µ,ρ⟩

Fetch-Protect-Exprc=x :=protect(e) i=x :=protect(e)

⟨is,c :cs,µ,ρ⟩fetch−−−→ϵ ⟨is++[ i ],cs,µ,ρ⟩

Exec-Protect1i=x :=protect(e) v=JeKρ i′=x :=protect(v)

⟨is1,i,is2,cs⟩(µ,ρ,ϵ )

⟨is1++[ i′ ]++is2,cs⟩

Exec-Protect2i=x :=protect(v)

guard( , , ) < is1 i′= (x :=v)

⟨is1,i,is2,cs⟩(µ,ρ,ϵ )

⟨is1++[ i′ ]++is2,cs⟩

Figure 18. Semantics of protect(·).

Page 18: AUTOMATICALLYELIMINATINGSPECULATIVELEAKSWITH BLADEgoto.ucsd.edu/~gleissen/papers/BLADE.pdf · vulnerable WebAssembly implementations of cryptographic primitives. BLADE can fix existing

Conference’17, July 2017, Washington, DC, USA

B Semantics of Stable Read

Current processors do not provide a protect primitive in-struction nor the means to implement it on top of existinginstructions, in its full generality. However, for array reads, itis possible to replicate the effects of protect by exploiting thesame data-dependencies tracking capabilities at the core ofthe processor pipeline. Indeed, Speculative Load Hardening(SLH), a mitigation technique deployed in the code gener-ated by the CLANG compiler, relies on data-dependencies tosecure memory loads automatically [10]. Using our formalmodel, we give rigorous semantics to SLH and show that itcan stop transient execution attacks.At a high level, SLH injects artificial data-dependencies

between the conditions used in branch instructions and theaddresses loaded in the following instructions to transformcontrol-flow dependencies into data-flow dependencies. In-tuitively, these data-dependencies validate control-flow de-cisions at runtime by stalling speculative loads until the pro-cessor resolves the conditions. Using branch conditions, SLHmasks the address of loads instructions in such a way that theprocessor zeroes out the address if the condition is mispre-dicted, preventing misloads.To formalize this mechanism, we extend our processor

model as follows. We introduce a new processor instructionx := e ? e1 : e2, which corresponds to the conditional moveinstruction CMOVon x86 processors. This instruction simplyassigns the value of e1 (resp. e2) to variable x, if the conditione evaluates to true (resp. false). Importantly, this instructionis not subject to speculation: the processor must first evaluatethe condition before it can resolve the assignment. We alsoextend expressions with the standard bitwise AND operator(&) andwrite 0 and 1 for bit words consisting of all 0 and 1. Asusual bitmask 0 and 1 are respectively the zero and identityelement for&, i.e., Je&0Kρ =0 and Je&1Kρ =JeKρ .Figure 19 presents the semantics rules for CMOV and for

the stable read command implementedusing SLH.Rule [Exec-CMOV] evaluates the condition (b=JρKe) of the conditionalassignment x :=e?:etrue :efalse and assigns the correspondingexpressions (x :=eb). Rule [Fetch-Stable-Read-SLH] fetchescommand x :=stable_read(e1,e2), computes the bounds checkcondition, the address of the indexed element, and push onthe stack the following command.

r :=e1⩽ length(e2)if r thenr :=r?1:0;x :=∗((base(e1)+e2)&r);elsefail

The code above is similar to the code generated by a regulararray read, but additionally stores the result of the bounds-check condition in reserved variable r . In the then-branch, thecondition is thenconverted intoa suitablebitmaskusingusing

Exec-CMOVi=x :=e?:etrue :efalse b=JeKρ i′=x :=eb

⟨is1,i,is2,cs⟩(µ,ρ,ϵ )

⟨is1++[ i′ ]++is2,cs⟩

Fetch-Stable-Read-SLHc=x :=stable_read(e1,e2) e=e2< length(e1)e′=base(e1)+e2 c1= r :=e c2= r :=r?1:0c3=x :=∗(e′&r) c′=c1;if r thenc2;c3 elsefail

⟨is,c :cs,µ,ρ⟩fetch−−−→ϵ ⟨is,c′ :cs,µ,ρ⟩

Figure 19. Semantics of x :=stable_read(e1,e2).

the non-speculative CMOV instruction i.e., r :=r?1:0, whichthen masks the address loaded, i.e., ∗((base(e1)+e2)&r). As aresult, thevalueof theaddress remainsundefineduntil thepro-cessor evaluates the bounds check condition. When the con-dition resolves, if the index is inbound r=1 and the programreads the correct address Je&1Kρ =JeKρ If the index is out-of-bounds, instead, r=0 and the load can only read speculativelyfrom a constant address (x :=µ(0)), thus closing the leak.4

Revisited Example. Consider again running example Ex1in Figure 3, where instead of standard array reads, we employthe stable_read(·) primitive from above. After fetching theprogram, the addresses of the loads are masked with the re-spective array bounds-check conditions. Assuming the samememory layout and content as in Figure 10 (except for thefact that arrays are shifted by one position since µ(0)= 0 isreserved), the processor resolves the first bounds check andreads the arraywithin its bounds, i.e., x :=µ(3)=0. The secondload attempts to read the array out of bounds (y :=a[2]), andour countermeasure prevents the buffer overrun by redirect-ing the load to the dummy value stored at address 0. First, theprocessor resolves the bounds check, i.e., r :=0, and forwardesit to the load y :=load((base(a)+i2)&r). Then, the conditionzeros out the address and the processor assigns the dummyvalue to variable y, i.e., y :=µ(0). As a result, we always readarray b at index z=0 and close the leak.

4We assume that the first memory cell is reserved to the processor, whichinitializes it with dummy data, e.g., µ(0)=0.

Page 19: AUTOMATICALLYELIMINATINGSPECULATIVELEAKSWITH BLADEgoto.ucsd.edu/~gleissen/papers/BLADE.pdf · vulnerable WebAssembly implementations of cryptographic primitives. BLADE can fix existing

Automatically Eliminating Speculative Leaks with BLADE Conference’17, July 2017, Washington, DC, USA

C Full Type System

Constraints. Our typing judgement Γ,Prot ⊢ s ⇒ c createsa set of constraints c . The syntax for constraints is shown inFigure 21. The constraints relate atomswhich either representthe unknown type of a variable x (αx ), or the unknown typeof an expression (r ). Constraints record can-flow-to relation-ships between the atoms and lattice values T and S. They areaccumulated via operator ∪, where we identify c1∪ ··· ∪cnwith the set {c1,...,cn}.

Solutions and Satisfiability.Wedefine the solution to a setof constraints as a function σ from atoms to flow types, i.e.,σ ∈Atoms 7→ {T,S}, and extend solutions to map T and S tothemselves. For a set of constraintsc and a solution functionσ ,we write σ ⊢c to say that the constraints c are satisfied undersolution σ . The definition of σ ⊢c is shown in the lower partof Figure 21. In short, solution σ satisfies c , if all can-flow-toconstraints hold, when the atoms are replaced by their valuesunderσ .We say that a set of constraintsc is satisfiable, if thereis a solution σ such that σ ⊢c .

Paths. The constraints generated by our type system giverise to the def-use graph of the type-checked program. For aset of constraints c , we call a sequence of atoms a1...an a pathin c , if ai ⊑ai+1 ∈c for i ∈1,...,n−1 and say that a1 is the path’sentry andan its exit. AT-S path is a pathwith entryT and exitS. A set of constraints c is satisfiable if and only if there is noT-S path in c , as such a path would correspond to a derivationof false. If c is satisfiable, we can compute a solution σ (c) byletting σ (c)(a)=T, if there is a path with entry T and exit a,and S otherwise.

Cuts. If a set of constraints is unsatisfiable, we can make itsatisfiable by removing someof thenodes in its graphor equiv-alently protecting some of the expressions. A set of atomsAcuts a path a1 ...an , if some a ∈A occurs along the path, i.e.,there existsa ∈A and i ∈1,...,n such thatai =a.We callA a cut-set for a set of constraints c , ifA cuts allT-S paths in c , and saythatA is minimal for c , if all other cut-setsA′ contain as manyor more atoms thatA, i.e., #A⩽#A′. The problem of findinga minimal cut-set is an instance of themMin-Cut/Max-Flowproblem, and we can reuse existing efficient algorithms [1]to compute a solution.

Extracting Types FromCuts. From a set of variablesA suchthat A is a cut-set of constraints c , we can extract a typingenvironment Γ(c,A) as follows: for an atom αx , we defineΓ(c,A)(x)=T, if there is a path with entry T and exit αx in cthat is not cut byA, and let Γ(c,A)(x)=S otherwise.

Type Inference.To infer typing environment Γ and protectedset Prot for a statment s , we first define a dummy environ-ment Γ∗ and protected set Prot∗, such that Γ∗,Prot∗ ⊢ s ⇒ cholds for any statement s , and use it to extract the set of con-straints c . For this, we define Γ∗ as the environment that mapsall variables to S, and Prot∗ the set of all variables. We then

compute a minimal set of variables A such that A is a cut-set of c , extract environment Γ(c,A) and use A as protectedset. Statement s is then guaranteed to type check under theinferred environment.

Proposition 2 (Type Inference). If Γ∗,Prot∗ ⊢s ⇒c andA isa set of variables that cut c , then Γ(c,A),A⊢s .

Remark. To infer a repair using stable_read instead ofprotect, we can restrict our cut-set to only include variablesthat are assigned values from an array read.Example. Consider again Ex1 in Figure 3. The graph definedby the constraints c , given by Γ∗,Prot∗ ⊢Ex1 ⇒c is shown inFigure 4, wherewe have omittedα-nodes. The constraints arenot satisfiable, since there are T-S paths. Both {x ,y} and {z}are cut-sets, since they cut eachT-S path, however, the set {z}contains only one element and is therefore minimal. Finally,environment Γ(c,{x ,y}) types all variables as S and Γ(c,{z})types x andy as T and z, i1 and i2 as S, and by Proposition 2both Γ(c,{x ,y}),{x ,y} ⊢Ex1 and Γ(c,{z}),{z} ⊢Ex1 hold.Example.Next, consider the following example Ex3.

x :=a[i]; b[y] :=x ; if 0⩽x then z :=y else skip

We show the corresponding graph in Figure 22. As before, theconstraints are unsatisfiable due to the path from T to S. Theset {x} is a minimal cut-set producing environment Γ(c,{x})which types all variables as S. Finally, the typing judgementΓ,{x} ⊢Ex3 holds, indicating that the program is secure, giventhe promise that x will be protected.

C.1 Examples for Repair

Example.Consider again Ex1 in Figure 3 from Section 2. Thecut-set shown on the right in Figure 4 produces the repairshown in the comments of Figure 3.Example. Consider again Ex2 and its dataflow graph shownin Figure 22. The cut-set {x} produces the repaired programbelow.

x :=protect(a[i]); b[y] :=x ;if 0⩽x then z :=y else skip

C.2 Type Inference

Our type-inference approach is based on type-constraints sat-isfaction. Intuitively, type constraints restrict the types thatvariables and expressions may assume in a program. In theconstraints, thepossible typesof variables andexpressions arerepresented by atoms, unknown types of (sub-)expressionsand type variables that can be instantianted with any typethat satisfies the constraints. Solving these constrains requiresfinding a substituion, i.e., a mapping from atoms to concretetransient-flow type, such that all constraints are satisfied ifwe instantiate the atoms with their type.

Type inference consists of 3 steps: (i) generate a set of con-straints under an initial typing environment and protectedset that under-approximates the solution of the constraints,

Page 20: AUTOMATICALLYELIMINATINGSPECULATIVELEAKSWITH BLADEgoto.ucsd.edu/~gleissen/papers/BLADE.pdf · vulnerable WebAssembly implementations of cryptographic primitives. BLADE can fix existing

Conference’17, July 2017, Washington, DC, USA

ValueΓ ⊢v :τ ⇒∅

VarΓ(x)=τ

Γ ⊢x :τ ⇒ x ⊑αx

Uopuop ∈ { length(·),base(·)} Γ ⊢e :τ1 ⇒k τ1 ⊑ τ

Γ ⊢uop(e):τ ⇒k∪(e ⊑uop(e))

Bop⊕ ∈ {+,⩽} Γ ⊢e1 :τ1 ⇒ k1 Γ ⊢e2 :τ2 ⇒ k2 τ1 ⊑ τ τ2 ⊑ τ

Γ ⊢e1 ⊕ e2 :τ ⇒k1∪k2∪(e1⊑e1⊕e2)∪(e2⊑e1⊕e2)

Ptr-ReadΓ ⊢e :S ⇒k

Γ ⊢∗e :T ⇒k∪(e ⊑S)∪(T⊑e)

Array-ReadΓ ⊢e1 :S ⇒k1 Γ ⊢e2 :S ⇒k2

Γ ⊢e1[e2]:T ⇒k1∪k2∪(e1⊑S)∪(e2⊑S)∪(T⊑e1[e2])

(a) Typing Rules for Expressions and Arrays.

SkipΓ ⊢skip ⇒∅

FailΓ ⊢ fail ⇒∅

SeqΓ ⊢c1 ⇒k1 Γ ⊢c2 ⇒k2

Γ ⊢c1;c2 ⇒k1∪k2

AsgnΓ ⊢ r :τ ⇒k τ ⊑ Γ(x)

Γ,Prot⊢x :=r ⇒k∪(r ⊑x)

Ptr-WriteΓ ⊢e1 :S ⇒k1 Γ ⊢e2 :S ⇒k2

Γ,Prot⊢∗e1 :=e2 ⇒k1∪k2∪(e1⊑S)∪(e2⊑S)

Array-WriteΓ ⊢e1 :S ⇒k1 Γ ⊢e2 :S ⇒k2 Γ ⊢e3 :S ⇒k3

Γ,Prot⊢e1[e2]:=e3 ⇒k1∪k2∪k3∪(e1⊑S)∪(e2⊑S)∪(e3⊑S)

ProtectΓ ⊢ r :τ ⇒k

Γ,Prot⊢x :=protect(r) ⇒k

Asgn-ProtΓ ⊢ r :τ ⇒k x ∈ Prot

Γ,Prot⊢x :=r ⇒k∪(r ⊑x)

Stable-ReadΓ ⊢e1 :S ⇒k Γ ⊢e2 :S

Γ,Prot⊢x :=stable_read(e1,e2) ⇒k∪(e1⊑S)∪(e2⊑S)

If-Then-ElseΓ ⊢e :S ⇒k Γ,Prot⊢c1 ⇒k1 Γ,Prot⊢c2 ⇒k2

Γ,Prot⊢ if e thenc1 elsec2 ⇒k∪k1∪k2∪(e ⊑S)

WhileΓ ⊢e :S ⇒k1 Γ,Prot⊢c ⇒k2

Γ,Prot⊢whileedoc ⇒k1∪k2∪(e ⊑S)

(b) Typing Rules fo Commands.

Figure 20. Transient flow type system and type constraints generation .

(ii) construct the def-use graph from the constraints and finda cut-set, and (iii) cut the transient-to-stable dataflows in thegraph and compute the resulting typing environment.Constraint Generation.We describe the generation of con-straints through the typing judgment from Figure 12.5 Givena typing environment Γ, a protected set Prot, the judgmentΓ,Prot⊢ r⇒k type checks r and generates type constraintsk. The syntax for constraints is shown in Figure 13. Con-straints are sets of can-flow-to relations involving concretetypes (S and T) and atoms, i.e., type variables correspondingto program variables (e.g., αx for x) and unknown types for

5For space reasons, the constraints generation is reported next to the typingjudgment, but we remark that these are two distinct judgments. In particularthe type-constraints generation judgment ignores the set of protectedvariables Prot, but to avoid confusion we include it in the judgment anyway.

expressions (e.g., r). In rule [Var], constraint x ⊑ αx indicatesthat the type variable of x should be at least as transitive asthe unknown type of x. This ensures that, if variable x is tran-sient, thenαx can only be instantiatedwith typeT. Rule [Bop]generates constraints e1 ⊑ e1 ⊕ e2 and e2 ⊑ e1 ⊕ e2 to reflectthe fact that the unknown type of e1 ⊕ e2 should be at least astransitive as the (unknown) type of e1 and e2. Notice that theseconstraints correspond exactly to the premises τ1 ⊑ τ andτ2 ⊑ τ of the same rule. Similarly, rule [Array-Read] gener-ates constraints e1 ⊑ S and e2 ⊑ S for the unknown type of thearray and the index respectively. In addition to these, the rulegenerates also the constraintT ⊑ e1[e2], which forces the typeof e1[e2] to be transient. Rule [Asgn] and [Asgn-Prot] gener-ate the same constraint r ⊑ x becausewe ignore the protectedset during constraint generation, as explained in the footnote.

Page 21: AUTOMATICALLYELIMINATINGSPECULATIVELEAKSWITH BLADEgoto.ucsd.edu/~gleissen/papers/BLADE.pdf · vulnerable WebAssembly implementations of cryptographic primitives. BLADE can fix existing

Automatically Eliminating Speculative Leaks with BLADE Conference’17, July 2017, Washington, DC, USA

Atom a F αx | rConstraint k F a⊑S |T⊑a

a⊑a |k∪k | ∅Solution σ ∈ Atoms 7→{S,T}

σ (S)=S σ (T)=T

Sol-TransientT⊑σ (a2)

σ ⊢T⊑a2

Sol-Stableσ (a1)⊑S

σ ⊢a1⊑S

Sol-Flowσ (a1)⊑σ (a2)

σ ⊢a1⊑a2

Sol-Setσ ⊢c1 ... σ ⊢cn

σ ⊢ {c1,...,cn}

Figure 21. Type Constraints and Satisfiability.

T

b[y]

a[i] x 0⩽x S

yz

i

Figure 22. Dataflow graph of Ex3. The minimal cut-set isshown using a dotted line.

In contrast, rule [Protect] does not generate the constraintr ⊑ x because r is explicitly protected. In the other rules, theconstraints are generated following a similar scheme.From the set of constraints, we can construct the use-def

graph of the program as outlined in Section 2.3. We referAppendix C for a formal account of the mathematical con-struction.Type Inference. To perform type inference on a program c,we first generate a set of constraints k using the judgmentdescribed above,with appropriate initial values for the typingenvironment and the protected set. Specifically, we start withan environment that types all variables as stable, i.e., Γ∗=λx .Sand include all variables in the cut-set, i.e., Prot∗ = Vars(c)and generate a set of constraints k for c, i.e., Γ∗,Prot∗ ⊢c⇒k.From the constraints k, we construct the def-graph and com-pute a cut-set Prot, e.g., by applying the Min-Cut/Max-Flowalgorithm. Then, from the cut-set Prot and the program c,we compute a substitution that solves the constraints k, asfollows. We remove from the graph all nodes in the cut-setProt (and their corresponding edges), and type all variablesreachable from node T as transient, and all other variables asstable. We update the initial typing environment with thesetype assignments and obtain the resulting environment Γ. Un-der new environment Γ and protected setProt, the unrepairedprogram type checks, i.e., Γ,Prot⊢c.

D Proofs

D.1 Security

In the following, we write Γ ⊢C to indicate that the programbeing executed on the processor is well-typed according tothe transient-flow type-system.Non-Speculative Projection of Observations. Function O↓computes the non-speculative projection of observationsO.To do that, it applies function C(o,ps) pointwise. FunctionC(o,ps) takes as input a single observation o and ps, a set ofidentifiers ofmispredicted guards. The function then removesprediction identifiers from observations correctly speculatedand replaces mispredicted load, store and rollbacks with theempty observation ϵ . Function R(O) collects the identifiers ofrollbacked guards from events rollback(p).R(O)= {p | rollback(·) (p) ∈ O }

C(load(n,ps1),ps2)| ps1 ∩ ps2≡∅= load(n)| otherwise=ϵ

C(store(n,ps1),ps2)| ps1 ∩ ps2≡∅=store(n)| otherwise=ϵ

C(rollback(p)),ps)=ϵC(o, )=o

O↓= {C(o,R(O)) | o ∈ O }

Definition2 (L-equivalence). TwoconfigurationsC1= ⟨is1,cs1,µ1,ρ1⟩and C2= ⟨is2,cs2,µ2,ρ2⟩ are L-equivalent, if and only if is1= is2,cs1=cs2, µ1≈L µ2, and ρ1≈L ρ2.

Lemma D.1 (L-equivalence 1-step preservation). Let ps bethe set of guard identifiers rollbacked in the rest of the executionof well-typed configurations Γ ⊢C1 and Γ ⊢C2. If C1≈LC2 and

C1d−→o1 C ′

1 and C2d−→o2 C ′

2, ifC(o1,ps)=C(o2,ps), then o1=o2.

Proof. Bycaseanalysisonthe twosmall-stepreductions.SinceC1≈LC2, then their reorder buffer and commands stack areequal, i.e., is1= is2 and cs1=cs2. Thus, the two configurationsexecute the process the same instruction in the same stage.All the instructions that generate the empty observation ϵ orfail are trivial. This include all the rules in the fetch and in theretire stage. The interesting cases that can leak occur specu-latively and out-of-order, i.e., the n stage. By inspecting rule[Execute], we notice both configurations execute the same n-th instruction from the attacker supplied directive andwith L-equivalent transient variable maps, (ϕ(is,ρ1)≈Lϕ(is,ρ2) fromρ1≈L ρ1). Then,we consider the instructions that can leak dur-ing the execute stage: guardsguard (eb,cs,p), loadsx :=load(e),and stores store(n,e). The guard instruction can result in arollback (rule [Exec-Branch-Mispredict]) or resolved suc-cessfully (rule [Exec-Branch-Ok]) If the two execution differthe attacker gains information by observing a rollback((p))or not (e.g., through the data cache).We show that this cannothappen because the guard expression is typed S.

Page 22: AUTOMATICALLYELIMINATINGSPECULATIVELEAKSWITH BLADEgoto.ucsd.edu/~gleissen/papers/BLADE.pdf · vulnerable WebAssembly implementations of cryptographic primitives. BLADE can fix existing

Conference’17, July 2017, Washington, DC, USA

Skip⟨µ,ρ⟩ ⇓

skipϵ ⟨µ,ρ⟩

Fail⟨µ,ρ⟩ ⇓failfail ⟨µ,ρ⟩

Asgnv=JeKρ

⟨µ,ρ⟩ ⇓(x:=e)ϵ ⟨µ,ρ[x 7→v]⟩

Ptr-Readn=JeKρ v=µ(n)

⟨µ,ρ⟩ ⇓(x:=∗e)read(n) ⟨µ,ρ[x 7→v]⟩

Array-Reada=Je1Kρ n=Je2Kρ n⩽ length(a) n′=base (a)+n v=µ(n′)

⟨µ,ρ⟩ ⇓(x:=e1[e2])read(n′) ⟨µ,ρ[x 7→v]⟩

Array-Read-Faila=Je1Kρ n=Je2Kρ n> length(a)

⟨µ,ρ⟩ ⇓(x:=e1[e2])fail ⟨µ,ρ⟩

Ptr-Writen=Je1Kρ v=Je2Kρ

⟨µ,ρ⟩ ⇓(∗e1:=e2)read(n′) ⟨µ[n 7→v],ρ⟩

Array-Writea=Je1Kρ n=Je2Kρ v=Je3Kρ n⩽ length(a) n′=base(a)+n

⟨µ,ρ⟩ ⇓(e1[e2]:=e3)write(n′) ⟨µ[n′ 7→v],ρ⟩

Array-Write-Faila=Je1Kρ n=Je2Kρ n> length(a)

⟨µ,ρ⟩ ⇓(e1[e2]:=e3)fail ⟨µ,ρ⟩

Protect⟨µ,ρ⟩ ⇓(x:=r)O ⟨µ,ρ ′⟩

⟨µ,ρ⟩ ⇓(x:=protect(r))O ⟨µ,ρ ′⟩

Stable-Read⟨µ,ρ ′⟩(⇓(x:=e1[e2])O ⟨µ,ρ ′⟩

⟨µ,ρ⟩ ⇓(x:=stable_read(e1,e2))O ⟨µ,ρ ′⟩

If-Then-Elseb=JeKρ ⟨µ,ρ⟩ ⇓cbO ⟨µ ′,ρ ′⟩

⟨µ,ρ⟩ ⇓(if e thenctrue elsecfalse)O ⟨µ ′,ρ ′⟩

Whilec′= if b then (whileedoc)elseskip ⟨µ,ρ⟩ ⇓c

O ⟨µ ′,ρ ′⟩

⟨µ,ρ⟩ ⇓(whileedoc)O ⟨µ ′,ρ ′⟩

Seq⟨µ,ρ⟩ ⇓c1O1

⟨µ ′,ρ ′⟩ fail < O1 ⟨µ ′,ρ ′⟩ ⇓c2O2⟨µ ′′,ρ ′′⟩

⟨µ,ρ⟩ ⇓(c1;c2)(O1 .O2)

⟨µ ′′,ρ ′′⟩

Seq-Fail⟨µ,ρ⟩ ⇓c1O1

⟨µ ′,ρ ′⟩ fail ∈ O1

⟨µ,ρ⟩ ⇓(c1;c2)O1⟨µ ′,ρ ′⟩

Figure 23. Sequential big-step semantics with observations.

Done⟨[ ],[ ],µ,ρ⟩ ⇓[ ]ϵ ⟨[ ],[ ],µ,ρ⟩

Step

⟨is,cs,µ,ρ⟩d−→o ⟨is′,cs′,µ ′,ρ ′⟩

⟨is′,cs′,µ ′,ρ ′⟩ ⇓DO ⟨is′′,cs′′,µ ′′,ρ ′′⟩

⟨is,cs,µ,ρ⟩ ⇓(d:D)(o.O) ⟨is

′′,cs′′,µ ′′,ρ ′′⟩

Figure 24. Speculative big-step semantics.

We need to prove JeKρ1 =JeKρ2 , with ρ1≈L ρ2. If e containsa secret variable, then JeKρ1 . JeKρ2 , however the secretvalue would be leaked during sequential execution as well,i.e., it contradicts the hyptothesis C(o1,ps) = C(o2,ps). If econtains only public variables, the outcome of the two condi-tions may still differ. In particular transient secrets may taintpublic variables and from there transimtted to the conditionthrough the transient function map. However, by the rulesof our type system Γ ⊢e :S, which means that there must be a

protect(·) in between the transient source and the stable sink.Since protect(·) forbids values forwarding, the value of thecondition is undefined e(ρ1) = e(ρ2) = bot and this case is void.The reasoning for rules [Exec-Load] and [Exec-Store-

Value] is similar.□

TheoremD.2 (Soundness). For all programs c, if Γ ⊢c then csatisfies speculative non-interference.

Proof. Let µ1 and µ2 be memories such that µ1≈L and µ2 andsimilarly ρ1 and ρ2 variables map such that ρ1 ≈L ρ2. LetCi= ⟨[ ],[c ],µi,ρi⟩ for i ∈ {1,2} and let D be a valid schedulesuch that C1 ⇓

DO1

C ′1 and C2 ⇓

DO2

C ′2. We now assumeO1↓=O2↓

and show thatO1=O2 by induction on the typing judgment.The base case ([Done]) is trivial. In the inductive case, wetwo pairs of small and big steps: reductions. A pair of small-step reductions ⟨isi,csi,µi,ρi⟩

d−→oi ⟨is

′i,cs′i,µ ′i ,ρ

′i ⟩ and a pair

of big-step reductions ⟨is′i,cs′i,µ ′i ,ρ′i ⟩ ⇓

DOi

⟨is′′i,cs′′i,µ ′′i ,ρ′′i ⟩

for i ∈ {1,2}. Assuming that the program does not leaksequentially, we have o1↓=o2↓ and O1↓=O2↓. By induction hy-pothesis on the big-step we obtain O1=O2 and derive o1=o2by Lemma D.1 applied to small-step reductions and the set ofmispredicted guard identifiers R(O1).

Page 23: AUTOMATICALLYELIMINATINGSPECULATIVELEAKSWITH BLADEgoto.ucsd.edu/~gleissen/papers/BLADE.pdf · vulnerable WebAssembly implementations of cryptographic primitives. BLADE can fix existing

Automatically Eliminating Speculative Leaks with BLADE Conference’17, July 2017, Washington, DC, USA