-
ARM Processor Architecture (I)ARM Processor Architecture (I)
Speaker: Lung-Hao Chang Advisor: Porf. Andy Wu
Graduate Institute of Electronics Engineering,National Taiwan
University
Modified from National Chiao-Tung University IP Core Design
course
-
2SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
OutlineThumb instruction setARM/Thumb interworkingARM
organizationSummary
-
3SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Thumb instruction set
-
4SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Thumb-ARM DifferenceThumb instruction set is a subset of the ARM
instruction set and the instructions operate on a restricted view
of the ARM registersMost Thumb instructions are executed
unconditionally All ARM instructions are executed conditionally
Many Thumb data processing instructions use two 2-address
format, i.e. the destination register is the same as one of the
source registers ARM data processing instructions, with the
exception of
the 64-bit multiplies, use a 3-address formatThumb instruction
formats are less regular than ARM instruction formats => dense
encoding
-
5SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Registers Access in ThumbNot all registers are directly
accessible in ThumbLow register r0 r7 fully accessible
High register r8 r12 only accessible with MOV, ADD, CMP
SP (Stack Pointer), LR (Link Register) & PC(Program Counter)
limited accessibility, certain instructions have implicit
access to theseCPSR only indirect access
SPSR no access
-
6SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Thumb Accessible RegistersShaded registers have restricted
access
-
7SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Branches (1/2)Thumb defines three PC-relative branch
instructions, each of which have different offset ranges Offset
depends upon the number of available bits
Conditional Branches B label 8-bit offset: range of -128 to 127
instruction (+/-256 bytes) Only conditional Thumb instructions
8-bit offset15 12 11 8 7 0
1 1 0 1 cond (1) B
-
8SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Branches (2/2)Unconditional Branches B label 11-bit offset:
range of -1024 to 1023 instructions (+/-2K bytes)
11-bit offset15 1110 0
1 1 1 0 0 (2) B
Long Branches with Link BL subroutine Implemented as a pair of
instructions 22-bit offset: range of -2097152 to 2097151
instruction (+/-4M
bytes)
11-bit offset15 12 1110 0
1 1 1 1 H (3) BL
10-bit offset15 11 10 1 0
1 1 1 0 1 (3a) BLX 0
-
9SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Data Processing InstructionSubset of the ARM data processing
instructionsSeparate shift instructions (e.g. LSL, ASR, LSR,
ROR)
LSL Rd,Rs,#Imm5 ;Rd:=Rs #Imm5ASR Rd,Rs ;Rd:=Rd Rs
Two operands for data processing instructions Act on low
registers
BIC Rd,Rs ;Rd:=Rd AND NOT RsADD Rd,#Imm8 ;Rd:=Rd + #Imm8
Also three operand forms of add, subtract and shiftsADD
Rd,Rs,#Imm3 ;Rd:=Rs + #Imm3
Condition code always set by low register operations
-
10SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Load or Store RegisterTwo pre-indexed addressing modes Base
register + offset register Base register + 5-bit offset, where
offset scaled by
4 for word accesses (range of 0-127 bytes / 0-31 words) STR
Rd,[Rb,#Imm7]
2 for halfword accesses (range of 0-63 bytes / 0-31 halfwords)
LDRH Rd,[Rb,#Imm6]
1 for bytes accesses (range of 0-31 bytes) LDRB
Rd,[Rb,#Imm5]
Special forms Load with PC as base with 1K byte immediate offset
(word aligned)
Used for loading a value from a literal pool Load and store with
SP as base with 1K byte immediate offset (word
aligned) Used for accessing local variables on the stack
-
11SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Block Data TransfersMemory copy, incrementing base pointer after
transfer STMIA Rb!, {Low Reg list} LDMIA Rb!, {Low Reg list}
Full descending stack operations PUSH {Low Reg list} PUSH {Low
Reg List, LR} POP {Low Reg list} POP {Low Reg List, PC}
The optional addition of the LR/PC provides support for
subroutine entry/exit
-
12SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Thumb Instruction Entry and ExitT bit, bit 5 of CPSR If T = 1,
the processor interprets the instruction stream as
16-bit Thumb instruction If T = 0, the processor interprets if
as standard ARM
instructionsThumb Entry ARM cores startup, after reset,
execution ARM
instructions Executing a branch and Exchange instruction
(BX)
Set the T bit if the bottom bit of the specified register was
set Switch the PC to the address given in the remainder of the
register
Thumb Exit Executing a thumb BX instruction
-
13SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
MiscellaneousThumb SWI instruction format Same effect as ARM,
but SWI number limited to 0-255 Syntax:
SWI
SWI number
15 8 7 0
1 1 0 1 1 1 1 1
Indirect access to CPSR and no access to SPSR, so no MRS or MSR
instructionsNo coprocessor instruction space
-
14SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
ARM Thumb-2 core technology
New instruction set for the ARM architectureEnhanced levels of
performance, energy efficiency, and code density for a wide range
of embedded applications
-
15SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Thumb Instruction Set (1/3)
-
16SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Thumb Instruction Set (2/3)
-
17SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Thumb Instruction Set (3/3)
-
18SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Thumb Instruction Format
-
19SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
ARM/Thumb interworking
-
20SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
The Need for InterworkingThe code density of Thumb and its
performance from narrow memory make it ideal for the bulk of C code
in many systems. However there is still a need to change between
ARM and Thumb state within most applications ARM code provides
better performance from wide memory
Therefore ideal for speed-critical parts of an application Some
functions can only be performed with ARM instructions, e.g.
Access to CPSR (to enable/disable interrupts & to change
mode) Access to coprocessors
Exception Handling ARM state is automatically entered for
exception handling, but system
specification may require usage of Thumb code for main handler
Simple standalone Thumb programs will also need an ARM
assembler header to change state and call the Thumb routine
-
21SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
ARM/Thumb InterworkingInterworking can be carried out using the
Branch Exchange instruction BX Rn ;Thumb state Branch
;Exchange BX Rn ;ARM state Branch
Can also be used as an absolute branch without a state
change
-
22SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Example;start off in ARM state
CODE32ADR r0,Into_Thumb+1 ;generate branch target
;address & set bit 0;hence arrive Thumb state
BX r0 ;branch exchange to ThumbCODE16 ;assemble subsequent as
Thumb
Into_Thumb ADR r5,Back_to_ARM ;generate branch target to
;word-aligned address,;hence bit 0 is cleared.
BX r5 ;branch exchange to ARMCODE32 ;assemble subsequent as
ARM
Back_to_ARM
-
23SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
ARM organization
-
24SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
3-Stage Pipeline ARM OrganizationRegister Bank 2 read ports, 1
write ports,
access any register 1 additional read port, 1
additional write port for r15 (PC)Barrel Shifter Shift or rotate
the operand by
any number of bitsALUAddress register and incrementerData
Registers Hold data passing to and from
memoryInstruction Decoder and Control
multiply
data out register
instruction
decode
&
control
incrementer
registerbank
address register
barrelshifter
A[31:0]
D[31:0]
data in register
ALU
control
PC
PC
ALU bus
A bus
B bus
register
-
25SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
3-Stage Pipeline (1/2)
Fetch The instruction is fetched from memory and placed in the
instruction
pipelineDecode The instruction is decoded and the datapath
control signals prepared
for the next cycleExecute The register bank is read, an operand
shifted, the ALU result
generated and written back into destination register
-
26SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
3-Stage Pipeline (2/2)At any time slice, 3 different
instructions may occupy each of these stages, so the hardware in
each stage has to be capable of independent operationsWhen the
processor is executing data processing instructions , the latency =
3 cycles and the throughput = 1 instruction/cycle
-
27SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Data Processing Instruction
address register
increment
registersRd
Rn
PC
as ins.
as instruction
mult
data out data in i. pipe
[7:0]
(b) register - immediate operations
address register
increment
registersRd
Rn
PC
Rm
as ins.
as instruction
mult
data out data in i. pipe
(a) register - register operations
All operations take place in a single clock cycle
-
28SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Data Transfer Instructions
Computes a memory address similar to a data processing
instructionLoad instruction follow a similar pattern except that
the data from memory only gets as far as the data in register on
the 2nd cycle and a 3rd cycle is needed to transfer the data from
there to the destination register
address register
increment
registersRn
PC
lsl #0
= A / A + B / A - B
mult
data out data in i. pipe
[11:0]
(a) 1st cycle - compute address
address register
increment
registersRn
Rd
shifter
= A + B / A - B
mult
PC
byte? data in i. pipe
(b) 2nd cycle - store data & auto-index
-
29SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Branch Instructionsaddress register
increment
registersR14
PC
shifter
= A
mult
data out data in i. pipe
(b) 2nd cycle - save return address
address register
increment
registersPC
lsl #2
= A + B
mult
data out data in i. pipe
[23:0]
(a) 1st cycle - compute branch target
The third cycle, which is required to complete the pipeline
refilling, is also used to mark the small correction to the value
stored in the link register in order that is points directly at the
instruction which follows the branch
-
30SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Multi-cycle Instruction
Memory access (fetch, data transfer) in every cycleDatapath used
in every cycle (execute, address calculation, data transfer)Decode
logic generates the control signals for the data path use in next
cycle (decode, address calculation)
-
31SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Branch Pipeline Example
Breaking the pipelineNote that the core is executing in the ARM
state
-
32SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
5-Stage Pipeline ARM OrganizationTprog = Ninst * CPI / fclk
Tprog: the time that execute a given program Ninst: the number of
ARM instructions executed in the
program => compiler dependent CPI: average number of clock
cycles per instructions =>
hazard causes pipeline stalls fclk: frequency
Separate instruction and data memories => 5 stage
pipelineUsed in ARM9TDMI
-
33SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
5-Stage Pipeline Organization (1/2)Fetch The instruction is
fetched
from memory and placed in the instruction pipeline
Decode The instruction is decoded
and register operands readfrom the register files. There are 3
operand read ports in the register file so most ARM instructions
can source all their operands in one cycle
Execute An operand is shifted and the
ALU result generated. If the instruction is a load or store, the
memory address is computed in the ALU
I-cache
rot/sgn ex
+4
byte repl.
ALU
I decode
register read
D-cache
fetch
instructiondecode
execute
buffer/data
write-back
forwardingpaths
immediatefields
nextpc
regshift
load/storeaddress
LDR pc
SUBS pc
post-index
pre-index
LDM/STM
register write
r15
pc + 8
pc + 4
+4
mux
shift
mul
B, BLMOV pc
-
34SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
5-Stage Pipeline Organization (2/2)Buffer/Data Data memory is
accessed if
required. Otherwise the ALU result is simply buffered for one
cycle
Write back The result generated by the
instruction are written backto the register file, including any
data loaded from memory
I-cache
rot/sgn ex
+4
byte repl.
ALU
I decode
register read
D-cache
fetch
instructiondecode
execute
buffer/data
write-back
forwardingpaths
immediatefields
nextpc
regshift
load/storeaddress
LDR pc
SUBS pc
post-index
pre-index
LDM/STM
register write
r15
pc + 8
pc + 4
+4
mux
shift
mul
B, BLMOV pc
-
35SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Pipeline HazardsThere are situations, called hazards, that
prevent the next instruction in the instruction stream from being
executing during its designated clock cycle. Hazards reduce the
performance from the ideal speedup gained by pipelining. There are
three classes of hazards: Structural Hazards: They arise from
resource conflicts
when the hardware cannot support all possible combinations of
instructions in simultaneous overlapped execution.
Data Hazards: They arise when an instruction depends on the
result of a previous instruction in a way that is exposed by the
overlapping of instructions in the pipeline.
Control Hazards: They arise from the pipelining of branches and
other instructions that change the PC.
-
36SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Structural HazardsWhen a machine is pipelined, the overlapped
execution of instructions requires pipelining of functional units
and duplication of resources to allow all possible combinations of
instructions in the pipeline.If some combination of instructions
cannot be accommodated because of a resource conflict, the machine
is said to have a structural hazard.
-
37SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
ExampleA machine has shared a single-memory pipeline for data
and instructions. As a result, when an instruction contains a
data-memory reference (load), it will conflict with the instruction
reference for a later instruction (instr 3):
Clock cycle numberinstr 1 2 3 4 5 6 7 8load IF ID EXE MEM
WBInstr 1 IF ID EXE MEM WBInstr 2 IF ID EXE MEM WBInstr 3 IF ID EXE
MEM WB
-
38SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Solution (1/2)To resolve this, we stall the pipeline for one
clock cycle when a data-memory access occurs. The effect of the
stall is actually to occupy the resources for that instruction
slot. The following table shows how the stalls are actually
implemented.
Clock cycle numberinstr 1 2 3 4 5 6 7 8 9load IF ID EXE MEM
WB
WB
Instr 1 IF ID EXE MEM WBInstr 2 IF ID EXE MEM WBInstr 3 stall IF
ID EXE MEM
-
39SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Solution (2/2)Another solution is to use separate instruction
and data memoriesARM used Harvard architecture, so we do not have
this hazard
-
40SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Data HazardsData hazards occur when the pipeline changes the
order of read/write accesses to operands so that the order differs
from the order seen by sequentially executing instructions on the
unpipelined machine.
Clock cycle number1 2 3 4 5 6 7 8 9
ADD R1,R2,R3R4,R5,R1R6,R1,R7R8,R1,R9
IF ID EXE MEM WB
R10,R11,R1
SUB IF IDsub EXE MEM WB
XOR IF IDxor EXE MEM
AND IF IDand EXE MEM WBOR IF IDor EXE MEM WB
WB
-
41SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
ForwardingThe problem with data hazards, introduced by this
sequence of instructions can be solved with a simple hardware
technique called forwarding.
Clock cycle number1 2 3 4 5 6 7
ADD R1,R2,R3R4,R5,R1R6,R1,R7
IF ID EXE MEM WBSUB IF IDsub EXE MEM WBAND IF IDand EXE MEM
WB
-
42SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Forward Data
Clock cycle number1 2 3 4 5 6 7
ADD R1,R2,R3R4,R5,R1R6,R1,R7
IF ID EXEadd MEMadd WBSUB IF ID EXEsub MEM WBAND IF ID EXEand
MEM WB
The first forwarding is for value of R1 from EXEadd to EXEsub.
The second forwarding is also for value of R1 from MEMaddto
EXEand.This code now can be executed without stalls.Forwarding can
be generalized to include passing the result directly to the
functional unit that requires it A result is forwarded from the
output of one unit to the input of
another, rather than just from the result of a unit to the input
of the same unit.
-
43SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Forwarding ArchitectureForwarding works as follows: The ALU
result from the
EXE/MEM register is always fed back to the ALU input
latches.
If the forwarding hardware detects that the previous ALU
operation has written the register corresponding to the source for
the current ALU operation, control logic selects the forwarded
result as the ALU input rather than the value read from the
register file.
I-cache
rot/sgn ex
+4
byte repl.
ALU
I decode
register read
D-cache
fetch
instructiondecode
execute
buffer/data
write-back
forwardingpaths
immediatefields
nextpc
regshift
load/storeaddress
LDR pc
SUBS pc
post-index
pre-index
LDM/STM
register write
r15
pc + 8
pc + 4
+4
mux
shift
mul
B, BLMOV pc
forwarding paths
-
44SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Without Forward
Clock cycle number1 2 3 4 5 6 7 8 9
WBWBMEM
R1,R2,R3R4,R5,R1R6,R1,R7
ADD IF ID EXE MEM WBSUB IF stall stall IDsub EXE MEMAND stall
stall IF IDand EXE
-
45SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Data ForwardingData dependency arises when an instruction needs
to use the result of one of its predecessors before the result has
returned to the register file => pipeline hazardsForwarding
paths allow results to be passed between stages as soon as they are
available5-stage pipeline requires each of the three source
operands to be forwarded from any of the intermediate result
registersStill one load stallLDR rN, []ADD r2,r1,rN ;use rN
immediately
One stall Compiler rescheduling
-
46SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Stalls are required
1 2 3 4 5 6 7 8R1,@(R2)
OR R8,R1,R9 IF ID EXE MEM WB
R4,R1,R5R6,R1,R7
LDR IF ID EXE MEM WBSUB IF ID EXEsub MEM WBAND IF ID EXEand MEM
WB
The load instruction has a delay or latency that cannot be
eliminated by forwarding alone.
-
47SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
The Pipeline with one Stall
1 2 3 4 5 6 7 8 9
WBWB
OR R8,R1,R9 stall IF ID EXE MEM
R1,@(R2)R4,R1,R5R6,R1,R7
LDR IF ID EXE MEM WBSUB IF ID stall EXEsub MEM WBAND IF stall ID
EXE MEM
The only necessary forwarding is done for R1 from MEM
toEXEsub.
-
48SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
LDR Interlock
In this example, it takes 7 clock cycles to execute 6
instructions, CPI of 1.2The LDR instruction immediately followed by
a data operation using the same register cause an interlock
-
49SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Optimal Pipelining
In this example, it takes 6 clock cycles to execute 6
instructions, CPI of 1The LDR instruction does not cause the
pipeline to interlock
-
50SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
LDM Interlock (1/2)
In this example, it takes 8 clock cycles to execute 5
instructions, CPI of 1.6During the LDM there are parallel memory
and write back cycles
-
51SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
LDM Interlock (2/2)
In this example, it takes 9 clock cycles to execute 5
instructions, CPI of 1.8The SUB incurs a further cycle of interlock
due to use the highest specified register in the LDM
instruction
-
52SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Control hazards (1/2)Control hazards can cause a greater
performance loss for ARM pipeline than data hazards.When a branch
is executed, it may or may not change the PC (program counter) to
something other than its current value plus 4.The simplest method
of dealing with branches is to stall the pipeline as soon as the
branch is detected until we reach the EXE stage.
Branch IF ID EXE MEM WBMEM WBEXE WBMEM
Branch successor IF (stall) Stall IF ID EXEBranch successor+1 IF
ID
-
53SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Control hazards (2/2)The number of clock cycles can be reduced
by two steps Find our whether the branch is taken or not taken
earlier
in the pipeline Compute the taken PC (i.e., the address of the
branch
target) earlierWe will discuss branch prediction schemes
-
54SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Branch predictionBranch prediction is to predict the branch as
no taken, simply allowing the hardware to continue as if the branch
were not executed.Care must be taken not to change the machine
state until the branch outcome is definitely known.
-
55SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Predict Not TakenThe pipeline with this scheme implemented
behaves as shown below:
Untaken Branch Instr IF ID EXE MEM WB
WBInstr i+1 IF ID EXE MEM WBInstr I+2 IF ID EXE MEM
Taken Branch Instr IF ID EXE MEM WB
WBBranch target+1 IF ID EXE MEM WB
Instr i+1 IF idle idle idle idleBranch target IF ID EXE MEM
-
56SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
Predict TakenAn alternative scheme is to predict the branch as
taken.
ARM employs a static branch prediction mechanism Conditional
branches that branch backwards are
predicted to be taken Conditional branches that branch forwards
are predicted
not to be taken
-
57SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
SummaryInstruction set 32 bit ARM instruction 16 bit Thumb
instruction
ARM/Thumb interworkingARM organization 3-stage pipeline
Fetch/Decode/Execute
5-stage pipeline Fetch/Decode/Execute/Buffer/Write Back Pipeline
hazards
Structure hazard Data hazard Control hazard
-
58SoC Consortium Course MaterialSoC Design Laboratory
03/10/2004
References[1]
http://twins.ee.nctu.edu.tw/courses/ip_core_02/index.html[2] ARM
System-on-Chip Architecture, Second Edition,
edited by S.Furber, Addison Wesley Longman: ISBN
0-201-67519-6.
[3] Architecture Reference Manual, Second Edition, edited by D.
Seal, Addison Wesley Longman: ISBN 0-201-73719-1.
[4] www.arm.com
ARM Processor Architecture (I)OutlineThumb instruction
setThumb-ARM DifferenceRegisters Access in ThumbThumb Accessible
RegistersBranches (1/2)Branches (2/2)Data Processing
InstructionLoad or Store RegisterBlock Data TransfersThumb
Instruction Entry and ExitMiscellaneousARM Thumb-2 core
technologyThumb Instruction Set (1/3)Thumb Instruction Set
(2/3)Thumb Instruction Set (3/3)Thumb Instruction FormatARM/Thumb
interworkingThe Need for InterworkingARM/Thumb
InterworkingExampleARM organization3-Stage Pipeline ARM
Organization3-Stage Pipeline (1/2)3-Stage Pipeline (2/2)Data
Processing InstructionData Transfer InstructionsBranch
InstructionsMulti-cycle InstructionBranch Pipeline Example5-Stage
Pipeline ARM Organization5-Stage Pipeline Organization (1/2)5-Stage
Pipeline Organization (2/2)Pipeline HazardsStructural
HazardsExampleSolution (1/2)Solution (2/2)Data
HazardsForwardingForward DataForwarding ArchitectureWithout
ForwardData ForwardingStalls are requiredThe Pipeline with one
StallLDR InterlockOptimal PipeliningLDM Interlock (1/2)LDM
Interlock (2/2)Control hazards (1/2)Control hazards (2/2)Branch
predictionPredict Not TakenPredict TakenSummaryReferences