Trace Caches

Page 1

Trace Caches

Michele Co

CS 451

Page 2

Motivation

High performance superscalar processorsHigh instruction throughputExploit ILP

–Wider dispatch and issue pathsExecution units designed for high parallelism

–Many functional units

–Large issue buffers

–Many physical registers Fetch bandwidth becomes performance bottleneck

Page 3

Fetch Performance Limiters

Cache hit rate Branch prediction accuracy Branch throughput

Need to predict more than one branch per cycle

Non-contiguous instruction alignment Fetch unit latency

Page 4

Problems with Traditional Instruction Cache

Contain instructions in compiled orderWorks well for sequential code with little branching, or code

with large basic blocks

Page 5

Suggested Solutions

Multiple branch target address prediction Branch address cache

(1993, Yeh, Marr, Patt)

–Provides quick access to multiple target addresses

–Disadvantages• Complex alignment

network, additional latency

Page 6

Suggested Solutions (cont’d)

Collapsing buffer Multiple accesses to btb

(1995, Conte, Mills, Menezes, Patel)–Allows fetching non-

adjacent cache lines–Disadvantages

• Bank conflicts• Poor scalability for

interblock branches• Significant logic added

before and after instruction cache

Fill unit Caches RISC-like

instructions derived from CISC instruction stream

(1988, Melvin, Shebanow, Patt)

Page 7

Problems with Prior Approaches

Need to generate pointers for all noncontiguous instruction blocks BEFORE fetching can beginExtra stages, additional latencyComplex alignment network necessary

Multiple simultaneous access to instruction cacheMultiporting is expensive

SequencingAdditional stages, additional latency

Page 8

Potential Solution – Trace Cache Rotenberg, Bennett, Smith (1996) Advantages

Caches dynamic instruction sequences

–Fetches past multiple branchesNo additional fetch unit latency

DisadvantagesRedundant instruction storage

–Between trace cache and instruction cache

–Within trace cache

Page 9

Trace Cache Details

TraceSequence of instructions potentially containing branches and

their targetsTerminate on branches with indeterminate number of targets

–Returns, indirect jumps, traps Trace identifier

Start address + branch outcomes Trace cache line

Valid bitTagBranch flagsBranch maskTrace fall-through addressTrace target address

Page 10

Page 11

Next Trace Prediction (NTP)

History register Correlating table

Complex history indexing

Secondary Table Indexed by most recently

committed trace ID

Index generating function

Page 12

NTP Index Generation

Page 13

Return History Stack

Page 14

Trace Cache vs. Existing Techniques

Page 15

Trace Cache Optimizations

PerformancePartial matching [Friendly, Patel, Patt (1997)] Inactive issue [Friendly, Patel, Patt (1997)]Trace preconstruction [Jacobson, Smith (2000)]

PowerSequential access trace cache [Hu, et al., (2002)]Dynamic direction prediction based trace cache [Hu, et al.,

(2003)]Micro-operation cache [Solomon, et al., 2003]

Page 16

Trace Processors

Trace Processor Architecture Processing elements (PE)

–Trace-sized instruction buffer–Multiple dedicated functional units–Local register file–Copy of global register file

Use hierarchy to distribute execution resources

Addresses superscalar processor issues Complexity

–Simplified multiple branch prediction (next trace prediction)–Elimination of local dependence checking (local register file)–Decentralized instruction issue and result bypass logic

Architectural limitations–Reduced bandwidth pressure on global register file (local register

files)

Page 17

Trace Processor

Page 18

Trace Cache Variations

Block-based trace cache (BBTC)Black, Rychlik, Shen (1999)Less storage capacity needed

Page 19

Trace Table: BBTC Trace Prediction

Page 20

Block Cache

Page 21

Rename Table

Page 22

BBTC Optimization

Completion time multiple branch prediction (Rakvic, et al., 2000) Improvement over trace table predictions

Page 23

Tree-based Multiple Branch Prediction

Page 24

Tree-PHT

Page 25

Tree-PHT Update

Page 26

Trace Cache Variations (cont’d)

Software trace cacheRamirez, Larriba-Pey, Navarro, Torrellas (1999)Profile-directed code reordering to maximize sequentiality

–Convert taken branches to not-taken

–Move unused basic blocks out of execution path

–Inline frequent basic blocks

–Map most popular traces to reserved area of i-cache

Trace Caches

Documents