Top Banner
Neural Methods for Dynamic Branch Prediction Daniel A. Jiménez Calvin Lin Dept. of Computer Science Dept. of Computer Science Rutgers University Univ. of Texas Austin Presented by: Rohit Mittal
48

Neural Methods for Dynamic Branch Prediction

Jan 23, 2016

Download

Documents

jabari

Neural Methods for Dynamic Branch Prediction. Daniel A. Jiménez Calvin Lin Dept. of Computer Science Dept. of Computer Science Rutgers University Univ. of Texas Austin Presented by: Rohit Mittal. Overview. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Neural Methods for  Dynamic Branch Prediction

Neural Methods for Dynamic Branch Prediction

Daniel A. Jiménez Calvin LinDept. of Computer Science Dept. of Computer Science

Rutgers University Univ. of Texas Austin

Presented by:Rohit Mittal

Page 2: Neural Methods for  Dynamic Branch Prediction

2

Overview

Branch prediction background

Applying machine learning to branch prediction

Results and analysis

Future work and conclusions

Page 3: Neural Methods for  Dynamic Branch Prediction

3

Branch Prediction Background

Page 4: Neural Methods for  Dynamic Branch Prediction

4

Outline

What are branches? Reducing branch penalties Branch prediction Why is branch prediction necessary? Branch prediction basics Issues which affect accurate branch prediction Examples of real predictors

Page 5: Neural Methods for  Dynamic Branch Prediction

5

Branches

Instructions which can alter the flow of instruction execution in a program

Conditional Unconditional

Direct if - then- else for loops (bez, bnez, etc)

procedure calls (jal) goto (j)

Indirect return (jr) virtual function lookup

Page 6: Neural Methods for  Dynamic Branch Prediction

6

The Context

How can we exploit program behavior to make it go

faster?

Remove control dependences

Increase instruction-level parallelism

Page 7: Neural Methods for  Dynamic Branch Prediction

7

An Example

The inner loop of this code executes two statements each time through the loop.

int foo (int w[], bool v[], int n) {int sum = 0;for (int i=0; i<n; i++) {

if (v[i])sum += w[i];

elsesum += ~w[i];

}return sum;

}

Page 8: Neural Methods for  Dynamic Branch Prediction

8

An Example continued

This C++ code computes the same thing with three statements in the loop.

This version is 55% faster on a Pentium 4. Previous version had many mispredicted branch instructions.

int foo2 (int w[], bool v[], int n) {int sum = 0;for (int i=0; i<n; i++) {

int a = w[i];int b = - (int) v[i];sum += ~(a ^ b);

}return sum;

}

Page 9: Neural Methods for  Dynamic Branch Prediction

9

Branch Prediction

To speed up the process, pipelining overlaps execution of multiple instructions, exploiting parallelism between instructions.

Conditional branches create a problem for pipelining: the next instruction can't be fetched until the branch has executed, several stages later.

A branch predictor allows the processor to speculatively fetch and execute instructions down the predicted path.

Branch predictors must be highly accurate to avoid mispredictions!

Page 10: Neural Methods for  Dynamic Branch Prediction

10

Why good Branch Prediction is necessary..

Branches are frequent - 15-25% Today’s pipelines are deeper and wider

Higher performance penalty for stalling High Misprediction Penalty

A lot of cycles can be wasted!!!

Page 11: Neural Methods for  Dynamic Branch Prediction

11

Branch Predictors Must Improve

The cost of a misprediction is proportional to pipeline depth As pipelines deepen, we need more accurate branch predictors

Pentium 4 pipeline has 20 stages Future pipelines will have > 32 stages

Simulations with SimpleScalar/Alpha

Deeper pipelines allow higher clock rates by decreasing the delay of each pipeline stage

Decreasing misprediction rate from 9% to 4% results in 31% speedup for 32 stage pipeline

Page 12: Neural Methods for  Dynamic Branch Prediction

12

Branch Prediction

Predicting the outcome of a branch Direction:

Taken / Not Taken Direction predictors

Target Address PC+offset (Taken)/ PC+4 (Not Taken) Target address predictors

Branch Target Address Cache (BTAC) or Branch Target Buffer (BTB)

Page 13: Neural Methods for  Dynamic Branch Prediction

13

Why do we need branch prediction?

Branch prediction Increases the number of instructions available for the

scheduler to issue. Increases instruction level parallelism (ILP)

Allows useful work to be completed while waiting for the branch to resolve

Page 14: Neural Methods for  Dynamic Branch Prediction

14

Branch Prediction Strategies

Static Decided before runtime Examples:

Always-Not Taken Always-Taken Backwards Taken, Forward Not Taken (BTFNT) Profile-driven prediction

Dynamic Prediction decisions may change during the execution of

the program

Page 15: Neural Methods for  Dynamic Branch Prediction

Dynamic Branch Prediction

Performance = ƒ(accuracy, cost of misprediction) Branch History Table (BHT) is simplest

Also called a branch-prediction buffer Lower bits of branch address index table of 1-bit values Says whether or not branch taken last time If branch was taken last time, then take again Initially, bits are set to predict that all branches are taken

Page 16: Neural Methods for  Dynamic Branch Prediction

16

Problems : Two branches can have the same low-order bits.

In a loop, 1-bit BHT will cause two mispredictions:End of loop case, when it exits instead of looping as beforeFirst time through loop on next time through code, when it predicts exit instead of looping

LOOP:LOAD R1, 100(R2)MUL R6, R6, R1SUBI R2, R2, #4BNEZ R2, LOOP

1-bit Branch History Table

Page 17: Neural Methods for  Dynamic Branch Prediction

17

2-bit Predictor

Predict Taken Predict Taken

Predict Not Taken

Predict Not Taken

T

T

T

T

NT

NT NT

NT

•This idea can be extended to n-bit saturating counters

–Increment counter when branch is taken

–Decrement counter when branch is not taken

–If counter <= 2n-1, then predict the branch is taken; else not taken.

Solution : 2-bit predictor scheme where change prediction only if mispredict twice in a row

Page 18: Neural Methods for  Dynamic Branch Prediction

18

Correlating Branches Often the behavior of one branch is correlated with the

behavior of other branches. Example C code

if (aa == 2) B1

aa = 0;

if (bb == 2) B2

bb = 0;

if (aa != bb) B3

cc = 4; If the first two branches are not taken, the third one will be.

B3 can be predicted with 100% accuracy based on the outcomes of B1 and B2

Page 19: Neural Methods for  Dynamic Branch Prediction

19

Correlating Branches – contd.

Hypothesis: recent branches are correlated; that is, behavior of recently executed branches affects prediction of current branch

Idea: record m most recently executed branches as taken or not taken, and use that pattern to select the proper branch history table

In general, (m,n) predictor means record last m branches to select between 2m history tables each with n-bit counters Old 2-bit BHT is then a (0,2) predictor

Page 20: Neural Methods for  Dynamic Branch Prediction

20

Need Address at same time as Prediction Branch Target Buffer (BTB): Address of branch index to get

prediction AND branch address (if taken) Note: must check for branch match now, since can’t use wrong

branch address

Return instruction addresses predicted with stack

Branch PC Predicted PC

Predict taken or not taken

PC

of in

stru

ctio

nFETC

H

=?

Page 21: Neural Methods for  Dynamic Branch Prediction

21

Branch Target Buffer

A branch-target buffer or branch-target cache stores the predicted address of branches that are predicted to be taken.

Values not in the buffer are predicted to be not taken. The branch-target buffer is accessed during the IF

stage, based on the k low order bits of the branch address.

If the branch-target is in the buffer and is predicted correctly, the one cycle stall is eliminated.

Page 22: Neural Methods for  Dynamic Branch Prediction

22

Branch Predictor Accuracy

Larger tables and smarter organizations yield better accuracy Longer histories provide more context for finding correlations

Table size is exponential in history length The cost is increased access delay and chip area

Page 23: Neural Methods for  Dynamic Branch Prediction

23

Alpha 21264

8-stage pipeline, mispredict penalty 7 cycle 64 KB, 2-way instruction cache with line and way prediction

bits (Fetch) Each 4-instruction fetch block contains a prediction for the next fetch

block

Hybrid predictor (Fetch) 12-bit GAg (4K-entry PHT, 2 bit counters) 10-bit PAg (1K-entry BHT, 1K-entry PHT, 3-bit counters)

Page 24: Neural Methods for  Dynamic Branch Prediction

24

Ultra Sparc III

14-stage pipeline, branch prediction accessed in instruction fetch stages 2-3

16K-entry 2-bit counter Gshare predictor Bimodal predictor which XOR’s PC bits with global

history register (except 3 lower order bits) to reduce aliasing

Miss queue Halves mispredict penalty by providing instructions for

immediate use

Page 25: Neural Methods for  Dynamic Branch Prediction

25

Pentium III

Dynamic branch prediction 512-entry BTB predicts direction and target, 4-bit

history used with PC to derive direction Static branch predictor for BTB misses Branch Penalties:

Not Taken: no penalty Correctly predicted taken: 1 cycle Mispredicted: at least 9 cycles, as many as 26, average

10-15 cycles

Page 26: Neural Methods for  Dynamic Branch Prediction

26

AMD Athlon K7

10-stage integer, 15-stage fp pipeline, predictor accessed in fetch

2K-entry bimodal predictor, 2K-entry BTB Branch Penalties:

Correct Predict Taken: 1 cycle Mispredict penalty: at least 10 cycles

Page 27: Neural Methods for  Dynamic Branch Prediction

27

Applying Machine Learning to Branch Prediction

Page 28: Neural Methods for  Dynamic Branch Prediction

28

Branch Prediction is a Machine Learning Problem

So why not apply a machine learning algorithm? Replace 2-bit counters with a more accurate predictor

Tight constraints on prediction mechanism

Must be fast and small enough to work as a component of a

microprocessor

Artificial neural networks Simple model of neural networks in brain cells

Learn to recognize and classify patterns

Most neural nets are slow and complex relative to tables

For branch prediction, we need a small and fast neural method

Page 29: Neural Methods for  Dynamic Branch Prediction

29

A Neural Method for Branch Prediction

Several neural methods were investigated

Most were too slow, too big, or not accurate enough

The perceptron [Rosenblatt `62, Block `62]

Very high accuracy for branch prediction

Prediction and update are quick, relative to other neural methods

Sound theoretical foundation; perceptron convergence theorem

Proven to work well for many classification problems

Page 30: Neural Methods for  Dynamic Branch Prediction

30

Branch-Predicting Perceptron

Inputs (x’s) are from branch history register Weights (w’s) are small integers learned by on-line training Output (y) gives prediction; dot product of x’s and w’s Training finds correlations between history and outcome w0 – bias, independent of the history

Page 31: Neural Methods for  Dynamic Branch Prediction

31

Training Algorithm

Page 32: Neural Methods for  Dynamic Branch Prediction

32

Training Perceptrons

W’ – i.e. new weights vector, might be a worse set of weights for any other training example. It is not evident that this is a useful algorithm.

Perception Convergence Theorem:

If any set of weights exist that correctly classify a finite set of training examples, then perceptron learning will come up with a (possibly different) set of weights that also correctly classifies all examples after a finite number of change steps, for a finite separable set of training examples.

Page 33: Neural Methods for  Dynamic Branch Prediction

33

Linear Separability

A limitation of perceptrons is that they are only capable of learning linearly separable functions

A boolean function over variables xi..n is linearly separable iff there exist values for wi..n such that all the true instances can be separated from all the false instances by a hyperplane defined by the solution of: n

w0 + ∑ xi wi = 0

i=1

• i.e. If n = 2, the hyperplane is a line.

Page 34: Neural Methods for  Dynamic Branch Prediction

34

Linear Separability – contd.

Example: a perceptron can learn the logical AND for two inputs but not the XOR.

A perceptron can still give good predictions for inseparable functions but will not achieve 100% accuracy. In contrast a two level PHT (pattern history table) scheme like gshare can learn any boolean function if given enough time.

Page 35: Neural Methods for  Dynamic Branch Prediction

35

1. The Branch address is hashed into the table of perceptrons

2. The ith perceptron is fetched, into a vector register, P1..n of weights.

3. The value of y is computed as the dot product of P and the global history register

4. The branch is predicted not taken if y is negative, or taken otherwise

5. Once this branch is resolved, the outcome is used by the training algorithm to update P

6. P is written back to the ith entry in the table

Putting it all together – perceptron based predictor

Page 36: Neural Methods for  Dynamic Branch Prediction

36

Organization of the Perceptron Predictor

Keeps a table of perceptrons, indexed by branch address Inputs are from branch history register Predict taken if output 0, otherwise predict not taken

Key intuition: table size isn't exponential in history length, so we can consider much longer histories

Page 37: Neural Methods for  Dynamic Branch Prediction

37

Results and Analysis for the Perceptron Predictor

Page 38: Neural Methods for  Dynamic Branch Prediction

38

Results: Predictor Accuracy

Perceptron outperforms competitive hybrid predictor by 36% at ~4KB; 1.71% vs. 2.66%

Page 39: Neural Methods for  Dynamic Branch Prediction

39

Results: Large Hardware Budgets

Multi-component hybrid was the most accurate fully dynamic predictor known in the literature [Evers 2000]

Perceptron predictor is even more accurate

Page 40: Neural Methods for  Dynamic Branch Prediction

40

Results: IPC with high clock rate

Pentium 4-like: 20 cycle misprediction penalty, 1.76 GHz 15.8% higher IPC than gshare, 5.7% higher than hybrid

Page 41: Neural Methods for  Dynamic Branch Prediction

41

Analysis: History Length

The fixed-length path branch predictor can also use long histories [Stark, Evers & Patt `98]

Page 42: Neural Methods for  Dynamic Branch Prediction

42

Analysis: Training Times

Perceptron “warms up’’ faster

Page 43: Neural Methods for  Dynamic Branch Prediction

43

Future Work and Conclusions

Page 44: Neural Methods for  Dynamic Branch Prediction

44

Future Work with Perceptron Predictor

Let's make the best predictor even better

Better representation

Better training algorithm

Latency is a problem

How can we eliminate the latency of the perceptron

predictor?

Page 45: Neural Methods for  Dynamic Branch Prediction

45

Future Work with Perceptron Predictor

Value prediction

Predict which set of values is likely to be the result of a

load operation to mitigate memory latency

Indirect branch prediction

Virtual dispatch

Switch statements in C

Page 46: Neural Methods for  Dynamic Branch Prediction

46

Future Work Characterizing Predictability

Branch predictability, value predictability

How can we characterize algorithms in terms of their predictability?

Given an algorithm, how can we transform it so that its branches and

values are easier to predict?

How much predictability is inherent in the algorithm, and how much is

an artifact of the program structure?

How can we compare different algorithms' predictability?

Page 47: Neural Methods for  Dynamic Branch Prediction

47

Conclusions

Neural predictors can improve performance for deeply

pipelined microprocessors

Perceptron learning is well-suited for microarchitectural

implementation

There is still a lot of work left to be done on the perceptron

predictor in particular and microarchitectural prediction in

general

Page 48: Neural Methods for  Dynamic Branch Prediction

48

The End