Top Banner

of 15

Optimized Hybrid Scaled Neural Analog Predictor

Jan 08, 2016

ReportDownload

Documents

thina

Optimized Hybrid Scaled Neural Analog Predictor. Daniel A. Jiménez Department of Computer Science The University of Texas at San Antonio. Branch Prediction with Perceptrons. Branch Prediction with Perceptrons cont. SNP/SNAP [St. Amant et al. 2008]. - PowerPoint PPT Presentation

  • Optimized Hybrid Scaled Neural Analog Predictor

    Daniel A. Jimnez

    Department of Computer ScienceThe University of Texas at San Antonio

  • Branch Prediction with Perceptrons*

  • Branch Prediction with Perceptrons cont.*

  • *SNP/SNAP [St. Amant et al. 2008]A version of piecewise linear neural prediction [Jimnez 2005]Based on perceptron predictionSNAP is a mixed digital/analog version of SNPUses analog circuit for costly dot-product operationEnables interesting tricks e.g. scaling

  • *Weight ScalingScaling weights by coefficientsDifferent history positions have different importance!

  • *The Algorithm: Parameters and VariablesC array of scaling coefficientsh the global history lengthH a global history shift registerA a global array of previous branch addressesW an n (GHL + 1) array of small integers a threshold to decide when to train

  • *The Algorithm: Making a PredictionWeights are selected based on the current branch and the ith most recent branch

  • The Algorithm: TrainingIf the prediction is wrong or |output| thenFor the ith correlating weight used to predict this branch:Increment it if the branch outcome = outcome of ith in historyDecrement it otherwiseIncrement the bias weight if branch is takenDecrement otherwise*

  • SNP/SNAP Datapath*

  • *TricksUse alloyed [Skadron 2000] global and per-branch historySeparate table of local perceptronsOutput from this stage multiplied by empircally determined coefficientTraining coefficients vector(s)Multiple vectors initialized to f(i) = 1 / (A + B i) Minimum coefficient value determined empircallyIndexed by branch PCEach vector trained with perceptron-like learning on-line

  • Tricks(2)Branch cacheHighly associative cache with entries for branch informationEach entry contains:A partial tag for this branch PCThe bias weight for this branchAn ever taken bitA never taken bitThe ever/never bits avoid needless use of weight resourcesThe bias weight is protected from destructive interferenceLRU replacement>99% hit rate*

  • Tricks(3)Hybrid predictorWhen perceptron output is below some threshold:If a 2-bit counter gshare predictor has high confidence, use itElse use a 1-bit counter PAs predictorMultiple s indexed by branch PCEach trained adaptively [Seznec 2005]Ragged arrayNot all rows of the matrix are the same size

    *

  • Benefit of Tricks*Graph shows effect of one trick in isolationTraining coefficients yields most benefit

  • *ReferencesJimnez & Lin, HPCA 2001 (perceptron predictor)Jimnez & Lin, TOCS 2002 (global/local perceptron)Jimnez ISCA 2005 (piecewise linear branch predictor)Skadron, Martonosi & Clark, PACT 2000 (alloyed history)Seznec 2005 (adaptively trained threshold)St. Amant, Jimnez & Burger, MICRO 2008 (SNP/SNAP)McFarling 1993, gshareYeh & Patt 1991, PAs

  • *The End

    *07/16/96*##*The major improvement of the SNP algorithm over previous neural prediction algorithms is that the weights are scaled by coefficients. This improvement is made possible by analog techniques that Ill describe in the next part the talk.We scale each weight by a coefficient because we found that different history positions have different importance. On the x-axis, is history position, where 0 corresponds to the bias weight, 1 is the previous branch, and this graph shows history positions out to 129.The y-axis shows the correlation coefficients between a branch in a particular history position and the outcome of the current branch.We found these correlation coefficients by looking at each history position separately and predicting branches based on only the weight in that particular history position. What we see in this graph is that more recent branches are more highly correlated with the outcome of the current branch.The weights themselves partially capture this correlation, but older braches sometimes introduce noise into the prediction result. Scaling weights based on their history position helps eliminate this noise.

    //Removal of ahead pipelining://Current summation instead of digital addition, no longer need ahead pipeliningEach history bit does not contribute equallyMaybe hint at analog stuff:Transistor sizing in the digital-to-analog converters to do multiplication by coefficients