Top Banner
Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis
25

Combining Statistical and Symbolic Simulation

Dec 31, 2015

Download

Documents

hayden-dodson

Combining Statistical and Symbolic Simulation. Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis. Overview. HLS is a hybrid performance simulation Statistical + Symbolic Fast Accurate Flexible. Motivation. I-cache hit rate. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Combining Statistical and Symbolic Simulation

Combining Statistical and Symbolic Simulation

Mark Oskin

Fred Chong and Matthew FarrensDept. of Computer Science

University of California at Davis

Page 2: Combining Statistical and Symbolic Simulation

Overview

• HLS is a hybrid performance simulation– Statistical + Symbolic

• Fast

• Accurate

• Flexible

Page 3: Combining Statistical and Symbolic Simulation

Motivation

Branch prediction accuracy

0.74 0.76 0.78 0.80 0.82 0.84 0.86 0.88 0.90 0.92 0.94

IPC

0.80

0.85

0.90

0.95

1.00

1.05

1.10

1.15 I-cache hit rate

I-cache miss penaltyBranch miss-predictpenalty

Basic block size

Dispatch bandwidth

Page 4: Combining Statistical and Symbolic Simulation

Motivation

• Fast simulation– seconds instead of hours or days– Ideally is interactive

• Abstract simulation– simulate performance of unknown designs– application characteristics not applications

Page 5: Combining Statistical and Symbolic Simulation

Outline

• Simulation technologies and HLS

• From applications to profiles

• Validation

• Examples

• Issues

• Conclusion

Page 6: Combining Statistical and Symbolic Simulation

Design Flow with HLS

Cycle-by-Cycle

Simulation

HLS

Profile

Design Issue

Design Issue

Design Issue

Possible solution

EstimatePerformance

Page 7: Combining Statistical and Symbolic Simulation

Traditional Simulation Techniques

• Cycle-by-cycle (Simplescalar, SimOS,etc.)

+ accurate

– slow

• Native emulation/basic block models (Atom, Pixie)

+ fast, complex applications

– useful to a point (no low-level modifications)

Page 8: Combining Statistical and Symbolic Simulation

Statistical / Symbolic Execution

• HLS+ fast (near interactive)

+ accurate / – within regions

+ permits variation of low-level parameters

+ arbitrary design points / – use carefully

Page 9: Combining Statistical and Symbolic Simulation

HLS: A Superscalar Statistical and Symbolic Simulator

L2

Cac

he

L1

I-ca

che

L1

D-c

acheM

ain

Mem

ory

BranchPredictor

Fet

ch U

nit

Ou

t of

ord

erD

ispa

tch

Un

it

Ou

t of

ord

erC

ompl

etio

n U

nit

Ou

t of

ord

erE

xecu

tion

cor

e

Statistical Symbolic

Page 10: Combining Statistical and Symbolic Simulation

WorkflowCode

Binary

sim-stat

sim-outorderapp profile

Stat-binary

HLS

machine-profile

R10k

machine-configuration

Page 11: Combining Statistical and Symbolic Simulation

Machine Configurations

• Number of Functional units (I,F,[L,S],B)

• Functional unit pipeline depths

• Fetch, Dispatch and completion bandwidths

• Memory access latencies

• Mis-speculation penalties

Page 12: Combining Statistical and Symbolic Simulation

Profiles• Machine profile:

– cache hit rates => ()– branch prediction accuracy => ()

• Application profile:– basic block size => (,)– instruction mix (% of I,F,L,S,B)– dynamic instruction distance (histogram)

0

1020

3040

50

Integer FloatingPoint

Load Store Branch

Instruction TypePer

cen

t o

f to

tal D

ynam

ic

Dep

end

ence

Dis

tan

ce None1-19

20-100

Page 13: Combining Statistical and Symbolic Simulation

Statistical Binary

• 100 basic blocks

• Correlated:– random instruction mix– random assignment of dynamic instruction

distance– random distribution of cache and branch

behaviors

Page 14: Combining Statistical and Symbolic Simulation

Statistical Binary

load (l1 i-cache, l2 i-cache, l1 d-cache l2 d-cache, dependence 0)

integer (l1 i-cache, l2 i-cache, dependence 0, dependence 1)

integer (l1 i-cache, l2 i-cache, dependence 0, dependence 1)

branch (l1 i-cache, l2 i-cache, branch-predictor accr., dep 0, dep 1)

store (l1 i-cache, l2 i-cache, l1 d-cache l2 d-cache, dep 0, dep 1)

load (l1 i-cache, l2 i-cache, l1 d-cache l2 d-cache, dependence 0)

core functionalunit requirements

cache behaviorduring I-fetch cache behavior

during data access

dynamic instruction distancebranch predictor behavior

Page 15: Combining Statistical and Symbolic Simulation

HLS Instruction Fetch Stage

integer (...)

branch (...)

store (...)

load (...)

integer (...)

branch (...)

load (...)

integer (..)

Similar to conventional instruction fetch:

- has a PC- has a fetch window- interacts with caches- utilizes branch predictor- passes instructions to dispatch

Differences:

- caches and branch predictor are statistical models

Fetches symbolic instructions and interacts with a statisticalmemory system and branch predictor model.

Page 16: Combining Statistical and Symbolic Simulation

Validation - SimpleScalar vs. HLS

Brenchmark SimpleScalar IPC HLS IPC Errorperl 1.27 1.32 4.20%compress 1.18 1.25 5.50%gcc 0.92 0.96 3.90%go 0.94 1.01 6.80%ijpeg 1.67 1.73 3.90%li 1.62 1.5 7.20%m88ksim 1.16 1.14 1.50%vortex 0.87 0.83 5.10%

Page 17: Combining Statistical and Symbolic Simulation

Validation - R10k vs. HLS

Brenchmark R10K HLS IPC Errorperl 1.01 1.09 7.00%compress 0.7 0.69 2.60%gcc 0.93 0.96 3.80%go 0.9 0.98 0.90%ijpeg 1.45 1.4 4.00%li 0.85 0.9 6.00%m88ksim 1.15 1.15 0.10%vortex 0.83 0.82 1.00%

Page 18: Combining Statistical and Symbolic Simulation

1.61.5

1.41.3

1.2

1.1

1.0

0.9

0.8

0.7

0.6

Branch Prediction Accuracy

0.80 0.82 0.84 0.86 0.88 0.90 0.92 0.94 0.96 0.98 1.00

L1

Intr

uct

ion

Ca

che

Hit

Ra

te

0.80

0.82

0.84

0.86

0.88

0.90

0.92

0.94

0.96

0.98

1.002.0

1.91.81.71.6

1.5

1.5

1.4

1.4

1.3

1.3

1.2

1.2

1.1

1.1

1.0

1.00.9

0.90.8

0.80.7

0.70.6

0.6

0.5

0.5

HLS Multi-value Validation with SimpleScalar

HLSSimple-Scalar

(Perl)

Page 19: Combining Statistical and Symbolic Simulation

HLS Multi-Value Validation with SimpleScalar

HLSSimple-Scalar

(Xlisp)

L1 Instruction Cache Hit rate

0.80 0.85 0.90 0.95 1.00

L1 In

stru

ctio

n C

ache

Mis

s P

enal

ty

2

4

6

8

10

12

14

16

18

20

1.3

1.4

1.21.11.0

0.90.8

0.70.6

0.5

0.4

0.3

0.2

1.5

1.5

1.4

1.4

1.4

1.3

1.3

1.3

1.2

1.2

1.2

1.1

1.1

1.1

1.0

1.0

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.2

Page 20: Combining Statistical and Symbolic Simulation

Example use of HLS

Branch Prediction Accuracy

0.80 0.82 0.84 0.86 0.88 0.90 0.92 0.94 0.96 0.98 1.00

Bas

ic B

lock

Siz

e

10

20

30

40

50

1.3

1.3

1.3

1.2

1.2

1.2

1.1

1.11.01.00.9

0.90.80.80.7

0.70.6

An intuitive result:branch predictionaccuracy becomesless important (crossesfewer iso-IPC contourlines, as basic block sizeincrease).

(Perl)

Page 21: Combining Statistical and Symbolic Simulation

Example use of HLS

Basic Block Size

2 4 6 8 10 12 14 16 18 20

Dyn

amic

Ins

truc

tion

Dis

tanc

e

2

4

6

8

10

12

14

16

18

20

1.4

1.4

1.4

1.3

1.3

1.3

1.3

1.3

1.21.2

1.2

1.2

1.2

1.2

1.1

1.1

1.1

1.1

1.0

1.0

1.0

1.0

0.9

0.9

0.9

0.9

0.8

0.8

0.8

0.8

0.7

0.7

0.7

0.7

Another intuitive result: gains in IPCdue to basic block size are front-loaded

(Perl)

Trade-off betweenfront-end (fetch/dispatch)and back-end (ILP)processor performance

Page 22: Combining Statistical and Symbolic Simulation

Example use of HLS

% Value predicted instructions

0 1

Dyn

amic

Ins

truc

tion

Dis

tanc

e

2

4

6

8

10

12

14

16

18

20

1.2

1.2

1.21.1

This spaceintentionallyleft blank.

(Perl)

Page 23: Combining Statistical and Symbolic Simulation

Related work

• R. Carl and J.E. Smith. Modeling superscalar processors via statistical simulation - PAID Workshop - June 1998.

• N. Jouppi. The non-uniform distribution of instruction-level and machine parallelism and its effect on performance. - IEEE Trans. 1989.

• D. Noonburg and John Shen. Theoretical modeling of superscalar processor performance - MICRO27 - November 1994.

Page 24: Combining Statistical and Symbolic Simulation

Questions & Future Directions

• How important are different well-performing benchmarks anyway?– easily summarized– summaries are not precise => yet precise enough– Will the statistical+symbolic technique work for

poorly behaved applications?

• Will it extend to deeper pipelines and more real processors (i.e. Alpha, P6 architecture)?

Page 25: Combining Statistical and Symbolic Simulation

Conclusion

• HLS: Statistical + Symbolic Execution– Intuitive design space exploration

• Fast

• Accurate

– Flexible

• Validated against cycle-by-cycle and R10k• Future work: deeper pipelines, more hardware

validations, additional domains• source code at: http://arch.cs.ucdavis.edu/~oskin