Top Banner
2/15/20 06 "Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006 1 Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok Garg, and Michael Huang Department of Electrical & Computer Engineering University of Rochester
23

2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20061 Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20061 Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok.

2/15/2006 "Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006 1

Software-Hardware Cooperative Memory

Disambiguation

Ruke Huang, Alok Garg, and Michael Huang

Department of Electrical & Computer Engineering

University of Rochester

Page 2: 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20061 Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok.

2/15/2006

"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006 2

Motivation

Hiding long latencies Scaling up of many structures Complex, hard to design Consumes more energy Slower

Inefficiency in hardware Meticulously keep track of all instructions No prior knowledge of out-of-order execution Simply cross-compare all loads and stores

ROB size: 320SQ size: 48LQ size: 48

LQ Size

16%

Page 3: 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20061 Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok.

2/15/2006

"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006 3

Software Assistance

Global information Statically identify non-conflicting memory accesses Advantages

Reduced resource pressure Energy savings

Loads not requiring memory disambiguation Average 43% dynamic loads in FP Spec applications

Page 4: 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20061 Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok.

2/15/2006

"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006 4

Recent Research

Chrysos and Emer (ISCA’98) Sethumadhavan et al. (MICRO’03) Park et al. (MICRO’03) Baugh and Zilles (PACC’04) Akkary et al. (MICRO’03) Gandhi et al. (ISCA’05), etc.

Hardware-only: Provisioning, re-occurring overhead

Cooperative: Consumption, one-time overhead

Page 5: 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20061 Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok.

2/15/2006

"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006 5

Outline

Cooperative Memory Disambiguation Framework Evaluation Conclusion

Page 6: 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20061 Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok.

2/15/2006

"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006 6

Cooperative Memory Disambiguation- Resource-Effective Approach

90% dynamic loads do not communicate with in-flight stores Many loads do not require memory disambiguation resources Safe loads: Software analyzer can identify them

Can exploit hardware specific information Hardware resources only for non-safe loads

int A[1000], B[1000];

void VecAdd() { for(int i=0; i<1000; i++)

A[i] = A[i] + B[i];}

Page 7: 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20061 Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok.

2/15/2006

"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006 7

Cooperative Memory Disambiguation Framework

Software-hardware Interface Decoupled ISA (No compatibility obligations)

Software Support Binary to binary translator - alto (Muth et al.) Binary analyzer

Identify read-only data loads Identify other general safe loads

Architectural Support Light-weight

Source compiler

Original binary

Hardware

Translator

Compilation

Hardware specifictranslator

ISA

Extended instruction set

Hardware specific internal binary

Page 8: 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20061 Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok.

2/15/2006

"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006 8

General Safe Loads

Scope of parser analysis Steady state loop No internal control flow

Limited in-flight instructions ROB size, store queue size

…LoadLoad…StoreBranch

Simple loop body

……Store…

……Store…

Load…Store…

i

i-1

i-2

Steady state loopexecution

Instructionwindow

Page 9: 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20061 Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok.

2/15/2006

"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006 9

General Safe Loads (Cont.)-Real example from a SPEC FP application

0x120033140: ldl r31, 256(r3) ; prefetch0x120033144: ldt f21, 0(r3) ; Ld10x120033148: lda r27, -2(r27) ; r27 = r27-20x12003314c: lda r3, 16(r3) ; r3 = r3+160x120033150: ldt f22, -8(r3) ; Ld20x120033154: ldt f23, 0(r11) ; Ld30x120033158: cmple r27, 0x1, r1 ; 0x12003315c: lda r11, 16(r11) ; r11 = r11+160x120033160: ldt f24, -8(r11) ; Ld40x120033164: lds f31, 240(r11) ; prefetch0x120033168: mult f20, f21, f21 ;0x12003316c: mult f20, f22, f22 ;0x120033170: addt f23, f21, f21 ;0x120033174: addt f24, f22, f22 ;0x120033178: stt f21, -16(r11) ; St10x12003317c: stt f22, -8(r11) ; St20x120033180: beq r1, 0x120033140 ;

One loop from galgel

0x120033140: ldl r31, 256(r3) ; prefetch0x120033144: ldt f21, 0(r3) ; Ld10x120033148: lda r27, -2(r27) ; r27 = r27-20x12003314c: lda r3, 16(r3) ; r3 = r3+160x120033150: ldt f22, -8(r3) ; Ld20x120033154: ldt f23, 0(r11) ; Ld20x120033158: cmple r27, 0x1, r1 ; 0x12003315c: lda r11, 16(r11) ; r11 = r11+160x120033160: ldt f24, -8(r11) ; Ld40x120033164: lds f31, 240(r11) ; prefetch0x120033168: mult f20, f21, f21 ;0x12003316c: mult f20, f22, f22 ;0x120033170: addt f23, f21, f21 ;0x120033174: addt f24, f22, f22 ;0x120033178: stt f21, -16(r11) ; St10x12003317c: stt f22, -8(r11) ; St20x120033180: beq r1, 0x120033140 ;

AddrLd1=_R3+16*i

AddrLd2=_R11+16*i

AddrSt1=_R11+16*iAddrSt2=_R11+16*i+8

Analysis window: 16 iterations

Address range =_R11+(i-16)*16 to _R11+(i-1)*16+8

Ld2 statically determined to be safe

Ld1 need run-time evaluation

Page 10: 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20061 Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok.

2/15/2006

"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006 10

General Safe Loads (Cont.)-Real example from a SPEC FP application

New_entry: mark_sqif(r3-r11+8>0) or (r3-r11+264<0) then

cset CR0, 1

0x120033144: sldt f21, 0(r3), [CR0] ; Ld1 (safe)

0x12003314c: lda r3, 16(r3) ; r3 = r3+16

0x120033154: sldt f23, 0(r11), [CR_TRUE] ; Ld2 (safe)0x120033158: cmple r27, 0x1, r1 ; 0x12003315c: lda r11, 16(r11) ; r11 = r11+16

0x120033174: addt f24, f22, f22 ;0x120033178: stt f21, -16(r11) ; St10x12003317c: stt f22, -8(r11) ; St2

Modified Code

Page 11: 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20061 Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok.

2/15/2006

"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006 11

Safe stores

Safe stores If it does not communicate with future loads Indirectly discover safe loads

Un-analyzable store Load is safe if all stores in SQ are safe

Summary of safe load detection Simple loop body All stores must be analyzable Address range calculation

…Load (A)…Store1 (UA)…Store2 (A)…Branch

Loop Body

…Load (A)…Store1 (UA)…Store2 (A)…Branch…Load (A)...

In-flightinstructions

Page 12: 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20061 Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok.

2/15/2006

"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006 12

Architectural Support

Safe loads Boolean condition registers cset (instruction)

Safe stores Scope marker Indirect jumps

Flash-reset all condition registers

Page 13: 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20061 Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok.

2/15/2006

"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006 13

Outline

Cooperative Memory Disambiguation Framework Evaluation Conclusion

Page 14: 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20061 Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok.

2/15/2006

"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006 14

Experimental Setup Modified SimpleScalar 3.0b simulator Wattch to estimate dynamic energy consumption SPEC CPU2000 benchmark suite

Page 15: 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20061 Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok.

2/15/2006

"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006 15

Breakdown of Safe Loads (FP)

97%

43%

Page 16: 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20061 Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok.

2/15/2006

"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006 16

Performance Improvement (FP)

40/48%

Page 17: 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20061 Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok.

2/15/2006

"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006 17

Breakdown of Safe Loads (INT)

Page 18: 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20061 Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok.

2/15/2006

"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006 18

Performance Improvement (INT)

Page 19: 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20061 Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok.

2/15/2006

"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006 19

Energy Savings

Floating-point applications

Integer applications

Page 20: 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20061 Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok.

2/15/2006

"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006 20

Conclusions

Software assistance improves LSQ efficiency Detects average 43% loads as safe Average 10% performance gain

Compiler techniques for optimization of micro-architecture resources

Future work More powerful static analyzer Manage other micro-architecture resources

E.g., register file

Page 21: 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20061 Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok.

2/15/2006

"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006 21

Thank you!

Questions?

Page 22: 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20061 Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok.

2/15/2006

"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006 22

Support for CoherencyHash Table: 2-bit

Total entries: 512 Details:

http://www.ece.rochester.edu/~mihuang/PAPERS/hpca06tr.pdfTable 1 Table 2

Accessbit

Invalidationbit

Page 23: 2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 20061 Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok.

2/15/2006

"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006 23

Read-Only Data Loads

Alpha COFF binary header Global pointer (GP) Read-only sections

Access address calculation Algorithm - extended constant propagation

gp=0x120022000

Read-Only Section

Start: 0x120023000

End: 0x120024000