Instruction and Data Address Trace Compression Aleksandar Milenković (collaborative work with Milena Milenković and Martin Burtscher) Electrical and Computer Engineering Department The University of Alabama in Huntsville Email: [email protected]Web: http://www.ece.uah.edu/~milenka http://www.ece.uah.edu/~lacasa
43
Embed
Instruction and Data Address Trace Compression Aleksandar Milenković (collaborative work with Milena Milenković and Martin Burtscher) Electrical and Computer.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Instruction and Data Address Trace Compression
Aleksandar Milenković
(collaborative work with Milena Milenković and Martin Burtscher)
External Trace Unitfor Storing/Processing(PC or Intelligent Drive)
14
Outline
Program Execution Traces Trace Compression Trace Compression in Hardware
Stream caches and predictors for instruction address traces
Data address stride caches for data address traces
Results Conclusions
15
Stream Detector + Stream Cache
F(S.SA, S.SL)
iSet
Hit/Miss
SCMT (SA, SL) SCIT
’00…0’
S.SA & S.L
Stream Cache (SC)
NSET - 1
…NWAY - 1
=?
iWay
S.SA & S.LFrom InstructionStream Buffer
Stream Cache Index Trace
Stream Cache Miss Trace
iWay
PC
PPC
-
S.SA S.L
SA
=! 4
SL
Instruction Stream Buffer
SA
SA
0
1
i
01
reserved
SA L
(0x020001f4,0x09)
0x0E
(0x020001f4,0x09) 0x00 // it. 0
0x020001f40x020001f8
...0x02000214
0x0E // it. 1
0x0E // it. 99
16
SC Itrace Compression
Instruction Stream Buffer size Not to stall processor
(e.g., have consecutive very short instruction streams)
Stream cache Size Associativity Replacement policy Mapping function
Compress instruction stream1. Get the next instruction stream record
from the instruction stream buffer(S.SA, S.SL);2. Lookup in the stream cache with iSet = F(S.SA, S.SL);3. if (hit) 4. Emit(iSet && iWay) to SCIT; 5. else {6. Emit reserved value 0 to SCIT;7. Emit stream descriptor (S.SA, S.SL) to SCMT;8. Select an entry (iWay) in the iSet set to be replaced;9. Update stream cache entry: SC[iSet][iWay].Valid = 1
Legend: CR(SC.I) – compression ratio N – number of instructions SL.Dyn – average stream
length (dynamic) SC.Hit(Nset,Nway) – SC hit rate
Assumptions: stream length < 256
(1 byte for SL) 4 bytes for stream starting
address
).1(5)(log81
.4).(
5).1(.
)(
8
)(log
.)(
4).(
)()(
).().(
2
2
WAYSNSETNWAYSSET
WAYSNSETN
WAYSSET
HitSCNN
DynSLISCCR
BytesHitSCDynSL
NSCMTSize
BytesNN
DynSL
NSCITSize
BytesNIDineroSize
SCMTSizeSCITSize
IDineroSizeISCCR
DynSLISCCRLimNN
DynSLISCCRLimNN
DynSLISCCRLimNN
NN
DynSLLimISCCRLim
HitSCWAYSSET
HitSCWAYSSET
HitSCWAYSSET
WAYSSETHitSCHitSC
.34.5)).((64
.57.4)).((128
.4)).((256
)(log
.32)).((
1.
1.
1.
21.1.
18
2nd Level Itrace Compression
Size(SCIT) >> Size(SCMT) HitRate = 98%, 8-bit index
=> Size(SCIT) = 10*Size(SCMT) Redundancy in SCIT
Temporal and spatial locality of instruction streams Reduce SCIT trace
Global Predictor N-tuple compression using Tuple History Table N-tuple compression using SCIT History Buffer
19
Global Predictor Structure
...
SCIT Trace
==?’0’
0
MaxP-1
Hit/Miss
SCIT PRED Trace SCIT PRED Miss Trace
History Buffer
F
’1’
next.sid
pindex
Predictor
20
SCIT Compression
Predict SCIT index1. Get the incoming index, next.sid, from the SCIT trace2. Calculate the SCIT predictor index, pindex,
using indices in the History bufferpindex = F (indices in the History Buffer);
3. Perform lookup in the SCIT Predictor with pindex;4. if(SCIT.Predictor[pindex] == next.sid) 5. Emit(‘1') to SCIT PRED trace; 6. else {7. Emit(‘0’) to SCIT PRED trace;8. Emit next.sid to SCIT Miss PRED trace; 9. SCIT.Predictor[pindex] = next.sid; }10. Shift in the next.sid to the History Buffer;
Length of history buffer Global predictor Size Mapping function
Design Decisions:
21
Redundancy in SCIT Pred Trace
High predictor hit rates and long runs of 0xFF bytes are expected in Predictor Hit Trace
Use a simple FSM to exploit byte repetitions
PREDHit
TracePrev.BYTE
=?CNT
SCIT PRED Header
SCIT PRED Repetition
Trace
// Detect byte repetitions in SCIT pred1. Get next SCIT Pred byte, Next.BYTE; 2. if (Next.BYTE == Prev.BYTE) CNT++;3. else {4. if (CNT == 0) {5. Emit Prev.BYTE to SCIT.REP.Trace;6. Emit ‘0’ to SCIT Header;7. } else {8. Emit (Prev.BYTE, CNT) pair
to SCIT.REP.Trace;9. Emit ‘1’ to SCIT Header;}10. Prev.BYTE = Next.BYTE;}
22
Outline
Program Execution Traces Trace Compression Trace Compression in Hardware
Stream caches and predictors for instruction address traces
Data address stride caches for data address traces
Results Conclusions
23
Data Address Trace Compression
More challenging task Data addresses rarely stay constant
during program execution However, they often have a regular stride => Use Data Address Stride Cache (DASC) to exploit
locality of memory referencing instructions and regularity in data address strides
24
index
PC
Data Address Stride Cache (DASC)
0
1
i
N - 1
… …
… …
LDA Stride
DA-LDA
G(PC)
DA
==?’0’ ’1’
DT (Data trace)DMT
Data Miss Trace
Stride.Hit
Data Address Stride Cache
Stride.Hit
DASC Tagless structure Indexed by PC of
the corresponding instruction Entry fields
LDA – Last Data Address Stride
0x020001f8
0xbfffbe24
0xbfffbe200xbfffbe1c
0xbfffbe20
0xbfffbe24
0 0 1
25
DASC Compression
// Compress data address stream1. Get the next pair from data buffers (PC, DA)2. Lookup in the data address stream cache indexSet = G(PC);3. cStride = DA - DASC[iSet].LDA;4. if (cStride == DASC[iSet].Stride) {5. Emit(‘1’) to DT; //1-bit info 6. } else {7. Emit(‘0’) to DT;8. Emit DA to DMT;9. DASC[iSet].Stride =lsb(cStride); }10. DASC[iSet].LDA = DA;
Number of entries Index function G Stride length Data address buffer depth
Design Decisions:
26
DASC Dtrace Compression: An Analytical Model
Legend: CR(SC.D) – compression ratio Nmemref – number of memory
referencing instructions DASC.Hit – DASC hit rate Assumptions:
4 bytes for stream starting address
HitDASCDSCCR
BHitDASCNDMTSizeDTSize
BNDDineroSize
DMTSizeDTSize
DDineroSizeDSCCR
memref
memref
.03125.1
1).(
)]125.04).1[()()(
4).(
)()(
).().(
3203125.0
1)).((
1.
DSCCRLim
HitDASC
27
Redundancy in DT Trace
DT
Prev.DT
=?CNT
Data Header(DH)
Data Repetition Trace (DRT)
// Detect data repetitions1. Get next DT byte; 2. if (DT == Prev.DT) CNT++;3. else {4. if (CNT == 0) {5. Emit Prev.DT to DRT;6. Emit ‘0’ to DH;7. } else {8. Emit (Prev.DT, CNT) pair to DRT;9. Emit ‘1’ to DH;}10. Prev.DT = DT;}
High predictor hit rates and long runs of 0xFF bytes are expected in DT Trace
Use a simple FSM to exploit byte repetitions
28
Outline
Program Execution Traces Trace Compression Trace Compression in Hardware
Stream caches and predictors for instruction address traces
Data address stride caches for data address traces
Results Conclusions
29
Experimental Evaluation
Goals Assess the effectiveness
of the proposed algorithms
Explore the feasibility of the proposed hardware implementations
Determine optimal size and organization of HW structures
Data Address Stride Cache + Byte repetition FSM for data traces Benefits
Enabling real-time trace compression with high compression ratio Low complexity (small structures, small number of external pins)
Analytical & simulation analysis focusing on compression ratio and optimal sizing/organization of the structures as well as real-time trace port bandwidth requirements
Laboratory for Advanced Computer Architectures and Systems
PMAC (Parallel MACs) for reducedcryptographic latency
A variation of the one-time-pad for code encryption
Instruction Verification Buffer for conditional execution before verification
Computer Security is Critical Software & physical attacks
Sign & Verify for Guaranteed Integrity and Confidentiality of Code
Improvements
Buffer overflow in MMClient.exe in IndiatimesMessenger 6.0 allows remote attackers to cause a denial of service (application crash) and
possibly execute arbitrary code via a long group name argument to the RenameGroupfunction in the MMClient.MunduMessenger.1 ActiveX object.
Multiple format string vulnerabilities in (1) neon 0.24.4 and earlier, and other products that use neon including (2) Cadaver, (3) Subversion, and (4) OpenOffice, allow remote malicious WebDAV servers to
execute arbitrary code.
Buffer overflow in the J PEG (J PG) parsing engine in the Microsoft Graphic Device Interface Plus (GDI+) component, GDIPlus.dll, allows remote
attackers to execute arbitrary code via a J PEG image.
Multiple buffer overflows in RealOne Player, RealOne Player 2.0, RealOne Enterprise Desktop, and RealPlayer Enterprise allow remote
attackers to execute arbitrary code via malformed (1) .RP, (2) .RT, (3) .RAM, (4) .RPM or (5) .SMIL files.
Multiple heap-based buffer overflows in the imlibBMP image handler allow remote
attackers to execute arbitrary code via a crafted BMP file.
I nteger overflow in pixbuf_create_from_xpm (io-xpm.c) in the XPM image decoder for gtk+ 2.4.4 (gtk2) and earlier, and gdk-pixbuf before 0.22, allows
remote attackers to execute arbitrary code via certain n_col and cpp values that enable a
heap-based buffer overflow.
Stack-based buffer overflow in the URL parsing function in Gaim before 1.3.0 allows remote attackers to
execute arbitrary codevia an instant message (IM) with a large URL.
Buffer overflow in WIDCOMM Bluetooth Connectivity Software, as used in products such as BTStackServer 1.3.2.7 and 1.4.2.10, Windows XP and Windows 98 with MSI Bluetooth Dongles, and HP IPAQ 5450 running WinCE 3.0, allows remote
attackers to execute arbitrary code via certain service requests.