Page 1
TDB: A Source-level Debugger for Dynamically Translated Programs
Department of Computer Science
University of Pittsburgh
Pittsburgh, Pennsylvania 15260
{naveen, childers}@cs.pitt.edu
Department of Computer Science
University of Virginia
Charlottesville, Virginia 22904
[email protected]
Naveen Kumar, Bruce Childers Mary Lou Soffa
Page 2
New execution vehicle
• New execution vehicle for:– software security, bug isolation, simulations, dynamic
optimizations….
• Software Dynamic Translation (SDT)– A layer between application program and the host machine– Intercepts and modifies instructions before they execute
Application Binary
CPU
Dynamic Translator
• Goal: Debug the Application transparent to SDT
Page 3
Challenges to debugging
• Static debug information is inconsistent– Code is generated and modified during execution– Code duplication at run-time
• Transparency of dynamic translation– Hide the SDT system– Hide the effects of dynamic translation (code
modifications) on the translated code
Page 4
Our approach
• A debug architecture for debugging dynamically translated programs
• Dynamic debug mappings– Relate untranslated code with translated code– Techniques to generate these mappings at run-time
for different kinds of translation operations
• Extensibility: support different uses of SDT
Page 5
Outline
• Background• Debug Architecture• Debug Mappings• Implementation• Experimental Results• Summary
Page 6
Background
• Primary tasks– Fetch application
instructions– Decode– Translate
(modify/instrument)– Emit translated code into
a code cache
Application Binary
SDT
Code Cache
Fetch
Next PC
Decode
Fetch
Emit
Decode
Translate
• Software Dynamic Translation (SDT)
Host CPU
Page 7
SDT Direct Execution & Cache
Program Code
Translator
Code Cache
ld [ %o1 ], %o0
ld [ %o1 ], %o0ld [ %o1 ], %o0
call 0x26a70c
nop
…
sethi hi(0x50400),%o7or %o7, 0x288, %o7
…
branchtrampoline
call 0x26a70cnopbranchexecute fragmentfetch fragment
Fetch code fragmentuntil end of fragment condition
Execute code fragmentuntil branch trampoline
re-enter
mov %g0, %o0mov %g0, %o0
be 0x26a77c
…
Regular Operation: One instruction translates into exactly one instruction in code cacheMany Operation: One instruction results in more than one translated instructionDelete Operation: Translation of an instruction results in zero instructionsTrampoline Operation: Translation of a branch results in a set of instructions to invoke translator
Page 8
Outline
• Background• Debug Architecture• Debug Mappings• Implementation• Experimental Results• Summary
Page 9
Debug Architecture
MappingGenerator
MapperBreakpointManager
MappingRepository
BreakpointRepository
Debug Engine
Native Debugger
ApplicationSDT
SystemCode Cache
Page 10
Debug Engine
MappingGenerator
MapperBreakpointManager
MappingRepository
BreakpointRepository
Translation information from SDT system
Debug Engine
Page 11
Debug Engine
MappingGenerator
MapperBreakpointManager
MappingRepository
BreakpointRepository
Translation information from SDT system
mapAddress or writeValue from Native Debugger
Read/Write into Code Cache
Debug Engine
Page 12
Debug Engine
MappingGenerator
MapperBreakpointManager
MappingRepository
BreakpointRepository
Translation information from SDT system
Read/Write into Code Cache
mapAddress or writeValue from Native Debugger
insert or delete breakpoints
Debug Engine
Page 13
Debug Engine
MappingGenerator
MapperBreakpointManager
MappingRepository
BreakpointRepository
Translation information from SDT system
Read/Write into Code Cache
mapAddress or writeValue from Native Debugger
insert or delete breakpoints
Breakpoint Exception
Debug Engine
Notify native debugger
Page 14
Outline
• Background• Debug Architecture• Debug Mappings• Implementation• Experimental Results• Summary
Page 15
Dynamic Debug Mappings
• Debug engine generates and uses debug information in terms of mappings
• Mappings used to implement debug commands• Mapping types
– U-T: untranslated code with translated code– T-T: translated code with translated code– T-U: translated code with untranslated code
• The mappings are generated based upon the kind of translation operation (regular, many etc.)
Page 16
50684: ld [ %o1 ], %o050688: call 0x26a70c 5068c: nop……26a70c: mov %o0, %o126a710: andcc %o1,3,%o326a714: be 0x26a77c26a718: mov %g0, %o0......
f1800c8: ld [ %o1 ], %o0
Program locations Translated locations
U-T Mappings1. 50684 {f1800c8}
Uses:
1. Determine code cache location for inserting a breakpoint
2. Determine untranslated location for PC, when a breakpoint is hit
u tU-T
Regular Operation(copy an instruction to code cache)
Page 17
50684: ld [ %o1 ], %o050688: call 0x26a70c 5068c: nop……26a70c: mov %o0, %o126a710: andcc %o1,3,%o326a714: be 0x26a77c26a718: mov %g0, %o0......
f1800c8: ld [ %o1 ], %o0
Program locations Translated locations
U-T Mappings1. 50684 {f1800c8}
Many Operation(translate an instruction into multiple instructions)
Page 18
50684: ld [ %o1 ], %o050688: call 0x26a70c 5068c: nop……26a70c: mov %o0, %o126a710: andcc %o1,3,%o326a714: be 0x26a77c26a718: mov %g0, %o0......
f1800c8: ld [ %o1 ], %o0f1800cc: sethi hi(0x50400),%o7f1800d0: or %o7, 0x288, %o7
Program locations Translated locations
Many Operation
U-T Mappings1. 50684 {f1800c8}
Page 19
50684: ld [ %o1 ], %o050688: call 0x26a70c 5068c: nop……26a70c: mov %o0, %o126a710: andcc %o1,3,%o326a714: be 0x26a77c26a718: mov %g0, %o0......
f1800c8: ld [ %o1 ], %o0f1800cc: sethi hi(0x50400),%o7f1800d0: or %o7, 0x288, %o7
U-T Mappings1. 50684 {f1800c8}2. 50688 {f1800cc} T-T Mappings3. f1800d0 {f1800d4}
u u+1
t1t2t3t4
U-T
U-T
T-T
Program locations Translated locations
Uses:
“Skip past” the execution of each additional instruction
(e.g. t2 & t3 in the adjoining figure are never visible to the native debugger)
Many Operation
Page 20
50684: ld [ %o1 ], %o050688: call 0x26a70c 5068c: nop……26a70c: mov %o0, %o126a710: andcc %o1,3,%o326a714: be 0x26a77c26a718: mov %g0, %o0......
f1800c8: ld [ %o1 ], %o0f1800cc: sethi hi(0x50400),%o7f1800d0: or %o7, 0x288, %o7f1800d4: mov %o0, %o1f1800d8: andcc %o1, 3, %o3f1800dc: be 0xff180104f1800e0: mov %g0, %o0f1800e4: save %sp, -96, %sp......
U-T Mappings1. 50684 {f1800c8}2. 50688 {f1800cc}4. 5068c {f1800d4} 5. 26a70c {f1800d4}6. 26a710 {f1800d8}7. 26a714 {f1800dc}8. 26a718 {f1800e0}
T-T Mappings3. f1800d0 {f1800d4} T-P Mappings9. f1800e4 {26a77c}
Program locations Translated locations
Other Operations
Other operations include:
1. Delete, Trampoline
2. Overhead reduction operations
3. Dynamic instrumentation
Page 21
Outline
• Background• Debug Architecture• Debug Mappings• Implementation• Experimental Results• Summary
Page 22
TDB
• Reference implementation of Debug Architecture
• GDB as the Native Debugger– Supports all source-level commands in GDB
• SDT system Strata– Basic translation operations (regular,many,delete,tramp)– Overhead reduction techniques– Dynamic instrumentation
• Also used by Intel for their Pin SDT system
Page 23
Layout of the Debug Engine
GDB process space
Strata process space
Shared memory
Native Debugger
Application + SDT System
Mapper Breakpoint Manager
Mapping & Breakpoint Repositories
Mapping generator
Debug Engine
Page 24
Outline
• Background• Debug Architecture• Debug Mappings• Implementation• Experimental Results• Summary
Page 25
Experiments
• Experiments– Measured time to execute one breakpoint– Measured memory overhead
• Experimental setup– Strata-SPARC, GDB 5.3
• security policy on invocation of syscalls
– SUN Blade 100, SPECint2000 benchmarks– Breakpoints set in "hot" functions
• Programs run until 10,000 breakpoints hit
Page 26
Breakpoint Overhead
Cost per breakpoint in GDB = 1
Average cost of breakpoint in Tdb = 1.63
0
0.5
1
1.5
2
2.5
mcf gcc gzip bzip twolfvortexvpr
Slowdown
gdbtdb
Page 27
Memory Requirements
• Memory requirement ranges from 56KB to 1.3 MB– Average of 501KB
1
10
100
1000
10000
mcf gcc gzip bzip twolf vortex vpr
Memory in KB
Page 28
Outline
• Background• Debug Architecture• Debug Mappings• Implementation• Experimental Results• Summary
Page 29
Summary
• Proposed a debug architecture– Debug mappings– Generation and use of mappings
• Available for Strata/GDB and Pin/GDB– Supports all source-level commands and queries
• Has minimal performance and memory overheads
Page 30
For More Information
Please visit
http://www.cs.pitt.edu/coco/tdb
University of VirginiaUniversity of Pittsburgh