Efficient and Flexible Architectural Support for Dynamic Monitoring YUANYUAN ZHOU, PIN ZHOU, FENG QIN, WEI LIU, & JOSEP TORRELLAS UIUC
Jan 30, 2016
Efficient and Flexible Architectural Support for
Dynamic MonitoringYUANYUAN ZHOU, PIN ZHOU, FENG QIN, WEI LIU, & JOSEP TORRELLAS
UIUC
Outline
Background iWatcher Functionality iWatcher Design Performance Conclusion
Static or Dynamic Monitoring?
Static Monitoring– Needs annotation, programmer work– Difficult for unsafe languages (C, C++)
Dynamic Monitoring– Large instrumentation cost– Significant slowdown, performance loss
Dynamic is stronger than Static Monitoring– Dynamic based on actual execution path
Code or Location Controlled Dynamic Monitoring?
Code-Controlled Monitoring– Monitoring performed by special instructions– Assertions & Dynamic Checkers belong here – No hardware support needed
Location-Controlled Monitoring– Monitoring performed only when program
accesses watched memory locations by any way
– Hardware support is usually required– iWatcher and hardware-assisted watchpoints
iWatcher Functionality
Flexible and low-overhead dynamic monitoring
With hardware support– Without expensive exceptions– The program has its own internal light-weight
exception handler, the monitoring function
When a watched memory address is accessed, the monitoring function is automatically executed.
iWatcher Functionality (cont)
If the check of the monitoring action fails, then:– Report, simply report error (non-interactive)– Break, raise a hardware exception, switching
control to the debugger– Rollback, revert to a safe checkpoint
For the same address, more than one monitors may be watching.
iWatcher – Software Level
int x, *p; /* assume invariant: x = 1 */iWatcherOn(&x, sizeof(int), READWRITE, BreakMode,
&MonitorX, &x, 1);...p = foo(); /* a bug: p points to x incorrectly */*p = 5; /* line A: a triggering access */z = Array[x]; /* line B: a triggering access */...iWatcherOff(&x, sizeof(int), READWRITE, &MonitorX);
bool MonitorX(int *x, int value){return (*x == value);
}
Modest Hardware Support (?)
How to monitor a location?
When iWatcherOn() is called– Add monitoring function to (software) CheckTable– If size < LargeRegion → all words are
transferred to L2 cache and tagged update L1 if necessary
– If size > LargeRegion → the entire area is tagged in the Range Watch Table (RWT)
If RWT full, proceed as if size < LargeRegion
How to monitor a location? (cont)
If a word is evicted from L2, store the watch bits (if valid) in Victim WatchFlag Table VWT– If VWT full, O/S support (rare)
When the word is restored, copy the watch bits from VWT
When iWatcherOff is called:– Remove monitoring function from Check Table– If no monitors are watching this area, update
VWT, RWT, L1 and L2 bits as necessary.
How to detect a triggering access?
Out of Order Execution, Pipelining →– Not all instructions will commit
For each Load/Store– Check if valid entry exists in RWT– Bring word and WatchFlag from cache (load) or
prefetch word to cache and get WatchFlag (store)– Store the flags in the ReOrder Buffer (ROB)– Upon retirement of instruction (if it retires), jump
to the monitor, if bits are set.
How to Trigger Monitoring Functions?
When a triggering access is detected– Save processor status and jump to
Main_Check_Function Register– The monitor scans the CheckTable and calls
serially all monitors that:Watch this addressFor this access mode
– For performance, the Thread-Level Speculation (TLS) mechanism may be used.
Executing Monitoring Functions
Executing Monitoring Functions
Comparison to Other Approaches
Performance Compared to Valgrind
4-179% overhead, 25-169x less than Valgrind
Performance with/without TLS
Up to 30% reduction in two cases
Performance varying the fraction of triggering loads and TLS
Performance varying the size of monitoring function and TLS
Above 4 contexts there is no significant improvement
Conclusion
Some Hardware Changes <180% overhead if 20% of loads are
monitored Detects most bugs
– Buffer Overflow– Memory Leaks– Access to non-allocated or non-initialized– …