Cantrill, B., Shapiro, M., and Leventhal, A. 2004. Dynamic instrumentation of production systems. roceedings of the 2004 Usenix Annual Technical Conference Onur Derin Jan 25, 2007
Mar 15, 2016
Cantrill, B., Shapiro, M., and Leventhal, A. 2004.
Dynamic instrumentation of production systems. Proceedings of the 2004 Usenix Annual Technical Conference.
Onur Derin
Jan 25, 2007
Outline
Motivation Expectations from a solution DTrace Features DTrace Architecture Instrumentation Providers D Language Aggregations Speculative Tracing Future work Examples Further readings Discussion
Motivation
Software Observability Problem Since software is not physical, only way to observe it is again by software.
if(tracing_enabled)printf(“we got here”);
This creates overhead. (load, compare, branch)
A solution to this is conditional compilation But it creates two versions of software
One in development and test Other in use in production systems
But then how do you identify a problem that occurs while the system is in use? Problem of reproducing the problem in development Usually, you end up finding a solution to a different problem
An instrumentation solution should allow observability in production systems
Motivation
Software abstraction is a good thing. (web application, web server, DB server, OS) It implies, at higher levels, less code induces more work.
Less of a misstep induces more unintended consequences. Missteps accumulate as you go down to software layers. (Avalanche effect) Therefore problems are observed first in lowest layers.
e.g. excessive memory demand, excessive I/O activity, excessive network traffic. Seeing lowest layer problems, a typical solution is to use faster hardware.
e.g. more RAM, more CPU, more bandwidth However, real problem is on higher layers of software.
Real solution is fixing the problem at higher levels.
Therefore identifying the real problem requires a system-view in instrumentationrather than a process-centric one.
Expectations from a Solution
Shift from development to production Zero disabled-probe effect
Ship the product totally optimized When it is to be observed, dynamically modify the code
Shift from programs to systems Entire stack should be able to be dynamically instrumented.
e.g. operating system, system libraries, high-level languages and environments. Kernel is involved. So observability infrastructure should be absolutely safe.
Abruptions during production are costly. Problematic state of the system is lost in case of a restart.
First time software is to be observed, it is already running in production Solution shouldn’t require special compilation options, having source code,
restarting components.
These expectations formed the design guidelines for DTrace.
DTrace Features
Dynamic Inst.: achieves zero disabled-probe effect. Unified Inst.: instruments both user and kernel level software. Arbitrary-context Kernel Inst.: instruments all kernel incl. scheduler and synch. Data Integrity: reports errors in handling of data during instrumentation. Arbitrary Actions: lets user specify arbitrary actions safely at any inst. point. Predicates: actions when predicate true. Allows pruning of data at source. A High-level Control Language: lets specifying predicates and actions. A Scalable Mechanism for Aggregating Data: processes data at low levels. Speculative Tracing: leaves decision to commit or not at a later time. Heterogeneous Inst.: a glue framework for diff. providers from I/O to scheduler to net. Scalable Architecture: efficient classification and selection of thousands of inst. points. Virtualized Consumers: allows multiple, concurrent consumers of the framework.
How have these features been enabled by DTrace?
DTrace Architecture
Virtualized consumers
a.d b.d DTrace program source files
dtrace(1M)
intrstat(1M)
intrstat(1M)
intrstat(1M)
libdtrace(3LIB)
DTrace consumers
DTracedtrace(7D)
userlandkernel
sysinfo vminfo fasttrap
syscall profile fbtDTrace poviders
Scalable architecture
API
API
Heterogeneous Inst.
Internals
Providers are loadable kernel modules that carry out the instrumentation task. Providers communicate with DTrace Framework using a well-defined API.
DTraceFramework::determineInstrumentationPoints(){
for provider in all providers{
provider.determineInstrumentationPoints(createProbe);}
}
Provider::determineInstrumentationPoints(){
Generate list of all inst. pointsfor instPoint in all instrumentation points{
probeID = DTraceFramework.createProbe(instPoint.moduleName, instPoint.funcName, instPoint.semanticName);
Associate probeID with instrumentation point}
}
Internals
dtrace(3LIB) advertises these probes to consumers.dtrace(3LIB)::enableProbe(providerName, moduleName, funcName, name){
probe = DTraceFramework.getProbe(providerName, moduleName, funcName, name);if(!probe.isEnabled() ){
provider.enableProbe(probe.ID);}
Create Enabling Control Block(i.e. ECB)Create per-CPU buffer associated with ECBAssociate ECB and probe
}
Provider::enableProbe(probeID){
Dynamically modify inst. point s.t. when hit, it calls DTraceFramework::probeFired(probeID)
}
ECB enables virtualized consumers.
A probe is associated withan ECB per enabling consumer.
This association is kept in DTraceFramework.
Internals
DTraceFramework::probeFired(probeID){
Disable interruptsfor ecb in all ECBs where ECB.probeID = probeID{
if(ecb.predicate)DTraceFramework.execute(ecb.actions);
}Re-enable interrupts
}
ECB
+ predicate+ actions
ECB Actions: may store data in per-CPU buffer associated with ECB. mayupdate D variable state. may not store to kernel memory, modify registers, change system state.
Internals
DTraceFramework::storeDataInPerCPUBuffer(ecb, data){
buffer = DTraceFramework.getBuffer(ecb);if(buffer.freeSpace() >= ecb.DATA_SIZE)
buffer.store(data);else
ecb.dropCount++;}
To minimize dropCount, buffers should be read periodically.
How to read buffers such that data integrity and waiting-free probe processing is assured?
Buffers
Since buffer switching and probe processing can not be interrupted, data integrity is assured.
What if interrupts were not disabled?
CPU0 CPU1
Consumer program Initiating
read buffer
operation
xcall()
Buffer2Buffer1Inactive
Interrupts disabled
Active
Inactive Active
Interrupts re-enabled
xcall()returns
Buffers
Two inactive buffers, none writtable.
CPU0 CPU1
Consumer program Initiating
read buffer
operation
xcall()
Buffer2Buffer1InactiveActive
Inactive
Probe interrupts, ECB action wants to store to the buffer.
DIF
D Intermediate Format Instruction set for specifying predicates and actions But mainly in order to to allow programmable actions to be
executed safely in arbitrary contexts. DIF code is checked for validity when it is loaded. Only forward branches are allowed to avoid infinite loops. Illegal loads (from misaligned addresses, memory-
mapped I/O devices, unmapped memory) and division by zero are handled at run-time by returning errors to the consumers.
Arbitrary stores are not allowed. Only defined subroutines can be called at run-time.
Instrumentation Providers
General properties No disabled-probe effect Mostly use dynamic code modification
Some examples syscall: traces entire comm. from userland to kernel fbt:entry and return points of kernel functions sched: which threads run on which CPU, how long io: disk I/O requests mib: counters for IP, IPv6 etc. profile: time-triggered probing at specified intervals lockstat: kernel synchronization behaviour
Function Boundary Tracing implementation in SPARC
call x
Modifieddynamically
ba y
y: prepare probeID etc.call DTrace, probeFired(probeID, …)
On return, call x is executed in y
Production Software
Instrumented Software
D Language
C-like, supports ANSI C operators Strings exist No if, no loop. Only integer arithmethic No need to declare variables Scalar variables Associative arrays
Collection of data elements No predefined number Like hashes name[key] = expression
D Language
Thread-local variables: Variables for OS threads referred with self->variable-name
Clause-local variables: Their storage is reused for each program clause. Referred with this->variable-name
Built-in variables (execname, pid, timestamp, curthread) External variables
Used in kernel modules (kmem_flags)
D Language
General templateprobe descriptions/predicate/{action statements
}Probe description:Provider Name:Module Name:FunctionName:Semantic name
Predicate is a D expression.Actions:
Recording actions (print(), printa(), trace()) Destructive actions(disabled by default) Special actions(copyinstr(), strlen(), rand() etc.)
Aggregations (Cherry on the cake)
Aggregate data and look for trends, generate reports General form
@name[keys] = aggfunc(args); Aggregation function:
f(f(x0) U f(x1) U ... U f(xn)) = f(x0 U x1 U ... U xn) e.g.
Count() Min() Max() Sum() Avg() Quantize()
Speculative Tracing
Trace data and later commit or not to a buffer When you cannot use a predicate condition and don't know
a probe event When you have an error event and would like to know the
history behind it and why that error occurred Functions:
speculation() speculate() commit() discard()
Example D Programs
BEGIN{
trace(“Hello world”);exit();
}
# dtrace -s helloworld.ddtrace: script 'helloworld.d' matched 1 probeCPU ID FUNCTION:NAME 0 1 :BEGIN Hello world
syscall::read:entry{ printf("Process %d", pid);}
Example D Programs
syscall::read:entry{ printf("Process %d", pid);}
# dtrace -s d2.ddtrace: script 'd2.d' matched 1 probeCPU ID FUNCTION:NAME 0 44129 read:entry Process 2680 0 44129 read:entry Process 2680 0 44129 read:entry Process 2827 0 44129 read:entry Process 2680 0 44129 read:entry Process 2680 0 44129 read:entry Process 2827…
syscall::write:entry
/execname=="sshd"/
{
@[arg0] = quantize(arg2);
}
RESULT:
4
value ------------- Distribution ------------- count
8 | 0
16 |@ 1
32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 24
64 |@@@@@@ 5
128 |@ 1
256 |@@@@ 3
512 | 0
Example D Programs
syscall::write:entry
/execname==“sshd” && arg0==5/
{
@[ustack()] = quantize(arg2);
}
RESULT: next slide
Example D Programs
# dtrace -s d4.ddtrace: script 'd4.d' matched 1 probe
^C
libc.so.1`_write+0x15 sshd`altprivsep_start_monitor+0x220 sshd`main+0xe57 sshd`0x805bad2
value ------------- Distribution ------------- count 2 | 0 4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 8 | 0
libc.so.1`_write+0x15 pkcs11_softtoken.so.1`looping_write+0x32 pkcs11_softtoken.so.1`C_SeedRandom+0xfd libpkcs11.so.1`C_SeedRandom+0xed mech_krb5.so.1`krb5_c_random_seed+0x3d mech_krb5.so.1`init_common+0x121 mech_krb5.so.1`krb5_init_context+0xd mech_krb5.so.1`krb5_gss_get_context+0x3d mech_krb5.so.1`_C0095D0A+0x49 libgss.so.1`__gss_get_mechanism+0xad libgss.so.1`gss_add_cred+0x79 libgss.so.1`gss_acquire_cred+0xfb sshd`ssh_gssapi_server_mechs+0x7c sshd`ssh_gssapi_server_kex_hook+0x22 sshd`0x807cc12 sshd`kex_send_kexinit+0x2a sshd`kex_setup+0x74 sshd`0x805e90f sshd`main+0xe05 sshd`0x805bad2
value ------------- Distribution ------------- count 4 | 0 8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 16 | 0
syscall::open:entry
{@files[copyinstr(arg0)] = count();
}
RESULT:
# dtrace -s d5.d
dtrace: script 'd5.d' matched 1 probe
^C
/etc/resolv.conf 1
Example D Programs
RESULT when copyinstr is removed:# dtrace -s d5.ddtrace: script 'd5.d' matched 1 probedtrace: error on enabled probe ID 1 (ID 44133: syscall::open:entry): invalid address (0x80fbdaf) in action #2 at DIF offset 28dtrace: error on enabled probe ID 1 (ID 44133: syscall::open:entry): invalid address (0x80fbdaf) in action #2 at DIF offset 28^C
/lib/libc.so.1 1 /proc/2647/psinfo 1 /proc/2723/psinfo 1 /proc/2874/psinfo 1 /proc/4680/psinfo 1 /proc/4691/psinfo 1 /proc/4740/psinfo 1 /var/ld/ld.config 1 /dev/null 2 /etc/resolv.conf 2 /var/adm/utmpx 2
Future Work
Performance counter provider Helper actions: Embracing high-level languages and their environments. User lock analysis: lock contention analysis of user-level multi-threaded processes. Fine-grained user-level providers Software visualization
Further Readings to be Googled
DTrace Guide Hidden In Plain Sight, Cantrill B. DTrace Toolkit as a repository of D scripts classified in
terms of application domains like CPU, Disk, Mem, Kernel, Net etc.
DTrace & DTraceToolkit, Stefan Parvu
Discussion
No discussion of how much overhead is introduced when probes are enabled. Safety is considered only as not crashing and halting the system.
What about guarenteeing not violating other requirements of the system like real-time properties?