Cantrill, B., Shapiro, M., and Leventhal, A. 2004.

Cantrill, B., Shapiro, M., and Leventhal, A. 2004.

Dynamic instrumentation of production systems. Proceedings of the 2004 Usenix Annual Technical Conference.

Onur Derin

Jan 25, 2007

Outline

Motivation Expectations from a solution DTrace Features DTrace Architecture Instrumentation Providers D Language Aggregations Speculative Tracing Future work Examples Further readings Discussion

Motivation

Software Observability Problem Since software is not physical, only way to observe it is again by software.

if(tracing_enabled)printf(“we got here”);

This creates overhead. (load, compare, branch)

A solution to this is conditional compilation But it creates two versions of software

One in development and test Other in use in production systems

But then how do you identify a problem that occurs while the system is in use? Problem of reproducing the problem in development Usually, you end up finding a solution to a different problem

An instrumentation solution should allow observability in production systems

Motivation

Software abstraction is a good thing. (web application, web server, DB server, OS) It implies, at higher levels, less code induces more work.

Less of a misstep induces more unintended consequences. Missteps accumulate as you go down to software layers. (Avalanche effect) Therefore problems are observed first in lowest layers.

e.g. excessive memory demand, excessive I/O activity, excessive network traffic. Seeing lowest layer problems, a typical solution is to use faster hardware.

e.g. more RAM, more CPU, more bandwidth However, real problem is on higher layers of software.

Real solution is fixing the problem at higher levels.

Therefore identifying the real problem requires a system-view in instrumentationrather than a process-centric one.

Expectations from a Solution

Shift from development to production Zero disabled-probe effect

Ship the product totally optimized When it is to be observed, dynamically modify the code

Shift from programs to systems Entire stack should be able to be dynamically instrumented.

e.g. operating system, system libraries, high-level languages and environments. Kernel is involved. So observability infrastructure should be absolutely safe.

Abruptions during production are costly. Problematic state of the system is lost in case of a restart.

First time software is to be observed, it is already running in production Solution shouldn’t require special compilation options, having source code,

restarting components.

These expectations formed the design guidelines for DTrace.

DTrace Features

Dynamic Inst.: achieves zero disabled-probe effect. Unified Inst.: instruments both user and kernel level software. Arbitrary-context Kernel Inst.: instruments all kernel incl. scheduler and synch. Data Integrity: reports errors in handling of data during instrumentation. Arbitrary Actions: lets user specify arbitrary actions safely at any inst. point. Predicates: actions when predicate true. Allows pruning of data at source. A High-level Control Language: lets specifying predicates and actions. A Scalable Mechanism for Aggregating Data: processes data at low levels. Speculative Tracing: leaves decision to commit or not at a later time. Heterogeneous Inst.: a glue framework for diff. providers from I/O to scheduler to net. Scalable Architecture: efficient classification and selection of thousands of inst. points. Virtualized Consumers: allows multiple, concurrent consumers of the framework.

How have these features been enabled by DTrace?

DTrace Architecture

Virtualized consumers

a.d b.d DTrace program source files

dtrace(1M)

intrstat(1M)

intrstat(1M)

intrstat(1M)

libdtrace(3LIB)

DTrace consumers

DTracedtrace(7D)

userlandkernel

sysinfo vminfo fasttrap

syscall profile fbtDTrace poviders

Scalable architecture

API

API

Heterogeneous Inst.

Internals

Providers are loadable kernel modules that carry out the instrumentation task. Providers communicate with DTrace Framework using a well-defined API.

DTraceFramework::determineInstrumentationPoints(){

for provider in all providers{

provider.determineInstrumentationPoints(createProbe);}

}

Provider::determineInstrumentationPoints(){

Generate list of all inst. pointsfor instPoint in all instrumentation points{

probeID = DTraceFramework.createProbe(instPoint.moduleName, instPoint.funcName, instPoint.semanticName);

Associate probeID with instrumentation point}

}

Internals

dtrace(3LIB) advertises these probes to consumers.dtrace(3LIB)::enableProbe(providerName, moduleName, funcName, name){

probe = DTraceFramework.getProbe(providerName, moduleName, funcName, name);if(!probe.isEnabled() ){

provider.enableProbe(probe.ID);}

Create Enabling Control Block(i.e. ECB)Create per-CPU buffer associated with ECBAssociate ECB and probe

}

Provider::enableProbe(probeID){

Dynamically modify inst. point s.t. when hit, it calls DTraceFramework::probeFired(probeID)

}

ECB enables virtualized consumers.

A probe is associated withan ECB per enabling consumer.

This association is kept in DTraceFramework.

Internals

DTraceFramework::probeFired(probeID){

Disable interruptsfor ecb in all ECBs where ECB.probeID = probeID{

if(ecb.predicate)DTraceFramework.execute(ecb.actions);

}Re-enable interrupts

}

ECB

+ predicate+ actions

ECB Actions: may store data in per-CPU buffer associated with ECB. mayupdate D variable state. may not store to kernel memory, modify registers, change system state.

Internals

DTraceFramework::storeDataInPerCPUBuffer(ecb, data){

buffer = DTraceFramework.getBuffer(ecb);if(buffer.freeSpace() >= ecb.DATA_SIZE)

buffer.store(data);else

ecb.dropCount++;}

To minimize dropCount, buffers should be read periodically.

How to read buffers such that data integrity and waiting-free probe processing is assured?

Buffers

Since buffer switching and probe processing can not be interrupted, data integrity is assured.

What if interrupts were not disabled?

CPU0 CPU1

Consumer program Initiating

read buffer

operation

xcall()

Buffer2Buffer1Inactive

Interrupts disabled

Active

Inactive Active

Interrupts re-enabled

xcall()returns

Buffers

Two inactive buffers, none writtable.

CPU0 CPU1

Consumer program Initiating

read buffer

operation

xcall()

Buffer2Buffer1InactiveActive

Inactive

Probe interrupts, ECB action wants to store to the buffer.

DIF

D Intermediate Format Instruction set for specifying predicates and actions But mainly in order to to allow programmable actions to be

executed safely in arbitrary contexts. DIF code is checked for validity when it is loaded. Only forward branches are allowed to avoid infinite loops. Illegal loads (from misaligned addresses, memory-

mapped I/O devices, unmapped memory) and division by zero are handled at run-time by returning errors to the consumers.

Arbitrary stores are not allowed. Only defined subroutines can be called at run-time.

Instrumentation Providers

General properties No disabled-probe effect Mostly use dynamic code modification

Some examples syscall: traces entire comm. from userland to kernel fbt:entry and return points of kernel functions sched: which threads run on which CPU, how long io: disk I/O requests mib: counters for IP, IPv6 etc. profile: time-triggered probing at specified intervals lockstat: kernel synchronization behaviour

Function Boundary Tracing implementation in SPARC

call x

Modifieddynamically

ba y

y: prepare probeID etc.call DTrace, probeFired(probeID, …)

On return, call x is executed in y

Production Software

Instrumented Software

D Language

C-like, supports ANSI C operators Strings exist No if, no loop. Only integer arithmethic No need to declare variables Scalar variables Associative arrays

Collection of data elements No predefined number Like hashes name[key] = expression

D Language

Thread-local variables: Variables for OS threads referred with self->variable-name

Clause-local variables: Their storage is reused for each program clause. Referred with this->variable-name

Built-in variables (execname, pid, timestamp, curthread) External variables

Used in kernel modules (kmem_flags)

D Language

General templateprobe descriptions/predicate/{action statements

}Probe description:Provider Name:Module Name:FunctionName:Semantic name

Predicate is a D expression.Actions:

Recording actions (print(), printa(), trace()) Destructive actions(disabled by default) Special actions(copyinstr(), strlen(), rand() etc.)

Aggregations (Cherry on the cake)

Aggregate data and look for trends, generate reports General form

@name[keys] = aggfunc(args); Aggregation function:

f(f(x0) U f(x1) U ... U f(xn)) = f(x0 U x1 U ... U xn) e.g.

Count() Min() Max() Sum() Avg() Quantize()

Speculative Tracing

Trace data and later commit or not to a buffer When you cannot use a predicate condition and don't know

a probe event When you have an error event and would like to know the

history behind it and why that error occurred Functions:

speculation() speculate() commit() discard()

Example D Programs

BEGIN{

trace(“Hello world”);exit();

}

# dtrace -s helloworld.ddtrace: script 'helloworld.d' matched 1 probeCPU ID FUNCTION:NAME 0 1 :BEGIN Hello world

syscall::read:entry{ printf("Process %d", pid);}

Example D Programs

syscall::read:entry{ printf("Process %d", pid);}

# dtrace -s d2.ddtrace: script 'd2.d' matched 1 probeCPU ID FUNCTION:NAME 0 44129 read:entry Process 2680 0 44129 read:entry Process 2680 0 44129 read:entry Process 2827 0 44129 read:entry Process 2680 0 44129 read:entry Process 2680 0 44129 read:entry Process 2827…

syscall::write:entry

/execname=="sshd"/

{

@[arg0] = quantize(arg2);

}

RESULT:

4

value ------------- Distribution ------------- count

8 | 0

16 |@ 1

32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 24

64 |@@@@@@ 5

128 |@ 1

256 |@@@@ 3

512 | 0

Example D Programs

syscall::write:entry

/execname==“sshd” && arg0==5/

{

@[ustack()] = quantize(arg2);

}

RESULT: next slide

Example D Programs

# dtrace -s d4.ddtrace: script 'd4.d' matched 1 probe

^C

libc.so.1`_write+0x15 sshd`altprivsep_start_monitor+0x220 sshd`main+0xe57 sshd`0x805bad2

value ------------- Distribution ------------- count 2 | 0 4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 8 | 0

libc.so.1`_write+0x15 pkcs11_softtoken.so.1`looping_write+0x32 pkcs11_softtoken.so.1`C_SeedRandom+0xfd libpkcs11.so.1`C_SeedRandom+0xed mech_krb5.so.1`krb5_c_random_seed+0x3d mech_krb5.so.1`init_common+0x121 mech_krb5.so.1`krb5_init_context+0xd mech_krb5.so.1`krb5_gss_get_context+0x3d mech_krb5.so.1`_C0095D0A+0x49 libgss.so.1`__gss_get_mechanism+0xad libgss.so.1`gss_add_cred+0x79 libgss.so.1`gss_acquire_cred+0xfb sshd`ssh_gssapi_server_mechs+0x7c sshd`ssh_gssapi_server_kex_hook+0x22 sshd`0x807cc12 sshd`kex_send_kexinit+0x2a sshd`kex_setup+0x74 sshd`0x805e90f sshd`main+0xe05 sshd`0x805bad2

value ------------- Distribution ------------- count 4 | 0 8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 16 | 0

syscall::open:entry

{@files[copyinstr(arg0)] = count();

}

RESULT:

# dtrace -s d5.d

dtrace: script 'd5.d' matched 1 probe

^C

/etc/resolv.conf 1

Example D Programs

RESULT when copyinstr is removed:# dtrace -s d5.ddtrace: script 'd5.d' matched 1 probedtrace: error on enabled probe ID 1 (ID 44133: syscall::open:entry): invalid address (0x80fbdaf) in action #2 at DIF offset 28dtrace: error on enabled probe ID 1 (ID 44133: syscall::open:entry): invalid address (0x80fbdaf) in action #2 at DIF offset 28^C

/lib/libc.so.1 1 /proc/2647/psinfo 1 /proc/2723/psinfo 1 /proc/2874/psinfo 1 /proc/4680/psinfo 1 /proc/4691/psinfo 1 /proc/4740/psinfo 1 /var/ld/ld.config 1 /dev/null 2 /etc/resolv.conf 2 /var/adm/utmpx 2

Future Work

Performance counter provider Helper actions: Embracing high-level languages and their environments. User lock analysis: lock contention analysis of user-level multi-threaded processes. Fine-grained user-level providers Software visualization

Further Readings to be Googled

DTrace Guide Hidden In Plain Sight, Cantrill B. DTrace Toolkit as a repository of D scripts classified in

terms of application domains like CPU, Disk, Mem, Kernel, Net etc.

DTrace & DTraceToolkit, Stefan Parvu

Discussion

No discussion of how much overhead is introduced when probes are enabled. Safety is considered only as not crashing and halting the system.

What about guarenteeing not violating other requirements of the system like real-time properties?

Cantrill, B., Shapiro, M., and Leventhal, A. 2004.

Documents