Lecture 2a: Performance Measurement

Lecture 2a:Lecture 2a:

Performance Performance MeasurementMeasurement

Goals of Performance Analysis

The goal of performance analysis is to provide quantitative information about the performance of a computer system

Goals of Performance Analysis Compare alternatives

• When purchasing a new computer system, to provide quantitative information Determine the impact of a feature

• In designing a new system or upgrading, to provide before-and-after comparison System tuning

• To find the best parameters that produce the best overall performance Identify relative performance

• To quantify the performance relative to previous generations Performance debugging

• To identify the performance problems and correct them Set expectations

• To determine the expected capabilities of the next generation

Performance Evaluation

Performance Evaluation steps:

1. Measurement / Prediction• What to measure? How to measure?

• Modeling for prediction• Simulation

• Analytical Modeling

2. Analysis & Reporting• Performance metrics

Performance Measurement

Interval Timers

• Hardware Timers

• Software Timers


Hardware Timers

• Counter value is read from a memory location

• Time is calculated as

Clock Counter

Tc

n bits to processor memory bus

Time = (x2 - x1) x Tc


Software Timers

• Interrupt-based

• When interrupt occurs, interrupt-service routine increments the timer value which is read by a program

• Time is calculated as

ClockPrescaling Counter

Tc

to processor interrupt input

T’c

Time = (x2 - x1) x T’c


Timer Rollover

Occurs when an n-bit counter undergoes a transition from its maximum value 2n – 1 to zero

There is a trade-off between roll over time and accuracy

T’c 32-bit 64-bit

10 ns 42 s 5850 years

1 s 1.2 hour 0.5 million years

1 ms 49 days 0.5 x 109 years

Timers

Solution:

1. Use 64-bit integer (over half a million year)

2. Timer returns two values:

• One represents seconds

• One represents microseconds since the last second

With 32-bit, the roll over is over 100 years


Interval Timers

T0 Read current timeEvent being timed ();T1 Read current time

Time for the event is: T1-T0

Performance MeasurementTimer Overhead

Initiate read_time

Current time is read

Event begins

Event ends; Initiate read_time

Current time is read

T1

T2

T3

T4

Measured time:

Tm = T2 + T3 + T4

Desired measurement:

Te = Tm – (T2 + T4)

= Tm – (T1 + T2) since T1 = T4

Timer overhead:

Tovhd = T1 + T2

Te should be 100-1000 times greater than Tovhd .

Performance MeasurementTimer Resolution

Resolution is the smallest change that can be detected by an interval timer.

nT’c < Te < (n+1)T’c

If T’c is large relative to the event being measured, it may be impossible to measure the duration of the event.

Performance MeasurementMeasuring Short Intervals

Te < T’c

T’c

Te

T’c

Te

1

0


Solution: Repeat measurements n times. Approximates a binomial distribution.

Average execution time: T’e = (m/n) x T’c

m: number of 1s measured

T’c

Te


Solution: Repeat measurements n times. Measure the total execution time (Tt)

Average execution time: T’e = (Tt / n ) – h

Tt : total execution time of n repetitions

h: repetition overhead

T’c

Te

Tt

Performance Measurement Time

• Elapsed time / wall-clock time / response time• Latency to complete a task, including disk access,

memory access, I/O, operating system overhead, and everything (includes time consumed by other programs in a time-sharing system)

• CPU time• The time CPU is computing, not including I/O time or

waiting time• User time / user CPU time

• CPU time spent in the program• System time / system CPU time

• CPU time spent in the operating system performing tasks requested by the program


UNIX time command

90.7u 12.9s 2:39 65%

Drawbacks:

• Resolution is in milliseconds

• Different sections of the code can not be timed

User time

System time

Elapsed time Percentage of

elapsed time

Timers

Timer is a function, subroutine or program that can be used to return the amount of time spent in a section of code.

t0 = timer(); …< code segment > …t1 = timer();time = t1 – t0;

zero = 0.0;t0 = timer(&zero); …< code segment > …t1 = timer(&t0);time = t1;

Timers

Read:

Wadleigh, Crawford pg 130-136 for:

time, clock, gettimeofday, etc.

TimersMeasuring Timer Resolution

main() { . . .zero = 0.0;t0 = timer(&zero);t1 = 0.0;j=0;while (t1 == 0.0) {

j++;zero=0.0;t0 = timer(&zero);foo(j);t1 = timer(&t0);

}printf (“It took %d iterations for a nonzero time\n”, j); if (j==1) printf (“timer resolution <= %13.7f seconds\n”, t1);else printf (“timer resolution is %13.7f seconds\n”, t1);

}foo(n){ . . .

i=0;for (k=0; k<n; k++)

i++;return(i);

}

TimersMeasuring Timer Resolution

Using clock():

Using times():

Using getrusage():

It took 682 iterations for a nonzero timetimer resolution is 0.0200000 seconds



TimersSpin Loops

For codes that take less time to run than the resolution of the timer First call to a function may require an inordinate amount of time. Therefore the minimum of all times may be desired.

main() { . . .zero = 0.0;t2 = 100000.0;for (j=0; j<n; j++) {

t0 = timer(&zero);foo(j);t1 = timer(&t0); t2 = min(t2, t1);

}t2 = t2 / n;printf (“Minimum time is %13.7f seconds\n”, t2);

}foo(n){ . . .

< code segment >}

Profilers A profiler automatically insert timing calls into applications to

generate calls into applications

It is used to identify the portions of the program that consumes the largest fraction of the total execution time.

It may also be used to find system-level bottlenecks in a multitasking system.

Profilers may alter the timing of a program’s execution

Profilers Data collection techniques

• Sampling-based

• This type of profilers use a predefined clock; every multiple of this clock tick the program is interrupted and the state information is recorded.

• They give the statistical profile of the program behavior.

• They may miss some important events.

• Event-based

• Events are defined (e.g. entry into a subroutine) and data about these events are collected.

• The collected information shows the exact execution frequencies.

• It has substantial amount of run-time overhead and memory requirement.

Information kept

• Trace-based: The compiler keeps all information it collects.

• Reductionist: Only statistical information is collected.

Performance Evaluation

Performance Evaluation steps:

1. Measurement / Prediction• What to measure? How to measure?

• Modeling for prediction• Simulation

• Analytical Modeling

• Queuing Theory

2. Analysis & Reporting• Performance metrics

Predicting Performance

Performance of simple kernels can be predicted to a high degree

Theoretical performance and peak performance must be close

It is preferred that the measured performance is over 80% of the theoretical peak performance

Homework 1

Write a C program to measure the execution time (elapsed time) of an addition operation (i.e. a=b+c). Run your program on both Windows and Linux systems. Use a timer that has at least s resolution.

Prepare a one-page report and explain the following: Your method to measure time Your code Specifications of the system that you run your code (processor, clock

speed, etc.) Your measurement results Comments on your results

Lecture 2a: Performance Measurement

Documents

performance relative

performance problems

timecurrent time

readt1t2t3t4measured

t1 t2 te

ntc te n

tm t1 t2

repeat measurements