Top Banner
DCO 1 Performance Performance Measurement Measurement and and Improvement Improvement Lecture 7 Lecture 7
37

DCO1 Performance Measurement and Improvement Lecture 7.

Dec 18, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: DCO1 Performance Measurement and Improvement Lecture 7.

DCO 1

PerformancePerformance Measurement and Measurement and

ImprovementImprovement

Lecture 7Lecture 7

Page 2: DCO1 Performance Measurement and Improvement Lecture 7.

2

Practical Hints

Give you the practical methods to enhance the programming performance

Hidden TroubleFast Allocation and Free

Page 3: DCO1 Performance Measurement and Improvement Lecture 7.

3

Hidden Trouble

First look at the memory allocation, malloc()malloc in printf malloc

Strings

Malloc, memory

allocation

Page 4: DCO1 Performance Measurement and Improvement Lecture 7.

4

malloc in printf

printf causes malloc to be called in the usual implementation.

This can add an unexpected cost.

String manipulation is expensive in general whether it is formatting text as in printf,

reading ASCII text and converting to numbers,

or performing string comparisons

Better not to use printf() Use puts()

Page 5: DCO1 Performance Measurement and Improvement Lecture 7.

5

Malloc

malloc is (or new in C++) expensive.A common solution is to use static or local variables to avoid allocating memory on the heap. Another solution is to keep a list of objects that need to be allocated often. Then, allocation is just a matter of removing an object from the list, and freeing simply inserts the object on the listBetter not to use malloc()

Page 6: DCO1 Performance Measurement and Improvement Lecture 7.

6

Strings

Microsoft Foundation Class (MFC) CString class allocates dynamic memory. This is great if you want to avoid managing memory yourself and you want to avoid nasty bugs due to writing data beyond the end of allocated string memory. On the other hand, if you find that memory allocation is taking significant time in an inner loop, you might want to consider allocating a fixed-length character array as local or static data.

Page 7: DCO1 Performance Measurement and Improvement Lecture 7.

7

Fast Allocation and Free

to obtain faster performance is to use a large block of memory from which smaller chunks are allocated to compute some result. (heap)

After the result is obtained, the entire block is freed.

This is fast because:Memory is allocated simply by incrementing the "free" pointer by the number of bytes you need to allocate. There is no need to free each allocated object;

you free all objects at once by freeing the entire pool at once.

reasons

Page 8: DCO1 Performance Measurement and Improvement Lecture 7.

8

Coding for Speed http://www.abarnett.demon.co.uk/tutorial.html

mainly from this web site

Array Indices Aliases Registers Integers Loop Jamming Dynamic Loop Unrolling Faster for() loops Switch Pointers Early loop breaking Misc Using array indices

There are many ways to speed up

the operation.

Page 9: DCO1 Performance Measurement and Improvement Lecture 7.

9

Array Indices

switch ( queue ) {case 0 :   letter = 'W';   

break; case 1 :   letter = 'S';   

break; case 2 :   letter = 'U';   

break; }

or may be if ( queue == 0 )   letter = 'W'; else if ( queue == 1 )   letter = 'S'; else   letter = 'U';

An example

using switch and

if-else

Page 10: DCO1 Performance Measurement and Improvement Lecture 7.

10

Array Indices

A quicker method is to simply use the value as an index into a character array, eg.

static char *classes="WSU"; letter = classes[queue];

In this case, class[0] means W, class[1] means S and class[2] means U

Page 11: DCO1 Performance Measurement and Improvement Lecture 7.

11

Aliases (1)

void func1( int *data ) {     int i; for(i=0; i<10; i++)     {           

somefunc2( *data, i);   } }

Not very good

Page 12: DCO1 Performance Measurement and Improvement Lecture 7.

12

Aliases – better change to this

void func1( int *data ){    

int i;     int localdata;     localdata = *data;     for(i=0; i<10; i++)     {           

somefunc2( localdata, i);     }

}

Better way

Page 13: DCO1 Performance Measurement and Improvement Lecture 7.

13

Registers – computer is good at register allocation

Use the "register" declaration whenever you can, eg.

register float  val; register double dval; register int    ival;

This will be fster

Page 14: DCO1 Performance Measurement and Improvement Lecture 7.

14

Integers

Use unsigned ints instead of ints if you know the value will never be negative.

Unsigned int a; is better then int a;

Some processors can handle unsigned integer arithmetic considerably faster than signed eg.

unsigned int i; instead of int iInteger arithmetic is faster than floating-point operation

Page 15: DCO1 Performance Measurement and Improvement Lecture 7.

15

Loop Jamming

Never use two loops where one is enough: for(i=0; i<100; i++) {    

stuff(); } for(i=0; i<100; i++) {    

morestuff(); }

Better combine

them

Page 16: DCO1 Performance Measurement and Improvement Lecture 7.

16

Loop Jamming

It would be better to do:

for(i=0; i<100; i++) {     stuff();     morestuff();

}

Page 17: DCO1 Performance Measurement and Improvement Lecture 7.

17

Example – three loops (0.36ms)

Page 18: DCO1 Performance Measurement and Improvement Lecture 7.

18

Example – one loop (0.31ms)

Page 19: DCO1 Performance Measurement and Improvement Lecture 7.

19

Loop Unrolling and Dynamic Loop Unrolling

for(i=0; i<3; i++) {     something(i);

}is less efficient than something(0); something(1); something(2);It is because the code has to check and increment the value of i.

Page 20: DCO1 Performance Measurement and Improvement Lecture 7.

20

Example – two for loops (0.96ms)

Page 21: DCO1 Performance Measurement and Improvement Lecture 7.

21

Example – one for loop (0.52ms)

Page 22: DCO1 Performance Measurement and Improvement Lecture 7.

22

Faster for loop

Ordinarily, you would code a simple for() loop like this:

for( i=0;  i<10;  i++){ ... }

i loops through the values 0,1,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 If you don't care about the order of the loop counter, you can do this instead:

for( i=10; i--; ) { ... }10, 9, 8, 7,……..

Decrement is faster

Page 23: DCO1 Performance Measurement and Improvement Lecture 7.

23

Faster for loop

The syntax is a little strange, but is perfectly legal. The same effect could also be gained by coding:

for(i=10; i; i--){……}

or (to expand it further) for(i=10; i!=0; i--){……}

Page 24: DCO1 Performance Measurement and Improvement Lecture 7.

24

Example – int and increment (1.51ms)

Page 25: DCO1 Performance Measurement and Improvement Lecture 7.

25

Example– unsigned int, decrement (1.29ms)

Page 26: DCO1 Performance Measurement and Improvement Lecture 7.

26

Use witch() instead of if...else...

For large decisions involving if...else...else..., like this:

if( val == 1)     dostuff1();

else if (val == 2)     dostuff2();

else if (val == 3)     dostuff3();

it may be faster to use a switch: switch( val )

{     case 1: dostuff1(); break;     case 2: dostuff2(); break;     case 3: dostuff3(); break; }

Better change to case

Page 27: DCO1 Performance Measurement and Improvement Lecture 7.

27

Pointers

Whenever possible, pass structures by reference ( ie. pass a pointer to the structure )

void print_data( const bigstruct  *data_pointer)

{     ...printf contents of structure... }

Page 28: DCO1 Performance Measurement and Improvement Lecture 7.

28

Early loop breaking

This loop searches a list of 10000 numbers to see if there is a -99 in it. found = FALSE; for(i=0;i<10000;i++) {     if( list[i] == -99 )     {         found = TRUE;     } } if( found ) printf("Yes, there is a -99. Hooray! \n");

This works well but searches the whole list.

Page 29: DCO1 Performance Measurement and Improvement Lecture 7.

29

Early loop breaking

A better way is to abort the search when it is found.

found = FALSE; for(i=0; i<10000; i++) {     if( list[i] == -99 )     {         found = TRUE;         break;     } } if( found ) printf("Yes, there is a -99. Hooray!\n");

Page 30: DCO1 Performance Measurement and Improvement Lecture 7.

30

Suggestion (1)

Avoid using ++ and -- etc. within loop expressions, eg. while(n--){}, as this can sometimes be harder to optimise. Minimize the use of global variables. Declare anything within a file (external to functions) as static, unless it is intended to be global. Use word-size variables if you can, as the machine can work with these better ( instead of char, short, double, bitfields etc. ).

Page 31: DCO1 Performance Measurement and Improvement Lecture 7.

31

Suggestion (2)

Don't use recursion. Recursion can be very elegant and neat, but creates many more function calls which can become a large overhead. Avoid the sqrt() square root function in loops - calculating square roots is very CPU intensive. Single dimension arrays are faster than multi-dimensioned arrays. (a[16] is better than a[4][4])Compilers can often optimise a whole file - avoid splitting off closely related functions into separate files, the compiler will do better if can see both of them together (it might be able to inline the code, for example).

Page 32: DCO1 Performance Measurement and Improvement Lecture 7.

32

Example - without recursion

Page 33: DCO1 Performance Measurement and Improvement Lecture 7.

33

Example - with recursion (366 ms), I already reduced the number of recursions

Page 34: DCO1 Performance Measurement and Improvement Lecture 7.

34

Suggestion (3)

Single precision maths may be faster than double precision - there is often a compiler switch for this. (float is better than double unless you really want it.)Floating point multiplication is often faster than division - use val * 0.5 instead of val / 2.0. Addition is quicker than multiplication - use val + val + val instead of val * 3 puts() is quicker than printf(), although less flexible.

Page 35: DCO1 Performance Measurement and Improvement Lecture 7.

35

Example - float (4 bytes) and double (8 bytes)

Page 36: DCO1 Performance Measurement and Improvement Lecture 7.

36

Suggestion (4)Use #defined macros instead of commonly used tiny functions - sometimes the bulk of CPU usage can be tracked down to a small external function being called thousands of times in a tight loop. Replacing it with a macro to perform the same job will remove the overhead of all those function calls, and allow the compiler to be more aggressive in it's optimisation.. Binary/unformatted file access is faster than formatted access, as the machine does not have to convert between human-readable ASCII and machine-readable binary. If you don't actually need to read the data in a file yourself, consider making it a binary file.

Page 37: DCO1 Performance Measurement and Improvement Lecture 7.

37

Summary

It is better to write a simple but fast program.

There are many ways to speed up the operation in programming.