Top Banner
High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1 R.Ibrahim (CE Master)
66

High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 1

High Performance Programming with C++

Hafiza Rabbia Ibrahim July 25, 2011

Page 2: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 2

Outline

• Motivation• Return Value Optimization (RVO)• Inlining• Standard Template Library (STL)• Constructor and Destructors• Virtual Functions • Coding Optimization

Page 3: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 3

Motivation

performance

Space efficiency

Time

efficiency

Page 4: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 4

Return Value Optimization (RVO)

Page 5: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 5

Why ?

Methods must

return an objectCreate an object

to return

Constructing object is time

consuming

“The optimization often performed by the compilers to speed up your source code by transferring it and eliminating object

creation.”

Page 6: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 6

• For instance, let’s walk through a simple example of complex numbers:

Without optimization, the compiler generated code for Complex _ Add() is:

void Complex_Add ( const Complex& __ tempResult, const Complex& c1, const Complex& c2){ struct Complex retVal; retVal . Complex :: Complex( ); //construct retVal retVal . real = a . real + b . real; retVal . imag= a . imag+ b . imag; __tempResult .Complex :: Complex (retVal); // copy - construct // __tempResult retVal. Complex :: ~ Complex ( ); // Destroy retVal

return;}

Page 7: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 7

• The compiler can optimize the Complex _ Add( ) by eliminating the local object retVal and replacing it with __tempResult. This is RVO:

void Complex _Add ( const Complex& __tempResult, const Complex& c1, const Complex& c2){ __ tempResult . Complex :: Complex ( ); //construct__tempResult

__ tempResult . real = a . real + b . real ;

__ tempResult . imag = a . imag + b. imag ;

return ; }

Page 8: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 8

with RVO without RVO 0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

1.3

1.89

Seco

nds

Execution time comparison

Page 9: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 9

Is it mandatory?

• NO!

• The application of RVO is up to the discretion of compiler implementation. You need to consult your compiler documentation or experiment to find if and when RVO is applied.

Page 10: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 10

INLINING

• Method Invocation Costs

What we are avoiding?

• Optimization tricksHow we are avoiding?

Page 11: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 11

What we are avoiding: Method Invocation Costs

REGISTERS

Register 0

Register 1

Register 2

-------------

Register X

Argument Pointer

Frame Pointer

Stack Pointer

Instruction Pointer

MEMORY

Variable passed in as argument to the method

Registers used by and therefore saved by the method

Memory allocated for the method’s automatic variables

Arguments pushed on the stack in preparation for a call

Unused memory

• 6 to 8 registers are saved• Consumption of at least 40 cycles (data movement to and from memory) Expansive in terms of machine cycles!

Page 12: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 12

Why Inline?• most significant performance enhancement technique available in C++.

Program’s Fast Path• the portion a program that supports the normal , error free, common usage

cases of he program’s execution.• typically less than 10% of the program’s code lies on this fast path.

Inlining and Fast Path

“ Inlining allows us to remove calls from the fast path.”

Page 13: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 13

Inlining Performance Story

Performance of avoiding expensive method

invocation

Cross Call Optimization Performance

Page 14: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 14

Performance gain of Avoiding method invocation

#include <iostream.h>//inlineint calc (int a, int b){ return a + b;}

int main (){ int x[1000] ; int y[1000] ; int z[1000] ;

for(int i=0; i<1000; ++i) {

for(int j=0; j<1000; ++j) {

for(int k=0; k<1000; ++k) {

z[i] = calc(y[j] , x[k] ); } } }}

when outlined: 62 seconds execution time

when inlined: 8 seconds execution time

Inlining provided here, 8x performance gain

Page 15: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 15

Performance gain of Cross Call OptimizationTake the form of doing things at compile time to avoid the necessity of doing at run time. For instance;

enum TrigFuns {SIN, COS, TAN}//inlinefloat calc_trig (TRIG_FUNS fun, float val){ switch (fun) { case SIN: return sin(val) ; case COS: return cos(val) ; case TAN: return tan(val) ; }}

//inlineTrigFuns get_trig_fun(){ return SIN;}

Page 16: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 16

Performance gain of Cross Call Optimization (cont.)

//inlinefloat get_float() { return 90; }

void calculator(){ --- TrigFuns tf = get_trig_fun() ;

float value = get_float() ;

reg0 = calc_trig ( tf, value) ;

---}

If inlined: simple optimization and calculations

If outlined: no one method optimization is possible, intra-method optimization is only possible

Page 17: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 17

Why not Inline? If Inlining is that good, why don’t you inline everything?

Page 18: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 18

Issues with Inlining

• Size of program source code increases

• Storage issues

multiple instances -> each has unique address each has storage in cache -> decrease in cache size capacity miss rate of cache

• Degenerative characteristics

exponential code growth

Page 19: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 19

int D( ){ . . . // 500 code bytes of functionality}

int C( ){ D( ) ; . . . // 500 code bytes of functionality D( ) ;}

int B( ){ C( ) ; . . . // 500 code bytes of functionality C( ) ;}

int A( ){ B( ) ; . . . // 500 code bytes of functionality B( ) ;}

int main ( ){ A( ) ; A( ) ; A( ) ; A( ) ; A( ) ; A( ) ; A( ) ; A( ) ; A( ) ; A( ) ;}

Inlining A,B,C,D will increase the code size by more than 70k bytes i.e.; 37x increase.

Page 20: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 20

When you should inline to be optimized?

Dynamic Frequency

Large (more than 20 lines of code)

Medium (between 5 and 20lines of code)

Small (less than 5 lines

of code)

Low (the bottom 80% of

call frequency)

Don't inline Don't inline Inline if you have the time and

patience

Medium (the top 5–20% of

call frequency)

Don't inline Consider rewriting the method to

expose its fast path and then inline

Always Inline

High (the top 5% of call

frequency)

Consider rewriting the method to

expose its fast path and then inline

Selectively inline the high frequency static invocation

points

Always Inline

Page 21: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 21

How we are avoiding: Inlining Optimization Tricks

Conditional Inlining

Outlined in .C file, Inlined in .inlFile: x . h:Class X{ ... int y (int a); };#if defined( INLINE)#include x.inl#endif

File: x .inl :#if !defined (INLINE)#define inline#end ifinline int X::y (int a){ .... }

File x.c:

#if !defined (INLINE)#include x.inl#endif

When INLINE is not defined , the .h file will not include the inlined methods , but rather these methods will be included in the .c file, and the inline directive will be stripped from the front of each method.

Page 22: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 22

Selective Inlining : Inlining specific parts in a method

File: x. h:Class x {public: int inline_y (int a) ; int y (int a) ; };#include "x. inl" File: x. inl:inline int x: :inline_y (int a){ .... } //original implementation of y

File: x . c:int x :: y (int a){ return inline_y(a); }

Page 23: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 23

concluding words about Inlining

• Inlining “might” improve the performance.

• Inlining may backfire i.e.; increase the size of the code

Be sure about the real cost of calls on your system before using Inlining!

Page 24: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 24

Standard Template Library(STL)

Page 25: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 25

Questions to be answered

Faced with a given computational task, what containers should I use? Are some better than others for a given scenario?

How good is the performance of the STL? Can I do better by rolling my own home-grown containers and algorithms?

Page 26: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 26

Execution time Comparisons

vector<int> list<int>0

100

200

300

400

500

600

700

800

900

800

10

Mill

isec

onds

INSERTING AT THE FRONT

Page 27: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 27

Execution time Comparisons (cont.)

vector<int> list<int>0

100

200

300

400

500

600

700

800

700

7

Mill

isec

onds

DELETING ELEMENTS AT THE FRONT

Page 28: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 28

Execution time Comparisons (cont.)

array vector list0

500

1000

1500

2000

2500

3000

110 110

2600

Mill

isec

onds

Container traversal speed

Page 29: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 29

Can I do better?

STL HOME GROWN

char *s = “abcde” ;reverse (&s[0] , &s[5] ) ;

// reverse sequence is required

char *s = "abcde";char temp;temp = s[4] ; // s[ 0] <-> s[4]s[ 4] = s[0] ;s[ 0] = temp;

temp = s[3] ; // s[ 1] <-> s[3]s[ 3] = s[1] ;s[ 1] = temp;

Page 30: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 30

Comparison STL speed to Home-grown code

STL HOME GROWN0

10

20

30

40

50

60

55

14

Mill

isec

onds

Page 31: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 31

Conclusions about STL performance

Outperforming the STL is possible.

Bend over backwards to concoct scenarios in which a home grown implementation outperforms the STL.

Outperforming STL ,home grown implementation should have something better that STL does NOT have!

Page 32: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 32

Constructors and Destructors

Page 33: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 33

Why this analysis?

• The performance of constructors and destructors is often poor due to the fact that an object's constructor (destructor) may call the constructors (destructors) of member objects and parent objects.

• This can result in constructors (destructors) that take a long time to execute, especially with objects in complex hierarchies or objects that contain several member objects.

• Hence a Performance Hit!

Page 34: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 34

Connection b/w cost of constructor/destructor and Inheritance based design

• Encounter: Implementation of thread synchronization constructors

• In multithreaded applications ,there should be thread synchronization to restrict concurrent access to shared resources

• Thread synchronization constructs can be any of :

Semaphore Mutex Critical Section

Page 35: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 35

Strategy:

• Encapsulate the lock in an object e.g. MutexLock object• Let the constructor obtain the lock• Destructor will release the lock automatically (as it does for regular

objects)• Compiler inserts a call to the lock destructor prior to each return statement• And the lock is always released!

Page 36: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 36

Performance Comparison constructors destructor behaviour with Mutex in case of

• Non-inherited object• inherited object

Page 37: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 37

Lock class implementation

Class Lock{public: Lock (pthread_mutex_t& key) : theKey(key) { pthread_mutex_lock(&theKey) ; } ~Lock() { pthread_mutex_unlock(&theKey) ; }

private: pthread_mutex_t &theKey;};

Page 38: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 38

BaseLock class implementation

class BaseLock{

public:

BaseLock ( pthread_mutex_t &key, LogSource &lsrc) {}; virtual ~BaseLock() {};

};

This class is intended as a root class for the various lock classes that are expected to be derived from it.

Page 39: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 39

Subclass of BaseLock: MutexLock class implementation

class MutexLock : public BaseLock { public: MutexLock (pthread_mutex_t &key, LogSource &lsrc) ; //constructor ~MutexLock() ; //destructor

private: pthread_mutex_t &theKey;

LogSource &src;

};

LogSource object is meant to capture filename and source code line where the object was constructed.

Page 40: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 40

MutexLock constructor

MutexLock: :MutexLock( pthread_mutex_t& aKey, const LogSource& source) : BaseLock(aKey, source) , theKey(aKey) , src(source)

{ pthread_mutex_lock (&theKey) ;

#if defined(DEBUG)

cout <<"MutexLock“<< &aKey<< "created at”<<src.file()<<"line"<<src.line()<<endl;

#endif

Page 41: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 41

MutexLock Destructor

MutexLock : : ~MutexLock ( ) { pthread_mutex_unlock(&theKey);

#if defined(DEBUG)

cout<<"MutexLock"<<&aKey<<“destroyed at"<<src.file()<<"line"<< src.line()<<endl;

#endif

}

Page 42: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 42

Non-inherited Mutex Object

int main() { . . . // Start timing here

for (i = 0; i < 1000000; ++i ) { SimpleMutex m( mutex); //using constructor to lock and destructor to unlock sharedCounter++; }

//stop timing here ....}

SimpleMutex object from a class containing acquire( ) and release( ) methods

Page 43: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 43

Inherited Mutex Object

int main( ) { . . . // Start timing here

for (i = 0; i < 1000000; i++) { DerivedMutex m(mutex); //using constructor to lock and destructor to unlock sharedCounter++; } // Stop timing here . . .}

replace SimpleMutex by DerivedMutex ( object of a derived class from BaseMutex)

Page 44: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 44

Non Inherited case

Inherited case0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

1.01

1.62

inheritance costse

cond

s

Execution Time comparison

Page 45: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 45

Concluding Remarks

• Distinguish b/w over all computational cost, required cost, and computational penalty.

• Eliminate the one which is not important by some other mechanism• Over all cost increases with the size of derivation tree.

Page 46: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 46

VIRTUAL FUNCTIONS

Page 47: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 47

Inflict on performance

• Class with Virtual function -> virtual function table (vtbl) -> assigns each object a pointer -> vptr.

Virtual functions seem to inflict a performance cost in several ways:

The vptr must be initialized in the constructor

VFs are called using pointer indirection, resulting a few extra instructions per method invocation.

Inlining is compile time decision. The compiler cannot inline VFs whose resolution takes place at run time.

Page 48: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 48

Performance Comparison for virtual and Non-virtual methods

class Virtual{

private:

int mv;

public:

Virtual( ) { mv = 0; } virtual ~Virtual( ) {} virtual int foo( ) const { return (mv); }

};

•Creating virtual objects costs more than creating non-virtual objects, because the virtual function table must be initialized. •And it takes slightly longer to call virtual functions, because of the additional level of indirection.

Page 49: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 49

Performance Comparison for virtual and Non-virtual methods (cont.)

class NonVirtual {

private:

int mnv;

public:

NonVirtual( ) { mnv = 0; } ~NonVirtual( ) {} int foo( ) const { return (mv); }

};

Page 50: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 50

• Construction/destruction shows the performance penalty of initializing the virtual function table. • Virtual function invocation is slightly expensive than invoking a function through a function pointer : memory overhead.

ctor/dtor foo0

0.10.20.30.40.50.60.70.80.9

11 1

0.73

0.96

virtual non virtual

Page 51: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 51

If a specific virtual function creates a performance problem for you, what are your options?

To eliminate a virtual call, you must allow the compiler to resolve the function binding at compile time.

You bypass dynamic binding by

– hard-coding (derive distinct classes from string: CriticalSection)

– inheritance (derive a single ThreadSafeString class that contains a pointer to a Locker object. Use polymorphism to select the particular synchronization mechanism at runtime)

– templates (Create a template-based string class parameterized by the Locker type.)

Page 52: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 52

Hard-Coding (synchronization mechanism example)

• Standard string class serves as a base class

class CriticalSectionString : public string {public: . . . int length( ) ;private: CriticalSectionLock cs;};

int CriticalSectionString::length(){ cs . lock (); int len = string :: length () ; cs . unlock (); return len;}

+ Although lock() and unlock() are VFs, they can be resolved statically by compiler. The compiler can bypass the dynamic binding and choose correct lock() and unlock() to use.+ it allows the compiler to inline those calls.

- you need to write a separate string class for each synchronization flavour ->

poor code reuse!

Page 53: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 53

Inheritance

• Implementing a string class for each synchronization mechanism is a pain so you can factor out the synchronization choice into a constructor argument.

class ThreadSafeString : public string {

public:

ThreadSafeString (const char *s, Locker *lockPtr) : string(s) , pLock(lockPtr) { } . . . int length() ;

private:

Locker *pLock; //pointer to the Locker object

};

Page 54: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 54

//The length( ) method is now implemented as follows: int ThreadSafeString: : length(){

pLock->lock();

int len = string: : length() ;

pLock->unlock() ;

return len;

}

+ more compact than the previous one

- the lock( ) and unlock( ) virtual calls can only be resolved at execution time and hence cannot be inlined

Page 55: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 55

Templates• Templates combine best of the both worlds reuse and efficiency

template <class LOCKER>

class ThreadSafeString : public string {public: ThreadSafeString(const char *s) : string(s) {} . . . int length() ;private: LOCKER lock;};

//The length method implementation is similar to the previous ones:template <class LOCKER>Inlineint ThreadSafeString<LOCKER>: :length(){ lock.lock() ; int len = string: : length() ; lock.unlock() ; return len;}

+ provides a relief from the virtual function calls to lock() and unlock().

+ enables the compiler to resolve the virtual calls and inline them.

+ push the type resolution to compile time.

Page 56: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 56

Coding Optimizations

Page 57: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 57

Caching

• Remembering the results of frequent and costly computations• So, you will not have to perform those computations over and over again• For instance; evaluating the constant expression inside a loop is inefficient

for( ...; !done; ... ) { done = patternMatch (pat1, pat2 , isCaseSensitive ( ) ); }

// 2 string patterns -> compared to third argument (a function itself) // isCaseSensitive is independent of loop iterations

Page 58: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 58

int isSensitive = isCaseSensitive();

for(... ; !done; ... ) {

done = patternMatch (pat1, pat2, isSensitive);

}

Now you compute case sensitivity once , cache it in local variable and reuse it!

Page 59: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 59

Useless Computations• Pointless computations whose results are never used!• For instance: wasted initialization of a member object

class Student {public: Student(char *nm) ; . ..private: string name;};// the Student constructor turns the input character pointer into a string object representing the student's name:

Student: :Student(char *nm) { name = nm; . . .} //the constructor body follows with an invocation of:name = nm;

Page 60: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 60

The previous one wipes away the contents of compiler generated calls to the String default constructor , we can eliminate this pointless computation by using an explicit string constructor:

Student :: Student (char *nm) : name (nm)

//explicit string constructor

{ ....

}

Page 61: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 61

Lazy Evaluation• You should not perform costly computations “just in case.”• We ought to delay object definition to the scope where it is being used.• For instance, a code routed messages between downstream and upstream

communication adapters. One of the objects we used was very expensive to construct and destroy:

int route(Message *msg){ ExpensiveClass upstream(msg) ; if (goingUpstream) { . .. // do something with the expensive object } //upstream object not used here return SUCCESS;}

Page 62: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 62

• Upstream object expensive used only 50% of the time.• A better solution would define the expensive upstream object in the scope

where it is actually necessary:

int route(Message *msg){ if (goingUpstream) { ExpensiveClass upstream(msg) ; // do something with the expensive object }

//upstream object not used here

return SUCCESS;

}

Page 63: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 63

80-20 Rule: Speed up the common path• 80% of the execution scenarios will traverse only 20% of your source

code, and 80% of the elapsed time will be spent in 20% of the functions encountered on the execution path.

• For instance evaluation order of sub-expressions:

if (e1 || e2)

{ ... }

• If e1 and e2 are equally likely to evaluate TRUE sub-expression with smaller computational placed first!

• If e1 and e2 are of equal computational cost most likely to be TRUE placed first!

• p1 = conditional probability of e1 being TRUE • c1 = computational cost of e1

Page 64: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 64

Cost = c1+ (1- p1) *c2

• If e2 evaluates TRUE 100% of the time p2 = 1.0• If e1 evaluates TRUE 90% of the time p1 = 0.9• c1= 10 instructions; c2 = 100 instructions

Cost = 10 + 0.1*100 = 20

• If we flip e1 and e2 i.e. : if(e2 || e1)• Cost = c2 + (1-p2) *c1

Cost = 100 + 0*100 = 100

• So,

if ( e1|| e2) is better choice than if (e2 || e1)

Page 65: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 65

Concluding words about coding optimizations

• Are you ever going to use the result?

It sounds silly, but it happens. At times we perform computation and never use the results

• Do you need the results now?

Defer a computation to the point where it is actually needed. Premature computations may never be used on some execution flows.

• Do you know the result already?

We do costly computations even thought their results are available already two lines above. If you already computed it earlier in the execution flow, make the result available for reuse.

Page 66: High Performance Programming with C++ Hafiza Rabbia Ibrahim July 25, 2011 1R.Ibrahim (CE Master)

R.Ibrahim (CE Master) 66

Thank you for your attention.

Questions...?