Top Banner

of 87

C-Programming-Optimization Techniques Class 4

May 30, 2018

Download

Documents

jack_harish
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    1/87

    1

    Optimization Techniques

    Session-4

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    2/87

    2

    Session Topics

    Compute-bound checking Memory-bound checking IO-bound checking Safe C programs

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    3/87

    3

    Session Objectives

    To know the different optimization techniques forembedded systems design when compiler is not

    enough

    To understand how to write safe C code

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    4/87

    4

    When Compiler Is Not Enough

    Locate spots that could be improved dont waste time improving code that is rarely used!

    Profiling tools: gprof(1) gcov (a gcc utility) OProfile Linux Trace Toolkit (LTT)

    Determine what to concentrate on Use time(1) to determine if program is compute-bound, memory-bound, io-bound, or not bound atall.

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    5/87

    Profiling Tools 5

    Compute-Bound

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    6/87

    6

    Compute-Bound

    Choose a Better Algorithm Write Clear, Simple Code Perspective Understand Compiler Options Inlining

    Loop Unrolling Loop Jamming Loop Inversion Strength Reduction Loop Invariant Computations Code for Common Case Tail Recursion Elimination Table Lookup Sorting

    Variables Function Calls Digestibility String Operations FP Parallelism

    Get a Better Compiler Stack Usage Code it in Assembly Shared Library Overhead Machine-Specific Optimization

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    7/87

    7

    Compute-Bound

    Most compute-bound programs can be translated tomemory-bound ones with the use of lookup tables

    When all else fails... rewrite slow code in assembly

    Hand-coded assembly Some software modules are best written in assembly

    language This gives the programmer an opportunity to make them as

    efficient as possible

    Though most C/C++ compilers produce much better machinecode than the average programmer, a good programmer canstill do better than the average compiler for a given function

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    8/87

    8

    Choose A Better Algorithm

    Also choose an appropriate data structure

    If you'll be doing a lot of insertions and deletions

    at random places then a linked list would be good

    If you'll be doing some binary searching, an arraywould be better.

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    9/87

    9

    Write Clear, Simple Code

    Some of the very things that make code clearand readable to humans also make it clearand readable to compilers

    Complicated expressions are harder to

    optimize and can cause the compiler to"fallback" to a less intense mode ofoptimization

    Part of the clarity is making hunks of code intofunctions when appropriate The cost of a function call is extremely small on

    modern machines, so optimization is NOT a validexcuse for writing ten-page functions.

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    10/87

    10

    Perspective

    A sure sign of misunderstanding is thisfragment:

    if (x != 0) x = 0;The intent is to save time by not initializing x if

    it's already zero In reality, the test to see whether it's zero or

    not will take up about as much time as settingit to zero itself would have

    x = 0;has the same effect and will be somewhat

    faster.

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    11/87

    11

    Understand Your Compiler

    Options

    Some compilers have special #pragmasor keywords (for example, inline) which

    also affect optimization.

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    12/87

    12

    In-lining of Functions

    eplacing a call to a function with the function's code iscalled in-lining

    Benefit: reduction in procedure call overheads andopportunity for additional code optimizations

    Danger: code bloat and negative instruction cacheeffects Appropriate when small and/or called from a small

    number of sites

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    13/87

    13

    Loop Unrolling

    This can make a BIG difference. It is well known that unrolling loops can produce considerable savings,

    e.g.

    for(i=0; i

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    14/87

    14

    Loop Unrolling

    Compilers will often unroll simple loopslike this, where a fixed number ofiterations is involved, but something like

    for(i=0;i

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    15/87

    15

    Loop Unrolling

    Simplest effect of loop unrolling: fewer test/jumpinstructions (fatter loop body, less loop overhead)

    Fewer loads per flop May lead to threaded code that uses multiple FP units

    concurrently (instruction-level parallelism) How are loops handled that have a trip count which is

    not a multiple of the unrolling factor? Already fat loops do hardly benefit from unrolling

    (instruction cache capacity!) Very short loops may suffer from unrolling or benefit

    strongly

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    16/87

    16

    Loop Unrolling

    Doing multiple iterations of work in each iteration iscalled loop unrolling

    Benefit: reduction in looping overheads and

    opportunity for more code opts.

    Danger: code bloat, negative instruction cache effects,and non-integral loop div.

    Appropriate when small and/or called from small

    number of sites

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    17/87

    17

    Loop Unrolling: Making Fatter Loop

    Bodies

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    18/87

    18

    Loop Unrolling :Improving Flop/Load

    Ratio

    Analysis of the flop-to-load-ratio often unveilsanother benefit of unrolling:

    do i= 1,N

    do j= 1,M

    y(i)=y(i)+a(j,i)*x(j)

    enddo

    enddo

    Innermost loop: two loads and two flopsperformed; i.e., we have one load per flop

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    19/87

    19

    Loop Unrolling :Improving Flop/Load

    Ratio

    do i= 1,N,2 Both loops unrolled twicet1= 0

    t2= 0

    do j= 1,M,2t1= t1+a(j,i) *x(j) +a(j+1,i) *x(j+1)

    t2= t2+a(j,i+1)*x(j) +a(j+1,i+1)*x(j+1)enddo

    y(i) = t1

    y(i+1)= t2

    enddo

    Innermost loop: 8 loads and 8 flops! Exposes instruction-level parallelism How about unrolling by 4? Watch register spill!

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    20/87

    20

    Loop Jamming

    Never use two loops where one will suffice:

    for(i=0; i

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    21/87

    21

    Loop Jamming

    It would be better to do:for(i=0; i

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    22/87

    22

    Loop Inversion

    Some machines have a special instruction fordecrement and compare with 0

    Assuming the loop is insensitive to direction, try thisreplacment:for (i = 1; i

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    23/87

    23

    Strength Reduction

    Strength reduction is the replacement of an expressionby a different expression that yields the same valuebut is cheaper to compute

    Many compilers will do this for you automatically

    The classic examples: x = w % 8;y = pow(x, 2.0);z = y * 33;

    for (i = 0; i < MAX; i++)

    {

    h = 14 * i;printf("%d", h);

    }

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    24/87

    24

    Strength Reduction

    It would be better to do:x = w & 7; /* bit-and cheaper than remainder */

    y = x * x; /* mult is cheaper than power-of */

    z = (y

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    25/87

    25

    Induction Variables and Strength

    Reduction

    A variable X is called an induction variable of a loop Lif every time the variable X changed value, it isincremented or decremented by some constant

    When there are 2 or more induction variables in aloop, it may be possible to get rid of all but one

    It is also frequently possible to perform strengthreduction on induction variables the strength of an instruction corresponds to its

    execution cost Benefit: fewer and less expensive operations

    t4 = 0

    label_XXX

    t4 += 4

    t5 = a[t4]

    if (t5 > v) goto label_XXX

    t4 = 0

    label_XXX

    j = j + 1

    t4 = 4 * j

    t5 = a[t4]

    if (t5 > v) goto label_XXX

    AfterBefore

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    26/87

    26

    Loop Invariant Computations

    Any part of a computation that does not depend on theloop variable and which is not subject to side effectscan be moved out of the loop entirely

    Try to keep the computations within the loop simpleanyway, and be prepared to move invariantcomputations out yourself: there may be somesituations where you knowthe value won't vary, butthe compiler is playing it safe in case of side-effects

    "Computation" here doesn't mean just arithmetic; array

    indexing, pointer dereferencing, and calls to purefunctions are all possible candidates for moving out ofthe loop.

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    27/87

    27

    Loop Invariant Computations

    In loops which call other functions, you might be ableto get some speedup by ripping the subroutines apartand figuring out which parts of them are loop-invariantfor that particular loop in their callerand calling thoseparts ahead of time

    This is not very easy and seldom leads to muchimprovement unless you're calling subroutines whichopen and close files repeatedly or malloc and freelarge amounts of memory on each call or somethingelse drastic.

    A common but not-always-optimized-away case is therepeated use of an expression in successivestatements

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    28/87

    28

    Loop Invariant Computations

    Old code:

    total =

    a->b->c[4]->aardvark +

    a->b->c[4]->baboon +

    a->b->c[4]->cheetah +

    a->b->c[4]->dog;

    New code:

    struct animals * temp = a->b->c[4];

    total =

    temp->aardvark +temp->baboon +

    temp->cheetah +

    temp->dog;

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    29/87

    29

    Code For Common Case

    In a section of code which deals with severalalternative situations, place at the beginningthe tests and the code for the situations whichoccur most often

    Frequently, this takes the form of a long trainof mutually exclusive if-then-else's, of whichonly one will get executed.

    By placing the most likely one first, fewer if's

    will need to be performed over the long term.But if the conditions are simple things like x ==

    3, consider using a switch statement

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    30/87

    30

    Tail Recursion Elimination (TRE)

    When a recursive function calls itself, an optimizercan, under some conditions, replace the call with anassembly level equivalent of a "goto" back to the top ofthe function

    The saves the effort of growing the stack, saving andrestoring registers, and any other function calloverhead

    For very small recursive functions that make zillions ofrecursive calls, TRE can result in a substantial

    speedup With proper design, the TRE can take a recursive

    function and turn it into whatever is the fastest form ofloop for the machine

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    31/87

    31

    Tail Recursion Elimination (TRE)

    int isemptystr(char * str){

    if (*str == '\0') return 1;

    else if (! isspace(*str)) return 0;

    else return isemptystr(++str);

    }

    The above can have TRE applied to the final return statement becausethe returned value from this invocation of isemptystr will be exactly that ofthe n+1th invocation, with no further computation.

    And now a counterexample:

    int factorial(int num)

    {

    if (num == 0) return 1;else return num * factorial(num - 1);

    }

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    32/87

    32

    Tail Recursion Elimination (TRE)

    The above cannot have TRE applied because thereturned value is not used directly: it is multiplied bynum after the call, so the state of that invocation mustbe maintained until after the return. Even a compilerthat supports TRE cannot use it here.

    And now a counter-counterexample, a rewrite of thefactorial program to allow TRE optimization.

    int factorial(int num, int factor)

    {

    if (num == 0) return factor;else return factorial(num - 1, factor * num);

    }

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    33/87

    33

    Table Lookup

    Consider using lookup tables especially if a computation is iterative or recursive,e.g. convergent series or factorial. (Calculations that take constant time can oftenbe recomputed faster than they can be retrieved from memory and so do notalways benefit from table lookup.)

    Old code:

    long factorial(int i)

    {

    if (i == 0)

    return 1;

    else

    return i * factorial(i - 1);

    } New code:

    static long factorial_table[] =

    {1, 1, 2, 6, 24, 120, 720 /* etc */};long factorial(int i)

    {

    return factorial_table[i];

    }

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    34/87

    34

    Sorting

    For nearly all situations, the library qsortfunction is speedy enough to make

    implementation of your own sort

    algorithm unnecessaryOften the strcmp optimizations are

    helpful.

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    35/87

    35

    Variables

    Avoid referring to global or static variablesinside the tightest loops

    Don't use the volatile qualifier unless you reallymean it

    Avoid passing addresses of your variables toother functions

    The optimizer has to assume that the calledfunction is capable of stashing a pointer to thisvariable somewhere and so the variable couldget modified as a side effect of calling whatseems like a totally unrelated function.

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    36/87

    36

    Variables

    Example:a = b();

    c(&d);Because d has had its address passed to

    another function, the compiler can no longerleave it in a register across function calls.

    It can however leave the variable a in aregister

    The register keyword can be used to trackdown problems like this; if d had beendeclared register the compiler would have towarn that its address had been taken.

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    37/87

    37

    Function Calls

    Function calls interrupt an optimizer's train ofthought in a drastic way Any references through pointers or to global

    variables are now "dirty" and need to be

    saved/restored across the function call Local variables which have had their address taken

    and passed outside the function are also now dirty

    There is some overhead to the function call

    itself as the stack must be manipulated andthe program counter altered by whatevermechanism the CPU uses.

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    38/87

    38

    Function Calls

    If the function being called happens to bepaged out, there will be a very long delaybefore it gets read back in For functions called in a loop it's unusual for the

    called function to be paged out until the loop is

    finished, but if virtual memory is scarce, calls toother functions in the same loop may demand thespace and force the other function out, leading tothrashing

    Most linkers respect the order in which you list

    object files, so you can try to get functions neareach other in hopes that they'll land on the samepage.

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    39/87

    39

    Digestibility

    Straight-line code, even with an extrastatement or two, will run faster than

    code full of if's, &&'s, switch's, and goto's

    Pipelining processors are much happierwith a steady diet of sequential

    instructions than a bunch of branches,

    even if the branches skip someunnecessary sections.

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    40/87

    40

    String Operations

    Most of the C library str* and mem*functions operate in time proportional to

    the length(s) of the string(s) they are

    givenIt's quite easy to loop over calls to these

    and wind up with a significant bottleneck.

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    41/87

    41

    String Operations

    strlen Avoid calling strlen() during a loop involving the string itself Even if you're modifying the string, it should be possible to

    rewrite it so that you set x = strlen() before the loop and thenx++ or x-- when you add or remove a character.

    strcat When building up a large string in memory using strcat, it will

    scan the full (current) length of the string on each call If you've been keeping track of the length anyway (see above)

    you can index directly to the end of the string and strcpy or

    memcpy to there. strcmp

    You can save a little time by checking the first characters ofthe strings in question before doing the call

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    42/87

    42

    Stack Usage

    A typical cause of stack-related problems ishaving large arrays as local variables

    In that case the solution is to rewrite the code

    so it can use a static or global array, orperhaps allocate it from the heap

    A similar solution applies to functions which

    have large structs as locals or parameters

    Recursive functions, even ones which havefew and small local variables and parameters,

    can still affect performance

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    43/87

    43

    Stack Usage

    int func1(){

    int a, b, c, etc;

    do_stuff(a, b, c)if (some_condition)

    return func2();

    else

    return 1;

    }

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    44/87

    44

    Code It In Assembly

    Estimates vary widely, but a competenthuman writing assembly-level code can

    produce code which runs about 10%

    faster than what a compiler with fulloptimization on would produce from well-

    written high-level source

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    45/87

    45

    Shared Library Overhead

    Calling a dynamically linked function isslightly slower than it would be to call it

    statically

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    46/87

    46

    Machine-specific Optimization

    As with other machine-specific code, youcan use #ifdef to set off sections of code

    which are optimized for a particular

    machineCompilers don't predefine RISC or

    SLOW_DISK_IO or HAS_VM or

    VECTORIZING so you'll have to comeup with your own and encode them into

    a makefile or header file.

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    47/87

    47

    Optimizing sorts

    Almost 60% of time spent in strcmp called by insert_sort strcmp compares two strings and returns int

    0 if equal, negative if first is ``less than'' second, positive

    otherwise

    Replace strcmp(a,b) call with some initialcompares

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    48/87

    48

    Optimizing sorts

    if (a[0] < b[0]) {result is neg

    }

    if (a[0] == b[0]) {

    if (a[1] < b[1]) {

    result is neg

    }if (a[1] == b[1]) {

    if (strcmp(a,b)

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    49/87

    Profiling Tools 49

    Memory-Bound

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    50/87

    50

    Memory-Bound

    Locality of Reference Column-major Accessing Don't Copy Large Things Split or Merge Arrays Reduce Padding Increase Padding March Forward Beware the Power of Two Memory Leaks Be Stringy Hinting Fix the Problem in Hardware Cache Profilers

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    51/87

    51

    Locality of Reference

    Locality of reference significantly improvesmemory performance through the use of

    caches

    Temporal localitymost recently used data items or instructions are

    more likely to be available in cache

    Spacial locality

    data items or instructions that are close together inmemory are more likely to be in cache when

    needed

    Locality Of Reference : Using The

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    52/87

    52

    Locality Of Reference : Using The

    Cache

    Try to keep data as close to the CPU as possible align data structures and data access to cacheline boundaries

    __attribute__ ((aligned (L1_CACHE_BYTES) ))

    place most frequently used structure members first allow compiler to pad structure members to the CPUs

    preferred data alignment avoid array sizes and array strides that are integer multiples

    of the cache size

    The cacheline prefetch use the prefetch instruction ahead of a memory access to

    guarantee that data is available in cache when its needed(see linux/prefetch.h)

    using prefetch instruction is not portable to all platforms

    Locality Of Reference : Using Virtual

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    53/87

    53

    Locality Of Reference : Using Virtual

    Memory

    TLB - Translation Lookaside Buffer this cache is used to store recent translations

    between virtual and physical addresses the fewer memory pages used the more effective

    utilization of the TLB for every miss in this table the kernel is called to

    make the translation -- this is an expensiveoperation

    Pagingmemory pages can be swapped out to disk when

    physical memory is used up programs should manipulate data in small working

    sets to minimize page faults

    Locality Of Reference : Better Stack

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    54/87

    54

    Locality Of Reference : Better Stack

    Use

    Reduce function call penalties instead of passing many parameters to a

    function use a structure pointer

    ask the compiler to pass upto X parametersusing registers__attribute__ ((regparm (X) ))

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    55/87

    55

    Don't Copy Large Things

    Instead of copying strings, arrays, or largestructs, consider copying a pointer to them

    ANSI C now requires that structs are pass-by-

    value like everything else If you have extraordinarily large structs, or are

    making millions of function calls on medium-sized

    ones, you might consider passing the struct's

    address instead, after modifying the called function

    so that it doesn't perturb the contents of the struct.

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    56/87

    56

    Split Or Merge Arrays

    If the parts of your program making heaviestuse of memory are doing so by accessingelements in "parallel" arrays you can combinethem into an array of structs so that the data

    for a given index is kept together in memory. If you already have an array of structs, but find

    that the critical part of your program isaccessing only a small number of fields in

    each struct, you can split these fields into aseparate array so that the unused fields do notget read into the cache unnecessarily.

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    57/87

    57

    Reduce Padding

    Arrange similarly-typed fields together in a structure with the mostrestrictively aligned types first - there may still be padding at theend

    New code:/* sizeof = 48 bytes */

    struct foo {

    double b;

    double d;long f;

    long h;

    float a;

    float c;

    int j;

    int l;

    short e;

    short g;

    char i;

    char k;

    };

    Old code:

    /* sizeof = 64 bytes */

    struct foo {

    float a;

    double b;float c;

    double d;

    short e;

    long f;

    short g;

    long h;

    char i;int j;

    char k;

    int l;

    };

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    58/87

    58

    Increase Padding

    Increasing the size and alignment of a data structureto match (or to be an integer fraction or multiple of) thecache line size may increase performance.

    The alignment is harder to control, but usually one ofthese techniques will work:

    Use malloc instead of a static array. Some mallocsautomatically allocate storage suitably aligned for cache lines Allocate a block twice as large as you need, then point

    wherever in it that satisfies the alignment you need. Use an alternate allocator (e.g. memalign) which guarantees

    minimal alignment.

    Use the linker to assign specific addresses or alignmentrequirements to symbols. Wedge the data into a known position inside another block

    which is already aligned.

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    59/87

    59

    March Forward

    Theoretically, it makes no differencewhether you iterate over an arrayforwards or backwards, but somecaches are of a "predictive" type thattries to read in successive cache lineseven before you need them

    Because these caches must work

    quickly, they tend to be fairly dim andrarely have the extra logic for predictingbackwards traversal of memory pages.

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    60/87

    60

    Beware The Power Of Two

    Direct-mapped 1MB cache with 128-byte cache linesand a program which uses 16MB of memory, allhappening on a machine with 32 bit addresses

    The simplest way for the cache to map the memoryinto the cache is to mask off the first 12 and the last 7

    bits of the address, then shift to the right 7 bits What we end up with is a cache that maps any twoaddresses exactly 8192 (2^13) bytes apart in mainmemory to the same cache line

    If the program happens to use an array of 8192 byte

    structs, and refers to just one element in each onewhile processing the whole array, every access willmap to the same cache line and force a reload, whichis considerable delay

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    61/87

    61

    Reducing Memory Usage

    Because ROM is usually cheaper than RAM (on a per-

    byte basis), one acceptable strategy for reducing the

    amount of global data might be to move constant data

    into ROM

    This can be done automatically by the compiler if you

    declare all of your constant data with the keyword const

    Most C/C++ compilers place all of the constant global

    data they encounter into a special data segment that is

    recognizable to the locator as ROM-able

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    62/87

    62

    Reducing Memory Usage

    This technique is most valuable if there are lots ofstrings or table-oriented data that does not change at

    runtime

    Stack size reductions can also lower program's RAM

    requirement Be especially conscious of stack space if you are

    using a real-time operating system

    Most operating systems create a separate stack for

    each task These stacks are used for function calls and interrupt

    service routines that occur within the context of a task

    R d i M U

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    63/87

    63

    Reducing Memory Usage

    To reduce the stack size: You can determine the amount of stack required for

    each task stack: fill the entire memory area reserved for the stack

    with a special data patternThen, after the software has been running for a

    while-preferably under both normal and stressfulconditions-use a debugger to examine themodified stack

    The part of the stack memory area that stillcontains your special data pattern has neverbeen overwritten, so it is safe to reduce the sizeof the stack area by that amount

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    64/87

    R d i M U

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    65/87

    65

    Reducing Memory Usage

    If the heap is too small, your program will not be ableto allocate memory when it is needed, so always be

    sure to compare the result ofmallocor new with NULL

    before dereferencing it

    If you've tried all of these suggestions and your

    program is still requiring too much memory, you might

    have no choice but to eliminate the heap altogether

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    66/87

    Profiling Tools 66

    IO-Bound

    IO B d

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    67/87

    67

    IO-Bound

    Things you can try: Sequential Access

    Random Access

    Terminals

    Sockets

    SFIO

    Tune your file-descriptors and sockets

    there are many options you could tweakSome io-bound programs can be translated to

    memory-bound with the use of mmap(2).

    IO B d

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    68/87

    68

    IO-Bound

    I/O (in Unix) usually puts your process to sleepfor a time

    The request for I/O may finish fairly quickly but

    perhaps some other process is busy using the

    CPU and your process has to wait

    Because the length of the wait is somewhat

    arbitrary and depends on what exactly the

    other processes are doing, I/O boundprograms can be tricky to optimize.

    S ti l A

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    69/87

    69

    Sequential Access

    Buffered I/O is usually (but not always) fasterthan unbuffered.

    If you aren't worried about portability you can

    try using lower level routines like read() and

    write() with large buffers and compare their

    performance to fread() and fwrite() Using read() or write() in a single-character-at-a-

    time mode is especially slow on Unix machinesbecause of the system call overhead.

    S ti l A

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    70/87

    70

    Sequential Access

    Consider using mmap() if you have it This can save effort in several ways The data doesn't have to go through stdio which

    saves a buffer copy Depending on the sophistication of the paging

    hardware, the data need not even be copied intouser space; the program can just access anexisting copy

    mmap() also lends itself to read-ahead;theoretically the entire file could be read into

    memory before you even need it Lastly, the file is can be paged directly off the

    source disk and doesn't have to use up virtualmemory.

    R d A

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    71/87

    71

    Random Access

    Consider mmap() if you have it If there's a trade off between I/O bound and

    memory bound in your program, consider alazy-free of records: when memory gets

    tight, free unmodified records and write outmodified records (to a temporary file if needbe) and read them in later

    Though if you take the disk space you'd use

    to write the records out and just add it to thepaging space instead you'll save yourself alot of hassle.

    A h I/O

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    72/87

    72

    Asynchronous I/O

    You can set up a file descriptor to be non-blocking (see ioctl(2) or fcntl(2) man pages)and arrange to have it send your process asignal when the I/O is done; in the meantimeyour process can get something else done,

    including perhaps sending off other I/Orequests to other devices Significant parallelism may result at the cost of

    program complexity.

    Multithreading packages can aid in theconstruction of programs which utilizeasynchronous I/O.

    T i l

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    73/87

    73

    Terminals

    If your program spews out a lot of data to thescreen, it's going to run slow on a 1200 baud

    line

    Waiting for the screen to catch up stops your

    program.

    This doesn't add to the CPU or disk time as

    reported for accounting purposes, but it sure

    seems slow to the userA general solution is to provide a way for the

    user to squelch out irrelevant data

    Gotchas

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    74/87

    74

    Gotchas

    1. Programmers tend to over-estimate theusefulness of the programs they write. The

    approximate value of an optimization is:

    number of runs number of users time

    savings user's salary - time spent optimizing

    programmer's salary

    even if the program will be run hundreds of

    times by thousands of users, an extra dayspent saving 40 milliseconds probably isn't

    going to help.

    Gotchas

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    75/87

    75

    Gotchas

    2. Machines are not created equal. What's fast on one machine maybe slow on another.

    3. Don't get into the habit of writing code according to the above

    rules of optimization. Only apply them afteryou have discovered

    exactly which function is the problem. Some of the rules if applied

    globally would make the program even slower.4. Spending a week optimizing a program can easily cost thousands

    of dollars in programmer time. Sometimes, it's easier to just buy a

    faster CPU or more memory or a faster disk and solve the

    problem that way.

    5. Novices often assume that writing lots of statements on a singleline and removing spaces and tabs will speed things up. While

    that may be a valid technique for some interpreted languages, it

    doesn't help at all in C.

    Architectural/Code Optimizations

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    76/87

    76

    Architectural/Code Optimizations

    Often, it is important to understand the architecture'simplementation in order to effectively optimize code

    Much more difficult for compilers to do because it requires a

    different compilerback-end for every implementation

    One example of this is the ARM barrel shifter

    Can convert Y * Constant into series of adds and shifts Y * 9 = Y * 8 + Y * 1

    Assume R1 holds Y and R2 will hold the result

    ADD R2, R1, R1, LSL #3 ; LSL #3 is same as * by 8

    Another example is the ARM 7500 write buffer specifics

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    77/87

    Profiling Tools 77

    Safe C

    Safety Violation

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    78/87

    78

    Safety Violation

    incorrect type castsdangling-pointer dereferences

    data races

    Uninitialized memory

    NULL-pointer dereferences

    array-bounds violations

    incorrect use of unions

    Type Equality for Parameters

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    79/87

    79

    Type Equality for Parameters

    void f2(void **p, void *x) { *p = x; } Type safety requires that p points to a value with the

    same type as x Without this equality, a use of f2 can violate memory

    safety:

    int y = 0;

    int * z = &y;

    f2(&z, 0xABC);

    *z = 123; Other functions with the same type, such as f2ok,

    could allow &z and 0xABC as arguments:

    void f2ok(void **p, void *x) { if(*p==x) printf("same"); }

    Dangling Stack Pointers

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    80/87

    80

    Dangling Stack Pointers

    dereference a dangling pointer, i.e., access a data object after it has been

    deallocated. Acall to g attempts to write 123 to address 0xABC.

    int * f1() {

    int x = 0;

    return &x;

    }

    int ** f2() {

    int * y = 0;

    return &y;

    }

    void g() {

    int * p1 = f1();

    int ** p2 = f2();

    *p1 = 0xABC;**p2 = 123;

    } To avoid memory exhaustion, a garbage collector reclaims memory implicitly.

    data races Pointer Race Condition

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    81/87

    81

    data races -Pointer Race Condition

    int g1 = 0;int g2 = 0;

    int * gp = &g1;

    void f1(int **x) { *x = &g2; }int f2() { spawn(f1,&gp,sizeof(int*)); return

    *gp; }

    If an invocation of f2 reads gp while aninvocation of f1 writes gp, the read couldproduce an unpredictable bit-string

    Uninitialized Memory

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    82/87

    82

    Uninitialized Memory

    void f() {int * p1;

    int ** p2 = malloc(sizeof(int*));

    *p1 = 123;

    **p2 = 123;

    }

    NULL Pointers

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    83/87

    83

    NULL Pointers

    The compiler inserts only one check into thiscode:

    int f(int *p, int *q, int **r) {

    int ans = 0;

    if(p == NULL) return 0;

    ans += *p;

    ans += *q; // inserted check

    *r = NULL;ans += *q;

    }

    Array Bounds Checking

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    84/87

    84

    Array-Bounds Checking

    void write_v(int v, unsigned sz, int *arr) {for(int i=0; i < sz; ++i)

    arr[i] = v;

    }

    To violate safety, clients can pass a

    value for sz greater than the length of

    arr.

    incorrect use of unions

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    85/87

    85

    incorrect use of unions

    C programs that use the same memory fordifferent types of data need casts or union

    types, but both are notoriously unsafe

    It is common to use an int (or enum) field to

    record the type of data currently in the

    memory; this field discriminates which variant

    occupies the memory

    Programmers must correctly maintain andcheck the tag

    Types of Development Tools

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    86/87

    86

    Types of Development Tools

    Compilation and building: make

    Managing files: RCS, SCCS, CVS

    Editors: vi, emacs

    Archiving: tar, cpio, pax, RPM

    Configuration: autoconf

    Debugging: gdb, dbx, prof, strace, purify

    Programming tools: yacc, lex, lint, indent

  • 8/14/2019 C-Programming-Optimization Techniques Class 4

    87/87