Top Banner

of 73

EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

Apr 05, 2018

Download

Documents

ndabade
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    1/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    1

    EE 2 0 2 C L E C TU R E 1 2 EM B E D D E D P L A T F O R M S A N D O P E R A T I N G SY S T E M S

    M U L T I P R O C E S S O R S Y N C H R O N I Z A T I O N

    T A B L E O F C O N T E N T S

    1. MULTIPROCESSOR / MULTICORE SYSTEMS: EMBEDDED AND MOBILE PLATFORMPROCESSORS ..............................................................................................................................................................22. SYMMETRIC MULTIPROCESSING (SMP) KERNEL THREADING .......................................................43. SMP KERNEL THREAD CREATION AND MANAGEMENT .....................................................................64. UNCOORDINATED KERNEL THREAD EXAMPLE .................................................................................105. LOCK CLASSES, APPLICATIONS, AND IMPLEMENTATION ..............................................................156. PROCESSOR SUPPORT FOR SYNCHRONIZATION ................................................................................177. ATOMIC PRIMITIVES FOR MULTIWORD OPERATIONS ....................................................................188. ARM ATOMIC OPERATIONS ........................................................................................................................209. ATOMIC PRIMITIVES FOR BIT OPERATIONS X86 .............................................................................2210. ATOMIC PRIMITIVES FOR BIT OPERATIONS ARM ........................................................................2411. SPIN LOCK X86 ............................................................................................................................................26

    Implementation of spinlocks: Setting Locks ............. .............. ............... .............. .............. .............. .............. .......28Spinlock Energy Optimization .............................................................................................................................31Controlling synchronization and interrupts .........................................................................................................32Lock operations and load management ................... .............. ............... .............. .............. .............. .............. .......33Implementation of spinlocks: Releasing Locks ......................... .............. .............. .............. .............. ............... ....34

    12. SPINLOCK SYNCHRONIZED KERNEL THREAD EXAMPLE .............................................................3913. SPIN LOCK ARM .........................................................................................................................................4514. RW SPINLOCKS ..............................................................................................................................................47

    Writers (In ARM architecture) .............................................................................................................................49readers (in ARM Architecture) ............................................................................................................................51Writers and Trylock .............................................................................................................................................53

    15. KERNEL SEMAPORES ..................................................................................................................................54Background ...................... .............. .............. ............... .............. .............. .............. .............. .............. ............... ....54Implementation ............ .............. ............... .............. .............. .............. .............. ............... .............. .............. .........54Operations ............................................................................................................................................................56

    16. SEMAPHORE SYNCHRONIZED KERNEL THREAD EXAMPLE ........................................................6217. RW SEMAPHORES .........................................................................................................................................6718. COMPLETION VARIABLES .........................................................................................................................7019. SEQ LOCK ........................................................................................................................................................7220. SYNCHRONIZATION AGAINST OUT OF ORDER EXECUTION ........................................................73

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    2/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    2

    1. MULTIPROCESSOR / MULTICORE SYSTEMS: EMBEDDED ANDMOBILE PLATFORM PROCESSORS

    1. Intel Atom Architecturea. Dual Die Processors

    i. Hyperthreading architecture1. Shared instruction cache2. Shared data cache3. Parallel decode and issue of instructions4.

    Parallel register file

    5. Parallel integer and floating point execution unitsb. Pipeline

    i. 16 stagesc. Two cache levels

    i. L1 compact cache1. 32KB I-Cache2. 24KB D-Cache

    ii. 512KB L2 cache per core1. Dual core architecture provides 1M total2. Each L2 cache shared with both threads

    iii. Include prefetch units that detect stride lengths and optimize prefetchoperations

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    3/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    3

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    4/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    4

    2. SYMMETRIC MULTIPROCESSING (SMP) KERNEL THREADING 1. SMP architecture benefits

    a. Parallel processing energy and performance benefits2. Synchronization challenges

    a. Many examples in kernel and applications for independent processes and threadsb. However, many of the most important applications introduce constraints of data

    dependence

    c. Most important applications (for example database systems) exhibit severe reductionin throughput as thread count rises above several threads per processor.

    3.

    Synchronization requirements

    a. Lock time resolutioni. Lock acquisition and release may require high time resolutionii. Lock testing may require high time resolution

    1. Encouraging a polling methodiii. Reduced time resolution requirements must be exploited

    1. Sleep timing upon lock detection greater than one clock tickpermits sleeping

    2. Sleep timing less than one clock period requires busy waitb. Lock footprint

    i. Lock fetch, decode costii. Lock cache footprint cost

    c. Lock optimization methodsi. Locking for readers (consumers) and writers (suppliers)

    1. Read lock2. Write lock3. Read and Write lock

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    5/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    5

    4. Architecture challengesa. Managing thread creation and removalb. Lock integration with interrupt manangementc. Lock integration with preemptiond. Lock integration with computational load systems

    i. Management of load bursts due to lock releasee. Energy efficient lock processor resources

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    6/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    6

    3. SMP KERNEL THREAD CREATION AND MANAGEMENT Kernel thread descriptors

    o Create information data structure This is managed for thread by keventd daemon Fields of started and result will be populated by the keventd daemon at runtime Result task structure will include name and arguments associated with thread

    function

    struct kthread_create_info{

    int (*threadfn)(void *data);void *data;struct completion started;struct task_struct *result;struct completion done;

    };

    o Stop information data structure Done state written by keventd

    struct kthread_stop_info{

    struct task_struct *k;

    int err;

    struct completion done;

    };

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    7/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    7

    Kernel thread creationstruct task_struct *kthread_create(int (*threadfn)(void *data), void *data,

    const char namefmt[], )

    /* Note: indicates a list of variable arguments referenced

    * by the name format (name_fmt) field. These may include, for

    * example, the current pid value. These arguments will be displayed

    * in process status information

    */

    {

    struct kthread_create_info create;

    DECLARE_WORK(work, keventd_create_kthread, &create);

    create.threadfn = threadfn; /* function argument */

    create.data = data; /* function data */

    init_completion(&create.started);

    init_completion(&create.done);

    /*

    * Start the workqueue system below

    */

    if (!helper_wq)

    work.func(work.data);

    else {

    queue_work(helper_wq, &work);

    wait_for_completion(&create.done);

    }

    if (!IS_ERR(create.result)) {

    /* following code prepares process table string) */va_list args;

    va_start(args, namefmt);

    vsnprintf(create.result->comm, sizeof(create.result->comm),

    namefmt, args);

    va_end(args);

    }

    return create.result;

    }

    EXPORT_SYMBOL(kthread_create);

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    8/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    8

    Kernel thread binding to CPUo Called after creation and before wakeup

    void kthread_bind(struct task_struct *k, unsigned int cpu)

    {

    wait_task_inactive(k); /* wait for task to be unscheduled */set_task_cpu(k, cpu);

    k->cpus_allowed = cpumask_of_cpu(cpu);

    }

    EXPORT_SYMBOL(kthread_bind);

    static inline void set_task_cpu(struct task_struct *p, unsigned int cpu)

    {

    task_thread_info(p)->cpu = cpu;

    }

    Kernel thread wakeup (enqueue task)int fastcall wake_up_process(task_t *p)

    {

    /* places stopped or sleeping task on run queue */

    return try_to_wake_up(p, TASK_STOPPED | TASK_TRACED |

    TASK_INTERRUPTIBLE | TASK_UNINTERRUPTIBLE, 0);

    }

    EXPORT_SYMBOL(wake_up_process);

    Kernel thread checks for stop status that may be applied by another thread

    o Kernel thread calls kthread_should_stop()int kthread_should_stop(void)

    {

    return (kthread_stop_info.k == current);

    }

    EXPORT_SYMBOL(kthread_should_stop);

    A kernel thread may apply stop state for itselfint kthread_stop(void)

    {

    return (kthread_stop_info.k == current);

    }

    EXPORT_SYMBOL(kthread_should_stop);

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    9/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    9

    A kernel thread may apply stop state for another threadint kthread_stop(struct task_struct *k)

    {

    return kthread_stop_sem(k, NULL);

    }

    EXPORT_SYMBOL(kthread_stop);

    The implementation of kthread_stop_sem is importanto The mutex_lock ensures that only one CPU may apply the stop conditiono Also, the thread must receive a signal (as required) in order to initiate its completion

    int kthread_stop_sem(struct task_struct *k, struct semaphore *s)

    {

    int ret;

    mutex_lock(&kthread_stop_lock);get_task_struct(k);

    init_completion(&kthread_stop_info.done);

    smp_wmb();

    kthread_stop_info.k = k; /* sets kthread pointer indicating stop */

    if (s)

    up(s); /* release the semaphore */

    else

    wake_up_process(k); /* start thread to enable completion */

    put_task_struct(k); /* atomic decrement of task usage */

    wait_for_completion(&kthread_stop_info.done);

    kthread_stop_info.k = NULL;

    ret = kthread_stop_info.err;

    mutex_unlock(&kthread_stop_lock);return ret;

    }

    EXPORT_SYMBOL(kthread_stop_sem);

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    10/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    10

    4 . UNCOORDINATED KERNEL THREAD EXAMPLE

    /** kthread_mod_uncoord.c

    *

    * Demonstration of multiple kernel thread

    * creation and binding on multicore system

    *

    */

    #include

    #include

    #include

    #include

    #include

    #include

    /* array of pointers to thread task structures */

    #define MAX_CPU 16

    #define LOOP_MAX 10

    #define BASE_PERIOD 200

    #define INCREMENTAL_PERIOD 330

    #define WAKE_UP_DELAY 0

    static struct task_struct *kthread_cycle_[MAX_CPU];

    static int kthread_cycle_state = 0;

    static int num_threads;

    static int cycle_count = 0;

    static int cycle(void *thread_data)

    {

    int delay, residual_delay;

    int this_cpu;

    int loops;

    delay = BASE_PERIOD;

    for (loops = 0; loops < LOOP_MAX; loops++) {

    this_cpu = get_cpu();

    delay = delay + this_cpu*INCREMENTAL_PERIOD;

    printk

    ("kthread_mod: no lock pid %i cpu %i delay %i count %i

    \n", current->pid, this_cpu, delay, cycle_count);

    cycle_count++;

    set_current_state(TASK_UNINTERRUPTIBLE);

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    11/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    11

    residual_delay = schedule_timeout(delay);

    cycle_count--;

    printk

    ("kthread_mod: no lock pid %i cpu %i delay %i count%i\n",

    current->pid, this_cpu, delay, cycle_count);

    }

    kthread_cycle_state--;

    /*

    * exit loop poll stop state with sleep cycle

    */

    while (!kthread_should_stop()) {

    delay = 1 * HZ;

    set_current_state(TASK_UNINTERRUPTIBLE);

    residual_delay = schedule_timeout(delay); /* prepare to yield */

    printk

    ("kthread_mod: wait for stop pid %i cpu %i \n",

    current->pid, this_cpu);

    }printk

    ("kthread_mod: cycle function: stop state detected for cpu %i\n",

    this_cpu);

    return 0;

    }

    int init_module(void)

    {

    int cpu = 0;

    int count;

    int this_cpu;

    int num_cpu;

    int delay_val;

    int *kthread_arg = 0;

    int residual_delay;

    const char thread_name[] = "cycle_th";

    const char name_format[] = "%s/%d"; /* format name and cpu id */

    num_threads = 0;

    num_cpu = num_online_cpus();

    printk("kthread_mod: number of operating processors: %i\n",

    num_cpu);

    this_cpu = get_cpu();printk

    ("thread_mod: kthread_mod init: current task is %i on cpu %i \n",

    current->pid, this_cpu);

    for (count = 0; count < num_cpu; count++) {

    cpu = count;

    num_threads++;

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    12/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    12

    kthread_cycle_state++;

    delay_val = WAKE_UP_DELAY;

    set_current_state(TASK_UNINTERRUPTIBLE); /* prepare to yield */

    residual_delay = schedule_timeout(delay_val);

    kthread_cycle_[count]=kthread_create(cycle, (void *) kthread_arg,

    thread_name, name_format, cpu);

    if (kthread_cycle_[count] == NULL) {

    printk("kthread_mod: thread creation error\n");

    }

    kthread_bind(kthread_cycle_[count], cpu); /* sets cpu in task */

    /* struct */

    wake_up_process(kthread_cycle_[count]);

    this_cpu = get_cpu();

    printk

    ("kthread_mod: execution after wake_up_process, current task

    pid %i on cpu %i\n", current->pid, this_cpu);

    printk("kthread_mod: current task is %i on cpu %i creating and

    waking next thread after delay of 1s \n", current->pid,

    this_cpu);

    }

    return 0;

    }

    void cleanup_module(void)

    {

    int ret;

    int count;

    int this_cpu;

    /*

    * determine if module removal terminated thread creation cycle early

    *

    * also must determine if cpu is suspended

    */

    printk("kthread_mod: number of threads to stop %i and active %i\n",

    num_threads, kthread_cycle_state);

    this_cpu = get_cpu();

    printk

    ("kthread_mod: kthread_stop requests being applied by task %i on

    cpu %i \n", current->pid, this_cpu);

    for (count = 0; count < num_threads; count++) {

    ret = kthread_stop(kthread_cycle_[count]); /* set done in *//* completion field */

    printk("kthread_mod: kthread_stop request for cpu count returned

    with value %i \n", ret);

    }

    }

    MODULE_LICENSE("GPL");

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    13/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    13

    Start up60937.707450] kthread_mod: number of operating processors: 4[60937.707486] thread_mod: kthread_mod init: current task is 16243 on cpu 3[60937.709822] kthread_mod: execution after wake_up_process, current task pid16243 on cpu 3

    [60937.709841] kthread_mod: no lock pid 16244 cpu 0 delay 200 count 0[60937.709919] kthread_mod: current task is 16243 on cpu 3 creating andwaking next thread after delay of 1s[60937.713666] kthread_mod: execution after wake_up_process, current task pid16243 on cpu 3[60937.713678] kthread_mod: no lock pid 16245 cpu 1 delay 530 count 1[60937.713738] kthread_mod: current task is 16243 on cpu 3 creating andwaking next thread after delay of 1s[60937.717799] kthread_mod: execution after wake_up_process, current task pid16243 on cpu 3[60937.717815] kthread_mod: no lock pid 16246 cpu 2 delay 860 count 2[60937.717893] kthread_mod: current task is 16243 on cpu 3 creating andwaking next thread after delay of 1s[60937.721661] kthread_mod: execution after wake_up_process, current task pid

    16243 on cpu 3[60937.721712] kthread_mod: current task is 16243 on cpu 3 creating andwaking next thread after delay of 1s[60937.721950] kthread_mod: no lock pid 16247 cpu 3 delay 1190 count 3[60938.508041] kthread_mod: no lock pid 16244 cpu 0 delay 200 count 3[60938.508084] kthread_mod: no lock pid 16244 cpu 0 delay 200 count 3[60939.308039] kthread_mod: no lock pid 16244 cpu 0 delay 200 count 3[60939.832050] kthread_mod: no lock pid 16245 cpu 1 delay 530 count 2[60939.832086] kthread_mod: no lock pid 16245 cpu 1 delay 860 count 2[60940.308037] kthread_mod: wait for stop pid 16244 cpu 0[60941.156046] kthread_mod: no lock pid 16246 cpu 2 delay 860 count2[60941.156082] kthread_mod: no lock pid 16246 cpu 2 delay 1520 count 2[60941.308064] kthread_mod: wait for stop pid 16244 cpu 0

    [60942.308038] kthread_mod: wait for stop pid 16244 cpu 0[60942.480041] kthread_mod: no lock pid 16247 cpu 3 delay 1190 count 2[60942.480074] kthread_mod: no lock pid 16247 cpu 3 delay 2180 count 2[60943.272042] kthread_mod: no lock pid 16245 cpu 1 delay 860 count 2

    Note lack of coordination above Complete and removal phase

    [61085.496049] kthread_mod: wait for stop pid 16247 cpu 3[61086.236050] kthread_mod: wait for stop pid 16246 cpu 2[61086.276046] kthread_mod: wait for stop pid 16245 cpu 1

    [61086.308040] kthread_mod: wait for stop pid 16244 cpu 0[61086.496039] kthread_mod: wait for stop pid 16247 cpu 3[61087.236041] kthread_mod: wait for stop pid 16246 cpu 2[61087.276048] kthread_mod: wait for stop pid 16245 cpu 1[61087.308040] kthread_mod: wait for stop pid 16244 cpu 0[61150.308036] kthread_mod: wait for stop pid 16244 cpu 0[61150.720049] kthread_mod: wait for stop pid 16247 cpu 3[61151.134948] kthread_mod: number of threads to stop 4 and active 0

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    14/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    14

    [61151.134984] kthread_mod: kthread_stop requests being applied by task 16332on cpu 0[61151.135024] kthread_mod: wait for stop pid 16244 cpu 0[61151.135049] kthread_mod: cycle function: stop state detected for cpu 0[61151.135118] kthread_mod: kthread_stop request for cpu count returned withvalue 0[61151.135171] kthread_mod: wait for stop pid 16245 cpu 1[61151.135220] kthread_mod: cycle function: stop state detected for cpu 1[61151.135267] kthread_mod: kthread_stop request for cpu count returned withvalue 0[61151.135331] kthread_mod: wait for stop pid 16246 cpu 2[61151.135357] kthread_mod: cycle function: stop state detected for cpu 2[61151.135398] kthread_mod: kthread_stop request for cpu count returned withvalue 0[61151.135456] kthread_mod: wait for stop pid 16247 cpu 3[61151.135499] kthread_mod: cycle function: stop state detected for cpu 3[61151.135541] kthread_mod: kthread_stop request for cpu count returned withvalue 0

    Computing load top d 0.1 Upon start of top execution, enter 1 for CPU display

    Cpu0 : 0.0%us, 9.1%sy, 0.0%ni, 90.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%stCpu1 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%stCpu2 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%stCpu3 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

    Note that one task presents a user space task load while all others are sleeping (note thepresence of 100% in idle state.

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    15/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    15

    5. LOCK CLASSES, APPLICATIONS, ANDIMPLEMENTATION

    Synchronization between tasks is critical to multiprocess and multiprocessor systems

    o Requirement exists for ensuring that operations occur only according to design constraintsand are not subject to raceconditions.

    Synchronized access to a resource may be managed by controlling access to the code segmentusing a variable that may implement a lock

    o Enables a form ofcommunicationbetween processes

    Hierarchyo Spinlocks

    Fast acquisition and release Resource intensive for extended lock times

    o RW Spinlocks High efficiency lock favoring readers Often read by rarely written Multiple readers One writer

    o Kernel Semaphore Complex implementation Increased latency Efficient for long delays

    o RW Semaphore Semaphore attributes with reader/writer resolution

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    16/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    16

    o Seqlock High efficiency lock that favors writers

    o Completion Variables Synchronization against out of order execution

    Applicationso List manipulation

    Memory, Tasks,o Timer interruptso Interrupt serviceo

    System call

    o Scheduler Operations

    Implementationo Relies on processor hardwareo Important optimizations for performanceo Recent advances in optimization for energy

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    17/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    17

    6. PROCESSOR SUPPORT FOR SYNCHRONIZATION Implementation of reliable synchronization methods requires support of processor hardware.

    o The presence of unscheduled interrupts implies that any sequence of control is uncertain Processor architectures may enable the implementation ofatomicoperations

    o Atomic operations complete without interruption of the sequence of control under allcircumstances

    o Examples may include the increment of a memory register.o For this to be atomic, the fetch, decode, fetch of operand from memory, increment and

    write back must occur contiguously

    o The arrival of an interrupt must not induce an interruption of this sequence of control. A class of Intel IA-32 instructions are always atomic. These include:

    o Byte length read or write from memoryo 32 bit aligned read or write from memory of 32 or 64 bit wordo 64 bit aligned read or write from memory of 128 bit wordo Reading or writing to cache

    The cache is accessible as cache lines of 32 bytes An unaligned read or write falling within this limit will be atomic

    Memory managemento Updating segment registerso Updating page tables

    Interruptso The data bus is locked after an interrupt, only allowing a selected APIC to write.

    Other operations are not atomico Thus other methods must applyo

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    18/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    18

    o Here, one processor may acquire and lock the address/data bus preventing any otherprocessor (or device) from accessing memory

    Assert LOCK prefixo Instructions are listed with the identifier lock prepending the instruction.

    Assembler will ensure that object code includes a bus lock operation duringexecution

    Will add an opcode modifier 0xF0 to instruction Only one processor may access memory during the lock

    7. ATOMIC PRIMITIVES FOR MULTIWORD OPERATIONS For IA-32, Linux atomic operations are defined in

    o /include/asm-i386/atomic.h#ifdef CONFIG_SMP

    #define LOCK "lock ; "

    #else

    #define LOCK ""

    #endif

    First, there is an atomic data typetypedef struct { volatile int counter; } atomic_t;

    o One data member - counter Defined as static inline

    o Standalone object code for functions may be created by compile, if required Examples of atomic increment and decrement

    static __inline__ void atomic_inc(atomic_t *v)

    {__asm__ __volatile__(

    LOCK "incl %0"

    :"=m" (v->counter)

    :"m" (v->counter));

    }

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    19/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    19

    static __inline__ void atomic_dec(atomic_t *v)

    {

    __asm__ __volatile__(

    LOCK "decl %0"

    :"=m" (v->counter)

    :"m" (v->counter));

    o Atomic add i to atomic type v ir indicates that a register is to be assigned by the compiler to an integer

    static __inline__ void atomic_add(int i, atomic_t *v)

    {

    __asm__ __volatile__(

    LOCK "addl %1,%0"

    :"=m" (v->counter)

    :"ir" (i), "m" (v->counter));

    }

    o Atomic sub i from atomic type vstatic __inline__ void atomic_sub(int i, atomic_t *v)

    {

    __asm__ __volatile__(

    LOCK "subl %1,%0"

    :"=m" (v->counter)

    :"ir" (i), "m" (v->counter)); // i indicates 32b for reg

    }

    o Since the variable in question is actually a data structure member, this must be accessed via aread operation

    #define atomic_read(v) ((v)->counter)

    o atomic_set must be used to writeo This sets the value of v to that of integer, i

    #define atomic_set(v,i) (((v)->counter) = (i))

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    20/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    20

    8.ARM ATOMIC OPERATIONS The ARMV6 architecture implements atomic operations using a unique approach.

    o Lock method differs in that memory bus locking is not applied ARM atomic instructions

    o read operations are inherently atomico Write operations must be protected

    Here is an example of atomic_seto This sets v equal to io Its functionality as a kernel library function is the same as its i386 counterparto However, due to processor differences between IA-32 and ARM, its underlying

    implementation is quite different

    The instruction Load Exclusive LDREX R1, [R2] is implemented on ARMo This loads R1 with the contents of memory register addressed by the contents of R2o Then, this initializes a monitor

    The monitor observes anywrite action on the address-data bus that may occuron the 32b memory block pointed to by the contents of R2

    This write action may occur due to the operation of another CPU that shares thememory space and address-data bus

    The occurrence of a write action can then be detected subsequently bySTREX The instruction Store Exclusive STREX R1, R2, [R3]

    o Stores R2 into memory register addressed R3o If the write is successful by the definition that the previously initialized monitor installed

    above shows no prior writes, then the data pointed to be [R3] has been written atomically

    o A successful write is returned as a zero value in R1 If a failure is detected, this function continues to attempt to initialize a monitor, store, and verify. An example is atomic_set() for ARM

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    21/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    21

    o First read value of v->counter (sets monitor) This is a guard instruction The value of counter not needed

    o Then start load operation with STREXo Check monitoro Loop until success detected

    Note this sequence will be inserted in-line in code not called as functiono By declaring static, this inline code can be included in any kernel function.

    static inline void atomic_set(atomic_t *v, int i)

    {unsigned long tmp;

    __asm__ __volatile__("@ atomic_set\n"

    "1: ldrex %0, [%1]\n" ; tmp register receives the contents at

    ; memory address containing counter

    ; This initializes the monitor.

    " strex %0, %2, [%1]\n" ; i (third on arg list) is stored into

    ; mem register at address of v->counter

    " teq %0, #0\n" ; test if reg 0 is cleared

    " bne 1b" ; if not success, repeat branch if

    ; not equal to label 1 back

    : "=&r" (tmp) ; & requires separation of output and

    ; input register choices by compiler: "r" (&v->counter), "r" (i)

    : "cc"); ; sequence may have modified cpsr

    ; compiler must ensure cpsr protected

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    22/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    22

    9. ATOMIC PRIMITIVES FORBIT OPERATIONS X86 Atomic bit manipulations are defined in include/asm-i386/bitops.h Aside:

    o Note use of volatile type qualifiero Consider example where it is desired to send two words in succession to a memory mapped

    I/O port

    o Informs compiler that memory is volatile and must be read from main memory at eachreferencing instruction

    Avoids reference to cache that would create an error in this case since memorycorresponds to I/O port

    Prevents any compiler optimization that may eliminate a reado Suppresses code optimization that may appear when these functions are inlinedo Example,

    Consider sequence

    volatile unsigned long *output_port = memory_mapped_interface_address

    *output_port = CONTROL_WORD_1; /* set high */

    *output_port = CONTROL_WORD_2; /* set low */

    Note that the first instruction would be otherwise eliminatedo Clear a bit in memory

    static inline void clear_bit(int nr, volatile unsigned long * addr)

    {

    __asm__ __volatile__( LOCK_PREFIX

    "btrl %1,%0" // Bit Test and Reset Long

    // 1 is the bit offset, nr.// 0 is operated on its bit

    // indicated by nr is cleared

    :"=m" (ADDR) // output operand

    :"Ir" (nr)); // I identifies a constant in

    // range of 0 31

    }

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    23/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    23

    o Change a bit in memory

    static inline void change_bit(int nr, volatile unsigned long * addr)

    {

    __asm__ __volatile__( LOCK_PREFIX

    "btcl %1,%0":"=m" (ADDR)

    :"Ir" (nr));

    }

    o Test and Change bit in memorystatic inline int test_and_change_bit(int nr, volatile unsigned long* addr)

    o Test and Clear bit in memorystatic inline int test_and_clear_bit(int nr, volatile unsigned long * addr)

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    24/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    24

    10.ATOMIC PRIMITIVES FORBIT OPERATIONS ARM Atomic bit manipulations are defined in include/asm-arm/bitops.h

    o Clear a bit in memory Operates on word referenced by pointer, p If bit > 31, then pointer is advanced to a following 32b word The bits 0 through 5 determine the bit test location to the following word Five lines of code without a conditional or loop

    o Example bit = 4

    bit = 0000 0000 0000 0100 bit & 31 = 0000 0000 0000 0100 & 0000 0000 0001 1111 = 0000 0000 0000

    0100

    mask = 1UL 5 = 0 p = p + 0 *p = *p & 1111 1111 1110 1111 Clears 5th bit

    static inline void ____atomic_clear_bit(unsigned int bit, volatile

    unsigned long *p)

    {

    unsigned long flags;

    unsigned long mask = 1UL > 5;

    local_irq_save(flags);

    *p &= ~mask;

    local_irq_restore(flags);

    }

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    25/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    25

    bit = 34 bit = 0000 0000 0010 0010 mask = 1UL 5 = 1 p = p + 1 this advances pointer to next word *p = *p & 1111 1111 1111 1011 Clear second bit in second word (bit location 34)

    o Set a bit in memory Replace AND operation setting bit

    static inline void ____atomic_set_bit(unsigned int bit, volatile unsigned

    long *p)

    {

    unsigned long flags;unsigned long mask = 1UL > 5;

    local_irq_save(flags);

    *p |= mask;

    local_irq_restore(flags);

    }

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    26/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    26

    11.SPIN LOCKX86 Atomic operations are adequate for the example of controlling a word or bit In general, methods are needed to lock an a sequence of control for atomic operation

    o Examples include examination of a list, for example of tasks or timers In the hierarchy of control sequence locking, the spinlock is the most efficient in its initialization and

    use, but, also represents the most significant impact on the kernel.

    o Design requirements Intended for application to locking where resource hold time is short Fast initialization Fast access and return Small cache footprint Design will tolerate processing overhead

    There are two primary alternatives for code sequence lockingo A process operating on one CPU (in an SMP system) may seek to acquire a resource or enter

    a sequence of instructions. If the lock is not accessible, the process (a kernel thread) may bedesigned to be dequeued until the lock is available.

    o

    If the lock is anticipated to not be available for an extended period, then this is acceptable.

    o The process of dequeue and then enqueue incurs latency A context switch exiting and entering

    o An alternative for design requirements where it is known in advance that the lock acquisitiondelay will be short is the spin lock

    The kernel thread process is not dequeued during the lock delay Rather, it continues to test the lock

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    27/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    27

    Operationso A process that may wish to protect a code sequence sets a spinlock

    Only one spinlock is available per threado A second process requesting the lock makes repeated attempts to gain the lock It remains in

    a busy loop, testing the lock during each period that it is scheduled.

    If the previous task releases the lock it will be discovered to be available when thenew task seeking the lock is scheduled

    Characteristicso The spinlock is central in kernel codeo The acquisition of a spinlock does not disable interrupt operations

    Will result from our having inserted an interruptable NOP loop An example of a potential deadlock failure results from a process having acquired a

    spinlock and then being interrupted and replaced by an ISR that seeks the samespinlock.

    o Thus, we very often observe the use of the spin_lock_irq_save() function Usage rules

    o Spinlocks are appropriate for fast execution (less than the time required for two contextswitches

    o Sleep operations should not be started in a sequence of execution after a lock is acquired

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    28/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    28

    Probability that an interrupt service routine may require the same lock resource ishigh

    IMPLEMENTATION OF SPINLOCKS: SETTING LOCKS

    First, setting the lock: (in kernel/spinlock.c) __lockfunc defines fastcall and the directive that first three function arguments are to be placed in

    registers as opposed to the stack (as would be the compiler operation default)

    o FASTCALL macro settings: #define fastcall __attribute__((regparm(3))) Pass up to three parameters via registers, the remainder on the stack

    #define spin_lock(lock) _spin_lock(lock)

    void __lockfunc _spin_lock(spinlock_t *lock)

    {

    preempt_disable();

    _raw_spin_lock(lock);

    }

    o Preemption disabled (from /linux/preempt.h)#define preempt_disable() \

    do { \

    inc_preempt_count(); \

    barrier(); \

    } while (0)

    o Actual spinlock slock = 1 if lock available slock = 0 after decrement on lock request if request successful

    static inline void _raw_spin_lock(spinlock_t *lock)

    {

    __asm__ __volatile__(spin_lock_string

    :"=m" (lock->slock) : : "memory");

    }

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    29/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    29

    #define spin_lock_string \

    "\n1:\t" \

    "lock ; decb %0\n\t" \

    "jns 3f\n" \

    "2:\t" \

    "rep;nop\n\t" \"cmpb $0,%0\n\t" \

    "jle 2b\n\t" \

    "jmp 1b\n" \

    "3:\n\t"

    o gcc preprocessor will produce

    static inline void _raw_spin_lock(spinlock_t *lock)

    {

    __asm__ __volatile__(

    "1:\t" \

    "lock ; decb %0\n\t" \

    "jns 3f\n" \

    "2:\t" \

    "rep;nop\n\t" \

    "cmpb $0,%0\n\t" \

    "jle 2b\n\t" \

    "jmp 1b\n" \

    "3:\n\t"

    : "=m" (lock->slock) :

    : "memory");

    }

    o decb decrements spinlock value note argument %0 points to lock->slock

    Tests if decrement of lock is zero (lock state was therefore one at time of access)o Note, not adequate to merely set slock

    Consider multiple CPUs in race to set bit

    Decrement removes race condition

    o Each CPU can decrement the locko No CPU can exit the spinlock until the lock becomes set to one and may be

    decremented

    o Checks sign flag

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    30/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    30

    if sign flag not set, then spinlock was 1 and is now zero, thus jump to 3 andcontinue

    The thread executing this sequence now owns the lock If the spinlock value was zero, a decrement will yield a negativevalue

    o The lock was therefore taken previouslyo Then, the system compares the memory register with zeroo If less than or equal, remain in loop since other CPU processes may be decrementing slocko If greater than zero, lock must be set to 1 and is free

    However, this system does not exit immediately A race condition with multiple CPUs is in progress

    o Another CPU has set the lock to 1o But, yet another CPU may now have decremented the lock

    Test is: can this thread successfully decrement the lock to zero from oneo If so, this is the only thread that owns the locko If not, this CPU has lost the race

    Hence, this system performs one more test to ensureo The current thread acquires the locko No other thread has or can acquire the lock

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    31/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    31

    SPINLOCK ENERGY OPTIMIZATION

    Note the rep;nop sequence

    static inline void rep_nop(void){

    __asm__ __volatile__("rep;nop": : :"memory");

    }

    #define cpu_relax() rep_nop()

    Detail point on optimizationo Analysis has shown that some systems spend a significant fraction of time in spinlock

    state.

    Delay may be unavoidable However, introduces undesired power dissipation

    o Now, the rep;nop sequence introduces a method for signaling the CPU that the currentthread is executing and waiting for a spinlock

    o The rep preprocessor causes a number of NOPs to be introduced equal to the contentsof cx register

    Assembler introduces the rep opcode modifier byte 0xF2 prepending theinstruction causes the instruction to be called repeatedly

    Applies only to string of instructions defaults to a single NOP otherwiseo However, processor observes the presence of rep nopo Processor then may adjust clock frequency, core voltage, and reduce energy usage

    The cpu_relax() macro has appeared in recent kernel versions that includes the rep nop sequence

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    32/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    32

    CONTROLLING SYNCHRONIZATION AND INTERRUPTS

    Setting a lock while disabling interruptso _spin_lock_irq(spinlock_t *lock)o Interrupts are enabled unconditionally at the time the lock is released

    This may create an error condition if interrupts were previously disabled#define _spin_lock_irq(lock) \

    do { \

    local_irq_disable(); \

    preempt_disable(); \

    _raw_spin_lock(lock); \

    } while (0)

    Setting a lock while disabling interrupts and storing interrupt state.

    o _spin_lock_irqsave(spinlock_t *lock , unsigned long flags)o This enables the state of interrupts to be stored at the time the lock is seto Interrupts are enabled at the time the lock is released only if they were initially enabled

    #define _spin_lock_irqsave(lock, flags)

    do {

    local_irq_save(flags);

    preempt_disable();_raw_spin_lock(lock);

    } while (0)

    #define local_irq_save(x)__asm__ __volatile__(

    "pushfl ;

    popl %0 ;

    cli"

    :"=g" (x): /* disable interrupts */

    :"memory")

    /asm-i386/system.ho This is called with flags argument

    Flags are first saved on the stack

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    33/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    33

    Flag value is then popped into a general purpose register- stack pointer is returnedto initial value

    o Thus, the flags are now saved as a temporary variable (in the next stackentry)

    Memory keyword informs compiler that memory has been changed, resulting inblocking of compiler actions to reorder sequence of control

    LOCK OPERATIONS AND LOAD MANAGEMENT

    Setting a lock while disabling interrupt bottom halveso Critical for many network driverso This permits hardware interrupts to proceed

    However, the computational demand of bottom halves is not introduced, delayinginterrupt service routines

    o For example, timer interrupts and other critical eventso _spin_lock_bh(spinlock_t *lock)o This enables the state of interrupts to be stored at the time the lock is seto Interrupts are enabled at the time the lock is released only if they were initially enabled

    #define _spin_lock_bh(lock) \

    do { \

    local_bh_disable(); \

    preempt_disable(); \

    _raw_spin_lock(lock); \

    } while (0)

    o This takes us to interrupts.c Now, softirqs (for networking, for example) will only be allowed if the preemption

    counter is less than SOFTIRQ_OFFSET

    o If many processes have incremented the preemption counter, policy isto not add yet another task, rather allow these to complete

    Convenient approacho Just add SOFTIRQ_OFFSET

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    34/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    34

    o Now, with the increase in preempt count, BH are disabled since thenumber of allowed softirqs will be incremented above the limit by simplyadding the max value to the current value

    o creates a convenient method for returning and restoring preemption count while gating BHoperations

    o Note the while (0) construct and barrier#define local_bh_disable() \

    do { add_preempt_count(SOFTIRQ_OFFSET); barrier(); } while (0)

    o And in sched.c (removing debug options)void fastcall add_preempt_count(int val)

    {

    preempt_count() += val;

    }

    IMPLEMENTATION OF SPINLOCKS: RELEASING LOCKS

    Design goalso Release lock resourceo Enable preemptiono Evaluate if rescheduling should occur

    Release spin_lock

    #define _spin_unlock(lock)

    do {

    _raw_spin_unlock(lock);

    preempt_enable();

    __release(lock);

    } while (0)

    In preempt.ho Enable

    #define preempt_enable()

    do {

    preempt_enable_no_resched();

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    35/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    35

    preempt_check_resched();

    } while (0)

    Decrement preempt count

    #define preempt_enable_no_resched()do {

    barrier();

    dec_preempt_count();

    } while (0)

    Call reschedule if current is flagged this return from spinlock represents important opportunity toexploit resched option

    #define preempt_check_resched()

    do {

    if (unlikely(test_thread_flag(TIF_NEED_RESCHED)))preempt_schedule();

    } while (0)

    Release spin_lock_irqo Here is the unconditional restore

    #define _spin_unlock_irq(lock) \

    do { \

    _raw_spin_unlock(lock); \local_irq_enable(); \

    preempt_enable(); \

    } while (0)

    Unlock (lock arg applies only if debug is applied) this is removed belowstatic inline void _raw_spin_unlock(spinlock_t *lock)

    {

    __asm__ __volatile__(

    spin_unlock_string

    );

    }

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    36/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    36

    Sets spin lock bit note memory barrier#define spin_unlock_string

    "movb $1,%0" \

    :"=m" (lock->slock) : : "memory"

    Release spin_lock_irq_restoreo Here is the conditional restore

    #define spin_unlock_irqrestore(lock, flags)

    do {

    _raw_spin_unlock(lock);

    local_irq_restore(flags);

    preempt_enable();

    } while (0)

    #define local_irq_restore(x) do {

    if ((x & 0x000000f0) != 0x000000f0)

    local_irq_enable();

    } while (0)

    #define local_irq_enable() __asm__ __volatile__(

    "sti"

    : : :"memory")

    Finally, releasing a spin_unlock_bh#define spin_unlock_bh(lock) _spin_unlock_bh(lock)

    o Here is the conditional restorevoid __lockfunc _spin_unlock_bh(spinlock_t *lock)

    {

    _raw_spin_unlock(lock);

    preempt_enable();

    local_bh_enable();

    In softirq.c, local_bh_enable is foundo Note the recovery using the (SOFTIRQ_OFFSET 1 ) subtractiono This removes the SOFTIRQ_OFFSET and enables SOFTIRQ threads to be executed by

    softirqd

    o However, preemption remains disabled (due to the -1 above) if it was previous to this action

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    37/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    37

    o Note that check is made on Not in interrupt and that there is a pending softirq Then actually perform the softirq immediately before any other process that the

    scheduler may have selected

    o Note that preemption counter is decremented preemption will be enabled when thisreaches zero.

    o Note that resched is calledvoid local_bh_enable(void)

    {

    sub_preempt_count(SOFTIRQ_OFFSET - 1);

    if (unlikely(!in_interrupt() && local_softirq_pending()))

    do_softirq();

    dec_preempt_count();

    preempt_check_resched();

    }

    Lock state testingo Lock state can be tested without locking to enable flow control

    For example, spin_trylock(), spin_trylock_bh() Implemented with atomic xchgl instruction in x86

    static inline int __raw_spin_trylock(raw_spinlock_t *lock){

    int oldval;

    __asm__ __volatile__(

    "xchgl %0,%1"

    :"=q" (oldval), "=m" (lock->slock)

    :"" (0) : "memory");

    return oldval > 0;

    }

    o Implemented in ARM architecture: Loads lock value Stores exclusive if equal (if lock value is 0) Otherwise, exits with lock value in tmp

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    38/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    38

    static inline int __raw_spin_trylock(raw_spinlock_t *lock)

    {

    unsigned long tmp;

    __asm__ __volatile__(" ldrex %0, [%1]\n"

    " teq %0, #0\n"

    " strexeq %0, %2, [%1]"

    : "=&r" (tmp)

    : "r" (&lock->lock), "r" (1)

    : "cc");

    if (tmp == 0) {

    smp_mb();

    return 1;

    } else {

    return 0;

    }}

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    39/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    39

    12.SPINLOCK SYNCHRONIZED KERNEL THREAD EXAMPLE /*

    * kthread_mod_coord.c

    *

    * Demonstration of multiple kernel thread* creation and binding on multicore system

    *

    * This system includes spinlock synchronization

    *

    */

    #include

    #include

    #include

    #include

    #include

    #include

    /* array of pointers to thread task structures */

    #define MAX_CPU 16

    #define LOOP_MAX 10

    #define BASE_PERIOD 200

    #define INCREMENTAL_PERIOD 30

    #define WAKE_UP_DELAY 0

    static struct task_struct *kthread_cycle_[MAX_CPU];

    static int kthread_cycle_state = 0;

    static int num_threads;

    static int cycle_count = 0;

    static spinlock_t kt_lock = SPIN_LOCK_UNLOCKED;

    static int cycle(void *thread_data)

    {

    int delay, residual_delay;

    int this_cpu;

    int ret;

    int loops;

    delay = BASE_PERIOD;

    for (loops = 0; loops < LOOP_MAX; loops++) {

    this_cpu = get_cpu();

    delay = delay + this_cpu*INCREMENTAL_PERIOD;

    ret = spin_is_locked(&kt_lock);

    if (ret != 0) {

    printk("kthread_mod: cpu %i start spin cycle\n", this_cpu);

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    40/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    40

    }

    spin_lock(&kt_lock);

    printk

    ("kthread_mod: lock pid %i cpu %i delay %i count %i \n",

    current->pid, this_cpu, delay, cycle_count);

    cycle_count++;

    set_current_state(TASK_UNINTERRUPTIBLE);residual_delay = schedule_timeout(delay);

    cycle_count--;

    printk

    ("kthread_mod: unlock pid %i cpu %i delay %i count %i\n",

    current->pid, this_cpu, delay, cycle_count);

    spin_unlock(&kt_lock);

    }

    kthread_cycle_state--;

    /*

    * exit loop

    */

    while (!kthread_should_stop()) {

    delay = 1 * HZ;

    set_current_state(TASK_UNINTERRUPTIBLE);

    residual_delay = schedule_timeout(delay);

    printk

    ("kthread_mod: wait for stop pid %i cpu %i \n",

    current->pid, this_cpu);

    }

    printk

    ("kthread_mod: cycle function: stop state detected for cpu %i\n",

    this_cpu);

    return 0;

    }

    int init_module(void)

    {

    int cpu = 0;

    int count;

    int this_cpu;

    int num_cpu;

    int delay_val;

    int *kthread_arg = 0;

    int residual_delay;

    const char thread_name[] = "cycle_th";const char name_format[] = "%s/%d"; /* format name and cpu id */

    num_threads = 0;

    num_cpu = num_online_cpus();

    this_cpu = get_cpu();

    printk

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    41/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    41

    ("kthread_mod: init task %i cpu %i of total CPU %i \n",

    current->pid, this_cpu, num_cpu);

    for (count = 0; count < num_cpu; count++) {

    cpu = count;

    num_threads++;kthread_cycle_state++;

    delay_val = WAKE_UP_DELAY;

    set_current_state(TASK_UNINTERRUPTIBLE);

    residual_delay = schedule_timeout(delay_val);

    kthread_cycle_[count] =

    kthread_create(cycle, (void *) kthread_arg,

    thread_name, name_format, cpu);

    if (kthread_cycle_[count] == NULL) {

    printk("kthread_mod: thread creation error\n");

    }kthread_bind(kthread_cycle_[count], cpu);

    wake_up_process(kthread_cycle_[count]);

    this_cpu = get_cpu();

    printk

    ("kthread_mod: current task %i cpu %i create/wake next thread\n",

    current->pid, this_cpu);

    }

    return 0;

    }

    void cleanup_module(void)

    {

    int ret;

    int count;

    int this_cpu;

    /*

    * determine if module removal terminated thread creation cycle early

    *

    * also must determine if cpu is suspended

    */

    printk("kthread_mod: number of threads to stop %i and active %i\n",

    num_threads, kthread_cycle_state);

    this_cpu = get_cpu();printk

    ("kthread_mod: kthread_stop requests being applied by task %i

    on cpu %i \n", current->pid, this_cpu);

    for (count = 0; count < num_threads; count++) {

    ret = kthread_stop(kthread_cycle_[count]) /* sets done state*/

    printk

    ("kthread_mod: kthread_stop request for cpu count returned

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    42/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    42

    with value %i \n", ret);

    }

    }

    MODULE_LICENSE("GPL");

    Start upo Note coordinationo Note locking occurs

    However, locking only occurs when relationship between delays leads to resourcecontention

    [61888.295386] kthread_mod: init task 17348 cpu 2 of total CPU 4[61888.297709] kthread_mod: current task 17348 cpu 2 create/wake next thread[61888.297805] kthread_mod: lock pid 17349 cpu 0 delay 200 count 0[61888.301142] kthread_mod: current task 17348 cpu 2 create/wake next thread[61888.301158] kthread_mod: cpu 1 start spin cycle[61888.309106] kthread_mod: current task 17348 cpu 2 create/wake next thread[61888.309146] kthread_mod: cpu 2 start spin cycle[61889.004148] kthread_mod: current task 17348 cpu 0 create/wake next thread[61889.004161] kthread_mod: cpu 3 start spin cycle[61889.100033] kthread_mod: unlock pid 17349 cpu 0 delay 200 count 0[61889.100073] kthread_mod: cpu 0 start spin cycle[61889.100080] kthread_mod: lock pid 17350 cpu 1 delay 230 count 0[61890.020530] kthread_mod: unlock pid 17350 cpu 1 delay 230 count 0[61890.020581] kthread_mod: cpu 1 start spin cycle[61890.020588] kthread_mod: lock pid 17351 cpu 2 delay 260 count 0[61891.061032] kthread_mod: unlock pid 17351 cpu 2 delay 260 count 0[61891.061074] kthread_mod: cpu 2 start spin cycle[61891.061080] kthread_mod: lock pid 17352 cpu 3 delay 290 count 0[61892.217531] kthread_mod: unlock pid 17352 cpu 3 delay 290 count 0[61892.217572] kthread_mod: cpu 3 start spin cycle[61892.217582] kthread_mod: lock pid 17349 cpu 0 delay 200 count 0[61893.312070] kthread_mod: unlock pid 17349 cpu 0 delay 200 count 0[61893.312131] kthread_mod: cpu 0 start spin cycle[61893.312138] kthread_mod: lock pid 17350 cpu 1 delay 260 count 0[61895.332564] kthread_mod: unlock pid 17350 cpu 1 delay 260 count 0[61895.332609] kthread_mod: cpu 1 start spin cycle[61895.332617] kthread_mod: lock pid 17351 cpu 2 delay 320 count 0[61897.921049] kthread_mod: unlock pid 17351 cpu 2 delay 320 count 0[61897.921094] kthread_mod: cpu 2 start spin cycle[61897.921101] kthread_mod: lock pid 17352 cpu 3 delay 380 count 0[61899.889564] kthread_mod: unlock pid 17352 cpu 3 delay 380 count 0[61899.889609] kthread_mod: cpu 3 start spin cycle[61899.889619] kthread_mod: lock pid 17349 cpu 0 delay 200 count 0[61902.088074] kthread_mod: unlock pid 17349 cpu 0 delay 200 count 0[61902.088118] kthread_mod: cpu 0 start spin cycle[61902.088124] kthread_mod: lock pid 17350 cpu 1 delay 290 count 0[61903.248533] kthread_mod: unlock pid 17350 cpu 1 delay 290 count 0[61903.248577] kthread_mod: cpu 1 start spin cycle[61903.248586] kthread_mod: lock pid 17351 cpu 2 delay 380 count 0

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    43/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    43

    Cpu0 : 0.0%us,100.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%stCpu1 : 0.0%us,100.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%stCpu2 : 0.0%us,100.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%stCpu3 : 3.7%us, 5.6%sy, 0.0%ni, 90.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

    The top utility, operating in batch mode, may record per cpu loado This is a direct result of the resource cost associated with the spinlocko Note the processor usage at 100 % system and 0% user

    Note: Three processors (threads) are operating at full load One processor, CPU3, is executing (and is in a sleep state). Note that this processor is spending the balance of its time in the idle thread.

    o Entries indicate percentage of time processor was executing a task other than the idle taskduring the time since the last screen update

    Note the behavior above:o CPU 0 wins the race to acquire the spinlock and remains in a sleep state, requiring

    now CPU load

    o CPU 1, 2, and 3 operate at 100 percent load, waiting for the spinlock resource tobecome available

    o Again at t = 1 second, a race condition occurs and CPU 1 wins

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    44/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    44

    Note the behavior above over an extended periodo CPU 0, 1, and 3 acquire the spinlocko CPU 2 does not acquire the lock

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    45/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    45

    13.SPIN LOCKARM Energy aware lock operation new in the ARM Linux 2.6.15 Applied directly to new multiprocessor embedded cores

    o ARM11 MPcore example Embedded control Networking Graphics

    Conventional multiprocessor systems suffer from energy inefficiency due to processors waiting forand expending energy in polling of spinlockso A significant fraction of processor time may be lost in synchronizationo Problems include:

    Priority Inversion Deadlocks convoy behavior

    o Groups of processors executing control sequences in parallel and stalling insynchrony, waiting for the same lock

    Now energy saving is ensured by placing processor in temporary stall state with ability to wakeprocessor within one cycle upon lock being freed

    o Notification via SCU (Snoop and Control Unit) in multiprocessor core

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    46/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    46

    Ensures that all caches are coherento Signal propagates to all CPUs (see unlock)o wfene instruction - receiveo sev instruction - notify

    static inline void __raw_spin_lock(raw_spinlock_t *lock)

    {

    unsigned long tmp;

    __asm__ __volatile__(

    "1: ldrex %0, [%1]\n" ; load lock member of &lock into r

    " teq %0, #0\n" ; test lock value

    " wfene\n" ; wait for notification

    " strexeq %0, %2, [%1]\n" ; attempt to store 1 in r

    " teqeq %0, #0\n" ; test

    " bne 1b" ; loop if unsuccessful

    : "=&r" (tmp)

    : "r" (&lock->lock), "r" (1)

    : "cc");

    smp_mb();

    }

    wmb() and rmb() both are defined as mb() for ARMo define mb() __asm__ __volatile__ ("" : : : "memory")o This ensures that any writes or reads of variables that are being protected by the spinlock are

    scheduled prior to releasing the lock

    static inline void __raw_spin_unlock(raw_spinlock_t *lock)

    {

    smp_mb();

    __asm__ __volatile__(

    " str %1, [%0]\n" ; release, store 0 in lock member

    " mcr p15, 0, %1, c7, c10, 4\n" ; drain storage buffer

    " sev" ; send signal to waiting CPUs

    :

    : "r" (&lock->lock), "r" (0)

    : "cc"); ; CPSR updated

    }

    Drain Storage Buffer operationo Forces synchronization of this stored data into D-cache of each processor

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    47/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    47

    14.RW SPINLOCKS Spinlock allows only one sequence of control to enter a sequence of instructions An alternative exists for the spinlock

    o The reader/writer lock admits many readers The lock prevents access by readers if a writer has taken the lock

    o Permits only one writero Writer may not access lock while any reader or other writer holds lock

    R RR WR RR W

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    48/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    48

    RW spinlocks are based on a rwlock_t structureo This contains a counter variable that is equal to the sum of readers that have acquired the

    rwlock

    Without debugging options, this appearstypedef struct {

    volatile unsigned int lock;} rwlock_t;

    To initialize (x is the lock of type rwlock_t#define RW_LOCK_UNLOCKED (rwlock_t) { 0 }

    Implementation and usage For a sequence of control that the designer intends to use to read a shared memory resource, then

    read_lock(rw_lock *lock) is used

    R WR WR WR W

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    49/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    49

    o All of the other variants above for spin_lock are included Read_lock write_lock Read_lock_irq write_lock_irq Read_lock_irqsave write_lock_irqsave

    Read_lock_bh

    write_lock_bh Read_unlock write_unlock Read_unlock_irq write_unlock_irq Read_unlock_irqsave write_unlock_irqsave Read_unlock_bh write_unlock_bh Read_trylock (added in 2.6

    kernel) write_trylock

    WRITERS (IN ARM ARCHITECTURE)

    Consider write lock acquisition (called by writers) Recall that strex r1, r2, [r3] stores contents of r2 into memory at address contained in r3 and places

    zero result in r1 if no other writes have occurred to [r3] since previous ldrex

    Note this can also execute conditionallystatic inline void _raw_write_lock(rwlock_t *rw)

    {

    unsigned long tmp;

    __asm__ __volatile__(

    "1: ldrex %0, [%1]\n" ; load exclusive and monitor lock

    " teq %0, #0\n" ; test if lock zero" strexeq %0, %2, [%1]\n" ; attempt to write LOCK_BIAS if zero

    " teq %0, #0\n" ; - note above is conditional execution

    " bne 1b" ; spin until lock acquired

    : "=&r" (tmp)

    : "r" (&rw->lock), "r" (0x80000000)

    : "cc", "memory");

    Note that this sets a value of 232 interpreted as negative value Also, write unlock this merely involves clearing the lock (called by writers)

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    50/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    50

    static inline void _raw_write_unlock(rwlock_t *rw)

    {

    __asm__ __volatile__(

    "str %1, [%0]" ; store zero at address: &rw->lock

    :: "r" (&rw->lock), "r" (0)

    : "cc", "memory");

    }

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    51/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    51

    READERS (IN ARM ARCHITECTURE)

    This must admit many readers This must track the number of readers and prevent any writer from entering a code section until all

    readers have exited

    o Each reader increments lock on entry and decrements on releaseo Writers only are permitted to enter if lock is set to zero

    Here is the operation for read_lock this is called by a reader attempting to enter a critical sectiono Note:

    if a writer has taken the lock its value will be -232o Then the increment result below will remain negative for 231 readers

    If no reader or writer is present, the initial value of the lock variable is zeroo The lock is incremented by one for each reader upon acquiring the lock

    o This implementation tests for presence of writer and spins in this event Otherwise, readers are admitted atomically increments the lock value by setting loading, setting monitor, and

    incrementing a value in a register, initially equal to lock value

    o The lock value is stored only if result is zero or positiveo It is then decremented if the lock is negative (returning value to prior state)o Then, if negative, remain in busy wait loop until writer exits and releases the locko Otherwise store will have occurred and exit

    Note strexpl is Store Exclusive executing onpositive or zero comparisono Result of adding 1 to register and placing in register will be positive only if lock was initially

    zero.

    o Otherwise, LS modifier executes on Lower or Same Note, rsbpls %0, %1, #0\n" returns negative result if result of strexpl is non-zero

    o Then remain in loop until another process sets lock to zero

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    52/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    52

    static inline void _raw_read_lock(rwlock_t *rw)

    {

    unsigned long tmp, tmp2; /* will be stored in registers */

    __asm__ __volatile__(

    "1: ldrex %0, [%2]\n" ; load exclusive (&rw->lock) into reg

    " adds %0, %0, #1\n" ; increment lock value (blindly)" strexpl %1, %0, [%2]\n" ; store reg exclusive setting reg (tmp2)

    ; if result is positive indicating no

    ; writers present

    ; But, a value of 1 will appear in

    ; (tmp2) if lock has been modified

    ; Thus, must now decrement to return lock

    ; to its initial value in next

    ; instruction note %1 contains value 1

    ; as a result of this event so,

    ; not required to load 1 immediate

    " rsbpls %0, %1, #0\n" ; decrement lock if lower or same

    " bmi 1b" ; branch if negative since lock value is

    : "=&r" (tmp), "=&r" (tmp2) ; negative and writers are present: "r" (&rw->lock)

    : "cc", "memory");

    }

    Now, unlocking proceeds witho Readers decrement the lock value on exitingo This is performed exclusively (atomically)o Lock value is positive if readers are present and decrements to 0 when all readers exit.

    Here is the operation for read_unlock called by a reader exiting a critical section

    static inline void _raw_read_unlock(rwlock_t *rw)

    {

    __asm__ __volatile__(

    "1: ldrex %0, [%2]\n" ; load exclusive lock into reg

    " sub %0, %0, #1\n" ; decrement lock value

    " strex %1, %0, [%2]\n" ; store lock value

    " teq %1, #0\n" ; test if successful exclusive operation

    " bne 1b" ; branch if not exclusive

    : "=&r" (tmp), "=&r" (tmp2)

    : "r" (&rw->lock)

    : "cc", "memory");

    Here is the operation for read_unlock called by a reader exiting a critical section

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    53/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    53

    WRITERS AND TRYLOCK

    Writer may attempt to set lock to LOCK_BIAS or exit if another thread holds write lock

    static inline int _raw_write_trylock(rwlock_t *rw)

    {

    unsigned long tmp;

    __asm__ __volatile__(

    "1: ldrex %0, [%1]\n"

    " teq %0, #0\n"

    " strexeq %0, %2, [%1]" ; store exclusive if equal (successful)

    : "=&r" (tmp)

    : "r" (&rw->lock), "r" (0x80000000)

    : "cc", "memory");

    return tmp == 0 ; otherwise exit

    }

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    54/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    54

    15. KERNEL SEMAPORES BACKGROUND

    The semaphore is a unique variable with the following characteristics:o The semaphore value can be used to determine whether a process will execute or wait.o The semaphore may be operated on by wait or post.o wait

    The wait function causes the semaphore value to be decremented by 1 if thesemaphore is non-zero

    o The process calling wait on the semaphore is allowed to continueo This operation is atomic in that it completes without interruption by other

    processes. Thus, two processes that attempt to decrement the semaphoreonly decrement it by one unit. If the semaphore is non-zero, only oneprocess will be allowed to continue, one will block.

    If the semaphore is zeroo The process calling wait on the semaphore is blockedo The process remains blocked until the action of decrementing the

    semaphore may return zero (as opposed to a negative value).

    o post The post function increments the semaphore. This is again atomic in that if two

    processes both attempt to increment a semaphore of value 0, it is incremented bytwo. For example, without the semaphore operation, both processes mightconclude that the proper value for the semaphore is 1.

    A process may use the semaphore to protect a critical sectionof code such that its access to sharedresources is protected (as if it were the only process operating) during a code sequence. This holdstrue even if the process is interrupted or taken from running to ready by the operating system.

    IMPLEMENTATION

    The next step in the locking hierarchy is the semaphore This prevents a process from passing a point in the sequence of control defined by the semaphore

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    55/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    55

    o However, unlike spinlocks, semaphores cause the process that reaches the semaphore tosleep

    o Formally, this means that the process (kernel thread) is dequeued and a user space process ornew kernel thread operates as a result of context switch

    o This is clearly efficient for designs where the sleep time is longo However, scheduler latency must be accounted for

    The semaphore design is considerably more complexo Task wait queue managemento Management of many waiting tasks that may be admitted when the semaphore is available

    Again, a data structure is the design foundationo

    srtuct semaphore with data members

    count: an atomic variable with these stateso Positive: semaphore is freeo 0: if semaphore is acquired and one thread is executing and

    no other threads are sleeping while waiting for the semaphore.

    o Negative: A number of threads equal to the absolute value of countare waiting for the semaphore

    wait: pointer to a linked list of waiting tasks sleepers: A flag indicating the presence of queued processes this is zero if

    no sleeping processed, 1 otherwise

    Functionso An atomic down operation reduces the count variable

    If semaphore is taken or busy, results in task being placed on wait queue untilsemaphore state changes

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    56/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    56

    OPERATIONS

    Initialization (see /include/asm-i386/semaphore.h)o void sema_init (struct semaphore *sem, int val)

    struct semaphore {atomic_t count;

    int sleepers;

    wait_queue_head_t wait;

    };

    static inline void sema_init (struct semaphore *sem, int val)

    {

    atomic_set(&sem->count, val);

    sem->sleepers = 0;

    init_waitqueue_head(&sem->wait);

    }

    Mutexo Initializing a semaphore to 1 produces a mutex variableo This implies that only one lock holder is enabled one thread that can occupy a code

    sequence

    static inline void init_MUTEX (struct semaphore *sem)

    {

    sema_init(sem, 1);

    }

    Requesting a semaphore : down down(struct semaphore * sem)

    o This will place a task that fails to receive the semaphore in the wait queue in a TASKUNINTERRUPTABLE state.

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    57/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    57

    Implementation of down()o First, note code structure.

    Begins with atomic decrement. If decrement does not yield zero, then lock has been taken. Jump to LOCK_SECTION_START Otherwise, exit the down() function

    o Optimization: Note access to the semaphore lock is This code is inlined in compilation with other code

    o Included in volatile node without reorderingo

    Note, LOCK_SECTION_START is defined to create a subsection for thiscode separate from this section

    Thus, as this code sequence is included in the inline function, only the decl and jsinstructions appear in the inline sequence

    o This prevents code in the lock section from being imported into the instruction cache,evicting other instructions more likely to be used

    static inline void down(struct semaphore * sem)

    {

    __asm__ __volatile__(

    LOCK "decl %0\n\t" ; decrement sem->count

    "js 2f\n" ; jump on sign; otherwise exit this

    ; function since next

    ; instructions not included

    "1:\n"

    LOCK_SECTION_START("")

    "2:\t lea %0,%%eax\n\t" ; load addr of sem in eax

    "call __down_failed\n\t" ;

    "jmp 1b\n" ; loop back to

    ; LOCK_SECTION_START label

    LOCK_SECTION_END

    :"=m" (sem->count)

    :

    :"memory","ax");

    }

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    58/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    58

    o __down_failed prepares call to __downo Places current task on a waitqueueo Note that this will be included in the text section (code section) that is occupied by sched.c

    asm(".section .sched.text\n" ; include in sched.c text section".align 4\n"".globl __down_failed\n""__down_failed:\n\t"

    "pushl %edx\n\t" ; save state"pushl %ecx\n\t""call __down\n\t" ; __down places task on waitqueue"popl %ecx\n\t" ; restore"popl %edx\n\t""ret"

    );

    Examine __down() First, obtain a pointer to a task struct with address equal to that of the current task task_struct Create a waitqueue for the current task Now, the current task was TASK_RUNNING, now set state member to be

    TASK_UNINTERRUPTIBLE

    Acquire a spinlock with interrupts disabled and with ability to restore interrupts Add the current task to the wait waitqueue associated with this semaphore

    o Mark this as WQ_EXCLUSIVE will control waking processo Adds tasks to tail of waitqueue

    Increment the sleepers member Now, enter a loop

    o First, get the number of sleeperso

    Subtract the number (sleepers -1) from the semaphore counter

    o If the result is negative, set sleepers to zero and break Consider an example semaphore is acquired and no task waiting on the semaphore

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    59/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    59

    o Upon entry to _down, semaphore is down (lock is held) count = 0 and no other sleepingfunction is waiting on the queue (the only other task of interest is the task that currentlyholds the semaphore

    First, semaphore count will be decremented to -1o Then, sleeper is incremented by 1o Then, count value is incremented by adding (sleepers 1) (original sleep count)

    This will yield (sleepers) + -1 = -1 (in our case with no sleepers)o Note the definition of atomic_add_negative result is true if result of

    addition is negative, otherwise false.

    This negative result will cause the conditional branch notto be takeno Then, sleeper is set to 1, this indicates the presence of the task requesting the semaphoreo Call schedule()

    Schedule will observe the TASK_INTERRUPTIBLE status and dequeue this task. This places task in the waitqueue, waiting for an event

    o After return from schedule Locks are taken Task is marked UNINTERRUPTABLE and another check is performed on the

    semaphore status with sleepers = 1

    If the semaphore is not available, then the control remains in the loop Sleepers are set to 1 and schedule is again called

    o If the semaphore count has been incremented to 1 (released) by other action Then the branch is taken, each of the sleepers are removed.

    As the semaphore becomes available A call is made to release the spinlock and restore interrupts Any processes sleeping on the waitqueue will be activated

    o With a set of rules to be seen below The task is set to TASK_RUNNING

    o The next time that the scheduler function runs, this task is eligible for selection

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    60/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    60

    fastcall void __sched __down(struct semaphore * sem)

    {

    struct task_struct *tsk = current;

    DECLARE_WAITQUEUE(wait, tsk);

    unsigned long flags;

    tsk->state = TASK_UNINTERRUPTIBLE;spin_lock_irqsave(&sem->wait.lock, flags);

    add_wait_queue_exclusive_locked(&sem->wait, &wait);

    sem->sleepers++;

    for (;;) { /* loop will not exit until */

    /* all sleepers exit */

    int sleepers = sem->sleepers;

    if (!atomic_add_negative(sleepers - 1, &sem->count)) {

    sem->sleepers = 0;

    break;

    }

    sem->sleepers = 1; /* this task - see -1 above */

    spin_unlock_irqrestore(&sem->wait.lock, flags);schedule(); /* will lead to sleep */

    spin_lock_irqsave(&sem->wait.lock, flags);

    tsk->state = TASK_UNINTERRUPTIBLE;

    }

    remove_wait_queue_locked(&sem->wait, &wait);

    wake_up_locked(&sem->wait);

    spin_unlock_irqrestore(&sem->wait.lock, flags);

    tsk->state = TASK_RUNNING;

    }

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    61/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    61

    It is important to consider how a list of tasks sleeping on the waitqueue may be activated (set toTASK_RUNNING)

    o So, consider the modified example where N tasks occupy the waitqueue Now, these all entered the waitqueue through this function ! So, upon being woken, they will execute this loop, entering the control immediately

    after schedule()

    So, each task will execute the remove_waitqueue and then wake_up_locked. Thewake up locked function will activate each task in turn.

    void fastcall add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t

    *wait)

    {

    unsigned long flags;

    wait->flags |= WQ_FLAG_EXCLUSIVE;

    spin_lock_irqsave(&q->lock, flags);

    __add_wait_queue_tail(q, wait);

    spin_unlock_irqrestore(&q->lock, flags);

    }

    Consider wake_up_locked this will call __wake_up_common( ) Will wakeup one exclusive task the process that initially called __down Will place task on the runqueue marked as TASK_RUNNING Semaphore functions

    static inline int down_interruptible(struct semaphore * sem)

    static inline int down_trylock(struct semaphore * sem)

    static inline void up(struct semaphore * sem)

    o An atomic up operation increments the count variable

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    62/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    62

    16.SEMAPHORE SYNCHRONIZED KERNEL THREAD EXAMPLE /* kthread_mod_coord_semaphore.c

    *

    * Demonstration of multiple kernel thread

    * creation and binding on multicore system

    * with semaphore synchronization

    *

    */

    #include

    #include

    #include

    #include

    #include

    #include

    #define MAX_CPU 16

    #define LOOP_MAX 20

    #define BASE_PERIOD 200

    #define INCREMENTAL_PERIOD 30

    #define WAKE_UP_DELAY 0

    /* array of pointers to thread task structures */

    static struct task_struct *kthread_cycle_[MAX_CPU];

    static int kthread_cycle_state = 0;

    static int num_threads;

    static int cycle_count = 0;

    static struct semaphore kthread_mod_sem;

    static int cycle(void *thread_data)

    {

    int delay, residual_delay;

    int this_cpu;

    int ret_sem;

    int loops;

    delay = BASE_PERIOD;

    for (loops = 0; loops < LOOP_MAX; loops++) {

    this_cpu = get_cpu();

    delay = delay + this_cpu*INCREMENTAL_PERIOD;

    printk("kthread_mod: cpu %i executing down on kthread_mod_semaphore \n",

    this_cpu);

    down(&kthread_mod_sem);

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    63/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    63

    printk

    ("kthread_mod: Thread pid %i acquired semaphore executing on cpu

    %i delay %i count %i\n", current->pid, this_cpu, delay,

    cycle_count);

    cycle_count++;

    set_current_state(TASK_UNINTERRUPTIBLE);

    residual_delay = schedule_timeout(delay);cycle_count--;

    printk

    ("kthread_mod: Thread pid %i releasing semaphore executing on cpu

    %i delay %i count %i\n", current->pid, this_cpu, delay,

    cycle_count);

    up(&kthread_mod_sem);

    }

    kthread_cycle_state--;

    /*

    * exit loop

    */

    while (!kthread_should_stop()) {

    delay = 1 * HZ;

    set_current_state(TASK_UNINTERRUPTIBLE);

    residual_delay = schedule_timeout(delay);

    printk

    ("kthread_mod: wait for stop pid %i cpu %i \n",

    current->pid, this_cpu);

    }

    printk

    ("kthread_mod: cycle function: stop state detected for cpu %i\n",

    this_cpu);

    return 0;

    }

    int init_module(void)

    {

    int cpu = 0;

    int count;

    int this_cpu;

    int num_cpu;

    int delay_val;

    int *kthread_arg = 0;

    int residual_delay;

    const char thread_name[] = "cycle_th";const char name_format[] = "%s/%d"; /* format for name and cpu id */

    num_threads = 0;

    num_cpu = num_online_cpus();

    this_cpu = get_cpu();

    printk

    ("kthread_mod: init task %i cpu %i of total CPU %i \n",

    current->pid, this_cpu, num_cpu);

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    64/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    64

    init_MUTEX(&kthread_mod_sem);

    for (count = 0; count < num_cpu; count++) {

    cpu = count;

    num_threads++;kthread_cycle_state++;

    delay_val = WAKE_UP_DELAY;

    set_current_state(TASK_UNINTERRUPTIBLE);

    residual_delay = schedule_timeout(delay_val);

    kthread_cycle_[count] =

    kthread_create(cycle, (void *) kthread_arg,

    thread_name, name_format, cpu);

    if (kthread_cycle_[count] == NULL) {

    printk("kthread_mod: thread creation error\n");

    }

    kthread_bind(kthread_cycle_[count], cpu); /* sets cpu in task struct */

    wake_up_process(kthread_cycle_[count]);

    this_cpu = get_cpu();

    printk

    ("kthread_mod: current task %i cpu %i create/wake next thread \n",

    current->pid, this_cpu);

    }

    return 0;

    }

    void cleanup_module(void)

    {

    int ret;

    int count;

    int this_cpu;

    /*

    * determine if early module removal terminated thread creation cycle early

    *

    * also must determine if cpu is suspended

    */

    printk("kthread_mod: number of threads to stop %i and active %i\n",

    num_threads, kthread_cycle_state);

    this_cpu = get_cpu();printk

    ("kthread_mod: kthread_stop requests being applied by task %i on cpu %i\n",

    current->pid, this_cpu);

    for (count = 0; count < num_threads; count++) {

    ret = kthread_stop(kthread_cycle_[count]); /* sets done state */

    printk

    ("kthread_mod: kthread_stop request for cpu count returned with value

    %i \n", ret);

  • 7/31/2019 EE202C Networked Embedded Systems Design Lecture 12 Multiprocessor Synchronization

    65/73

    Embedded Platform and Embedded Operating System IntroductionEE202C Lecture 12

    65

    }

    }

    MODULE_LICENSE("GPL");

    Note the behavior where the same thread on cpu0 reacquires the semaphoreo As thread executed up() it returns and executes down()o Unlike the spinlock example, other competing threads do not observe the availability of

    the semaphore since their test of the s