OS scheduling and The anatomy of a context switch
Post on 05-May-2017
690 Views
Preview:
Transcript
OS Scheduling and The Anatomy of a context
switchDaniel Ben-Zvi
Multi-tasking
Scheduling in the modern world● Pause a running process in the middle of its execution
● Resume a previously paused process that is ready to
execute
● Do this (tens) of thousands of times per second
But it wasn’t always like this
No Scheduler
Only one process can operate at a time.
A single process can hog the whole system forever.
Digger sleep function, 1982sleep(time)
int time;
{
int a,b;
for(a=0; a<time; a++) {
for(b=0; b<100; b++);
}
}
So this is why pressing Turbo would speed it up! :)
3(source: http://www.digger.org/digsrc_orig.zip)
And then they said.. lets cooperate!
Cooperative scheduler
Each process must “play nice” and yield control to other processes.
… but a single misbehaving process can still hog the whole system forever.
(the sad reality of a kibbutz)
Sleep method designed for MacOS 8 (in C)void wxThread::Sleep(unsigned long milliseconds)
{
UnsignedWide start, now;
Microseconds(&start);
double mssleep = milliseconds * 1000 ;
double msstart, msnow ;
msstart = (start.hi * 4294967296.0 + start.lo) ;
do
{
YieldToAnyThread();
Microseconds(&now);
msnow = (now.hi * 4294967296.0 + now.lo) ;
} while( msnow - msstart < mssleep );
}
// Start a busy loop till time has passed
// Pass control to other threads (in the active process context)
(source: https://github.com/LuaDist/wxwidgets/blob/master/src/mac/classic/thread.cpp#L539)
Non preemptiveA process will eventually misbehave...
Round Robin (Time sharing)
(source: http://www.qnx.com/developers/docs/qnx_4.25_docs/qnx4/sysarch/microkernel.html)
but.. tasks are different in nature!● Some tasks require CPU time to complete
● others require I/O (waiting for user input, network data, disk)
● Some just sleep most of the time
Multilevel feedback queue
And many more..
So, how do you interrupt a running process in the middle of its execution, and then resume the work later on?
Use a timer interrupt!● Ask the processor to interrupt the active process and wake up
the kernel every X interval (in the goold old Linux days, this was defined as CONFIG_HZ):
void do_timer(struct pt_regs *regs) // Linux kernel (some version) interrupt handler{ jiffies_64++;
update_process_times(user_mode(regs)); update_times();}
The “context” in this programbits 16 ; 16 bits real mode
start:
cli ; disable interrupts
mov si, msg ; SI points to RAM address of msg
mov ah, 0x0e ; print char service, with int 0x10
.loop lodsb ; load RAM: AL <- [DS:SI] && SI++
or al, al ; end of string?
jz halt
int 0x10 ; call BIOS print char
jmp .loop ; next char
halt: hlt ; halt
msg: db "Hello, World!", 0
# void swtch(struct context **old, struct context *new);
#
# Save current register context in old
# and then load register context from new.
.globl swtch
swtch:
movl 4(%esp), %eax
movl 8(%esp), %edx
# Save old callee-save registers
pushl %ebp ; Save old process ebp (base pointer) register
pushl %ebx ; Save old process ebx (general) register
pushl %esi ; Save old process esi (source index) register
pushl %edi ; Save old process edi (destination index) register
# Switch stacks
movl %esp, (%eax)
movl %edx, %esp
# Load new callee-save registers
popl %edi ; Load new process edi
popl %esi ; Load new process esi
popl %ebx ; Load new process ebx
popl %ebp ; Load new process ebp
ret
(source: http://samwho.co.uk/blog/2013/06/01/context-switching-on-x86/)
Finding out the context switch rate[us-east-1c] i-91xxxxxx [root:~]$ vmstat -w 10
procs ---------------memory-------------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 3275668 48308 370264 0 0 0 0 12504 8884 1 5 94 0 0
0 0 0 3275492 48308 370264 0 0 0 0 12933 8834 2 5 93 0 0
1 0 0 3275588 48308 370264 0 0 0 0 12490 8825 2 4 94 0 0
Using /usr/bin/time[us-east-1c] i-xxxxxxxx [root:~]$ /usr/bin/time -v find / > /dev/null
Command being timed: "find /"
User time (seconds): 0.12
System time (seconds): 0.30
Percent of CPU this job got: 6%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:06.79
...
Voluntary context switches: 7362
Involuntary context switches: 60
...
File system inputs: 58888
Voluntary vs Involuntary● Non voluntary context switches occur when the
scheduler interrupts the process (i.e. timeslice expired)
● Voluntary context switches occur when the process yields control to the CPU (i.e. waiting for I/O, sleeping)
Using pidstat[us-east-1c] i-xxxxxxxx [root:~]$ pidstat -w -p 1504
Linux 3.13.0-55-generic (ip-10-xx-xx-x8) 06/21/2015 _x86_64_ (2 CPU)
09:24:23 PM UID PID cswch/s nvcswch/s Command
09:24:23 PM 0 1504 531.40 3.45 haproxy
Some benchmarks● No CPU affinity
○ Intel 5150: ~4300ns/context switch○ Intel E5440: ~3600ns/context switch○ Intel E5520: ~4500ns/context switch○ Intel X5550: ~3000ns/context switch○ Intel L5630: ~3000ns/context switch○ Intel E5-2620: ~3000ns/context switch
● With CPU affinity○ Intel 5150: ~1900ns/process context switch, ~1700ns/thread context switch○ Intel E5440: ~1300ns/process context switch, ~1100ns/thread context switch○ Intel E5520: ~1400ns/process context switch, ~1300ns/thread context switch○ Intel X5550: ~1300ns/process context switch, ~1100ns/thread context switch○ Intel L5630: ~1600ns/process context switch, ~1400ns/thread context switch○ Intel E5-2620: ~1600ns/process context switch, ~1300ns/thread context switch
Performance boost: 5150: 66%, E5440: 65-70%, E5520: 50-54%, X5550: 55%, L5630: 45%, E5-2620: 45%.
(source: http://blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html)
The hidden cost
● TLB flush
● Potentially screw up L1,L2,L3 CPU cache
● Kernel tries very hard to schedule threads on the same Core/CPU/Numa node, but sometimes it can't.
Without CPU affinity
(source: http://blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html)
With CPU affinity
(source: http://blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html)
Without CPU affinity With CPU affinity
(source: http://blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html)
Full context switch
● Save current process context ● TLB flush ● Potentially screw up L1,L2,L3 CPU cache● Load next process context● Can be done in hardware - but is usually
done in software.
So how much is too much?
10k c.s. @ 5 micro-seconds per c.s. = 50ms CPU time spent on context switching.
If this was 100k...
Some formulas● IO bound tasks - threads = number of cores * (1 + wait time / service time)
● Balanced - N = U * Ncores * (1+ I/O Wait time / CPU time) ^^ same as above without the U factor
● CPU bound tasks - threads = number of CPUs + 1 ^^ same as above but W == 0
see y i like this version of the formula? yes.I think this is a pretty good presentation.. :Pyep, but it looks like we’ll keep adding slides until tomorrow night…. lol Most probably you’re right - but this covers great thingswe have to stop somewhere. also, we need to get drunk ;)
Conclusions● Scheduling is complicated
● Context switches can be very expensive
● Find the right balance when deciding how many workers to use
● Use CPU affinity to reduce cost where applicable
● And most important lesson, don't overload the scheduler with runnable threads/processes
Thank youDaniel Ben-Zvi | daniel.benzvi@gmail.com
Further reading● The Timer Interrupt Handler● Context Switching on X86● CPU Scheduling● How long does it take to make a context switch?● Calculate the Optimum Number of Threads
top related