Transcript
8/3/2019 Adaptive Partition Scheduling
1/15
January 24, 2012
Adaptive Partition Scheduling
Part 1: Why we did itCool stuff from QNXA.Danko
8/3/2019 Adaptive Partition Scheduling
2/15
Cool Stuff from QNX 2January 24, 2012
Evolution of schedulersWhy?
Timeline
priority pre-emptive
Timeslicing
Time-varying priority
Really clever time-varying
Fair Share scheduling
Adaptive configuration
Yes, but:
System locks up
Backhoes and Mothers day
Untuneable for more than 1
application.
US Military Satcom
Hard to manage share interactions.
Not invented until now.
SCHED_FIFO
SCHED_RR
SCHED_SPORADIC
8/3/2019 Adaptive Partition Scheduling
3/15
Cool Stuff from QNX 3January 24, 2012
Evolution: Lessons learned
Numerical priorities are chosen by applications but systemscheduling behavior must be designed globally
Degradation and overload: Priorities are not constants.Importance of work depends on circumstances.> Modes: normal operation, restart, emergency maintenance
Scheduling strategy needs to be based on unit of work, butwhat we have is communicating threads.
must measure real-time behavior.> 0.1 % accuracy
Want to specify shares as global percentages> Applications dont get to pick their importance or shares. System engineers
do.
Need to throttle cpu usage without losing realtime latencies.
Why?
8/3/2019 Adaptive Partition Scheduling
4/15
Cool Stuff from QNX 4January 24, 2012
QNX Answer POSIX compatible design which can be
applied to existing systems with little or
no recoding
A global hard real-time scheduler with
overload protection and CPU guarantees> Separation of work based on working for
common purpose
Runtime typed memory and kernel object
guarantees and limits
>With full inheritance and accounting for allchildren
Persistent storage (file system)
guarantees and limits
Process model for fault isolation
Dynamic configuration
What is Partitioning?
General Answer
Separation of
work
To isolate:> cpu usage
> memory usage
> system resource
usage> Failures
Design
Adaptive Partition Scheduling
8/3/2019 Adaptive Partition Scheduling
5/15
Cool Stuff from QNX 5January 24, 2012
Principles
Scheduler must not trigger an overload> Overhead may not increase with # of threads
Real-time during underload> Same behavior as today
Real-time during overload> At least for interrupt handling
Must also be a fair-share scheduler> global scheduler algorithm
> globally configured
Must mesh with current QNX architecture Preemptive priority, individual thread scheduling
Heavy use of message passing
> Easy to drop onto existing applications
> Cant be a bag on the side
Simple enough for customers to use> Engineerable
> Reconfigure on the fly
Offered load
Throughput
Insert picture ofJuggling Watermelons
here
Design
8/3/2019 Adaptive Partition Scheduling
6/15
Cool Stuff from QNX 6January 24, 2012
Counting time
What does 14% cpu mean?> CPU usage is calculated over a sliding window.
>
Accuracy:
> Counting ticks is not enough. Micro-billing is used to track actual CPUutilization even when threads dont use their whole timeslice.
> micro- and nano-second resolution
> Threads are billed based on real usage, not statistics
windowsize is configurable as an argument to kernel at boot> Tradeoff maximum READY-state latency with accuracy of CPU budgeting
100ms window -> 1% accuracy or better.
> Internal arithmetic accurate to 0.5% or better
Partition usage> ns cpu time executed, during last sliding window, expressed as percentage
Partition budget> Guaranteed percentage of cpu time, balanced over sliding window
Design
T= nowT= -100ms
8/3/2019 Adaptive Partition Scheduling
7/15
Cool Stuff from QNX 7January 24, 2012
File System
Process
-
Whos got time: Partition Inheritance
Adaptive Partition 1
(Multi-media)
Adaptive Partition 2
(Java application)
CPU budget
available
6
11
8
9
Resource manager threads work on behalf of sender
Priority and adaptive partition in inherited on receive> Execution time in server billed to clients partition
This allows proper accounting for shared resources
-
-
Receive Threads CPU budget
available
6
67
4
10
Design
99Message
9
10
Message
9
10
8/3/2019 Adaptive Partition Scheduling
8/15
Cool Stuff from QNX 8January 24, 2012
Real time: Behavior under normal load
Adaptive Partition 1
(Multi-media)
Adaptive Partition 2
(Java application)
Blocked
Running
Ready
CPU budget
available
CPU budget
available
6
118
99
6
67
4
1010
Hard real-time scheduler under normal load
Running thread selected as highest priority READY thread
No delay on scheduling if adaptive partition has budget
Design
8/3/2019 Adaptive Partition Scheduling
9/15
Cool Stuff from QNX 9January 24, 2012
Out of time: Behavior under overload
Adaptive Partition 1
(Multi-media)
Adaptive Partition 2
(Java application)
Blocked
Running
Ready
CPU budget
available
CPU budget
exceeded
6
118
9
6
67
4
10
Highest priority READY thread in Partition with budget runs
No delay on scheduling if adaptive partition has budget
Design
8/3/2019 Adaptive Partition Scheduling
10/15
Cool Stuff from QNX 10January 24, 2012
Free Time: Behavior with unused CPU
Adaptive Partition 1
(Multi-media)
Adaptive Partition 2
(Java application)
Blocked
Running
CPU budget
exceeded
CPU budget
exceeded
6
118
9
6
67
4
10
If no partitions with remaining budget have READY threads, highest
priority READY thread is selected to run from other partitions
This allows free time to be given based upon priority> Free time is still accounted and may have to be paid back (for example, if partition 3
becomes ready within 1 averaging window)
Adaptive Partition 3
6
10
8
CPU budget
available
Design
109
8/3/2019 Adaptive Partition Scheduling
11/15
Cool Stuff from QNX 11January 24, 2012
30
Borrowed Time: Critical Threads
Adaptive Partition 1
(Multi-media)
Adaptive Partition 2
(Air Bag Control)
Blocked
Running
Ready
CPU budget
available
CPU budget
exceeded
6
118
11
6
67
4
30
Critical threads still run (based on priority) even if partition has no budget
Critical threads provide deterministic scheduling even in overload
Critical threads are given critical budget and can go into short-term debt> Critical time is accounted and has to be repaid
> Exceeding critical budget is considered an error and causes notification/action
Critical
Thread
11
Design
8/3/2019 Adaptive Partition Scheduling
12/15
Cool Stuff from QNX 12January 24, 2012
Equal time.
How to choose between partitions of equal priority> Unimportant?
> Many threads run at default priority, therefore equal priority
Possible algorithms:
> - round robin
> - favor partition with most free time
> - favor longest waiter
Requirement:> Minimize latencies during underload
> WBN: divide free time by % cpu share.
Solution: Interleave partitions by ratio of partition shares
We found a clever way to do that, so its in the patent.
Design
8/3/2019 Adaptive Partition Scheduling
13/15
Cool Stuff from QNX 13January 24, 2012
How it does it
uKernel
libmod_aps.aProcesscreation
messaging
Per-partitionReady Q
Schedulerclock intr handler
ready()
block()
select_thread()
for all partitions, p
Def m(p) ->
(bud(p)||crit(p), prio(p), run_t/wsize/bud(p))
Then schedule ps
Def ps -> rdy(ps) and (m(ps) < m(pi))
For all i != s
8/3/2019 Adaptive Partition Scheduling
14/15
Cool Stuff from QNX 14January 24, 2012
Overhead: Fancy, but is it fast?
Scheduling overhead increases with:> - number of partitions
> - number of messages/sec
> - number of clock interrupts/sec, i.e. ClockPeriod()
> * does not increase with number of threads *
Free or almost free operations:> Inheriting partition as part of message receive> Joining a thread to a partition
> Dynamically changing budgets
Computational requirements> 32 bit multiply, 64bit add
> *no floating point* *no divides* *no address space swapping**short-circuit calculation of merit function* *no inter-cpu msging onSMP* *history-less algorithm*
Overhead typically 1% of total cpu
8/3/2019 Adaptive Partition Scheduling
15/15
Cool Stuff from QNX
Any Queries????
15January 24, 2012
top related