Adaptive Partition Scheduling

8/3/2019 Adaptive Partition Scheduling
1/15
January 24, 2012
Adaptive Partition Scheduling
Part 1: Why we did itCool stuff from QNXA.Danko

2/15
Cool Stuff from QNX 2January 24, 2012
Evolution of schedulersWhy?
Timeline
priority pre-emptive
Timeslicing
Time-varying priority
Really clever time-varying
Fair Share scheduling
Adaptive configuration
Yes, but:
System locks up
Backhoes and Mothers day
Untuneable for more than 1
application.
US Military Satcom
Hard to manage share interactions.
Not invented until now.
SCHED_FIFO
SCHED_RR
SCHED_SPORADIC

3/15
Evolution: Lessons learned
Numerical priorities are chosen by applications but systemscheduling behavior must be designed globally
Degradation and overload: Priorities are not constants.Importance of work depends on circumstances.> Modes: normal operation, restart, emergency maintenance
Scheduling strategy needs to be based on unit of work, butwhat we have is communicating threads.
must measure real-time behavior.> 0.1 % accuracy
Want to specify shares as global percentages> Applications dont get to pick their importance or shares. System engineers
do.
Need to throttle cpu usage without losing realtime latencies.
Why?

4/15
QNX Answer POSIX compatible design which can be
applied to existing systems with little or
no recoding
A global hard real-time scheduler with
overload protection and CPU guarantees> Separation of work based on working for
common purpose
Runtime typed memory and kernel object
guarantees and limits
>With full inheritance and accounting for allchildren
Persistent storage (file system)
guarantees and limits
Process model for fault isolation
Dynamic configuration
What is Partitioning?
General Answer
Separation of
work
To isolate:> cpu usage
> memory usage
> system resource
usage> Failures
Design
Adaptive Partition Scheduling

5/15
Principles
Scheduler must not trigger an overload> Overhead may not increase with # of threads
Real-time during underload> Same behavior as today
Real-time during overload> At least for interrupt handling
Must also be a fair-share scheduler> global scheduler algorithm
> globally configured
Must mesh with current QNX architecture Preemptive priority, individual thread scheduling
Heavy use of message passing
> Easy to drop onto existing applications
> Cant be a bag on the side
Simple enough for customers to use> Engineerable
> Reconfigure on the fly
Offered load
Throughput
Insert picture ofJuggling Watermelons
here
Design

6/15
Counting time
What does 14% cpu mean?> CPU usage is calculated over a sliding window.
>
Accuracy:
> Counting ticks is not enough. Micro-billing is used to track actual CPUutilization even when threads dont use their whole timeslice.
> micro- and nano-second resolution
> Threads are billed based on real usage, not statistics
windowsize is configurable as an argument to kernel at boot> Tradeoff maximum READY-state latency with accuracy of CPU budgeting
100ms window -> 1% accuracy or better.
> Internal arithmetic accurate to 0.5% or better
Partition usage> ns cpu time executed, during last sliding window, expressed as percentage
Partition budget> Guaranteed percentage of cpu time, balanced over sliding window
Design
T= nowT= -100ms

7/15
File System
Process
-
Whos got time: Partition Inheritance
Adaptive Partition 1
(Multi-media)
(Java application)
CPU budget
available
6
11
8
9
Resource manager threads work on behalf of sender
Priority and adaptive partition in inherited on receive> Execution time in server billed to clients partition
This allows proper accounting for shared resources
-
-
Receive Threads CPU budget
available
6
67
4
10
Design
99Message
9
10
Message
9
10

8/15
Real time: Behavior under normal load
(Multi-media)
(Java application)
Blocked
Running
Ready
CPU budget
available
CPU budget
available
6
118
99
6
67
4
1010
Hard real-time scheduler under normal load
Running thread selected as highest priority READY thread
No delay on scheduling if adaptive partition has budget
Design

9/15
Out of time: Behavior under overload
(Multi-media)
(Java application)
Blocked
Running
Ready
CPU budget
available
CPU budget
exceeded
6
118
9
6
67
4
10
Highest priority READY thread in Partition with budget runs
No delay on scheduling if adaptive partition has budget
Design

10/15
Free Time: Behavior with unused CPU
(Multi-media)
(Java application)
Blocked
Running
CPU budget
exceeded
CPU budget
exceeded
6
118
9
6
67
4
10
If no partitions with remaining budget have READY threads, highest
priority READY thread is selected to run from other partitions
This allows free time to be given based upon priority> Free time is still accounted and may have to be paid back (for example, if partition 3
becomes ready within 1 averaging window)
6
10
8
CPU budget
available
Design
109

11/15
30
Borrowed Time: Critical Threads
(Multi-media)
(Air Bag Control)
Blocked
Running
Ready
CPU budget
available
CPU budget
exceeded
6
118
11
6
67
4
30
Critical threads still run (based on priority) even if partition has no budget
Critical threads provide deterministic scheduling even in overload
Critical threads are given critical budget and can go into short-term debt> Critical time is accounted and has to be repaid
> Exceeding critical budget is considered an error and causes notification/action
Critical
Thread
11
Design

12/15
Equal time.
How to choose between partitions of equal priority> Unimportant?
> Many threads run at default priority, therefore equal priority
Possible algorithms:
> - round robin
> - favor partition with most free time
> - favor longest waiter
Requirement:> Minimize latencies during underload
> WBN: divide free time by % cpu share.
Solution: Interleave partitions by ratio of partition shares
We found a clever way to do that, so its in the patent.
Design

13/15
How it does it
uKernel
libmod_aps.aProcesscreation
messaging
Per-partitionReady Q
Schedulerclock intr handler
ready()
block()
select_thread()
for all partitions, p
Def m(p) ->
(bud(p)||crit(p), prio(p), run_t/wsize/bud(p))
Then schedule ps
Def ps -> rdy(ps) and (m(ps) < m(pi))
For all i != s

14/15
Overhead: Fancy, but is it fast?
Scheduling overhead increases with:> - number of partitions
> - number of messages/sec
> - number of clock interrupts/sec, i.e. ClockPeriod()
> * does not increase with number of threads *
Free or almost free operations:> Inheriting partition as part of message receive> Joining a thread to a partition
> Dynamically changing budgets
Computational requirements> 32 bit multiply, 64bit add
> *no floating point* *no divides* *no address space swapping**short-circuit calculation of merit function* *no inter-cpu msging onSMP* *history-less algorithm*
Overhead typically 1% of total cpu

15/15
Cool Stuff from QNX
Any Queries????
15January 24, 2012

Adaptive Partition Scheduling

Documents

Adaptive Cyclic Scheduling of Nested Loops

Model-based optimization of ARINC-653 partition scheduling

Adaptive Bandwidth Reservation and Scheduling for...

Adaptive Fuzzy Gain Scheduling of PI Controller for ...

Adaptive Bandwidth Reservation and Scheduling for...

Adaptive Cache Aware Multiprocessor Scheduling...

Adaptive Partition Weighted Monte Carlo Estimation

ADAPTIVE AND FAULT-TOLERANT SCHEDULING FOR EFFECTIVE …

Adaptive Job Routing and Scheduling

Multi-IMA Partition Scheduling for Global I/O...

Adaptive Virtual Machine Scheduling and Migration for ...

Adaptive HARQ and Scheduling for Video over LTE

Adaptive Scheduling Using Performance Introspection - RENCI

Hydroponics water management using adaptive scheduling...

An Adaptive Scheduling Algorithm for Dynamic Heterogeneous.....

Adaptive QoS scheduling in a service-oriented grid...