Top Banner
Server Resources Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:
82

Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

Dec 13, 2015

Download

Documents

Emil Wells
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

Server ResourcesServer Resources

12/9 - 2005

INF5070 – Media Storage and Distribution Systems:

Page 2: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Overview

Resources, real-time, “continuous” media streams, …

(CPU) Scheduling

Memory management

Page 3: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

Resources and Real–Time

Page 4: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Resources Resource:

“A resource is a system entity required by a task for manipulating data” [Steimetz & Narhstedt 95]

Characteristics: active: provides a service, e.g., CPU, disk or network adapter passive: system capabilities required by active resources, e.g.,

memory

exclusive: only one process at a time can use it, e.g., CPU shared: can be used by several concurrent processed, e.g.,

memory

single: exists only once in the system, e.g., loudspeaker multiple: several within a system, e.g., CPUs in a multi-

processor system

Page 5: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Real–Time Real-time process:

“A process which delivers the results of the processing in a given time-span”

Real-time system:“A system in which the correctness of a computation depends not only on obtaining the result, but also upon providing the result on time”

Many real-time applications, e.g.: temperature control in a nuclear/chemical plant

driven by interrupts from an external device these interrupts occur irregularly

defense system on a navy boat driven by interrupts from an external device these interrupts occur irregularly

control of a flight simulator execution at periodic intervals scheduled by timer-services which the application requests from the OS

...

Page 6: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Real–Time Deadline:

“A deadline represents the latest acceptable time for the presentation of the processing result”

Hard deadlines: must never be violated system failure too late results

have no value, e.g., processing weather forecasts

means severe (catastrophic) system failure, e.g., processing of an incoming torpedo signal in a navy boat scenario

Soft deadlines: in some cases, the deadline might be missed

not too frequently not by much time

result still may have some (but decreasing) value, e.g., a late I-frame in MPEG

Page 7: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Real–Time and Multimedia Multimedia systems

have periodic processing requirements (e.g., each 33 ms in a 30 fps video)

require large bandwidths (e.g., average of 3.5 Mbps for DVD video only)

typically have soft deadlines (may miss a frame) are non-critical (user may be annoyed, but …)

need predictability (guarantees) adapt real-time mechanisms to continuous media priority-based schemes are of special importance

Page 8: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Admission and Reservation To prevent overload, admission may be performed:

schedulability test: “are there enough resources available for a new stream?” “can we find a schedule for the new task without disturbing the existing workload?” a task is allowed if the utilization remains < 1

yes – allow new task, allocate/reserve resources no – reject

Resource reservation is analogous to booking(asking for resources) pessimistic

avoid resource conflicts making worst-case reservations potentially under-utilized resources guaranteed QoS

optimistic reserve according to average load high utilization overload may occur

perfect must have detailed knowledge about resource requirements of all processes too expensive to make/takes much time

Page 9: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Real–Time and Operating Systems The operating system manages local resources

(CPU, memory, disk, network card, busses, ...)

In a real-time, multimedia scenario, support is needed for: real-time processing efficient memory management

This also means support for proper … scheduling –

high priorities for time-restrictive multimedia tasks timer support –

clock with fine granularity and event scheduling with high accuracy kernel preemption –

avoid long periods where low priority processes cannot be interrupted memory replacement –

prevent code for real-time programs from being paged out fast switching –

both interrupts and context switching should be fast ...

Page 10: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

Continuous Media Streams

Page 11: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Start playback at t1

Consumed bytes (offset) variable rate constant rate

Must start retrieving data earlier

Data must arrive beforeconsumption time

Data must be sent before arrival time

Data must be read from disk before sending time

Streaming Data

t1

time

data offset

consume function

arrive function

send functionread function

Page 12: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Need buffers to hold data between the functions, e.g., client B(t) = A(t) – C(t), i.e., t : A(t) ≥ C(t)

Latest start of data arrival is given by min[B(t,t0,t1) ; t B(t,t0,t1) ≥ 0],

i.e., the buffer must at all times t have more data to consume

Streaming Data

time

data offset

t1

consume function

arrive function

t 0

Page 13: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

file systemcommunication

system

application

“Continuous Media” and “continuous streams” are ILLUSIONS retrieve data in blocks from disk transfer blocks from file

system to application send packets to communication system

split packets into appropriate MTUs

... (intermediate nodes) ... (client)

different optimal sizes

pseudo-parallel processes (run in time slices)

need for scheduling(to have timing and appropriate resource allocation)

Streaming Data

Page 14: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

(CPU) Scheduling

Page 15: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Scheduling A task is a schedulable entity

(a process/thread executing a job, e.g., an packet through the communication system or a disk request through the file system)

In a multi-tasking system, several tasks may wish to use a resource simultaneously

A scheduler decides which task that may use the resource, i.e., determines order by which requests are serviced, using a scheduling algorithm

Each active (CPU, disk, NIC) resources needs a scheduler(passive resources are also “scheduled”, but in a slightly different way)

resource

requests

scheduler

Page 16: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Scheduling Scheduling algorithm classification:

dynamic make scheduling decisions at run-time flexible to adapt considers only actual task requests and execution time parameters large run-time overhead finding a schedule

static make scheduling decisions at off-line (also called pre-run-time) generates a dispatching table for run-time dispatcher at compile time needs complete knowledge of task before compiling small run-time overhead

preemptive currently executing task may be interrupted (preempted) by higher priority

processes preempted process continues later at the same state potential frequent contexts switching (almost!?) useless for disk and network cards

non-preemptive running tasks will be allowed to finish its time-slot (higher priority processes

must wait) reasonable for short tasks like sending a packet (used by disk and network

cards) less frequent switches

Page 17: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Scheduling Preemption:

tasks waits for processing scheduler assigns priorities task with highest priority will be

scheduled first preempt current execution if a higher

priority (more urgent) task arrives

real-time and best effort priorities(real-time processes have higher priority - if exists, they will run)

to kinds of preemption: preemption points

o predictable overheado simplified scheduler accounting

immediate preemptiono needed for hard real-time systemso needs special timers and

fast interrupt and context switch handling

resource

requests

scheduler preemption

Page 18: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Scheduling Scheduling is difficult and takes time:

process 1 process 2 process 3 process 4 process N RT process…

RT process

request

round-robin

process 1 process 2 process 3 process 4 process N…

RT process

requestpriority,non-preemtive

delay

RT process

delay

process 1 process 2 process 3 process 4 process N…

requestpriority,preemtive p 1 p 1 process 2 process 3 process 4 process N…

RT process

RT process p 1 process 2 process 3 process 4 process N…

only delay switching and interrupts

Page 19: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Priorities and Multimedia Multimedia streams need predictable access to

resources – high priorities, e.g.:

Within each class one could have a second-level scheduler 1 and 2: real-time scheduling and fine grained

priorities 3: may use traditional approaches as round-robin

1. multimedia traffic with guaranteed QoS

2. multimedia traffic with predictive QoS

3. other requests

may not exist

must not starve

Page 20: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Scheduling in Windows 2000 Preemptive kernel Schedules threads individually

Time slices given in quantums 3 quantums = 1 clock interval (length of interval may vary)

defaults: Win2000 server: 36 quantums Win2000 workstation (professional) : 6 quantums

may manually be increased between threads (1x, 2x, 4x, 6x)

foreground quantum boost (add 0x, 1x, 2x): active window can get longer time slices (assumed needs fast response)

Page 21: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Scheduling in Windows 2000 32 priority levels:

Round Robin (RR) within each level

Interactive and throughput-oriented: “Real time” – 16 system levels

fixed priority may run forever

Variable – 15 user levels priority may change:

thread priority = process priority ± 2 uses much drops user interactions, I/O completions increase

Idle/zero-page thread – 1 system level runs whenever there are no other processes to run e.g., clearing memory pages for memory manager

31

30

...

17

16

15

14

...

2

1

0

Real Time (system thread)

Variable (user thread)

Idle (system thread)

Page 22: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Scheduling in Linux Preemptive kernel Threads and processes used to be equal,

but Linux uses (in 2.6) thread scheduling

SHED_FIFO may run forever, no timeslices may use it’s own scheduling algorithm

SHED_RR each priority in RR timeslices of 10 ms (quantums)

SHED_OTHER ordinary user processes uses “nice”-values: 1≤ priority≤40 timeslices of 10 ms (quantums)

Threads with highest goodness are selected first:

realtime (FIFO and RR):goodness = 1000 + priority

timesharing (OTHER): goodness = (quantum > 0 ? quantum + priority : 0)

Quantums are reset when no ready process has quantums left (end of epoch):quantum = (quantum/2) + priority

1

2

...

126

127

1

2

...

126

127

default (20)

-20

-19

...

18

19

SHED_FIFO

SHED_RR

SHED_OTHER

nice

Page 23: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Scheduling in AIX Similar to Linux, but has

always only used thread scheduling SHED_FIFO SHED_RR SHED_OTHER

BUT, SHED_OTHER may change “nice” values running long (whole

timeslices) penalty – nice increase

interrupted (e.g., I/O) gives initial “nice” value back

1

2

...

126

127

1

2

...

126

127

default

-20

-19

...

18

19

SHED_FIFO

SHED_RR

SHED_OTHER

nice

Page 24: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Multimedia streams are usually periodic (fixed frame rates and audio sample frequencies)

Time constraints for a periodic task: s – starting point

(first time the task require processing) e – processing time d – deadline p – period (r – rate (r = 1/p))

0 ≤ e ≤ d (often d ≤ p: we’ll use d = p – end of period, but Σd ≤ Σp is enough)

the kth processing of the task is ready at time s + (k – 1) p must be finished at time s + (k – 1) p + d

the scheduling algorithm must account for these properties

Real–Time Scheduling

s time

ed

p

Page 25: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Real–Time Scheduling Resource reservation

QoS can be guaranteed relies on knowledge of tasks no fairness origin: time sharing operating systems e.g., earliest deadline first (EDF) and rate monotonic (RM)

(AQUA, HeiTS, RT Upcalls, ...)

Proportional share resource allocation no guarantees requirements are specified by a relative share allocation in proportion to competing shares size of a share depends on system state and time origin: packet switched networks e.g., Scheduler for Multimedia And Real-Time (SMART)

(Lottery, Stride, Move-to-Rear List, ...)

Page 26: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Earliest Deadline First (EDF) Preemptive scheduling based on dynamic task priorities

Task with closest deadline has highest priority stream priorities vary with time

Dispatcher selects the highest priority task

Assumptions: requests for all tasks with deadlines are periodic the deadline of a task is equal to the end on its period (starting

of next) independent tasks (no precedence) run-time for each task is known and constant context switches can be ignored

Page 27: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Earliest Deadline First (EDF)

Example:

Task A

Task Btime

Dispatching

deadlines

priority A > priority B

priority A < priority B

Page 28: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Rate Monotonic (RM) Scheduling Classic algorithm for hard real-time systems with one

CPU [Liu & Layland ‘73]

Pre-emptive scheduling based on static task priorities

Optimal: no other algorithms with static task priorities can schedule tasks that cannot be scheduled by RM

Assumptions: requests for all tasks with deadlines are periodic the deadline of a task is equal to the end on its period (starting of

next) independent tasks (no precedence) run-time for each task is known and constant context switches can be ignored any non-periodic task has no deadline

Page 29: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Process priority based on task periods task with shortest period gets

highest static priority task with longest period gets

lowest static priority dispatcher always selects task requests with highest priority

Example:

Rate Monotonic (RM) Scheduling

pri

ori

ty

period length

shortest period, highest priority

longest period, lowest priority

Task 1

p1

Dispatching

Task 2

p2 P1 < P2

P1 highest priority

Page 30: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

EDF Versus RM It might be impossible to prevent deadline misses in a strict, fixed priority system:

Task A

Task B

Fixed priorities,A has priority, no dropping

Fixed priorities,B has priority, no dropping

Fixed priorities,A has priority, dropping

Fixed priorities,B has priority, dropping

time

deadline miss

deadline miss

deadline miss

deadline miss

Earliest deadline first

deadlines

waste of time

waste of time

waste of time

Rate monotonic (as the first)

deadline miss

RM may give somedeadline violationswhich is avoided by EDF

Page 31: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

NOTE: this means that EDF is usually more efficient than RM, i.e., if switchesare free and EDF uses resources ≤ 1, then RM may need ≤ ln(2) resources to schedule the same workload

EDF Versus RM EDF

dynamic priorities changing in time overhead in priority switching QoS calculation – maximal throughput:

Ri x ei ≤ 1, R – rate, e – processing time

RM static priorities based on periods may map priority onto fixed OS priorities (like Linux) QoS calculation:

Ri x ei ≤ ln(2), R – rate, e – processing time

all streams i

all streams i

Page 32: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

SMART (Scheduler for Multimedia And Real–Time applications)

Designed for multimedia and real-time applications

Principles

priority – high priority tasks should not suffer degradation due to presence of low priority tasks

proportional sharing – allocate resources proportionally and distribute unused resources (work conserving)

tradeoff immediate fairness – real-time and less competitive processes (short-lived, interactive, I/O-bound, ...) get instantaneous higher shares

graceful transitions – adapt smoothly to resource demand changes

notification – notify applications of resource changes

Proportional shares no admission control

Page 33: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Tasks have importance and urgency urgency – an immediate real-time constraint, short deadline

(determine when a task will get resources) importance – a priority measure

expressed by a tuple: [ priority p , biased virtual finishing time bvft ]

p is static: supplied by user or assigned a default value

bvft is dynamic:o virtual finishing time: degree to which the share was consumedo bias: bonus for interactive tasks

Best effort schedule based on urgency and importance find most important tasks – compare tuple:

T1 > T2 (p1 > p2) (p1 = p2 bvft1 > bvft2) sort after urgency (EDF based sorting) iteratively select task from candidate set as long as schedule is

feasible

SMART (Scheduler for Multimedia And Real–Time applications)

Page 34: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Evaluation of a Real–Time Scheduling

Tests performed by IBM (1993) executing tasks with and without EDF on an 57 MHz, 32 MB RAM, AIX Power 1

Video playback program: one real-time process

read compressed data decompress data present video frames via X server to user

process requires 15 timeslots of 28 ms each per second 42 % of the CPU time

Page 35: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Evaluation of a Real–Time Scheduling

task numberevent number

lax

ity [

s]3 Load Processes

-0.05

-0.04

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0.04

0.05

0 20 40 60 80 100 120 140 160 180 200

without real-time schedulingwith real-time scheduling

laxit

y (

rem

ain

ing t

ime t

o d

eadlin

e)

several deadlineviolations by thenon-real-timescheduler

the real-time scheduler reaches all its deadlines

3 load processes(competing with the video playback)

Page 36: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Evaluation of a Real–Time Scheduling

0.026

0.028

0.03

0.032

0.034

0.036

0.038

0.04

0.042

0 20 40 60 80 100 120 140 160 180 200task number

laxit

y (

rem

ain

ing t

ime t

o d

eadlin

e)

Varied the number of load processes(competing with the video playback)

NB! The EDF scheduler kept its deadlines

4 other processes

16 other processes

Only video process

Page 37: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Evaluation of a Real–Time Scheduling

Tests again performed by IBM (1993) on an 57 MHz, 32 MB RAM, AIX Power 1

“Stupid” end system program: 3 real-time processes only requesting CPU cycles each process requires 15 timeslots of 21 ms each per

second 31.5 % of the CPU time each 94.5 % of the CPU time required for real-time tasks

Page 38: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Evaluation of a Real–Time Scheduling

1 Load Process

event number

laxi

ty [

s]

-0.08

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0 20 40 60 80 100 120 140 160 180 200

without real-time scheduling

with real-time scheduling

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05

0 20 40 60 80 100 120 140 160 180 200

with real-time scheduling – process 1with real-time scheduling – process 2with real-time scheduling – process 3

16 Load Processes

laxi

ty [

s]

event number

1 load process(competing with the real-time processes)

task number

laxit

y (

rem

ain

ing t

ime t

o d

eadlin

e)

the real-time scheduler reaches all its deadlines

Page 39: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Evaluation of a Real–Time Scheduling

16 load process(competing with the real-time processes)

task number

laxit

y (

rem

ain

ing t

ime t

o d

eadlin

e)

1 Load Process

event number

laxi

ty [

s]

-0.08

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0 20 40 60 80 100 120 140 160 180 200

without real-time scheduling

with real-time scheduling

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05

0 20 40 60 80 100 120 140 160 180 200

with real-time scheduling – process 1with real-time scheduling – process 2with real-time scheduling – process 3

16 Load Processesla

xity

[s]

event number

Regardless of other load, the EDF-scheduler reach its deadlines(laxity almost equal as in 1 load process scenario)

process 1

process 2

process 3NOTE: Processes are scheduled in same order

1 Load Process

event number

laxi

ty [

s]

-0.08

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0 20 40 60 80 100 120 140 160 180 200

without real-time scheduling

with real-time scheduling

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05

0 20 40 60 80 100 120 140 160 180 200

with real-time scheduling – process 1with real-time scheduling – process 2with real-time scheduling – process 3

16 Load Processes

laxi

ty [

s]

event number

Page 40: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

Memory Management

Page 41: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Delivery Systems

Network

bus(es)

Page 42: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

file systemcommunication

system

application

user space

kernel space

bus(es)

Delivery Systems

several disk-to-memory transfers

several in-memory data movements and context switches

Page 43: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

Memory Caching

Page 44: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Memory Caching

communication system

application

disk network card

expensive

file system

cache

caching possible

How do we manage a cache? how much memory to use? how much data to prefetch? which data item to replace? …

Page 45: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Is Caching Useful in a Multimedia Scenario?

High rate data may need lots of memory for caching…

Tradeoff: amount of memory, algorithms complexity, gain, …

Cache only frequently used data – how?(e.g., first (small) parts of a broadcast partitioning scheme, allow “top-ten” only, …)

Buffer vs. Rate

160 Kbps(e.g., MP3)

1.4 Mbps (e.g., uncompressed

CD)

3.5 Mbps (e.g., average DVD

video)

100 Mbps (e.g., uncompressed

HDTV)

100 MB 85 min 20 s 9 min 31 s 3 min 49 s 8 s

1 GB 14 hr 33 min 49 s

1 hr 37 min 31 s 39 min 01 s 1 min 22 s

16 GB 133 hr 01 min 01 s

26 hr 00 min 23 s

10 hr 24 min 09 s

21 min 51 s

32 GB 266 hr 02 min 02 s

52 hr 00 min 46 s

20 hr 48 min 18 s

43 min 41 sMaximum amount of memory (totally)that a Dell Server can manage in 2004 – and all is NOT used for caching

Page 46: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Need For Special “Multimedia Algorithms” ?

Most existing systems use an LRU-variant keep a sorted list replace first in list insert new data elements at the end if a data element is re-accessed (e.g., new client or rewind),

move back to the end of the list

Extreme example – video frame playout:LRU buffer

longest time

since accessshortest time

since access

play video (7 frames): 1234567

rewind and restart playout at 1: 7 6 5 4 3 21

playout 2: 1 7 6 5 4 32

playout 3: 2 1 7 6 5 43

playout 4: 3 2 1 7 6 54

In this case, LRU replaces the next needed frame. So the answer is in many cases YES…

Page 47: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

“Classification” of Mechanisms Block-level caching consider (possibly unrelated) set of blocks

each data element is viewed upon as an independent item usually used in “traditional” systems e.g., FIFO, LRU, CLOCK, …

multimedia (video) approaches: Least/Most Relevant for Presentation (L/MRP) …

Stream-dependent caching consider a stream object as a whole related data elements are treated in the same way research prototypes in multimedia systems e.g.,

BASIC DISTANCE Interval Caching (IC) Generalized Interval Caching (GIC) Split and Merge (SAM) SHR

Page 48: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Least/Most Relevant for Presentation (L/MRP)

L/MRP is a buffer management mechanism for a single interactive, continuous data stream

adaptable to individual multimedia applications

preloads units most relevant for presentation from disk

replaces units least relevant for presentation

client pull based architecture

[Moser et al. 95]

Server

request

Homogeneous stream e.g., MJPEG video

ClientBuffer

request

Continuous Presentation Units (COPU)e.g., MJPEG video frames

Page 49: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

current presentation point

Least/Most Relevant for Presentation (L/MRP) Relevance values are calculated with respect to current playout of the

multimedia stream presentation point (current position in file) mode / speed (forward, backward, FF, FB, jump)

relevance functions are configurable

[Moser et al. 95]

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

COPUs – continuous object presentation units

1011

2021

26

COPU number10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

relevance value

1.0

0

0.8

0.6

0.4

0.2

X referenced

X history

playback direction

1213

1415 16 17 18 19

2524

2322

X skipped

16 18

20

22

24

26

Page 50: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

loaded frames

Global relevance value each COPU can have more than one relevance value

bookmark sets (known interaction points) several viewers (clients) of the same

= maximum relevance for each COPU

Least/Most Relevant for Presentation (L/MRP)[Moser et al. 95]

... ...

0

1

Relevance

Bookmark-Set Referenced-SetHistory-Set

100 101 102 1039998

current presentation

point S1

91 92 93 949089 95 96 97 104 105 106

current presentation

point S2

global relevance value

Page 51: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Least/Most Relevant for Presentation (L/MRP)

L/MRP … … gives “few” disk accesses (compared to other schemes) … supports interactivity … supports prefetching

… targeted for single streams (users) … expensive (!) to execute

(calculate relevance values for all COPUs each round)

Variations: Q-L/MRP – extends L/MRP with multiple streams and changes

prefetching mechanism (reduces overhead) [Halvorsen et. al. 98]

MPEG-L/MRP – gives different relevance values for different MPEG frames [Boll et. all. 00]

Page 52: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Interval Caching (IC) Interval caching (IC) is a caching strategy for streaming servers

caches data between requests for same video stream – based on playout intervals between requests

following requests are thus served from the cache filled by preceding stream

sort intervals on length, buffer requirement is data size of interval

to maximize cache hit ratio (minimize disk accesses) the shortest intervals are cached first

Video clip 1

S11

Video clip 1

S11S12

Video clip 1

S12 S11S13

Video clip 2

S22 S21

Video clip 3

S33 S31S32S34

I11I12

I21

I31I32I33

: I32 I33 I21I11I31I12

Page 53: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Generalized Interval Caching (GIC) Interval caching (IC) does not work for short clips

a frequently accessed short clip will not be cached

GIC generalizes the IC strategy manages intervals for long video objects as IC short intervals extend the interval definition

keep track of a finished stream for a while after its termination define the interval for short stream as the length between the new stream and the position of the old

stream if it had been a longer video object the cache requirement is, however, only the real requirement

cache the shortest intervals as in IC

Video clip 1

S11S12

I11

C11

S11

Video clip 2

S22 S21

I21

Page 54: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

wasted buffering

LRU vs. L/MRP vs. IC Caching What kind of caching strategy is best (VoD

streaming)? caching effect

movie X

S5 S4 S2 S1S3

Memory (L/MRP):

Memory (IC):

loaded page frames

global relevance values

I1 I2I3 I4

4 streams from disk, 1 from cache

2 streams from disk, 3 from cache

Memory (LRU): 4 streams from disk, 1 from cache

Page 55: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

LRU vs. L/MRP vs. IC Caching What kind of caching strategy is best (VoD

streaming)? caching effect (IC best) CPU requirementLRU

for each I/O request reorder LRU chain

L/MRP

for each I/O request for each COPU RV = 0 for each stream tmp = r ( COPU, p, mode ) RV = max ( RV, tmp )

IC

for each block consumed if last part of interval release memory element

Page 56: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

In-Memory Copy Operations

Page 57: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

In Memory Copy Operations

communication system

application

disk network card

expensive

file system

expensive

Page 58: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Cost of Data Transfers Data copy operations are expensive

consume CPU, memory, hub, bus and interface resources (proportional to size)

profiling shows that ~40% of CPU time is consumed by copying data

speed-gap between memory and CPU increase different access times to different banks

System calls makes a lot of switches between user and kernel space ~450 ns in 2000 on 933MHz PentiumIII ~920 ns in 2005 on 1.7GHz PentiumIV

memcpy() - 1.7GHz PentiumIV

Page 59: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Cost of Data Transfers THUS; data movement costs should be kept

small careful management of contiguous media data avoid unnecessary physical copy operations apply appropriate buffer management schemes

reduce overhead by removing physical in-memory copy operation, i.e., ZERO-COPY ZERO-COPY data pathsdata paths

Page 60: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

file systemcommunication

system

application

user space

kernel space

bus(es)

data_pointer data_pointer

Basic Idea of Zero–Copy Data Paths

Page 61: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Zero–Copy (Streaming) Mechanisms

Linux: sendfile() between two descriptors (file and TCP-socket) bi-directional: disk-network and network-disk need TCP_CORK

AIX: send_file() only TCP uni-directional: disk-network

INSTANCE (MMBUF-based, in NetBSDv1.5): by UniK/IFI (2000) uni-directional: disk-network

(network-disk ongoing work) stream_read() and stream_send()

(zero-copy 1) stream_rdsnd()

(zero-copy 2)

splice(), stream(), IO-Lite, MMBUF, …

Kernel streaming using zero-copy

Application streaming using zero-copy

Page 62: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

INSTANCE Zero–Copy Transfer Rate

Throughput increase of ~2.7 times per stream (can at least double the number of streams)

Zero-copy transfer rate limited by network cardand storage system

saturated a 1 Gbps NIC and 32-bit, 33 MHz PCI

reduced processing time by approximately 50 %

huge improvement in number of concurrent streams

approx. 12 Mbps

approx. 6 Mbps

read, write, with copy

read, write, no copy

read, automatic write, no copy

Page 63: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

Existing Linux Existing Linux Data PathsData Paths

A lot of research has been performed in this area!!!!BUT, what is the status today of commodity operating systems?

Page 64: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Content Download

file systemcommunication

system

application

user space

kernel space

bus(es)

Page 65: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Content Download: read / send

application

kernel

page cache socket buffer

applicationbuffer

read send

copycopy

DMA transfer DMA transfer

2n copy operations 2n system calls

Page 66: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Content Download: mmap / send

application

kernel

page cache socket buffer

mmap send

copy

DMA transfer DMA transfer

n copy operations 1 + n system calls

Page 67: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Content Download: sendfile

application

kernel

page cache socket buffer

sendfile

gather DMA transfer

append descriptor

DMA transfer

0 copy operations 1 system calls

Page 68: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Content Download: Results

UDP TCP

Tested transfer of 1 GB file on Linux 2.6 Both UDP (with enhancements) and TCP

Page 69: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Streaming

file systemcommunication

system

application

user space

kernel space

bus(es)

Page 70: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Streaming: mmap / send

application

kernel

page cache socket buffer

application buffer

mmap uncork

copy

DMA transfer DMA transfer

2n copy operations 1 + 4n system calls

copy

sendsendcork

Page 71: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Streaming: mmap / writev

application

kernel

page cache socket buffer

application buffer

mmap writev

copy

DMA transfer DMA transfer

2n copy operations 1 + n system calls

copy

Previous solution three less calls per packet

Page 72: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Streaming: sendfile

application

kernel

page cache socket buffer

application buffer

DMA transfer

n copy operations 4n system calls

gather DMA transfer

append descriptor

copy

uncorksendfilesendcork

Page 73: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Streaming: Results Tested streaming of 1 GB file on Linux 2.6 RTP over UDP

TCP sendfile (content download)

Compared to not sending an RTP header over UDP, we get an increase of 29%(additional send call)

More copy operations and system calls required potential for improvements

Page 74: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

Enhanced Streaming Enhanced Streaming

Data PathsData Paths

Page 75: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Enhanced Streaming: mmap / msend

application

kernel

page cache socket buffer

application buffer

DMA transfer

n copy operations 1 + 4n system calls

gather DMA transfer

append descriptor

copy

msend allows to send data from anmmap’ed file without copy

mmap uncorksendsendcork msend

copy

DMA transfer

Previous solution one more copy per packet

Page 76: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Enhanced Streaming: mmap / rtpmsend

application

kernel

page cache socket buffer

application buffer

DMA transfer

n copy operations 1 + n system calls

gather DMA transfer

append descriptor

copy

mmap uncorksendsendcork rtpmsend

RTP header copy integrated intomsend system call

previous solution require three more calls per packet

Page 77: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Enhanced Streaming: mmap / krtpmsend

application

kernel

page cache socket buffer

application buffer

DMA transfer

0 copy operations 1 system call

gather DMA transfer

append descriptor

copy

krtpmsend

previous solution require one more call per packet

An RTP engine in the kernel adds RTP headers

rtpmsend

RTP engine

previous solution require one more copy per packet

Page 78: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Enhanced Streaming: rtpsendfile

application

kernel

page cache socket buffer

application buffer

DMA transfer

n copy operations n system calls

gather DMA transfer

append descriptor

copy

rtpsendfile

existing solution require three more calls per packet

uncorksendfilesendcork

RTP header copy integrated intosendfile system call

Page 79: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Enhanced Streaming: krtpsendfile

application

kernel

page cache socket buffer

application buffer

DMA transfer

0 copy operations 1 system call

gather DMA transfer

append descriptor

copy

krtpsendfile

previous solution require one more call per packet

An RTP engine in the kerneladds RTP headers

rtpsendfile

RTP engine

previous solution require one more copy per packet

Page 80: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Enhanced Streaming: Results

Tested streaming of 1 GB file on Linux 2.6 RTP over UDP

TCP

send

file

(con

tent

dow

nlo

ad)Ex

isting

mec

hani

sm

(str

eam

ing)

mmap based mechanisms sendfile based mechanisms

~27%

impr

ovem

ent

~25%

impr

ovem

ent

Page 81: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

The End:Summary

Page 82: Server Resources 12/9 - 2005 INF5070 – Media Storage and Distribution Systems:

2005 Carsten Griwodz & Pål Halvorsen

INF5070 – media storage and distribution systems

Summary All resources needs to be scheduled Scheduling algorithms for multimedia tasks have to…

… consider real-time requirements … provide good resource utilization (… be implementable)

Memory management is an important issue caching copying is expensive

Rule of thumb: watch out for bottlenecks copying data touching operations frequent context switches (system calls) scheduling of slow devices (disk) ...