[email protected] 1998-09-16 1/31 “Large Scale Audio Distribution on the Internet” A technical perspective by Kåre Synnes.

unic

orn@

cdt.l

uth.

se

1

998-

09-1

6

1/31

“Large Scale

Audio Distribution

on the Internet”

A technical perspective

by Kåre Synnes

unic

orn@

cdt.l

uth.

se

1

998-

09-1

6

2/31

• Born 1969 in Sollefteå, Sweden

• Books, games, sports, food, film, music, company

• Engaged to Maggie

unic

orn@

cdt.l

uth.

se

1

998-

09-1

6

3/31

Large Scale Audio Distribution

on the Internet

• Techniques for Packet-Loss Repairof Audio Streams

• Layering of Audio Data

• Adaptive Audio Applications

• Techniques for Packet-Loss Repairof Audio Streams

• Layering of Audio Data

• Adaptive Audio Applications

unic

orn@

cdt.l

uth.

se

1

998-

09-1

6

4/31

Large Scale Audio Distribution

on the Internet

Large Scale = Many receivers

Audio = Prioritized temporal data

Distribution = One-to-Many

Internet = Best-effort (lossy)

unic

orn@

cdt.l

uth.

se

1

998-

09-1

6

5/31

Issues at hand

• Distribution needs to be scalable for very large groups - multicast RTP/UDP/IP

• Best-effort IP transport results in:– delay (~400ms acceptable)– delay variation (buffering)– loss (congestion, jitter, overload, delay variation)

unic

orn@

cdt.l

uth.

se

1

998-

09-1

6

6/31

IP Multicast

What’s HOT!• Minimum traffic load• Scaleable...• Effective protocols

(RTP/UDP/IP)• Cheap, no special

network equipment needed (I.e. MTUs)

What’s HOT!• Minimum traffic load• Scaleable...• Effective protocols

(RTP/UDP/IP)• Cheap, no special

network equipment needed (I.e. MTUs)

What’s NOT!• By default turned off • Complex distribution

tree management• No back-off for UDP

at congestion• Lossy• Few applications

What’s NOT!• By default turned off • Complex distribution

tree management• No back-off for UDP

at congestion• Lossy• Few applications

unic

orn@

cdt.l

uth.

se

1

998-

09-1

6

7/31

Loss - a generalization

Low loss • Single packets are lost • Loss are 'almost' evenly

distributed

Medium and High loss • Packet are lost in twos

or threes • Losses are 'clustered'

Low loss • Single packets are lost • Loss are 'almost' evenly

distributed

Medium and High loss • Packet are lost in twos

or threes • Losses are 'clustered'

Also, given a large group: • Most receivers will have

2-5% loss • A small number of

receivers will have greater loss

• Each packet is assumed to be lost atleast once

Also, given a large group: • Most receivers will have

2-5% loss • A small number of

receivers will have greater loss

• Each packet is assumed to be lost atleast once

unic

orn@

cdt.l

uth.

se

1

998-

09-1

6

8/31

Techniques for Packet-Loss Repair of Audio Streams

Receiver-Only Repairs

• Silence Substitution

• Waveform Substitution

– White Noice

– Repetition

– (Predictive) Interpolation

Receiver-Only Repairs

• Silence Substitution

• Waveform Substitution

– White Noice

– Repetition

– (Predictive) Interpolation

Sender Initiated Repairs

• Piggy-backed Redundancy

• Forward Error Correction

• Parallell Redundancy

Receiver Initiated Repairs

• Semi-Reliable Transmissions

Sender Initiated Repairs

• Piggy-backed Redundancy

• Forward Error Correction

• Parallell Redundancy

Receiver Initiated Repairs

• Semi-Reliable Transmissions

unic

orn@

cdt.l

uth.

se

1

998-

09-1

6

9/31

Silence Substitution

• Very simple to implement • Adequate performance for:

– small packets ( <32ms )

– low loss ( <1% )

• Not very good (clipping)

unic

orn@

cdt.l

uth.

se

1

998-

09-1

6

10/31

White Noice

• Also, Very simple to implement

• Better than Silence Substitution

• Subconsious repairs – Applies to noice but not silence

• Tolerance of 5-10% loss

unic

orn@

cdt.l

uth.

se

1

998-

09-1

6

11/31

Self-similarity

• Speech waveforms often exhibit a degree of self-similarity.

• Generation of a replacement packet with similar spectral qualities is possible.

• Clips shorter than 30 ms is recommended (phonems).

• Speech waveforms often exhibit a degree of self-similarity.

• Generation of a replacement packet with similar spectral qualities is possible.

• Clips shorter than 30 ms is recommended (phonems).

unic

orn@

cdt.l

uth.

se

1

998-

09-1

6

12/31

Repetition

• Again, Very simple to implement

• Significantly improves audio quality, at 5-15% loss

• Bad effects if overdone (echo/reverberating)

• An amplitude gain shift is good

• Experience: 50% decrease for at most 2 consecutive 40ms clips

• Again, Very simple to implement

• Significantly improves audio quality, at 5-15% loss

• Bad effects if overdone (echo/reverberating)

• An amplitude gain shift is good

• Experience: 50% decrease for at most 2 consecutive 40ms clips

unic

orn@

cdt.l

uth.

se

1

998-

09-1

6

13/31

(Predictive) Interpolation

• Interpolation can be done in two ways: – Use two sorrounding clips (additional delay)

– Use two or more earlier clips (less accurate)

• Not so common due to complexity • Gives better results than Repetition

• Interpolation can be done in two ways: – Use two sorrounding clips (additional delay)

– Use two or more earlier clips (less accurate)

• Not so common due to complexity • Gives better results than Repetition

unic

orn@

cdt.l

uth.

se

1

998-

09-1

6

14/31

Interleaving

• Spread the effect of a packet over several packets, thus smaller losses to repair

• Phonems are ~20 ms

• Additional delay

• No extra BW cost

• Uncertain of the results (intelligibility)

unic

orn@

cdt.l

uth.

se

1

998-

09-1

6

15/31

Audio Formats

Name kbps Load ComentPCM 64 1 G.711ulaw

ADPCM 16-48 12 G.721/G.723/DVIGSM 13 1200LPC 4.8 110 One GSM step

There are several new codecs developed:

• proprietary

• down to 1.2 kbps!

There are several new codecs developed:

• proprietary

• down to 1.2 kbps!

unic

orn@

cdt.l

uth.

se

1

998-

09-1

6

16/31

Redundancy

• Synthetic low quality, low bit-rate encodings can be used as redundant repairs.

• LPC is considered to contain ~60% of a speech signal, while preserving the frequency spectra.

• GSM is even better, but at the double bit-rate, 13 vs 4.8 kbps.

• Multiple redundancy is also an option.

• Synthetic low quality, low bit-rate encodings can be used as redundant repairs.

• LPC is considered to contain ~60% of a speech signal, while preserving the frequency spectra.

• GSM is even better, but at the double bit-rate, 13 vs 4.8 kbps.

• Multiple redundancy is also an option.

unic

orn@

cdt.l

uth.

se

1

998-

09-1

6

17/31

Piggy-backed Redundancy

• High tolerance of loss (25-40%). • A singular redundancy using PCM (64 kbps)

and GSM (13 kbps) is common. • Degree of loss determines optimal delay. • Non-redundancy capable receivers may be

able to skip the the redundant encoding(s).

• High tolerance of loss (25-40%). • A singular redundancy using PCM (64 kbps)

and GSM (13 kbps) is common. • Degree of loss determines optimal delay. • Non-redundancy capable receivers may be

able to skip the the redundant encoding(s).

unic

orn@

cdt.l

uth.

se

1

998-

09-1

6

18/31

Forward Error Correction

• Redundancy is added with XOR methods

• 50% extra overhead in the example, but the redundancy can be recoded

• Other options possible as well, e.g.:

1. a, f(a,b), b, f(b,c), c, ...

2. a, b, c, x(a,b,c), d, e, f, x(d,e,f), ...

3. a, b, c, x(a,c), d, x(b,d), e, x(c,e), ...

4. x(a,b), x(b,c), x(a,b,c), ...

• Better than simple redundancy, but more CPU expensive

• Redundancy is added with XOR methods

• 50% extra overhead in the example, but the redundancy can be recoded

• Other options possible as well, e.g.:

1. a, f(a,b), b, f(b,c), c, ...

2. a, b, c, x(a,b,c), d, e, f, x(d,e,f), ...

3. a, b, c, x(a,c), d, x(b,d), e, x(c,e), ...

4. x(a,b), x(b,c), x(a,b,c), ...

• Better than simple redundancy, but more CPU expensive

unic

orn@

cdt.l

uth.

se

1

998-

09-1

6

19/31

Parallell Redundancy

The idea is to use several channels.

• Division of bandwidth need – Main transmission in one channel

– Redundancy over another cannel

• Can be applied to any scheme

• Receivers can decide how much redundancy, or even which encoding they prefer

• Additional overhead (headers)

The idea is to use several channels.

• Division of bandwidth need – Main transmission in one channel

– Redundancy over another cannel

• Can be applied to any scheme

• Receivers can decide how much redundancy, or even which encoding they prefer

• Additional overhead (headers)

unic

orn@

cdt.l

uth.

se

1

998-

09-1

6

20/31

Semi-Reliable Transmissions

1. The sender transmit a packet

2. A receiver send a NACK if it is lost

3. The sender retransmit the packet, if it is still in the queue

1. The sender transmit a packet

2. A receiver send a NACK if it is lost

3. The sender retransmit the packet, if it is still in the queue

• A time-limited repair is achieved

• Protocols such as SRRTP can be used.

• This can be used for small groups on networks with low delay

• Other redundancy schemes are preferable

• A time-limited repair is achieved

• Protocols such as SRRTP can be used.

• This can be used for small groups on networks with low delay

• Other redundancy schemes are preferable

unic

orn@

cdt.l

uth.

se

1

998-

09-1

6

21/31

mAudio

unic

orn@

cdt.l

uth.

se

1

998-

09-1

6

22/31

mAudio Recoveryint cnt = 0; // Number of consecutive lost packets

byte[] read() { if (received(n)) { // main or redundant packet decreaseBuffer(); // adaptive buffering cnt=0; return recode(n); } increaseBuffer(); cnt++; if (cnt == 1) // Repeat with 50% amplitude return amplify(n-1, 0.5); if (cnt == 2) // Repeat with 25% amplitude return amplify(n-2, 0.25); if (cnt < 10) // Feed noice with correct amplitude return noice(n-cnt);

return silence; // Feed silence}

Packet n is lost!

unic

orn@

cdt.l

uth.

se

1

998-

09-1

6

23/31

Layered Encodings

• Allows the receivers to adapt to network conditions – Main parts are sent over one channel – Additional parts over other channels

• Example, 6 layers:– 50%, 25%, 12%, 6%, 4%, 3%

• Can be CPU expensive

• This is tricky for audio, simpler for video

unic

orn@

cdt.l

uth.

se

1

998-

09-1

6

24/31

Simple Layering

Time (ms)

Amplitude (db) 8 kHz 8 kHz16 kHz

• Audio artifacts when only merged(frequency overtones)– ‘tin can’ sound

– reverberating

• Filtering needed

• Audio artifacts when only merged(frequency overtones)– ‘tin can’ sound

– reverberating

• Filtering needed

32 kHz sampling

8,16,24,32 kHz

unic

orn@

cdt.l

uth.

se

1

998-

09-1

6

25/31

Wavelet Encoding

Frequency (Hz)

Amplitude (db)

8 16 24 32

Speech

• Transform the data to the frequency domain, and divide it there

• Computational difficult (expensive)• Longer delays due to buffering• Very good division

• Transform the data to the frequency domain, and divide it there

• Computational difficult (expensive)• Longer delays due to buffering• Very good division

unic

orn@

cdt.l

uth.

se

1

998-

09-1

6

26/31

Adaptive Audio Applications

• How can we support heterogeneous environments?– Network: 56k modem,

ISDN, xDSL, Ethernet

– Load: congestion, hardware jitter, delay variation

– Client: Mobile phone, PDA, NC, PC, Workstation

• How can we support heterogeneous environments?– Network: 56k modem,

ISDN, xDSL, Ethernet

– Load: congestion, hardware jitter, delay variation

– Client: Mobile phone, PDA, NC, PC, Workstation

• Allow scaling of Quality

• NOT use a least common denominator!

• Senders should adapt slowly while receivers adapt more rapidly, i.e. highly adaptive clients

• Allow scaling of Quality

• NOT use a least common denominator!

• Senders should adapt slowly while receivers adapt more rapidly, i.e. highly adaptive clients

unic

orn@

cdt.l

uth.

se

1

998-

09-1

6

27/31

RTP/RTCPReceiver Reports

The receivers report on: • Loss rate

(long-term congestion) • Delay-variation

(short-term congestion) • Throughput • Additional

(Load, Encoding etc)

The receivers report on: • Loss rate

(long-term congestion) • Delay-variation

(short-term congestion) • Throughput • Additional

(Load, Encoding etc)

Can be used to change: • Encoding • Redundancy• Layering

How do we do this for

many receivers? Voting?

Can be used to change: • Encoding • Redundancy• Layering

How do we do this for

many receivers? Voting?

unic

orn@

cdt.l

uth.

se

1

998-

09-1

6

28/31

Summary

• Receiver-only techniques are good for low loss and small packets

• Up to 40% loss rates can be repaired intelligible, using redundancy schemes

• There is a trade-off between delays and buffering, which affects response-times

• Much can be done to enhance audio quality

• Receiver-only techniques are good for low loss and small packets

• Up to 40% loss rates can be repaired intelligible, using redundancy schemes

• There is a trade-off between delays and buffering, which affects response-times

• Much can be done to enhance audio quality

unic

orn@

cdt.l

uth.

se

1

998-

09-1

6

29/31

Questions?

E-mail: [email protected]: http://www.cdt.luth.se/~unicorn/

unic

orn@

cdt.l

uth.

se

1

998-

09-1

6

30/31

Future Work

• Use real network statistics to model loss, while studying receiver report effects

• Try different combinations of recovery, to achieve optimal adaptation

• Measure gain (intelligibility) vs. cost (net and CPU load)

[email protected] 1998-09-16 1/31 “Large Scale Audio Distribution on the Internet” A technical perspective by Kåre Synnes.

Documents

lost loss

greater loss

receivers audio

high loss packet

large scale audio distribution

internet large scale

maggie slide

delay variation slide