Top Banner
Multimedia Services: Audio Sep-2015 Dani Gutiérrez Porset Associate Professor Communications Engineering Eman ta zabal zazu
33

Multimedia Services: Audio

Apr 16, 2017

Download

Technology

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Multimedia Services: Audio

Multimedia Services:Audio

Sep-2015

Dani Gutiérrez PorsetAssociate Professor

Communications Engineering

Eman ta zabal zazu

Page 2: Multimedia Services: Audio

2 2Multimedia Services:

Audio

Thanks, Licences and Tools

● Thanks to people and organizations who took or take part in free software and free knowledge projects, specially Wikimedia Foundation and KDE

● This presentation is licensed as CC BY-SA 3.0 EShttp://creativecommons.org/licenses/by-sa/3.0/es/

● This presentation has been made with KDE, LibreOffice, Inkscape, Gimp, Chromium, Firefox

Page 3: Multimedia Services: Audio

3 3Multimedia Services:

Audio

Sources and References

● Images from Wikimedia Foundation, if not referenced other source. Logos and trademarks belong to respective organizations

● Texts:

– Wikipedia pages and referenced articles and material– “Guide to Voice and Video over IP” - Sun, Mkwawa, Jammeh,

Ifeachor– “Video over IP” - Wes Simpson– “Computer Networking, a top-down approach” - Kurose, Ross

Page 4: Multimedia Services: Audio

4 4Multimedia Services:

Audio

Index

● Introduction● Codecs● Speech● Files, Containers and Formats● Audio wires and connectors

Page 5: Multimedia Services: Audio

5 5Multimedia Services:

Audio

Human ear

● Time domain:– Not able to hear short

time signals (< 1 msec)– Loud signals mask

quieter signals near in time

Introduction

● Frequency domain:– Range of audible

frequencies– Loud signals at one

pitch mask quieter signals at a near pitch

Page 6: Multimedia Services: Audio

6 6Multimedia Services:

Audio

Audio Applications

● Speech, VoIP● CD, DVD sound● Digital Audio/Video Broadcasting (DAB, DVB)● Internet streaming● Studio/transmitter link● Theatrical movie presentation● MIDI (similar to vectorial images): Technical standard for musical

instruments that describes pitch, durations, velocity,... of notes. An output hardware device or a software synthesizes real audio

Introduction

Page 7: Multimedia Services: Audio

7 7Multimedia Services:

Audio

Audio analog signals

Waveform:for time-domain

Spectrum:for freq-domain

Audacity screenshots. Dani Gutiérrez

Introduction

Page 8: Multimedia Services: Audio

8 8Multimedia Services:

Audio

Modulation families

Analog baseband signal Digital signal

Analog bandpass channel Analog modulatione.g. AM, FM

Digital modulatione.g. PSK, FSK, ASK, QAM

Analog baseband channel Pulse modulation,analog over analoge.g. PAM, PWM

Digital baseband modulatione.g. Unipolar, NRZ, Manchester

Digital channel Pulse modulation,analog over digitale.g. PCM

Introduction

Page 9: Multimedia Services: Audio

9 9Multimedia Services:

Audio

Analog-over-digital modulations= Digitization

● Pulse-code modulation (PCM)– Differential PCM (DPCM)– Adaptive DPCM (ADPCM)

● Delta modulation (DM or Δ-modulation)● Adaptive-delta modulation (ADM) or Continuously variable

slope delta modulation (CVSDM)● Delta-sigma modulation ( Δ)∑

● Pulse-density modulation (PDM), e.g. used in Super Audio CD (“DSD” trademark from Sony and Philips)

Introduction

Page 10: Multimedia Services: Audio

10 10Multimedia Services:

Audio

Pulse Code Modulation

1.Sampling (>= 2 x bandwidth of the analog signal)Errors depend on Frecuency cut and clock accuracy

2.Quantization: uniform (LPCM=Linear PCM) or non-uniform (PCMA=A-law, PCMU= -law)μErrors: Granularity

3.Coding: number of bits per sample

Mode Bandwidth (Hz) Sampling (kHz)

Narrowband (NB) 300–3400 8

Wideband (WB) 50–7000 16

Super-wideband (SWB) 50–14000 32

Fullband (FB) 20–20000 48

Introduction

Page 11: Multimedia Services: Audio

11 11Multimedia Services:

Audio

Audio Codecs

● Aim of a Codec: to convert and to compress, for storage and transmission over distinct media, e.g.– AMR-NB: lossy, for speech– Dolby Digital: lossy, for cinema and HDTV broadcast– Dolby TrueHD: lossless, for home entertainment

● Conversion types:– Analog to Digital (+ Digital to Analog)– Digital to Digital

● Bitrate (kbits/s) at the codec outputhttp://en.wikipedia.org/wiki/Analog-to-digital_converter

http://en.wikipedia.org/wiki/Audio_coding_format

Codecs

Page 12: Multimedia Services: Audio

12 12Multimedia Services:

Audio

Classifications of Audio CodecsCodecs

● Nature of source: speech, music, cellular (2G GSM, 3G ARM)...● Source signal bandwidth (NB, WB, SWB, FB) and Sampling rate

● Resulting bitrate (Most in 4,8 to 16 kbps)● One or more bitrates (adaptive)● Lossless or lossy● Latency or delay (inherent to each algorithm)● Quality

● Creator (ITU-T, IETF, ETSI, Skype,...)● Licenses● Costs for encoder and player

● Compression techniques and algorithms(depend on nature and bandwidth of source signal):Frame based or sample based, Delay, CBR or VBR, No. of channels,...

● Complexity (computation time)

Source

Processing

Result

Legals & Costs

http://en.wikipedia.org/wiki/Comparison_of_audio_coding_formats

Page 13: Multimedia Services: Audio

13 13Multimedia Services:

Audio

Audio compression

● Based on psychoacoustics:– Threshold of hearing (frequencies)– Simultaneous masking

● Lossy used algorithm families:– Time domain: Linear predictive coding (LPC), mainly for speech: CELP,

ACELP, VSELP, LPC, RPE-LTP,...– Freq domain:

● Modified discrete cosine transform (MDCT), e.g. CELT● Applied to full band or to sub-bands (SBC): break signal into freq bands, and

encode each one independently, e.g. MP3

– Some codecs combine both, e.g. G.718 uses CELP and MDTC

Codecs

Page 14: Multimedia Services: Audio

14 14Multimedia Services:

Audio

Compression ratio

CodecDigital Input

StreamOutputStream

● f=Sampling freq (kHz)● bs=Bits/sample

● b=Bitrate (kbps)

f x bsb

Compression ratio =(related to input)

64b

Compression ratio =(related to 64 kbps)

Codecs

Page 15: Multimedia Services: Audio

15 15Multimedia Services:

Audio

Framed based vs Sample based

● Sample-based: one sample each timee.g. PCM and ADPCM

● Frame-based: more than one sample is taken, to study correlation between near samples. Frame length can be fixed or variablee.g. G.723.1 and G.729

Codecs

Page 16: Multimedia Services: Audio

16 16Multimedia Services:

Audio

Audio Codecs and Delays● Delays more or less appropriate for some types of transmission:

– Low latency: less compression, higher bitratee.g. for real time in VoIP or satellite communications

– High latency: higher compression, lower bitratee.g. for stored media, broadcasting or recording

● Origin of latencies:– Processing, depends on hardware– Inherent to each algorithm or codec (buffering is needed):

● Frame size (msec): related to number of samples inside the frame● Look-ahead time: when needed to study correlation between actual and next frame

● Delay calculations:– In sender: Algorithm delay = Frame length + Look-ahead time– In both: Codec delay = 2 x Frame length + Look-ahead time

Codecs

Page 17: Multimedia Services: Audio

17 17Multimedia Services:

Audio

Audio Codecs: CBR vs VBR

● CBR: Constant bitrate. Older● VBR: Variable bitrate:

– Frames of a file with distinct bitrates depending on variability of information, higher during more complex periods

– Better quality vs size, but more complex to encode– Typical in lossless compression (e.g. FLAC, Apple Lossless) and in some

lossy compressions (e.g. MP3, Opus, Vorbis, AAC)– Encoding in single-pass (“on the fly”) or multipass (not for real time or

live streaming)– Input parameter: fixed quality, max/min bitrate, average bitrate, file

size

Codecs

Page 18: Multimedia Services: Audio

18 18Multimedia Services:

Audio

Audio Codecs and Channels

● Mix two (stereo) or more channels of similar information reducing size but at high quality, instead of store and send independent channels

● Techiques (used in e.g. MP3, AAC, Vorbis) that may be combined for a signal:– Simple Stereo (SS): independent channels. No compression– Mid-side Stereo (MS):

● Middle = (L+R)/2, Side = (L-R)/2.● Can benefit if signal is more “mono-like”, compressing new “Middle channel”

– Intensity Stereo:● Based on phychoacoustics, replaces both channels with a single signal plus directional

information● Better at low bitrates, worse at high bitrates

Codecs

Page 19: Multimedia Services: Audio

19 19Multimedia Services:

Audio

Audio Codecs comparisons

Source: http://www.opus-codec.org/comparison/

Codecs

Page 20: Multimedia Services: Audio

20 20Multimedia Services:

Audio

Example of Audio Codec: MP3

● Versions: MPEG-1, MPEG-2, MPEG-2.5 Audio Layer III● Specification defines decoder better than encoder.

Distinct implementations for encoder, e.g. LAME● Distinct bitrates and sampling rates depending on version● Channels: 2 in MPEG-1 mode and up to 5.1 in MPEG-2● Algorithms: MDCT Hybrid Subband● Supports CBR and VBR● Licensing and patent war

Codecs

Page 21: Multimedia Services: Audio

21 21Multimedia Services:

Audio

Other examplesof typical Audio Codecs

● AAC (Advanced Audio Coding), from ISO and IEC. Part of MPEG-2 and MPEG-4. Designed to replace MP3. Patent for coding, not for streaming or distributing contents

● Vorbis, from Xiph.Org foundation: typically inside Ogg or WebM containers. Based on MDCT. Open, Royalty-free

● Opus, from IETF. Suitable for interactive real-time. Based on CELT and SILK. Open, Royalty-free

http://en.wikipedia.org/wiki/Category:Audio_codecs

Codecs

Page 22: Multimedia Services: Audio

22 22Multimedia Services:

Audio

Speech case

● Distinct to music● Interactive● Voiced speech: harmonics (at freq depending if

male/female)● Unvoiced signal: like white noise

Speech

Page 23: Multimedia Services: Audio

23 23Multimedia Services:

Audio

Speech Codecs

● Aim: intelligibility and speaker identification● Specialized codecs, e.g.:

– Better for music: Vorbis– Better for speech: GSM, Speex,...

● Distinct times:– Speech frame: time to encode a frame of speech– RTP Packet voice duration: time to packetize and send to the network

e.g. for PCM: 20 msec

● Sometimes a VoIP tool provides several codecs to be selected manually or automatically, and can be changed during conversation

Speech

Page 24: Multimedia Services: Audio

24 24Multimedia Services:

Audio

Speech Codecs:Techniques and Codec Comparison

● Compression: remove short-term correlation (~ 1 msec) and long-term correlation (~ 5 to 10 msec).

● Techniques: Waveform, Parametric (Vocoders for speech), Hybrid

Source: http://www-mobile.ecs.soton.ac.uk/speech_codecs/common_classes.html

Speech

Page 25: Multimedia Services: Audio

25 25Multimedia Services:

Audio

G.711 Codec

● Reference codec for comparison● G.711 = “PCM of voice frequencies”

8k samples/sec x 8 bits/sample = 64 kbps● Voice quantisation: non-uniform logarithmic quantisation

because of its nature of voice: lower level speech signal has higher PDF (Probability Density Function) than higher speech

● Variations:– µ-law (North America, Japan): 14 bits to 8 bits– A-law (Europe): 13 bits to 8 bits

Speech

Page 26: Multimedia Services: Audio

26 26Multimedia Services:

Audio

Speech compression:Waveform based technique

● Method: Remove rendundancy in waveform and reconstruct.

● Complexity: low● Results: 16 kpbs to 64 kbps● Examples of Codecs:

– PCM– ADPCM (Adaptative Differential PCM), for NB and WB

Speech

Page 27: Multimedia Services: Audio

27 27Multimedia Services:

Audio

Speech compression:Parametric based technique

● Method:– Take segments of short periods (~20 msec) and classifies them in

voiced or unvoiced– The voice parameters of each segment are obtained via speech

analysis, encoded and sent

● Complexity: high● Results: better compression ratios, bad quality● Examples of Codecs:

– LPC (Linear Prediction Coding); 1,2 to 4,8 kbps. Used for secure wireless communications

Speech

Page 28: Multimedia Services: Audio

28 28Multimedia Services:

Audio

Speech compression:Hybrid based technique

● “Analysis-by-Synthesis coding”● Method: Combines waveform and parametric● Examples of Codecs:

– CELP (Codebook Excitation Linear Prediction): 4,8 to 16 kbps. Mobile/wireless/satellite communications achieving toll quality (MOS over 4.0)

– Other modern codecs: G.729, G.723.1, AMR, iLBC, SILK

Speech

Page 29: Multimedia Services: Audio

29 29Multimedia Services:

Audio

Speech codec examples

Source: Cisco. Voice Over IP - Per Call Bandwidth Consumption

Other important speech codecs:● SILK, from Skype. Based on LPC. Not royalty-free● iSAC (internet Speech Audio Codec): wideband and super wideband, open, royalty-free● ILBC (internet Low Bitrate Codec): narrowband, open, royalty-free for WebRTC● AMR (Adaptive Multi-Rate) or AMR-NB (Narrow Band)● AMR-WB (Wideband)● Speex http://en.wikipedia.org/wiki/Category:Speech_codecs

Speech

Page 30: Multimedia Services: Audio

30 30Multimedia Services:

Audio

Audio files

● No. of Channels: one (“mono”), two (“stereo”) or Multichannel

● Compression and codecs:– No compression: raw PCM,...– Lossless: FLAC, Apple Lossless .m4a, WMA lossless,...– Lossy: MP3, Vorbis, AAC,…

Files, Containers and Formats

http://en.wikipedia.org/wiki/Audio_file_format

Page 31: Multimedia Services: Audio

31 31Multimedia Services:

Audio

Examples of Audio containers

● WAV:– Instance of [the more general] RIFF– Chunks:

● One or more chunks, e.g. 2 channels for stereo● Can contain compressed audio data and non-audio data

– Metadata for each chunk:● Encoding (typically LPCM uncompressed), No. of channels,

bits/channel, sample rate● Labels: artist, comments,...

● From video: Ogg, MPEG-4 Part 14 or MP4

Files, Containers and Formats

Page 32: Multimedia Services: Audio

32 32Multimedia Services:

Audio

Audio physical formats

● CD:– Reference: “Red book”– Digital audio encoding:

● 2-channel● Signed 16-bit● Linear PCM● 44,100 Hz

– Similar but distinct to WAV: no headers, tracks that match the CD's sector sizes

● Other supports and associated modulations and lossless codecs:

Super Audio CD (SACD) Pulse density modulation + Direct Stream Transfer

DVD-Audio, Blu-ray, (HD DVD) Meridian Lossless Packing

Files, Containers and Formats

Page 33: Multimedia Services: Audio

33 33Multimedia Services:

Audio

Audio hardware:wires and connectors

● Only for audio:– Analog: PC (colors scheme)– Digital: S/PDIF. Supports uncompressed PCM audio

and compressed 5.1/7.1 surround sound

● Audio with video:– Analog: SCART– Digital: HDMI

Wires and Connectors