Lecture 8 audio compression

1

Audio Compression Techniques

Prepared byRazia Nisar Noorani

Lecture 8

2

Introduction

Digital Audio Compression Removal of redundant or otherwise irrelevant

information from audio signal Audio compression algorithms are often referred to as

“audio encoders” Applications

Reduces required storage space Reduces required transmission bandwidth

3

Audio Compression

Audio signal – overview Sampling rate (# of samples per second) Bit rate (# of bits per second). Typically,

uncompressed stereo 16-bit 44.1KHz signal has a 1.4MBps bit rate

Number of channels (mono / stereo / multichannel) Reduction by lowering those values or by data

compression / encoding

4

Audio Data Compression

Redundant information Implicit in the remaining information Ex. oversampled audio signal

oversampling is the process of sampling a signal with a sampling frequency significantly higher than twice the bandwidth or highest frequency of the signal being sampled

Irrelevant information Perceptually insignificant Cannot be recovered from remaining information

5


Lossless Audio CompressionRemoves redundant dataResulting signal is same as original – perfect

reconstruction Lossy Audio Encoding

Removes irrelevant dataResulting signal is similar to original

6


Audio vs. Speech Compression TechniquesSpeech Compression uses a human vocal

tract model to compress signalsAudio Compression does not use this

technique due to larger variety of possible signal variations

7

Generic Audio Encoder

Psychoacoustic ModelPsychoacoustics – study of how sounds are

perceived by humansUses perceptual coding

eliminate information from audio signal that is inaudible to the ear

Detects conditions under which different audio signal components mask each other

8

Psychoacoustic Model

Signal MaskingThreshold cut-offSpectral (Frequency / Simultaneous) MaskingTemporal Masking

Threshold cut-off and spectral masking occur in frequency domain, temporal masking occurs in time domain

9

Signal Masking

Threshold cut-off Hearing threshold

level – a function of frequency

Any frequency components below the threshold will not be perceived by human ear

10

Signal Masking

Spectral Masking A frequency

component can be partly or fully masked by another component that is close to it in frequency

This shifts the hearing threshold

11

Signal Masking

Temporal Masking A quieter sound can

be masked by a louder sound if they are temporally close

Sounds that occur both (shortly) before and after volume increase can be masked

12

Spectral Analysis

a device or algorithm that identifies a frequency domain representation of a time domain signal.

Tasks of Spectral Analysis To derive masking thresholds to determine which

signal components can be eliminated To generate a representation of the signal to which

masking thresholds can be applied

Spectral Analysis is done through transforms or filter banks

13

Spectral Analysis

TransformsFast Fourier Transform (FFT)Discrete Cosine Transform (DCT) - similar to

FFT but uses cosine values onlyModified Discrete Cosine Transform (MDCT)

[used by MPEG-1 Layer-III, MPEG-2 AAC, Dolby AC-3] – overlapped and windowed version of DCT

14

Spectral Analysis

Filter Banks a filter bank is an array of band-pass filters that

separates the input signal into multiple components, each one carrying a single frequency subband of the original signal Time sample blocks are passed through a set of bandpass

filters Masking thresholds are applied to resulting frequency subband

signals Poly-phase and wavelet banks are most popular filter structures

15

Filter Bank Structures

Polyphase Filter Bank [used in all of the MPEG-1 encoders]Signal is separated into subbands, the widths

of which are equal over the entire frequency range

The resulting subband signals are downsampled to create shorter signals (which are later reconstructed during decoding process)

16

Filter Bank Structures

Wavelet Filter Bank [used by Enhanced Perceptual Audio Coder (EPAC) by Lucent] Unlike polyphase filter, the widths of the

subbands are not evenly spaced (narrower for higher frequencies)

This allows for better time resolution (ex. short attacks), but at expense of frequency resolution

17

Noise Allocation

System Task: derive and apply shifted hearing threshold to the input signal Anything below the threshold doesn’t need to be

transmitted Any noise below the threshold is irrelevant

Frequency component quantization Tradeoff between space and noise Encoder saves on space by using just enough bits for

each frequency component to keep noise under the threshold - this is known as noise allocation

18

Noise Allocation

Pre-echo In case a single audio block contains silence followed

by a loud attack, pre-echo error occurs - there will be audible noise in the silent part of the block after decoding

This is avoided by pre-monitoring audio data at encoding stage and separating audio into shorter blocks in potential pre-echo case

This does not completely eliminate pre-echo, but can make it short enough to be masked by the attack (temporal masking)

19

Additional Encoding Techniques

Other encoding techniques techniques are available (alternative or in combination)Predictive CodingCoupling / Delta EncodingHuffman Encoding

20


Predictive Coding Often used in speech and image compression Estimates the expected value for each sample based on

previous sample values Transmits/stores the difference between the expected

and received value Generates an estimate for the next sample and then

adjusts it by the difference stored for the current sample Used for additional compression in MPEG2 AAC

(Advance audio Coding)

21


Coupling / Delta encoding Used in cases where audio signal consists of two or

more channels (stereo or surround sound) Similarities between channels are used for

compression A sum and difference between two channels are

derived; difference is usually some value close to zero and therefore requires less space to encode

This is a case of lossless encoding process

22


Huffman Coding Information-theory-based technique An element of a signal that often reoccurs in the signal

is represented by a simpler symbol, and its value is stored in a look-up table

Implemented using a look-up tables in encoder and in decoder

Provides substantial lossless compression, but requires high computational power and therefore is not very popular

Used by MPEG1 and MPEG2 AAC

23

Encoding - Final Stages

Audio data packed into frames Frames stored or transmitted

24

Questions

Lecture 8 audio compression

Education