Kolesnik Audio Compression

7/27/2019 Kolesnik Audio Compression

1/26

1

Audio Compression

Techniques

MUMT 611, January 2005

Assignment 2

Paul Kolesnik


2/26

2

Introduction

Digital Audio Compression Removal of redundant or otherwise irrelevant

information from audio signal

Audio compression algorithms are often referred to asaudio encoders

Applications Reduces required storage space

Reduces required transmission bandwidth


3/26

3

Audio Compression

Audio signaloverview Sampling rate (# of samples per second)

Bit rate (# of bits per second). Typically,uncompressed stereo 16-bit 44.1KHz signal has a1.4MBps bit rate

Number of channels (mono / stereo / multichannel)

Reduction by lowering those values or by datacompression / encoding


4/26

4

Audio Data Compression

Redundant information

Implicit in the remaining information

Ex. oversampled audio signal

Irrelevant information

Perceptually insignificant

Cannot be recovered from remaininginformation


5/26

5


Lossless Audio Compression

Removes redundant data

Resulting signal is sameas originalperfectreconstruction

Lossy Audio Encoding

Removes irrelevant dataResulting signal is similarto original


6/26

6


Audio vs. Speech Compression

Techniques

Speech Compression uses a human vocaltract model to compress signals

Audio Compression does not use this

technique due to larger variety of possiblesignal variations


7/26

7

Generic Audio Encoder

QuickTime and aTIFF (LZW) decompressor

are needed to see this picture.


8/26

8

Generic Audio Encoder

Psychoacoustic Model

Psychoacousticsstudy of how sounds are

perceived by humansUses perceptual coding

eliminate information from audio signal that is

inaudible to the ear

Detects conditions under which different audio

signal components maskeach other


9/26

9

Psychoacoustic Model

Signal Masking

Threshold cut-off

Spectral (Frequency / Simultaneous) Masking

Temporal Masking

Threshold cut-off and spectral masking

occur in frequency domain, temporalmasking occurs in time domain


10/26

10

Signal Masking

Threshold cut-off

Hearing threshold

levela function offrequency

Any frequency

components below the

threshold will not be

perceived by human

ear

QuickTime an d a

TIFF (LZW) decompressorare needed to see this picture.


11/26

11

Signal Masking

Spectral Masking

A frequency

component can bepartly or fully masked

by another component

that is close to it in

frequency

This shifts the hearing

threshold

QuickTime and a



12/26

12

Signal Masking

Temporal MaskingA quieter sound can

be masked by a louder

sound if they aretemporally close

Sounds that occurboth (shortly) before

and aftervolumeincrease can bemasked

QuickTime an d a



13/26

13

Spectral Analysis

Tasks of Spectral Analysis

To derive masking thresholds to determine

which signal components can be eliminatedTo generate a representation of the signal to

which masking thresholds can be applied

Spectral Analysis is done throughtransforms or filter banks


14/26

14

Spectral Analysis

Transforms

Fast Fourier Transform (FFT)

Discrete Cosine Transform (DCT) - similar toFFT but uses cosine values only

Modified Discrete Cosine Transform (MDCT)[used by MPEG-1 Layer-III, MPEG-2 AAC,

Dolby AC-3]overlapped and windowedversion of DCT


15/26

15

Spectral Analysis

Filter Banks

Time sample blocks are passed through a set

of bandpass filtersMasking thresholds are applied to resulting

frequency subband signals

Poly-phase and wavelet banks are mostpopular filter structures


16/26

16

Filter Bank Structures

Polyphase Filter Bank[used in all of the MPEG-1 encoders]

Signal is separated into subbands, the widthsof which are equal over the entire frequencyrange

The resulting subband signals are

downsampled to create shorter signals (whichare later reconstructed during decodingprocess)


17/26

17

Filter Bank Structures

Wavelet Filter Bank[used by Enhanced Perceptual Audio

Coder (EPAC) by Lucent]Unlike polyphase filter, the widths of the

subbands are not evenly spaced (narrower forhigher frequencies)

This allows for better time resolution (ex. shortattacks), but at expense of frequencyresolution


18/26

18

Noise Allocation

System Task: derive and apply shifted hearingthreshold to the input signalAnything below the threshold doesnt need to be

transmitted

Any noise below the threshold is irrelevant

Frequency component quantization Tradeoff between space and noise

Encoder saves on space by using just enough bits foreach frequency component to keep noise under thethreshold - this is known as noise allocation


19/26

19

Noise Allocation

Pre-echo In case a single audio block contains silence followed

by a loud attack, pre-echo error occurs - there will be

audible noise in the silent part of the block afterdecoding

This is avoided by pre-monitoring audio data atencoding stage and separating audio into shorter

blocks in potential pre-echo case This does not completely eliminate pre-echo, but can

make it short enough to be masked by the attack(temporal masking)


20/26

20

Pre-echo Effect

QuickTime and a



21/26

21

Additional Encoding Techniques

Other encoding techniques techniques are

available (alternative or in combination)

Predictive Coding

Coupling / Delta Encoding

Huffman Encoding


22/26

22


Predictive CodingOften used in speech and image compression

Estimates the expected value for each sample basedon previous sample values

Transmits/stores the difference between the expectedand received value

Generates an estimate for the next sample and then

adjusts it by the difference stored for the currentsample

Used for additional compression in MPEG2 AAC


23/26

23


Coupling / Delta encoding

Used in cases where audio signal consists of two or

more channels (stereo or surround sound) Similarities between channels are used for

compression

A sum and difference between two channels are

derived; difference is usually some value close tozero and therefore requires less space to encode

This is a case of lossless encoding process


24/26

24


Huffman Coding Information-theory-based technique

An element of a signal that often reoccurs in thesignal is represented by a simpler symbol, and itsvalue is stored in a look-up table

Implemented using a look-up tables in encoder and indecoder

Provides substantial lossless compression, butrequires high computational power and therefore isnot very popular

Used by MPEG1 and MPEG2 AAC


25/26

25

Encoding - Final Stages

Audio data packed into frames

Frames stored or transmitted


26/26

26

Conclusion

HTML Bibliography

http://www.music.mcgill.ca/~pkoles

Questions
http://www.music.mcgill.ca/~pkoleshttp://www.music.mcgill.ca/~pkoles

Kolesnik Audio Compression

Documents