1 Audio Compression Techniques Prepared by Razia Nisar Noorani Lecture 8
1
Audio Compression Techniques
Prepared byRazia Nisar Noorani
Lecture 8
2
Introduction
Digital Audio Compression Removal of redundant or otherwise irrelevant
information from audio signal Audio compression algorithms are often referred to as
“audio encoders” Applications
Reduces required storage space Reduces required transmission bandwidth
3
Audio Compression
Audio signal – overview Sampling rate (# of samples per second) Bit rate (# of bits per second). Typically,
uncompressed stereo 16-bit 44.1KHz signal has a 1.4MBps bit rate
Number of channels (mono / stereo / multichannel) Reduction by lowering those values or by data
compression / encoding
4
Audio Data Compression
Redundant information Implicit in the remaining information Ex. oversampled audio signal
oversampling is the process of sampling a signal with a sampling frequency significantly higher than twice the bandwidth or highest frequency of the signal being sampled
Irrelevant information Perceptually insignificant Cannot be recovered from remaining information
5
Audio Data Compression
Lossless Audio CompressionRemoves redundant dataResulting signal is same as original – perfect
reconstruction Lossy Audio Encoding
Removes irrelevant dataResulting signal is similar to original
6
Audio Data Compression
Audio vs. Speech Compression TechniquesSpeech Compression uses a human vocal
tract model to compress signalsAudio Compression does not use this
technique due to larger variety of possible signal variations
7
Generic Audio Encoder
Psychoacoustic ModelPsychoacoustics – study of how sounds are
perceived by humansUses perceptual coding
eliminate information from audio signal that is inaudible to the ear
Detects conditions under which different audio signal components mask each other
8
Psychoacoustic Model
Signal MaskingThreshold cut-offSpectral (Frequency / Simultaneous) MaskingTemporal Masking
Threshold cut-off and spectral masking occur in frequency domain, temporal masking occurs in time domain
9
Signal Masking
Threshold cut-off Hearing threshold
level – a function of frequency
Any frequency components below the threshold will not be perceived by human ear
10
Signal Masking
Spectral Masking A frequency
component can be partly or fully masked by another component that is close to it in frequency
This shifts the hearing threshold
11
Signal Masking
Temporal Masking A quieter sound can
be masked by a louder sound if they are temporally close
Sounds that occur both (shortly) before and after volume increase can be masked
12
Spectral Analysis
a device or algorithm that identifies a frequency domain representation of a time domain signal.
Tasks of Spectral Analysis To derive masking thresholds to determine which
signal components can be eliminated To generate a representation of the signal to which
masking thresholds can be applied
Spectral Analysis is done through transforms or filter banks
13
Spectral Analysis
TransformsFast Fourier Transform (FFT)Discrete Cosine Transform (DCT) - similar to
FFT but uses cosine values onlyModified Discrete Cosine Transform (MDCT)
[used by MPEG-1 Layer-III, MPEG-2 AAC, Dolby AC-3] – overlapped and windowed version of DCT
14
Spectral Analysis
Filter Banks a filter bank is an array of band-pass filters that
separates the input signal into multiple components, each one carrying a single frequency subband of the original signal Time sample blocks are passed through a set of bandpass
filters Masking thresholds are applied to resulting frequency subband
signals Poly-phase and wavelet banks are most popular filter structures
15
Filter Bank Structures
Polyphase Filter Bank [used in all of the MPEG-1 encoders]Signal is separated into subbands, the widths
of which are equal over the entire frequency range
The resulting subband signals are downsampled to create shorter signals (which are later reconstructed during decoding process)
16
Filter Bank Structures
Wavelet Filter Bank [used by Enhanced Perceptual Audio Coder (EPAC) by Lucent] Unlike polyphase filter, the widths of the
subbands are not evenly spaced (narrower for higher frequencies)
This allows for better time resolution (ex. short attacks), but at expense of frequency resolution
17
Noise Allocation
System Task: derive and apply shifted hearing threshold to the input signal Anything below the threshold doesn’t need to be
transmitted Any noise below the threshold is irrelevant
Frequency component quantization Tradeoff between space and noise Encoder saves on space by using just enough bits for
each frequency component to keep noise under the threshold - this is known as noise allocation
18
Noise Allocation
Pre-echo In case a single audio block contains silence followed
by a loud attack, pre-echo error occurs - there will be audible noise in the silent part of the block after decoding
This is avoided by pre-monitoring audio data at encoding stage and separating audio into shorter blocks in potential pre-echo case
This does not completely eliminate pre-echo, but can make it short enough to be masked by the attack (temporal masking)
19
Additional Encoding Techniques
Other encoding techniques techniques are available (alternative or in combination)Predictive CodingCoupling / Delta EncodingHuffman Encoding
20
Additional Encoding Techniques
Predictive Coding Often used in speech and image compression Estimates the expected value for each sample based on
previous sample values Transmits/stores the difference between the expected
and received value Generates an estimate for the next sample and then
adjusts it by the difference stored for the current sample Used for additional compression in MPEG2 AAC
(Advance audio Coding)
21
Additional Encoding Techniques
Coupling / Delta encoding Used in cases where audio signal consists of two or
more channels (stereo or surround sound) Similarities between channels are used for
compression A sum and difference between two channels are
derived; difference is usually some value close to zero and therefore requires less space to encode
This is a case of lossless encoding process
22
Additional Encoding Techniques
Huffman Coding Information-theory-based technique An element of a signal that often reoccurs in the signal
is represented by a simpler symbol, and its value is stored in a look-up table
Implemented using a look-up tables in encoder and in decoder
Provides substantial lossless compression, but requires high computational power and therefore is not very popular
Used by MPEG1 and MPEG2 AAC
23
Encoding - Final Stages
Audio data packed into frames Frames stored or transmitted
24
Questions