Top Banner
1 Audio Compression Techniques Prepared by Razia Nisar Noorani Lecture 8
24

Lecture 8 audio compression

Apr 22, 2015

Download

Education

Mr SMAK

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture 8 audio compression

1

Audio Compression Techniques

Prepared byRazia Nisar Noorani

Lecture 8

Page 2: Lecture 8 audio compression

2

Introduction

Digital Audio Compression Removal of redundant or otherwise irrelevant

information from audio signal Audio compression algorithms are often referred to as

“audio encoders” Applications

Reduces required storage space Reduces required transmission bandwidth

Page 3: Lecture 8 audio compression

3

Audio Compression

Audio signal – overview Sampling rate (# of samples per second) Bit rate (# of bits per second). Typically,

uncompressed stereo 16-bit 44.1KHz signal has a 1.4MBps bit rate

Number of channels (mono / stereo / multichannel) Reduction by lowering those values or by data

compression / encoding

Page 4: Lecture 8 audio compression

4

Audio Data Compression

Redundant information Implicit in the remaining information Ex. oversampled audio signal

oversampling is the process of sampling a signal with a sampling frequency significantly higher than twice the bandwidth or highest frequency of the signal being sampled

Irrelevant information Perceptually insignificant Cannot be recovered from remaining information

Page 5: Lecture 8 audio compression

5

Audio Data Compression

Lossless Audio CompressionRemoves redundant dataResulting signal is same as original – perfect

reconstruction Lossy Audio Encoding

Removes irrelevant dataResulting signal is similar to original

Page 6: Lecture 8 audio compression

6

Audio Data Compression

Audio vs. Speech Compression TechniquesSpeech Compression uses a human vocal

tract model to compress signalsAudio Compression does not use this

technique due to larger variety of possible signal variations

Page 7: Lecture 8 audio compression

7

Generic Audio Encoder

Psychoacoustic ModelPsychoacoustics – study of how sounds are

perceived by humansUses perceptual coding

eliminate information from audio signal that is inaudible to the ear

Detects conditions under which different audio signal components mask each other

Page 8: Lecture 8 audio compression

8

Psychoacoustic Model

Signal MaskingThreshold cut-offSpectral (Frequency / Simultaneous) MaskingTemporal Masking

Threshold cut-off and spectral masking occur in frequency domain, temporal masking occurs in time domain

Page 9: Lecture 8 audio compression

9

Signal Masking

Threshold cut-off Hearing threshold

level – a function of frequency

Any frequency components below the threshold will not be perceived by human ear

Page 10: Lecture 8 audio compression

10

Signal Masking

Spectral Masking A frequency

component can be partly or fully masked by another component that is close to it in frequency

This shifts the hearing threshold

Page 11: Lecture 8 audio compression

11

Signal Masking

Temporal Masking A quieter sound can

be masked by a louder sound if they are temporally close

Sounds that occur both (shortly) before and after volume increase can be masked

Page 12: Lecture 8 audio compression

12

Spectral Analysis

a device or algorithm that identifies a frequency domain representation of a time domain signal.

Tasks of Spectral Analysis To derive masking thresholds to determine which

signal components can be eliminated To generate a representation of the signal to which

masking thresholds can be applied

Spectral Analysis is done through transforms or filter banks

Page 13: Lecture 8 audio compression

13

Spectral Analysis

TransformsFast Fourier Transform (FFT)Discrete Cosine Transform (DCT) - similar to

FFT but uses cosine values onlyModified Discrete Cosine Transform (MDCT)

[used by MPEG-1 Layer-III, MPEG-2 AAC, Dolby AC-3] – overlapped and windowed version of DCT

Page 14: Lecture 8 audio compression

14

Spectral Analysis

Filter Banks a filter bank is an array of band-pass filters that

separates the input signal into multiple components, each one carrying a single frequency subband of the original signal Time sample blocks are passed through a set of bandpass

filters Masking thresholds are applied to resulting frequency subband

signals Poly-phase and wavelet banks are most popular filter structures

Page 15: Lecture 8 audio compression

15

Filter Bank Structures

Polyphase Filter Bank [used in all of the MPEG-1 encoders]Signal is separated into subbands, the widths

of which are equal over the entire frequency range

The resulting subband signals are downsampled to create shorter signals (which are later reconstructed during decoding process)

Page 16: Lecture 8 audio compression

16

Filter Bank Structures

Wavelet Filter Bank [used by Enhanced Perceptual Audio Coder (EPAC) by Lucent] Unlike polyphase filter, the widths of the

subbands are not evenly spaced (narrower for higher frequencies)

This allows for better time resolution (ex. short attacks), but at expense of frequency resolution

Page 17: Lecture 8 audio compression

17

Noise Allocation

System Task: derive and apply shifted hearing threshold to the input signal Anything below the threshold doesn’t need to be

transmitted Any noise below the threshold is irrelevant

Frequency component quantization Tradeoff between space and noise Encoder saves on space by using just enough bits for

each frequency component to keep noise under the threshold - this is known as noise allocation

Page 18: Lecture 8 audio compression

18

Noise Allocation

Pre-echo In case a single audio block contains silence followed

by a loud attack, pre-echo error occurs - there will be audible noise in the silent part of the block after decoding

This is avoided by pre-monitoring audio data at encoding stage and separating audio into shorter blocks in potential pre-echo case

This does not completely eliminate pre-echo, but can make it short enough to be masked by the attack (temporal masking)

Page 19: Lecture 8 audio compression

19

Additional Encoding Techniques

Other encoding techniques techniques are available (alternative or in combination)Predictive CodingCoupling / Delta EncodingHuffman Encoding

Page 20: Lecture 8 audio compression

20

Additional Encoding Techniques

Predictive Coding Often used in speech and image compression Estimates the expected value for each sample based on

previous sample values Transmits/stores the difference between the expected

and received value Generates an estimate for the next sample and then

adjusts it by the difference stored for the current sample Used for additional compression in MPEG2 AAC

(Advance audio Coding)

Page 21: Lecture 8 audio compression

21

Additional Encoding Techniques

Coupling / Delta encoding Used in cases where audio signal consists of two or

more channels (stereo or surround sound) Similarities between channels are used for

compression A sum and difference between two channels are

derived; difference is usually some value close to zero and therefore requires less space to encode

This is a case of lossless encoding process

Page 22: Lecture 8 audio compression

22

Additional Encoding Techniques

Huffman Coding Information-theory-based technique An element of a signal that often reoccurs in the signal

is represented by a simpler symbol, and its value is stored in a look-up table

Implemented using a look-up tables in encoder and in decoder

Provides substantial lossless compression, but requires high computational power and therefore is not very popular

Used by MPEG1 and MPEG2 AAC

Page 23: Lecture 8 audio compression

23

Encoding - Final Stages

Audio data packed into frames Frames stored or transmitted

Page 24: Lecture 8 audio compression

24

Questions