Top Banner

of 26

Kolesnik Audio Compression

Apr 13, 2018

Download

Documents

dummihai2
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 7/27/2019 Kolesnik Audio Compression

    1/26

    1

    Audio Compression

    Techniques

    MUMT 611, January 2005

    Assignment 2

    Paul Kolesnik

  • 7/27/2019 Kolesnik Audio Compression

    2/26

    2

    Introduction

    Digital Audio Compression Removal of redundant or otherwise irrelevant

    information from audio signal

    Audio compression algorithms are often referred to asaudio encoders

    Applications Reduces required storage space

    Reduces required transmission bandwidth

  • 7/27/2019 Kolesnik Audio Compression

    3/26

    3

    Audio Compression

    Audio signaloverview Sampling rate (# of samples per second)

    Bit rate (# of bits per second). Typically,uncompressed stereo 16-bit 44.1KHz signal has a1.4MBps bit rate

    Number of channels (mono / stereo / multichannel)

    Reduction by lowering those values or by datacompression / encoding

  • 7/27/2019 Kolesnik Audio Compression

    4/26

    4

    Audio Data Compression

    Redundant information

    Implicit in the remaining information

    Ex. oversampled audio signal

    Irrelevant information

    Perceptually insignificant

    Cannot be recovered from remaininginformation

  • 7/27/2019 Kolesnik Audio Compression

    5/26

    5

    Audio Data Compression

    Lossless Audio Compression

    Removes redundant data

    Resulting signal is sameas originalperfectreconstruction

    Lossy Audio Encoding

    Removes irrelevant dataResulting signal is similarto original

  • 7/27/2019 Kolesnik Audio Compression

    6/26

    6

    Audio Data Compression

    Audio vs. Speech Compression

    Techniques

    Speech Compression uses a human vocaltract model to compress signals

    Audio Compression does not use this

    technique due to larger variety of possiblesignal variations

  • 7/27/2019 Kolesnik Audio Compression

    7/26

    7

    Generic Audio Encoder

    QuickTime and aTIFF (LZW) decompressor

    are needed to see this picture.

  • 7/27/2019 Kolesnik Audio Compression

    8/26

    8

    Generic Audio Encoder

    Psychoacoustic Model

    Psychoacousticsstudy of how sounds are

    perceived by humansUses perceptual coding

    eliminate information from audio signal that is

    inaudible to the ear

    Detects conditions under which different audio

    signal components maskeach other

  • 7/27/2019 Kolesnik Audio Compression

    9/26

    9

    Psychoacoustic Model

    Signal Masking

    Threshold cut-off

    Spectral (Frequency / Simultaneous) Masking

    Temporal Masking

    Threshold cut-off and spectral masking

    occur in frequency domain, temporalmasking occurs in time domain

  • 7/27/2019 Kolesnik Audio Compression

    10/26

    10

    Signal Masking

    Threshold cut-off

    Hearing threshold

    levela function offrequency

    Any frequency

    components below the

    threshold will not be

    perceived by human

    ear

    QuickTime an d a

    TIFF (LZW) decompressorare needed to see this picture.

  • 7/27/2019 Kolesnik Audio Compression

    11/26

    11

    Signal Masking

    Spectral Masking

    A frequency

    component can bepartly or fully masked

    by another component

    that is close to it in

    frequency

    This shifts the hearing

    threshold

    QuickTime and a

    TIFF (LZW) decompressorare needed to see this picture.

  • 7/27/2019 Kolesnik Audio Compression

    12/26

    12

    Signal Masking

    Temporal MaskingA quieter sound can

    be masked by a louder

    sound if they aretemporally close

    Sounds that occurboth (shortly) before

    and aftervolumeincrease can bemasked

    QuickTime an d a

    TIFF (LZW) decompressorare needed to see this picture.

  • 7/27/2019 Kolesnik Audio Compression

    13/26

    13

    Spectral Analysis

    Tasks of Spectral Analysis

    To derive masking thresholds to determine

    which signal components can be eliminatedTo generate a representation of the signal to

    which masking thresholds can be applied

    Spectral Analysis is done throughtransforms or filter banks

  • 7/27/2019 Kolesnik Audio Compression

    14/26

    14

    Spectral Analysis

    Transforms

    Fast Fourier Transform (FFT)

    Discrete Cosine Transform (DCT) - similar toFFT but uses cosine values only

    Modified Discrete Cosine Transform (MDCT)[used by MPEG-1 Layer-III, MPEG-2 AAC,

    Dolby AC-3]overlapped and windowedversion of DCT

  • 7/27/2019 Kolesnik Audio Compression

    15/26

    15

    Spectral Analysis

    Filter Banks

    Time sample blocks are passed through a set

    of bandpass filtersMasking thresholds are applied to resulting

    frequency subband signals

    Poly-phase and wavelet banks are mostpopular filter structures

  • 7/27/2019 Kolesnik Audio Compression

    16/26

    16

    Filter Bank Structures

    Polyphase Filter Bank[used in all of the MPEG-1 encoders]

    Signal is separated into subbands, the widthsof which are equal over the entire frequencyrange

    The resulting subband signals are

    downsampled to create shorter signals (whichare later reconstructed during decodingprocess)

  • 7/27/2019 Kolesnik Audio Compression

    17/26

    17

    Filter Bank Structures

    Wavelet Filter Bank[used by Enhanced Perceptual Audio

    Coder (EPAC) by Lucent]Unlike polyphase filter, the widths of the

    subbands are not evenly spaced (narrower forhigher frequencies)

    This allows for better time resolution (ex. shortattacks), but at expense of frequencyresolution

  • 7/27/2019 Kolesnik Audio Compression

    18/26

    18

    Noise Allocation

    System Task: derive and apply shifted hearingthreshold to the input signalAnything below the threshold doesnt need to be

    transmitted

    Any noise below the threshold is irrelevant

    Frequency component quantization Tradeoff between space and noise

    Encoder saves on space by using just enough bits foreach frequency component to keep noise under thethreshold - this is known as noise allocation

  • 7/27/2019 Kolesnik Audio Compression

    19/26

    19

    Noise Allocation

    Pre-echo In case a single audio block contains silence followed

    by a loud attack, pre-echo error occurs - there will be

    audible noise in the silent part of the block afterdecoding

    This is avoided by pre-monitoring audio data atencoding stage and separating audio into shorter

    blocks in potential pre-echo case This does not completely eliminate pre-echo, but can

    make it short enough to be masked by the attack(temporal masking)

  • 7/27/2019 Kolesnik Audio Compression

    20/26

    20

    Pre-echo Effect

    QuickTime and a

    TIFF (LZW) decompressorare needed to see this picture.

  • 7/27/2019 Kolesnik Audio Compression

    21/26

    21

    Additional Encoding Techniques

    Other encoding techniques techniques are

    available (alternative or in combination)

    Predictive Coding

    Coupling / Delta Encoding

    Huffman Encoding

  • 7/27/2019 Kolesnik Audio Compression

    22/26

    22

    Additional Encoding Techniques

    Predictive CodingOften used in speech and image compression

    Estimates the expected value for each sample basedon previous sample values

    Transmits/stores the difference between the expectedand received value

    Generates an estimate for the next sample and then

    adjusts it by the difference stored for the currentsample

    Used for additional compression in MPEG2 AAC

  • 7/27/2019 Kolesnik Audio Compression

    23/26

    23

    Additional Encoding Techniques

    Coupling / Delta encoding

    Used in cases where audio signal consists of two or

    more channels (stereo or surround sound) Similarities between channels are used for

    compression

    A sum and difference between two channels are

    derived; difference is usually some value close tozero and therefore requires less space to encode

    This is a case of lossless encoding process

  • 7/27/2019 Kolesnik Audio Compression

    24/26

    24

    Additional Encoding Techniques

    Huffman Coding Information-theory-based technique

    An element of a signal that often reoccurs in thesignal is represented by a simpler symbol, and itsvalue is stored in a look-up table

    Implemented using a look-up tables in encoder and indecoder

    Provides substantial lossless compression, butrequires high computational power and therefore isnot very popular

    Used by MPEG1 and MPEG2 AAC

  • 7/27/2019 Kolesnik Audio Compression

    25/26

    25

    Encoding - Final Stages

    Audio data packed into frames

    Frames stored or transmitted

  • 7/27/2019 Kolesnik Audio Compression

    26/26

    26

    Conclusion

    HTML Bibliography

    http://www.music.mcgill.ca/~pkoles

    Questions

    http://www.music.mcgill.ca/~pkoleshttp://www.music.mcgill.ca/~pkoles