This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Slide 1
Slide 2
Digital Audio Coding Dr. T. Collins Standard MIDI Files
Perceptual Audio Coding MPEG-1 layers 1, 2 & 3 MPEG-4
Slide 3
Audio coding has actually been around for hundreds of years
Traditionally, composers record their music by writing out the
notes in a standard notation A slightly more modern equivalent
example is the Victorian piano-rolls Ancient Audio Coding Methods
200 year old example of audio coding
Slide 4
Standard MIDI Files A piano roll can be efficiently digitally
encoded by recording the time when each note begins and ends This
is what a standard MIDI file does The MIDI standard (Musical
Instrument Digital Interface) is an internationally agreed language
Standard MIDI files encode MIDI events/messages e.g. note-on,
note-off, etc. The time delay between each event As well as
encoding note limits, it also allows: Up to 16 different
instruments to be played at once Transmission of parameters
containing key velocity, volume, modulation etc.
Slide 5
Standard MIDI Limitations In a MIDI file, it is the
instructions to play the notes that are stored, not the audio
itself The quality of the reproduction depends on the synthesiser
used for playback Original recording Playback on other synthesisers
/ sound cards
Slide 6
MIDI vs. Digital Audio MIDIDigital Audio Stores instructions to
turn notes on and off Stores the actual sampled audio Very
efficient (typical rate: 1 kbps) Less efficient (typical rate: 100
kbps) Playback quality depends on the MIDI device Playback quality
is always the same Only synthesised instruments can be used Any
sounds (including speech and singing) can be recorded
Slide 7
Sampling Digital audio represents the continuous analogue audio
waveform by a series of discrete samples The Sample rate must be at
least double the bandwidth of the audio signal Typical hi-fi sample
rates are 44.1 kHz (CD audio) and 48 kHz (DAT tape and DAB radio)
Sound Pressure Level 0 Fs/2Fs Sample rate Frequency
Slide 8
Quantisation levels Each sample is quantised to be represented
by a binary integer The number of bits used to represent each
sample sets the number of quantisation levels The error between the
quantised signal and the original audio is the quantisation noise
Peak signal-to-quantisation noise ratio using n-bits per sample can
be estimated as: CD audio uses 16 bit resolution giving a dynamic
range of ~96 dB To hear the quantisation noise, the signal level
would be close to the threshold of pain!
Slide 9
Sub-band Coding Like the eye, the ear is more sensitive to some
frequencies than others Many audio coding algorithms exploit this
using a form of sub- band coding Down- sample Filters Digital audio
in Quantise Coded audio out Multiplex Bit rates:16x48000 =768 kbps
16x3x48000 =2304 kbps 16x3x16000 =768 kbps 4x3x16000 =192 kbps
Slide 10
Perceptual Coding A key question when designing a sub-band
coder: What should the quantisation levels of the sub-bands be?
Remember that the quantisation process will introduce noise and
that we want the noise to be imperceptible We want the noise to be
just below the threshold of hearing (also known as the Minimum
Audible Field, MAF) So, the question should be: What is the MAF in
each sub-band? To estimate this, look at Robinson-Dadson
curves
Psychoacoustics Substantial improvements to our sub-band coder
are possible using psychoacoustics Psychoacoustics is the study of
how sound is perceived by the ear-brain combination Of interest to
us: how the threshold of hearing is not constant In fact, the
threshold of hearing constantly changes due to masking
Slide 15
Masking In the presence of the signal, the noise sounds much
quieter (almost undetectable) Due to the anatomy of the ear, loud
sounds mask quieter sounds at nearby frequencies Effectively, the
threshold of hearing is raised to the masking threshold The masking
threshold can be estimated using a psychoacoustic model and
exploited by the coder SignalSignal + Noise (SNR = 24 dB)
Noise
Slide 16
The Masking Threshold Sound Pressure Level [dB-SPL] 80 70 60 50
40 30 20 10 0 -10 -20 -30 50001000015000 Frequency [Hz] Threshold
of Hearing Masking threshold Signal
Slide 17
2 bits 4 bits 4 bits 4 bits 3 bits 2 bits 4 bits 4 bits 5 bits
5 bits 5 bits 5 bits Applying Masking Sound Pressure Level [dB-SPL]
80 70 60 50 40 30 20 10 0 -10 -20 -30 50001000015000 Frequency [Hz]
Threshold of Hearing Average bits per sample = 3.92 Compression
ratio = 16:3.92 = 4.1:1 Masking threshold Space Oddity, Bowie Frame
used for example
Slide 18
Additional Side Information The audio signal is processed in
discrete blocks of samples known as frames Each frame of each
sub-band is: Scaled to normalise the peak signal level Quantised at
a level appropriate for the current signal-to- mask ratio The
receiver needs to know the scale factor and quantisation levels
used This information must be embedded along with the samples The
resulting overhead is very small compared with the compression
gains
Slide 19
Block Diagrams Sub-band filter bank Scale and Quantise
Multiplex and Data Format Code Side Info FFT Psycho- acoustic model
ENCODER Digital Audio In De- Multiplex Descale & Dequantise
Inverse filter bank Decode Side Info DECODER Digital Audio Out
Coded Audio In Coded Audio Out Masking thresholds
Slide 20
MPEG 1: Layers 1, 2 & 3 Three perceptual coders are
available in the MPEG 1 specification They are know as layers 1, 2
& 3 Layer 1 (.mp1) Similar to the simple coder just described
32 sub-bands are used Each frame contains 384 samples (32 x 12) A
version of layer 1 was used in the Digital Compact Cassette (DCC)
Layer 2 (.mp2) Slightly more complex but better quality than layer
1 Frame length increased to 1152 samples (32 x 36)
Slide 21
MPEG 1: Layers 1, 2 & 3 (cont) Layer 2 (cont) Data
formatting of samples and side information is slightly more
efficient Used in Digital Audio Broadcasting (DAB) Layer 3 (.mp3)
Significantly more complex than layers 1 or 2 Capable of reasonable
quality even at very low data rates A combination of sub-band
coding and transform coding is used to give up to 576 frequency
bands (compared to 32 for layers 1 & 2) Huffman encoding is
applied to samples MP3 files now hugely popular for internet and
mobile users
Slide 22
Other Perceptual Coders The same principles are applied in
subtly different ways in most general-purpose audio coders E.g.
Real Audio Microsofts WMA format MiniDisc (ATRAC)
Slide 23
MPEG-4 In the latest version of MPEG, MPEG-4, the specification
includes: General audio coders: Similar to MPEG 1 but including
multi- channel support Parametric coder: HILN (Harmonics,
Individual Lines and Noise) for very low bit rates Speech coders:
HVXC and CELP speech coders Structured Audio: Similar to MIDI but
including instrument models. Used for synthetic audio. Synthesised
Speech: Allows speech to be coded as text and resynthesised at the
decoder
Slide 24
Summary Standard MIDI files Work by encoding the structure of
the music MPEG-1 Layers 1 & 2 Work by removing the perceptual
redundancy from digitised audio MPEG-1 Layer 3 Removes perceptual
redundancy and statistical redundancy (by entropy coding) MPEG-4
Coding method can be chosen to suit signal source Perceptual,
statistical and structural redundancy can be exploited