Top Banner
Digital Audio Coding – Dr. T. Collins Standard MIDI Files Perceptual Audio Coding MPEG-1 layers 1, 2 & 3 MPEG-4

Digital Audio Coding – Dr. T. Collins Standard MIDI Files Perceptual Audio Coding MPEG-1 layers 1, 2 & 3 MPEG-4.

Dec 16, 2015



Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
  • Slide 1
  • Slide 2
  • Digital Audio Coding Dr. T. Collins Standard MIDI Files Perceptual Audio Coding MPEG-1 layers 1, 2 & 3 MPEG-4
  • Slide 3
  • Audio coding has actually been around for hundreds of years Traditionally, composers record their music by writing out the notes in a standard notation A slightly more modern equivalent example is the Victorian piano-rolls Ancient Audio Coding Methods 200 year old example of audio coding
  • Slide 4
  • Standard MIDI Files A piano roll can be efficiently digitally encoded by recording the time when each note begins and ends This is what a standard MIDI file does The MIDI standard (Musical Instrument Digital Interface) is an internationally agreed language Standard MIDI files encode MIDI events/messages e.g. note-on, note-off, etc. The time delay between each event As well as encoding note limits, it also allows: Up to 16 different instruments to be played at once Transmission of parameters containing key velocity, volume, modulation etc.
  • Slide 5
  • Standard MIDI Limitations In a MIDI file, it is the instructions to play the notes that are stored, not the audio itself The quality of the reproduction depends on the synthesiser used for playback Original recording Playback on other synthesisers / sound cards
  • Slide 6
  • MIDI vs. Digital Audio MIDIDigital Audio Stores instructions to turn notes on and off Stores the actual sampled audio Very efficient (typical rate: 1 kbps) Less efficient (typical rate: 100 kbps) Playback quality depends on the MIDI device Playback quality is always the same Only synthesised instruments can be used Any sounds (including speech and singing) can be recorded
  • Slide 7
  • Sampling Digital audio represents the continuous analogue audio waveform by a series of discrete samples The Sample rate must be at least double the bandwidth of the audio signal Typical hi-fi sample rates are 44.1 kHz (CD audio) and 48 kHz (DAT tape and DAB radio) Sound Pressure Level 0 Fs/2Fs Sample rate Frequency
  • Slide 8
  • Quantisation levels Each sample is quantised to be represented by a binary integer The number of bits used to represent each sample sets the number of quantisation levels The error between the quantised signal and the original audio is the quantisation noise Peak signal-to-quantisation noise ratio using n-bits per sample can be estimated as: CD audio uses 16 bit resolution giving a dynamic range of ~96 dB To hear the quantisation noise, the signal level would be close to the threshold of pain!
  • Slide 9
  • Sub-band Coding Like the eye, the ear is more sensitive to some frequencies than others Many audio coding algorithms exploit this using a form of sub- band coding Down- sample Filters Digital audio in Quantise Coded audio out Multiplex Bit rates:16x48000 =768 kbps 16x3x48000 =2304 kbps 16x3x16000 =768 kbps 4x3x16000 =192 kbps
  • Slide 10
  • Perceptual Coding A key question when designing a sub-band coder: What should the quantisation levels of the sub-bands be? Remember that the quantisation process will introduce noise and that we want the noise to be imperceptible We want the noise to be just below the threshold of hearing (also known as the Minimum Audible Field, MAF) So, the question should be: What is the MAF in each sub-band? To estimate this, look at Robinson-Dadson curves
  • Slide 11
  • Equal Loudness Curves
  • Slide 12
  • 16 bits Quantisation noise 12 bits Quantisation Implications Sound Pressure Level [dB-SPL] 80 70 60 50 40 30 20 10 0 -10 -20 -30 50001000015000 Frequency [Hz] Peak Signal Level Threshold of Hearing
  • Slide 13
  • 9 bits 9 bits 10 bits 10 bits 10 bits 9 bits 10 bits 11 bits 12 bits 11 bits 12 bits 12 bits Application to Sub-band Coding Sound Pressure Level [dB-SPL] 80 70 60 50 40 30 20 10 0 -10 -20 -30 50001000015000 Frequency [Hz] Peak Signal Level Threshold of Hearing
  • Slide 14
  • Psychoacoustics Substantial improvements to our sub-band coder are possible using psychoacoustics Psychoacoustics is the study of how sound is perceived by the ear-brain combination Of interest to us: how the threshold of hearing is not constant In fact, the threshold of hearing constantly changes due to masking
  • Slide 15
  • Masking In the presence of the signal, the noise sounds much quieter (almost undetectable) Due to the anatomy of the ear, loud sounds mask quieter sounds at nearby frequencies Effectively, the threshold of hearing is raised to the masking threshold The masking threshold can be estimated using a psychoacoustic model and exploited by the coder SignalSignal + Noise (SNR = 24 dB) Noise
  • Slide 16
  • The Masking Threshold Sound Pressure Level [dB-SPL] 80 70 60 50 40 30 20 10 0 -10 -20 -30 50001000015000 Frequency [Hz] Threshold of Hearing Masking threshold Signal
  • Slide 17
  • 2 bits 4 bits 4 bits 4 bits 3 bits 2 bits 4 bits 4 bits 5 bits 5 bits 5 bits 5 bits Applying Masking Sound Pressure Level [dB-SPL] 80 70 60 50 40 30 20 10 0 -10 -20 -30 50001000015000 Frequency [Hz] Threshold of Hearing Average bits per sample = 3.92 Compression ratio = 16:3.92 = 4.1:1 Masking threshold Space Oddity, Bowie Frame used for example
  • Slide 18
  • Additional Side Information The audio signal is processed in discrete blocks of samples known as frames Each frame of each sub-band is: Scaled to normalise the peak signal level Quantised at a level appropriate for the current signal-to- mask ratio The receiver needs to know the scale factor and quantisation levels used This information must be embedded along with the samples The resulting overhead is very small compared with the compression gains
  • Slide 19
  • Block Diagrams Sub-band filter bank Scale and Quantise Multiplex and Data Format Code Side Info FFT Psycho- acoustic model ENCODER Digital Audio In De- Multiplex Descale & Dequantise Inverse filter bank Decode Side Info DECODER Digital Audio Out Coded Audio In Coded Audio Out Masking thresholds
  • Slide 20
  • MPEG 1: Layers 1, 2 & 3 Three perceptual coders are available in the MPEG 1 specification They are know as layers 1, 2 & 3 Layer 1 (.mp1) Similar to the simple coder just described 32 sub-bands are used Each frame contains 384 samples (32 x 12) A version of layer 1 was used in the Digital Compact Cassette (DCC) Layer 2 (.mp2) Slightly more complex but better quality than layer 1 Frame length increased to 1152 samples (32 x 36)
  • Slide 21
  • MPEG 1: Layers 1, 2 & 3 (cont) Layer 2 (cont) Data formatting of samples and side information is slightly more efficient Used in Digital Audio Broadcasting (DAB) Layer 3 (.mp3) Significantly more complex than layers 1 or 2 Capable of reasonable quality even at very low data rates A combination of sub-band coding and transform coding is used to give up to 576 frequency bands (compared to 32 for layers 1 & 2) Huffman encoding is applied to samples MP3 files now hugely popular for internet and mobile users
  • Slide 22
  • Other Perceptual Coders The same principles are applied in subtly different ways in most general-purpose audio coders E.g. Real Audio Microsofts WMA format MiniDisc (ATRAC)
  • Slide 23
  • MPEG-4 In the latest version of MPEG, MPEG-4, the specification includes: General audio coders: Similar to MPEG 1 but including multi- channel support Parametric coder: HILN (Harmonics, Individual Lines and Noise) for very low bit rates Speech coders: HVXC and CELP speech coders Structured Audio: Similar to MIDI but including instrument models. Used for synthetic audio. Synthesised Speech: Allows speech to be coded as text and resynthesised at the decoder
  • Slide 24
  • Summary Standard MIDI files Work by encoding the structure of the music MPEG-1 Layers 1 & 2 Work by removing the perceptual redundancy from digitised audio MPEG-1 Layer 3 Removes perceptual redundancy and statistical redundancy (by entropy coding) MPEG-4 Coding method can be chosen to suit signal source Perceptual, statistical and structural redundancy can be exploited