1 CSCD 443/533 Advanced Networks Fall 2017 Lecture 18 Compression of Video and Audio
1
CSCD 443/533Advanced NetworksFall 2017
Lecture 18
Compression of Video and Audio
2
Topics
• Compression technology• Motivation• Human attributes make it possible• Audio Compression• Video Compression• Performance
Motivation, Why Compress?
Why do we need to compress streaming media?• Look at one instance
– 640 X 480 pixel frames– 24 bits color/pixel – 30 frames / sec
– No compression, takes over 200 Mbps to transmit just video
– Do you have a 200 Mbps link?– We need massive compression to be able to view
streaming video and audio with our current network
Motivation, Why Compress?
What does compression buy us?
Lossless DVD video - – 221 Mbps
Compressed DVD video - – 4 Mbps
50:1 compression ratio!
Why Compress?
In a Nutshell– To reduce the file size– To deliver stream to the user– To conserve storage space
Choosing a compression rate is a balance: Quality of the Media
Availablebandwidth
So, Why Compress?
Delivering video over Web means compromises Mostly trading image quality for lower bit rates In general,
Video and audio are compressed Stuffed into a container and Delivered to you via web
If done well, you won't notice The missing bits and The delivery of media
Discuss individual format, codecs and tradeoffs
Definitions
• File Format– Particular way information is stored in a file– Known as “containers” for streaming media
• Codec– Codec is an acronym for Compression/Decompression– Codec is any technology for compressing and
decompressing data.
• Compression– Reduces file size by removing audio or video information– Takes advantage of human perception
Format vs. Codec
•EExample
•FFlash Video (FLV) is a file format
•HH.264, On2, VP6, Sorenson Spark
• are codecs for the flash video file
Container File Formats Purpose of container formats
Function as "black boxes" for holding a variety of media formats
Good container formats can handle files compressed with a variety of different codecs
In a perfect world, you could put any codec in any container format … Unfortunately are some incompatibilities
Examples MPEG-2, Advanced Systems Format (ASF)
from Microsoft, AVI, Quicktime (MOV), MP4, Flash (FLV) RealMedia
Multimedia Container Files
Multimedia file extensions .mov, .ogg, .wmv, .flv,.mp4, .mpeg
Essentially, videos packaged Into encapsulation containers, or wrapper formats, that contain all information needed to present video
You can think of file formats as being containers that hold all this information
Very similar to a .zip, .sit or .rar file
Differences in Containers
Why are certain formats are popular? Popular Support
How widely supported is the format? File Size
Larger is not better for streaming files Support for advanced codec functionality
Older formats such as AVI do not support new codec features … like B-frames or VBR audio
Support for advanced content Such as chapters, subtitles, meta-tags, user-
data.
Compression
Lossless Lossy
Keeps All Bits Removes Bits
Two Types:
Compression
Lossy Compression
• Lossy compression schemes reduce file size by discarding some amount of data during encoding before sent over Internet
• Once received by client, codec attempts to reconstruct information that was lost or discarded
Video Lossy Compression• Image Compression
– Image format uses lossy compression to sample an image and discard unnecessary color/contrast information
Can you really see difference?
Video Lossy Compression
• Why can you do lossy compression?• Spatial and temporal redundancy
– Pixel values are not independent, correlated with their neighbors both within same frame and across frame
• Value of pixel is predictable given values of neighboring pixels
• Psychovisual redundancy– Human eye has limited response to fine spatial detail,
• Less sensitive to detail near object edges or around shot-changes
• Impairments introduced by bit rate reduction should not be visible to human viewer
Audio Lossy Compression
• Audio compression– Lossy discards frequencies on high and low end of
spectrum and attempts to locate and remove unnecessary audio data
Nice description and example programs http://www.videograbber.net/compress-audio-file.html
Audio Streaming Formats
• Many formats and standards for streaming audio
– RealNetworks' RealAudio, streaming MP3, Macromedia's Flash and Director Shockwave, Microsoft's Windows Media, and Apple's QuickTime
– Also recognized standard formats, including
– Liquid Audio, MP3, MIDI, WAV, and AU
Audio Lossy Compression
• First, player decompresses audio file as it downloads to your computer
• Then fills in missing information according to the instructions set by codec
– Compressed file is unintelligible to listener– Decompressed file is intelligible but of a lower
quality than original
MP3 Audio Lossy Compression
• Example - MP3• MP3 lossy audio data compression algorithm
takes advantage of perceptual limitation of human hearing– Auditory Masking
• Discovered (in late 1800's) that tone could be rendered inaudible by another tone of lower frequency
• How your brain perceives similar sounds
MP3 Audio Lossy Compression
• Uncompressed audio,– Like CDs, stores more data than your brain
can actually process–For example,
• Two notes are very similar and very close together, your brain may perceive only one of them
• Two sounds are different, one is much louder than the other, your brain may never perceive the quieter signal
MP3 Audio Lossy Compression• Study these auditory phenomena Psychoacoustics,
– Can be accurately described in tables and charts,– Mathematical models representing human hearing
patterns– These can be stored in the codec as reference tables
Article on psychoacoustics http://www.uaudio.com/blog/how-the-ear-works/
MP3 Audio Lossy Compression• MP3 Encoding Tools
– Analyze incoming source signal,– Break it down into mathematical patterns, and– Compare these patterns to psychoacoustic models
stored in encoder itself• Encoder can then discard most of data that
doesn't match stored models, keeping that which does
• Shrinks file by discarding great deal of extra data
MP3 Audio Lossy Compression• MP3 encoding process … two-pass system• Step 1
– Run all psychoacoustic models, discarding data– Then compress what's left to shrink storage space
• Step 2 – Huffman coding, does not discard any data– Lets you store what's left in a smaller amount of space– Uses fewer bits to store most common symbols
• Steps 2a - Break resulting audio stream into frames assembled into a bitstream, with header information preceding each data frame
– Headers contain "meta-data" specific to that frame– Such as an ID, bitrate, audio frequency, padding, type of
frame, MPEG1 or 2
Basic Structure of Audio Encoder
Note: A decoder works in just the opposite manner
Limit values to audible tones
Processes of and Audio Encoder
Mapping Block – divides audio inputs into 32 equal-width frequency subbands (samples)
Psychoacoustic Block – calculates masking threshold for each subband
Bit-Allocation Block (in Quantizer block) – allocates bits using outputs of the Mapping and Psychoacoustic blocks
Quantizer & Coding Block – scales and quantize (reduce) the samples
Frame Packing Block – formats the samples with headers into an encoded stream
Processes of and Audio Encoder
Video Encoding, Standards
MPEG Organization• Moving Picture Experts Group• Established in 1988• Standards under International Organization for
standardization (ISO) and International Electro technical Commission (IEC)
• Official name: ISO/IEC JTC1 SC29 WG11• Responsible for MPEG standards
Evolution of MPEG
MPEG-1– Initial audio/video compression standard– Used by VCD’s – 1990's– MP3 = MPEG-1 audio layer 3– Target of 1.5 Mb/s bitrate at 352x240
resolution– Only supports progressive pictures, no
interlaced pictures
Evolution of MPEG
MPEG-2– Standard, still widely used in DVD and Digital
TV– Support in current hardware implies that it will
be here for a long time• Transition to HDTV has taken over 10 years
and is not finished yet– Different profiles and levels allow for quality
control
Evolution of MPEG
MPEG-3– Originally developed for HDTV, but abandoned
when MPEG-2 was determined to be sufficient MPEG-4
– Includes support for AV “objects”, 3D content, low bitrate encoding, and DRM
– In practice, provides equal quality to MPEG-2 at a lower bitrate
– MPEG-4 Part 10 is H.264, which is used in HD-DVD and Blu-Ray
– H.264 is the encoding used in video
MPEG technical specification
Part 1 - Systems - describes synchronization and multiplexing of video and audio.
Part 2 - Video - compression codec for interlaced and non-interlaced video signals.
Part 3 - Audio - compression codec for perceptual coding of audio signals. A multichannel-enabled extension of MPEG-1 audio.
Part 4 - Describes procedures for testing compliance. Part 5 - Describes systems for Software simulation. Part 6 - Describes extensions for DSM-CC (Digital Storage Media
Command and Control.) Part 7 - Advanced Audio Coding (AAC) Part 8 - Deleted Part 9 - Extension for real time interfaces. Part 10 - Conformance extensions for DSM-CC.
MPEG Video spatial domain processing Spatial Domain Handled Similarly to JPEG
– Convert RGB values to YUV colorspace• One Brightness and two other color
representations• RGB from Television, YUV graphics processing• Y represents luminosity, U,V color• Can represent YUV with fewer bitssince human eye can't tell if color is missing• We care more about brightness
– Split frame into 8x8 blocks
8 x 8 Blocks
MPEG Video spatial domain processing 2-D Discrete Cosine Transform (DCT) on each block
• Similar to a Fourier Transform for Signal Processing• Transforms blocks into higher frequency and lower
frequency values• Pushes more important – least frequent values to
upper quadrant of the 8 X 8 block• For typical image, most of visually significant
information about image is concentrated in just a few coefficients of DCT
– Quantization of DCT coefficients• Values that are near zero, converted to zero• Values that are smaller, shrunk • All are represented by integers
Quantization matrix matrix divides each coefficient by a number. The quantization matrix is pre-calculated and defined by the JPEG standard and favors the items in the top left corner of the matrix , the more frequency significant terms.Each coefficient has a different weighting
Run-length Encoding
The regular JPEG standard uses an advanced version of Huffman coding
DCT Transform on Blocks• Final Result … Reduction in Number of Bits
– De-compression is the reverse process– However, the lossy part of this, can't quite get back
to the original image – there is a loss of information
Nice Examples using Discrete Cosine Transform http://www.dspguide.com/ch27/6.htm
http://datagenetics.com/blog/november32012/index.html
MPEG video time domain processing
Totally new ballgame (this concept doesn’t exist in JPEG)
General idea – Use motion vectors to specify how a 16x16 macroblock translates between reference frames and current frame, then code difference between reference and actual block
MPEG video time domain processing GOP (Group of Pictures)
• GOP is a set of consecutive frames that can be decoded without any other reference frames
• Usually 12 or 15 frames • Starts with I frame
MPEG video time domain processing Group of Pictures (GOP)
• I-frames– Can be reconstructed without any reference to other
frames, like still pictures• P-frames
– Forward predicted from last I-frame and P-frames, Code differences like movement
– Two to 4 frames in the future• B-frames
– Forward and backward predicted
MPEG Processing GOP
MPEG GOP
Final Comments on Prediction
• Only use motion vector if a “close” match can be found– Evaluate “closeness” with Mean Standard Error or
other metric– Can’t search all possible blocks, so need a smart
algorithm– If no suitable match found, just code the macroblock as
an I-block– If a scene change is detected, start fresh
• Don’t want too many P or B frames in a row– Predictive error will keep propagating until next I frame– Delay in decoding
MPEG Usefulness
Multimedia CommunicationsWebcastingBroadcastingVideo on DemandInteractive Digital MediaTelecommunicationsMobile communications
ReferencesOverviews of Codecs and Container Formats
http://www.divxland.org/en/article/15/multimedia_container_formatshttp://www.pcworld.com/article/213612/all_about_video_codecs_and
_containers.html?page=2
Ripping CD's and Encoding audiohttp://www.blog.gartonhill.com/ripping-your-cd-collection-part-1/http://www.blog.gartonhill.com/ripping-your-cd-collection-part-2-
building-your-library/
Mp3 Audiohttp://oreilly.com/catalog/mp3/chapter/ch02.html
Audio Streaming http://oreilly.com/catalog/sound/chapter/ch05.html
Summary
Video and audio has become a huge part of our daily interaction with the Internet
New codecs and file formats being proposed all the time
Number of devices with different needs driving the push for more efficient ways to compress and deliver streaming media
End
New program is up – last assignment
1
4
Motivation, Why Compress?
What does compression buy us?
Lossless DVD video - – 221 Mbps
Compressed DVD video - – 4 Mbps
50:1 compression ratio!
5
Why Compress?
In a Nutshell– To reduce the file size– To deliver stream to the user– To conserve storage space
Choosing a compression rate is a balance: Quality of the Media
Availablebandwidth
6
So, Why Compress?
Delivering video over Web means compromises Mostly trading image quality for lower bit rates In general,
Video and audio are compressed Stuffed into a container and Delivered to you via web
If done well, you won't notice The missing bits and The delivery of media
Discuss individual format, codecs and tradeoffs
9
Container File Formats Purpose of container formats
Function as "black boxes" for holding a variety of media formats
Good container formats can handle files compressed with a variety of different codecs
In a perfect world, you could put any codec in any container format … Unfortunately are some incompatibilities
Examples MPEG-2, Advanced Systems Format (ASF)
from Microsoft, AVI, Quicktime (MOV), MP4, Flash (FLV) RealMedia
10
Multimedia Container Files
Multimedia file extensions .mov, .ogg, .wmv, .flv,.mp4, .mpeg
Essentially, videos packaged Into encapsulation containers, or wrapper formats, that contain all information needed to present video
You can think of file formats as being containers that hold all this information
Very similar to a .zip, .sit or .rar file
11
Differences in Containers
Why are certain formats are popular? Popular Support
How widely supported is the format? File Size
Larger is not better for streaming files Support for advanced codec functionality
Older formats such as AVI do not support new codec features … like B-frames or VBR audio
Support for advanced content Such as chapters, subtitles, meta-tags, user-
data.
27
Mapping-divide into 32 subbands, or frequency samples
Psychoacoustic-…below which noise is imperceptible to the human ear (Map & Psycho can be done independently
Bit-Allocation-total noise to mask ratios can be minimized, over all the channels and subbands
Frame Packing – header includes bit allocation and scaling information (scale factor)
Quantizer & Coding – scaled and quantized according to the bit allocation
28
Mapping-divide into 32 subbands, or frequency samples
Psychoacoustic-…below which noise is imperceptible to the human ear (Map & Psycho can be done independently
Bit-Allocation-total noise to mask ratios can be minimized, over all the channels and subbands
Frame Packing – header includes bit allocation and scaling information (scale factor)
Quantizer & Coding – scaled and quantized according to the bit allocation
36
8 x 8 Blocks
38
Click to add an outline
Quantization matrix matrix divides each coefficient by a number. The quantization matrix is pre-calculated and defined by the JPEG standard and favors the items in the top left corner of the matrix , the more frequency significant terms.Each coefficient has a different weighting
39
Run-length Encoding
The regular JPEG standard uses an advanced version of Huffman coding
40
43
The MPEG file consists of compressed video data, called the video stream. The basic unit of the video stream is a "Group of Pictures" (GOP), made up of three picture types, also called frames: I, P, and B.
The ‘I’-frames can be restructured without any references to other frames. On average, the ‘I’-frames can occur one in every ten-fifteen frames of motion picture. This type of frames contains information only about itself.
‘P’-frames can only be recreated by references from previous I-frame or P-frame; it is impossible to construct them without any data of another frame.
The ‘B’-frames are referred to as bi-directional frames, because they can be recreated based on forward and backward predictions from the information presented in the nearest preceding and following ‘I’ or ‘P’ frame.
49
Summary
Video and audio has become a huge part of our daily interaction with the Internet
New codecs and file formats being proposed all the time
Number of devices with different needs driving the push for more efficient ways to compress and deliver streaming media