CSCD 443/533 Advanced Networkspenguin.ewu.edu › cscd433 › CourseNotes › CSCD433...CSCD 443/533 Advanced Networks Fall 2017 Lecture 18 Compression of Video and Audio. 2 Topics

1

CSCD 443/533Advanced NetworksFall 2017

Lecture 18

Compression of Video and Audio

2

Topics

• Compression technology• Motivation• Human attributes make it possible• Audio Compression• Video Compression• Performance

Motivation, Why Compress?

Why do we need to compress streaming media?• Look at one instance

– 640 X 480 pixel frames– 24 bits color/pixel – 30 frames / sec

– No compression, takes over 200 Mbps to transmit just video

– Do you have a 200 Mbps link?– We need massive compression to be able to view

streaming video and audio with our current network


What does compression buy us?

Lossless DVD video - – 221 Mbps

Compressed DVD video - – 4 Mbps

50:1 compression ratio!

Why Compress?

In a Nutshell– To reduce the file size– To deliver stream to the user– To conserve storage space

Choosing a compression rate is a balance: Quality of the Media

Availablebandwidth

So, Why Compress?

Delivering video over Web means compromises Mostly trading image quality for lower bit rates In general,

Video and audio are compressed Stuffed into a container and Delivered to you via web

If done well, you won't notice The missing bits and The delivery of media

Discuss individual format, codecs and tradeoffs

Definitions

• File Format– Particular way information is stored in a file– Known as “containers” for streaming media

• Codec– Codec is an acronym for Compression/Decompression– Codec is any technology for compressing and

decompressing data.

• Compression– Reduces file size by removing audio or video information– Takes advantage of human perception

Format vs. Codec

•EExample

•FFlash Video (FLV) is a file format

•HH.264, On2, VP6, Sorenson Spark

• are codecs for the flash video file

Container File Formats Purpose of container formats

Function as "black boxes" for holding a variety of media formats

Good container formats can handle files compressed with a variety of different codecs

In a perfect world, you could put any codec in any container format … Unfortunately are some incompatibilities

Examples MPEG-2, Advanced Systems Format (ASF)

from Microsoft, AVI, Quicktime (MOV), MP4, Flash (FLV) RealMedia

Multimedia Container Files

Multimedia file extensions .mov, .ogg, .wmv, .flv,.mp4, .mpeg

Essentially, videos packaged Into encapsulation containers, or wrapper formats, that contain all information needed to present video

You can think of file formats as being containers that hold all this information

Very similar to a .zip, .sit or .rar file

Differences in Containers

Why are certain formats are popular? Popular Support

How widely supported is the format? File Size

Larger is not better for streaming files Support for advanced codec functionality

Older formats such as AVI do not support new codec features … like B-frames or VBR audio

Support for advanced content Such as chapters, subtitles, meta-tags, user-

data.

Compression

Lossless Lossy

Keeps All Bits Removes Bits

Two Types:

Compression

Lossy Compression

• Lossy compression schemes reduce file size by discarding some amount of data during encoding before sent over Internet

• Once received by client, codec attempts to reconstruct information that was lost or discarded

Video Lossy Compression• Image Compression

– Image format uses lossy compression to sample an image and discard unnecessary color/contrast information

Can you really see difference?

Video Lossy Compression

• Why can you do lossy compression?• Spatial and temporal redundancy

– Pixel values are not independent, correlated with their neighbors both within same frame and across frame

• Value of pixel is predictable given values of neighboring pixels

• Psychovisual redundancy– Human eye has limited response to fine spatial detail,

• Less sensitive to detail near object edges or around shot-changes

• Impairments introduced by bit rate reduction should not be visible to human viewer

Audio Lossy Compression

• Audio compression– Lossy discards frequencies on high and low end of

spectrum and attempts to locate and remove unnecessary audio data

• More on this …

Nice description and example programs http://www.videograbber.net/compress-audio-file.html

Audio Streaming Formats

• Many formats and standards for streaming audio

– RealNetworks' RealAudio, streaming MP3, Macromedia's Flash and Director Shockwave, Microsoft's Windows Media, and Apple's QuickTime

– Also recognized standard formats, including

– Liquid Audio, MP3, MIDI, WAV, and AU

Audio Lossy Compression

• First, player decompresses audio file as it downloads to your computer

• Then fills in missing information according to the instructions set by codec

– Compressed file is unintelligible to listener– Decompressed file is intelligible but of a lower

quality than original

MP3 Audio Lossy Compression

• Example - MP3• MP3 lossy audio data compression algorithm

takes advantage of perceptual limitation of human hearing– Auditory Masking

• Discovered (in late 1800's) that tone could be rendered inaudible by another tone of lower frequency

• How your brain perceives similar sounds

MP3 Audio Lossy Compression

• Uncompressed audio,– Like CDs, stores more data than your brain

can actually process–For example,

• Two notes are very similar and very close together, your brain may perceive only one of them

• Two sounds are different, one is much louder than the other, your brain may never perceive the quieter signal

MP3 Audio Lossy Compression• Study these auditory phenomena Psychoacoustics,

– Can be accurately described in tables and charts,– Mathematical models representing human hearing

patterns– These can be stored in the codec as reference tables

Article on psychoacoustics http://www.uaudio.com/blog/how-the-ear-works/

MP3 Audio Lossy Compression• MP3 Encoding Tools

– Analyze incoming source signal,– Break it down into mathematical patterns, and– Compare these patterns to psychoacoustic models

stored in encoder itself• Encoder can then discard most of data that

doesn't match stored models, keeping that which does

• Shrinks file by discarding great deal of extra data

MP3 Audio Lossy Compression• MP3 encoding process … two-pass system• Step 1

– Run all psychoacoustic models, discarding data– Then compress what's left to shrink storage space

• Step 2 – Huffman coding, does not discard any data– Lets you store what's left in a smaller amount of space– Uses fewer bits to store most common symbols

• Steps 2a - Break resulting audio stream into frames assembled into a bitstream, with header information preceding each data frame

– Headers contain "meta-data" specific to that frame– Such as an ID, bitrate, audio frequency, padding, type of

frame, MPEG1 or 2

Basic Structure of Audio Encoder

Note: A decoder works in just the opposite manner

Limit values to audible tones

Processes of and Audio Encoder

Mapping Block – divides audio inputs into 32 equal-width frequency subbands (samples)

Psychoacoustic Block – calculates masking threshold for each subband

Bit-Allocation Block (in Quantizer block) – allocates bits using outputs of the Mapping and Psychoacoustic blocks

Quantizer & Coding Block – scales and quantize (reduce) the samples

Frame Packing Block – formats the samples with headers into an encoded stream

Processes of and Audio Encoder

Video Encoding, Standards

MPEG Organization• Moving Picture Experts Group• Established in 1988• Standards under International Organization for

standardization (ISO) and International Electro technical Commission (IEC)

• Official name: ISO/IEC JTC1 SC29 WG11• Responsible for MPEG standards

Evolution of MPEG

MPEG-1– Initial audio/video compression standard– Used by VCD’s – 1990's– MP3 = MPEG-1 audio layer 3– Target of 1.5 Mb/s bitrate at 352x240

resolution– Only supports progressive pictures, no

interlaced pictures

Evolution of MPEG

MPEG-2– Standard, still widely used in DVD and Digital

TV– Support in current hardware implies that it will

be here for a long time• Transition to HDTV has taken over 10 years

and is not finished yet– Different profiles and levels allow for quality

control

Evolution of MPEG

MPEG-3– Originally developed for HDTV, but abandoned

when MPEG-2 was determined to be sufficient MPEG-4

– Includes support for AV “objects”, 3D content, low bitrate encoding, and DRM

– In practice, provides equal quality to MPEG-2 at a lower bitrate

– MPEG-4 Part 10 is H.264, which is used in HD-DVD and Blu-Ray

– H.264 is the encoding used in video

MPEG technical specification

Part 1 - Systems - describes synchronization and multiplexing of video and audio.

Part 2 - Video - compression codec for interlaced and non-interlaced video signals.

Part 3 - Audio - compression codec for perceptual coding of audio signals. A multichannel-enabled extension of MPEG-1 audio.

Part 4 - Describes procedures for testing compliance. Part 5 - Describes systems for Software simulation. Part 6 - Describes extensions for DSM-CC (Digital Storage Media

Command and Control.) Part 7 - Advanced Audio Coding (AAC) Part 8 - Deleted Part 9 - Extension for real time interfaces. Part 10 - Conformance extensions for DSM-CC.

MPEG Video spatial domain processing Spatial Domain Handled Similarly to JPEG

– Convert RGB values to YUV colorspace• One Brightness and two other color

representations• RGB from Television, YUV graphics processing• Y represents luminosity, U,V color• Can represent YUV with fewer bitssince human eye can't tell if color is missing• We care more about brightness

– Split frame into 8x8 blocks

8 x 8 Blocks

MPEG Video spatial domain processing 2-D Discrete Cosine Transform (DCT) on each block

• Similar to a Fourier Transform for Signal Processing• Transforms blocks into higher frequency and lower

frequency values• Pushes more important – least frequent values to

upper quadrant of the 8 X 8 block• For typical image, most of visually significant

information about image is concentrated in just a few coefficients of DCT

– Quantization of DCT coefficients• Values that are near zero, converted to zero• Values that are smaller, shrunk • All are represented by integers

Quantization matrix matrix divides each coefficient by a number. The quantization matrix is pre-calculated and defined by the JPEG standard and favors the items in the top left corner of the matrix , the more frequency significant terms.Each coefficient has a different weighting

Run-length Encoding

The regular JPEG standard uses an advanced version of Huffman coding

DCT Transform on Blocks• Final Result … Reduction in Number of Bits

– De-compression is the reverse process– However, the lossy part of this, can't quite get back

to the original image – there is a loss of information

Nice Examples using Discrete Cosine Transform http://www.dspguide.com/ch27/6.htm

http://datagenetics.com/blog/november32012/index.html

MPEG video time domain processing

Totally new ballgame (this concept doesn’t exist in JPEG)

General idea – Use motion vectors to specify how a 16x16 macroblock translates between reference frames and current frame, then code difference between reference and actual block

MPEG video time domain processing GOP (Group of Pictures)

• GOP is a set of consecutive frames that can be decoded without any other reference frames

• Usually 12 or 15 frames • Starts with I frame

MPEG video time domain processing Group of Pictures (GOP)

• I-frames– Can be reconstructed without any reference to other

frames, like still pictures• P-frames

– Forward predicted from last I-frame and P-frames, Code differences like movement

– Two to 4 frames in the future• B-frames

– Forward and backward predicted

MPEG Processing GOP

MPEG GOP

Final Comments on Prediction

• Only use motion vector if a “close” match can be found– Evaluate “closeness” with Mean Standard Error or

other metric– Can’t search all possible blocks, so need a smart

algorithm– If no suitable match found, just code the macroblock as

an I-block– If a scene change is detected, start fresh

• Don’t want too many P or B frames in a row– Predictive error will keep propagating until next I frame– Delay in decoding

MPEG Usefulness

Multimedia CommunicationsWebcastingBroadcastingVideo on DemandInteractive Digital MediaTelecommunicationsMobile communications

ReferencesOverviews of Codecs and Container Formats

http://www.divxland.org/en/article/15/multimedia_container_formatshttp://www.pcworld.com/article/213612/all_about_video_codecs_and

_containers.html?page=2

Ripping CD's and Encoding audiohttp://www.blog.gartonhill.com/ripping-your-cd-collection-part-1/http://www.blog.gartonhill.com/ripping-your-cd-collection-part-2-

building-your-library/

Mp3 Audiohttp://oreilly.com/catalog/mp3/chapter/ch02.html

Audio Streaming http://oreilly.com/catalog/sound/chapter/ch05.html

Summary

Video and audio has become a huge part of our daily interaction with the Internet

New codecs and file formats being proposed all the time

Number of devices with different needs driving the push for more efficient ways to compress and deliver streaming media

End

New program is up – last assignment

1

4


What does compression buy us?

Lossless DVD video - – 221 Mbps

Compressed DVD video - – 4 Mbps

50:1 compression ratio!

5

Why Compress?

In a Nutshell– To reduce the file size– To deliver stream to the user– To conserve storage space

Choosing a compression rate is a balance: Quality of the Media

Availablebandwidth

6

So, Why Compress?

Delivering video over Web means compromises Mostly trading image quality for lower bit rates In general,

Video and audio are compressed Stuffed into a container and Delivered to you via web

If done well, you won't notice The missing bits and The delivery of media

Discuss individual format, codecs and tradeoffs

9

Container File Formats Purpose of container formats

Function as "black boxes" for holding a variety of media formats

Good container formats can handle files compressed with a variety of different codecs

In a perfect world, you could put any codec in any container format … Unfortunately are some incompatibilities

Examples MPEG-2, Advanced Systems Format (ASF)

from Microsoft, AVI, Quicktime (MOV), MP4, Flash (FLV) RealMedia

10

Multimedia Container Files

Multimedia file extensions .mov, .ogg, .wmv, .flv,.mp4, .mpeg

Essentially, videos packaged Into encapsulation containers, or wrapper formats, that contain all information needed to present video

You can think of file formats as being containers that hold all this information

Very similar to a .zip, .sit or .rar file

11

Differences in Containers

Why are certain formats are popular? Popular Support

How widely supported is the format? File Size

Larger is not better for streaming files Support for advanced codec functionality

Older formats such as AVI do not support new codec features … like B-frames or VBR audio

Support for advanced content Such as chapters, subtitles, meta-tags, user-

data.

27

Mapping-divide into 32 subbands, or frequency samples

Psychoacoustic-…below which noise is imperceptible to the human ear (Map & Psycho can be done independently

Bit-Allocation-total noise to mask ratios can be minimized, over all the channels and subbands

Frame Packing – header includes bit allocation and scaling information (scale factor)

Quantizer & Coding – scaled and quantized according to the bit allocation

28

Mapping-divide into 32 subbands, or frequency samples

Psychoacoustic-…below which noise is imperceptible to the human ear (Map & Psycho can be done independently

Bit-Allocation-total noise to mask ratios can be minimized, over all the channels and subbands

Frame Packing – header includes bit allocation and scaling information (scale factor)

Quantizer & Coding – scaled and quantized according to the bit allocation

36

8 x 8 Blocks

38

Click to add an outline

Quantization matrix matrix divides each coefficient by a number. The quantization matrix is pre-calculated and defined by the JPEG standard and favors the items in the top left corner of the matrix , the more frequency significant terms.Each coefficient has a different weighting

39

Run-length Encoding

The regular JPEG standard uses an advanced version of Huffman coding

40

43

The MPEG file consists of compressed video data, called the video stream. The basic unit of the video stream is a "Group of Pictures" (GOP), made up of three picture types, also called frames: I, P, and B.

The ‘I’-frames can be restructured without any references to other frames. On average, the ‘I’-frames can occur one in every ten-fifteen frames of motion picture. This type of frames contains information only about itself.

‘P’-frames can only be recreated by references from previous I-frame or P-frame; it is impossible to construct them without any data of another frame.

The ‘B’-frames are referred to as bi-directional frames, because they can be recreated based on forward and backward predictions from the information presented in the nearest preceding and following ‘I’ or ‘P’ frame.

49

Summary

Video and audio has become a huge part of our daily interaction with the Internet

New codecs and file formats being proposed all the time

Number of devices with different needs driving the push for more efficient ways to compress and deliver streaming media

CSCD 443/533 Advanced Networkspenguin.ewu.edu › cscd433 › CourseNotes › CSCD433...CSCD 443/533 Advanced Networks Fall 2017 Lecture 18 Compression of Video and Audio. 2 Topics

Documents