Transcript
1
Image, video and audio coding concepts
Stefan Alfredsson
(based on material by Johan Garcia)
Roadmap
• XML – Data structuring
• Loss-less compression (huffman, LZ77, ...)
•�
Lossy compression
Rationale
• Compression is about removing redundant information, and decompression is about restoring to the ”original” state
• For loss-less compresson, decompressed data is identical to compressed data
• But, higher compression can be achieved by allowing a quality tradeoff!– For example: removing all wovels from a text– Fr xmpl: rmvng ll wvl frm txt
• Obviously not suitable for all kinds of data– But image/video/audio are good candidates
2
The case for image compression
• The human is ”impatient and half blind”
• Impatient– Frustrated by waiting behind the screen
– (i.e. need compression for faster transfer)
• Half blind– The human vision does not catch all
information
The seeing sense
(image from www.glaucoma.org/learn/eye-anatomy.gif )
Color vision worse than BW
Limited resolution
Different sensitivity to different patterns
Sharp contrast ”hides” weaker
Compression example
Original Compressed
73 kbyte 11 kbyte
> 6 times faster download!
3
Compression
• What’s differing?
Common Image Formats
• GIF (Graphis Interchange Format)– ”Lossless”, but only in 256 colors
– Uses LZW for compression (Patent problem)
• PNG (Portable Network Graphics)– More flexible replacement for GIF
• JPEG (Joint Photographers Expert Group)– Complex standard with many modes
– Typically lossy, but has lossless mode as well
JPEG
• The best compression for photo-like natural images
• Color or grayscale images
• Lossy or lossless compression
• Sequential or progressive display
• Exploits human visual system characteristics
4
JPEG functional overview
• Color conversion and downsample
• Split in 8x8 pixel blocks
• Do DCT (Discrete Cosine Transformation)
• Quantize
• Difference encode DC and runlength encode AC
• Huffman encode
• Packetize
Color conversion, color split
Split into three components(Y Cb Cr Color format)
RGB pixel image =>JPEG
• Image normally represented in RGB format (8 bits per color channel per pixel indicate pixel colors)• Color split: RGB is converted into Y Cb Cr color format
•Y – Luminance•Cb – Chrominance (blue)•Cr – Chrominance (red)
Color downsampling
Downsample chrominancecomponents
RGB pixel image =>JPEG
• Color split and downsample
•The eye less sensitive to chrominance (color details) than to luminance (brightness details)•==> Downsample color components•Downsampling the chroma components saves 33% or 50% of the space taken by the image.
5
Splitting into blocks
Split into 8x8 pixel blocks
RGB pixel image =>JPEG
• Color split and downsample
• Split in blocks
DCT transform
Perform DCT
RGB pixel image =>JPEG
• Color split and downsample
• Split in blocks and do DCT
•DCT indicates the amount of intensity variation (spatial frequency)
Why DCT? Spatial locality!
6
Image Frequencies• An 8x8 pixel block
is transformed into64 spatial frequencies(coefficients)
• Sort of”2-dimensional fourier transform”
Summing of coefficients
Ursprungsbild
• By using all 64 coefficients a perfectreconstrucion is possible
• Using fewercoefficients gives a good reconstructionand data reduction
Quantization
/ =
• The quantization reduces the resolution (i.e. The amount of information) for the coefficents
• The number of zero coefficients is increased
7
Encoding DCT - DC
Differential encode DC coefficient
RGB pixel image =>JPEG
• Color split and downsample
• Split in blocks and do DCT
• Quantize
• Diff code DC
Encoding DCT - AC
RGB pixel image =>JPEG
• Color split and downsample
• Split in blocks and do DCT
• Quantize
• Diff code DC and RLE AC
Run Length Encode AC coefficients
Zig-zag order• The quantized
coefficents are coded in a zig-zagmanner
• This leads to longer zerorunlength for the last coefficients.
8
Packetization
RGB pixel image =>JPEG
• Color split and downsample
• Split in blocks and do DCT
• Quantize
• Diff code DC and RLE AC
• Huffman encode
• PacketizeHeaders
One Minmum Coded Unit
Coder - overview
Decoder - overview
9
Image format recommendations
• Photo-like images – JPEG• Photo or drawn images – PNG or GIF
– These use lossless compression
• Text – PNG or GIF– High contrast, will be blurred by jpeg
• Trim JPEG quality• Avoid browser scaling
– Adjust filesize before to reduce download speed– Easy to upload 5 Mpix image, but image to be
displayed in 800x600 (0.48 Mpix) album!)
Size/quality comparisons
(Från: http://www.psychology.nottingham.ac.uk/staff/cr1/graphics.html)
Video coding
10
Video Coding• Video coding
– MPEG1, originally for 1x CD-spelare (”vcd”)– MPEG2, Digital Satellite och cable-TV
• Used for example in DVB-T for Boxer marksänd digitaltv
• Video coding for conferencing– H.261, older standard, used with e.g. ISDN– H.263, newer standard. Works better at low transmission speeds.
• Netmeeting• Used for most Flash videos (YouTube, Google Video, Myspace, and
others)• Basis of RealVideo / RealPlayer until RealVideo 8
• New technology– MPEG 4– (a subset of MPEG4 is used in DivX, XviD, H.264, and other
codecs)
Overview video coding
Video coding concepts
• Spatial Redundancy– Nearby pixels typically have similar values
– Removed by DCT-transform/quantization/huffmanencoding (like in JPEG)
• Temporal Redundancy– Video frames in sequence typically have similar content
– Removed by macro block coding with motion vectors and difference frames
11
Macro block coding
The motion vector specify how much and at which way the macro block has moved. The difference frame specify the difference within the macro block between frames.
Macro block coding 2
I, P and B frames
12
Transmission order
MPEG 4
Mpeg 4 är object based, where both image and audio objects may be placed in a 3D coordinate system.
These objetcs can then be coded and manipulted independently of each other.
The viewer may also interact with the settning, change viewing angles, and so on.
Mesh coding
An area in the picture can be modeled with a mesh. The mesh parameters are then changed when the area is changed. The texture is rescaled according to the mesh changes.
Wireline models of face and animation of ”wirelinefaces” with mapped textures are specified in MPEG 4 ver 1. Later versions have included models of human body and corresponding motion patterns.
13
Sprite coding
A picture can be composed with a fixed background and a moving sprite. The background can be larger than what is shown on screen, to facilitate camera panorama without transmitting a new background.
(compare to weather forecasts on TV, where the background is really ”blue” and the background / weather map / is added afterwards)
Audio coding
Audio coding
• Use redundancy in data– Similarities in channels in stereo sound
• Use the weaknesses of the hearing sense (psycho acoustics)– Hearing thresholds
– Frequency bound masking
– Time bound masking
14
Hearing threshold
• The hearing sensitivity varies for different frequencies
• What can not be detected does not need tranmission
Frequency masking
• A strong tone masks (conceals/hides) nearby tones
• The mask width varies with the frequency. Higher = broader
Time masking
• Strong tones also mask in time
• We can not sense nearby tones right after a strong tone has finished
15
Masking
• Frequency and time masking can be modeled as a surface
• Tones below the surface can not be sensed, and their information does not need coding / transmission
MPEG 2 - Audio Layer 3 (MP3)Algorithm:
1. Separate the audio into 32 subbands with a filter
2. Compute the mask that every band cause
3. If the audio strength in one subband is masked, do not code that subband
4. Determine quantization with regard to masking and bit rate
5. Huffman encode
6. Format output stream
Q?
top related