Top Banner
www.sigmatrainers.com Since 21 Years More than 1500 Trainers TRAINERS SIGMA TRAINERS AHMEDABAD (INDIA) VIDEO COMPRESSING TECHNIQUES MODEL-VIDEOCOMP100
42

Video Compression Techniques

Nov 18, 2014

Download

Documents

Anirudha Mhase

This document comprises description of different video compression techniques like MPEG1, MPEG2 etc.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Video Compression Techniques

www.sigmatrainers.com

Since 21

Years

More than1500

Trainers

TRAINERS

SIGMA TRAINERSAHMEDABAD (INDIA)

VIDEO COMPRESSING

TECHNIQUES

MODEL-VIDEOCOMP100

Page 2: Video Compression Techniques

INTRODUCTION This trainer includes theory and soft wares used for different types video compressing techniques.

SPECIFICATIONS

1. Manual : Includes more than 200 pages discussing different types video

Compressing Techniques.

2. Video Compressing formats : To compress AVI, MPG1, MPEG-2, WMV.

To compress to MPEG using VCD, SVCD, or DVD

3. Video Compressing soft wares: 1. Blaze Media Pro software

2. Alparysoft Lossless Video Codec

3. MSU Lossless Video Codec

4. DivX Player with DivX Pro Codec (98/Me)

5. Elecard MPEG-2 Decoder & Streaming pack

Page 3: Video Compression Techniques

VIDEO COMPRESSING TECHNIQUES- MPEG2 VIDEO COMPRESSION Video compression refers to reducing the quantity of data used to represent video content without excessively reducing the quality of the picture. It also reduces the number of bits required to store and/or transmit digital media. Compressed video can be transmitted more economically over a smaller carrier. Digital video requires high data rates - the better the picture, the more data is ordinarily needed. This means powerful hardware, and lots of bandwidth when video is transmitted. However much of the data in video is not necessary for achieving good perceptual quality, e.g., because it can be easily predicted - for example, successive frames in a movie rarely change much from one to the next - this makes data compression work well with video. Video compression can make video files far smaller with little perceptible loss in quality. For example, DVDs use a video coding standard called MPEG-2 that makes the movie 15 to 30 times smaller while still producing a picture quality that is generally considered high quality for standard-definition video. Without proper use of data compression techniques, either the picture would look much worse, or one would need more such disks per movie.

Theory Video is basically a three-dimensional array of color pixels. Two dimensions serve as spatial (horizontal and vertical) directions of the moving pictures, and one dimension represents the time domain. A frame is a set of all pixels that correspond to a single point in time. Basically, a frame is the same as a still picture. (These are sometimes made up of fields. See interlace) Video data contains spatial and temporal redundancy. Similarities can thus be encoded by merely registering differences within a frame (spatial) and/or between frames (temporal). Spatial encoding is performed by taking advantage of the fact that the human eye is unable to distinguish small differences in colour as easily as it can changes in brightness and so very similar areas of colour can be "averaged out" in a similar way to jpeg images (JPEG image compression FAQ, part 1/2). With temporal compression only the changes from one frame to the next are encoded as often a large number of the pixels will be the same on a series of frames (About video compression).

Lossless compression Some forms of data compression are lossless. This means that when the data is decompressed, the result is a bit-for-bit perfect match with the original. While lossless compression of video is possible, it is rarely used. This is because any lossless compression system will sometimes result in a file (or portions of) that is as large and/or has the same data rate as the uncompressed original. As a result, all hardware in a lossless system would have to be able to run fast enough to handle uncompressed video as well. This eliminates much of the benefit of compressing the data in the first place. For example, digital videotape can't vary its data rate easily so dealing with short bursts of maximum-data-rate video would be more complicated than something that was fixed at the maximum rate all the time.

Intraframe vs interframe compression One of the most powerful techniques for compressing video is interframe compression. This works by comparing each frame in the video with the previous one. If the frame contains areas where nothing has moved, the system simply issues a short command that copies that part of the previous frame, bit-for-bit, into the next one. If objects move in a simple manner, the compressor emits a (slightly longer) command that tells the decompressor to shift, rotate, lighten, or darken the copy -- a longer command, but still much shorter than intraframe compression. Interframe compression is best for finished programs that will simply be played back by the viewer. Interframe compression can cause problems if it is used for editing.

1

Since Interframe compression copies data from one frame to another, if the original frame is simply cut out (or lost in transmission), the following frames cannot be reconstructed. Some video formats, such as DV, compress each frame independently, as if they were all unrelated still images (using image compression techniques). This is called intraframe compression. Editing intraframe-compressed video is almost as easy as editing uncompressed

Page 4: Video Compression Techniques

video -- one finds the beginning and ending of each frame, and simply copies bit-for-bit each frame that one wants to keep, and discards the frames one doesn't want. Another difference between intraframe and interframe compression is that with intraframe systems, each frame uses a similar amount of data. In interframe systems, certain frames called "I frames" aren't allowed to copy data from other frames, and so require much more data than other frames nearby. (The "I" stands for independent.) It is possible to build a computer-based video editor that spots problems caused when I frames are edited out while other frames need them. This has allowed newer formats like HDV to be used for editing. However, this process demands a lot more computing power than editing intraframe compressed video with the same picture quality.

) MPEG (MOVING PICTURES EXPERTS GROUP It is a set of standards established for the compression of digital video and audio data. It is the universal standard for digital terrestrial, cable and satellite TV, DVDs and digital video recorder. MPEG uses lossy compression within each frame similar to JPEG, which means pixels from the original images are permanently discarded. It also uses interframe coding, which further compresses the data by encoding only the differences between periodic frames (see interframe coding). MPEG performs the actual compression using the discrete cosine transform (DCT) method (see DCT). MPEG is an asymmetrical system. It takes longer to compress the video than it does to decompress it in the DVD player, PC, set-top box or digital TV set. As a result, in the early days, compression was perfomed only in the studio. As chips advanced and became less costly, they enabled digital video recorders, such as Tivos, to convert analog TV to MPEG and record it on disk in realtime (see DVR). MPEG-1 (Video CDs) Although MPEG-1 supports higher resolutions, it is typically coded at 352x240 x 30fps (NTSC) or 352x288 x 25fps (PAL/SECAM). Full 704x480 and 704x576 frames (BT.601) were scaled down for encoding and scaled up for playback. MPEG-1 uses the YCbCr color space with 4:2:0 sampling, but did not provide a standard way of handling interlaced video. Data rates were limited to 1.8 Mbps, but often exceeded. See YCbCr sampling. MPEG-2 (DVD, Digital TV) MPEG-2 provides broadcast quality video with resolutions up to 1920x1080. It supports a variety of audio/video formats, including legacy TV, HDTV and five channel surround sound. MPEG-2 uses the YCbCr color space with 4:2:0, 4:2:2 and 4:4:4 sampling and supports interlaced video. Data rates are from 1.5 to 60 Mbps. See YCbCr sampling.

MPEG-4 (All Inclusive and Interactive) MPEG-4 is an extremely comprehensive system for multimedia representation and distribution. Based on a variation of Apple's QuickTime file format, MPEG-4 offers a variety of compression options, including low-bandwidth formats for transmitting to wireless devices as well as high-bandwidth for studio processing. See H.264.

MPEG-4 also incorporates AAC, which is a high-quality audio encoder. MPEG-4 AAC is widely used as an audio-only format (see AAC).

A major feature of MPEG-4 is its ability to identify and deal with separate audio and video objects in the frame, which allows separate elements to be compressed more efficiently and dealt with independently. User-controlled interactive sequences that include audio, video, text, 2D and 3D objects and animations are all part of the MPEG-4 framework. For more information, visit the MPEG Industry Forum at www.mpegif.org.

2

MPEG-7 (Meta-Data) MPEG-7 is about describing multimedia objects and has nothing to do with compression. It provides a library of core description tools and an XML-based Description Definition Language (DDL) for extending the library with

Page 5: Video Compression Techniques

additional multimedia objects. Color, texture, shape and motion are examples of characteristics defined by MPEG-7.

MPEG-21 (Digital Rights Infrastructure) MPEG-21 provides a comprehensive framework for storing, searching, accessing and protecting the copyrights of multimedia assets. It was designed to provide a standard for digital rights management as well as interoperability. MPEG-21 uses the "Digital Item" as a descriptor for all multimedia objects. Like MPEG-7, it does not deal with compression methods.

The Missing Numbers MPEG-3 was abandoned after initial development because MPEG-2 was considered sufficient. Because MPEG-7 does not deal with compression, it was felt a higher number was needed to distance it from MPEG-4. MPEG-21 was coined for the 21st century.

MPEG Vs. Motion JPEG Before MPEG, a variety of non-standard Motion JPEG (M-JPEG) methods were used to create consecutive JPEG frames. Motion JPEG did not use interframe coding between frames and was easy to edit, but not as highly compressed as MPEG. For compatibility, video editors may support one of the Motion JPEG methods. MPEG can also be encoded without interframe compression for faster editing. See MP3, MPEG LA, MPEGIF, MPEG-2

MPEG-2 is a standard for "the generic coding of moving pictures and associated audio information [1]." It is widely used around the world to specify the format of the digital television signals that are broadcast by terrestrial (over-the-air), cable, and direct broadcast satellite TV systems. It also specifies the format of movies and other programs that are distributed on DVD and similar disks. The standard allows text and other data, e.g., a program guide for TV viewers, to be added to the video and audio data streams. TV stations, TV receivers, DVD players, and other equipment are all designed to this standard. MPEG-2 was the second of several standards developed by the Motion Pictures Expert Group (MPEG) and is an international standard (ISO/IEC 13818).

While MPEG-2 is the core of most digital television and DVD formats, it does not completely specify them. Regional institutions adapt it to their needs by restricting and augmenting aspects of the standard. See "#Profiles and Levels."

MPEG-2 includes a Systems part (part 1) that defines two distinct (but related) container formats. One is Transport Stream, which is designed to carry digital video and audio over somewhat-unreliable media. MPEG-2 Transport Stream is commonly used in broadcast applications, such as ATSC and DVB. MPEG-2 Systems also defines Program Stream, a container format that is designed for reasonably reliable media such as disks. MPEG-2 Program Stream is used in the DVD and SVCD standards.

The Video part (part 2) of MPEG-2 is similar to MPEG-1, but also provides support for interlaced video (the format used by analog broadcast TV systems). MPEG-2 video is not optimized for low bit-rates (less than 1 Mbit/s), but outperforms MPEG-1 at 3 Mbit/s and above. All standards-conforming MPEG-2 Video decoders are fully capable of playing back MPEG-1 Video streams.

With some enhancements, MPEG-2 Video and Systems are also used in most HDTV transmission systems.

3

The MPEG-2 Audio part (defined in Part 3 of the standard) enhances MPEG-1's audio by allowing the coding of audio programs with more than two channels. Part 3 of the standard allows this to be done in a backwards compatible way, allowing MPEG-1 audio decoders to decode the two main stereo components of the presentation.

Page 6: Video Compression Techniques

Part 7 of the MPEG-2 standard specifies a rather different, non-backwards-compatible audio format. Part 7 is referred to as MPEG-2 AAC. While AAC is more efficient than the previous MPEG audio standards, it is much more complex to implement, and somewhat more powerful hardware is needed for encoding and decoding.

Video coding (simplified)

An HDTV camera generates a raw video stream of more than one billion bits per second. This stream must be compressed if digital TV is to fit in the bandwidth of available TV channels and if movies are to fit on DVDs. Fortunately, video compression is practical because the data in pictures is often redundant in space and time. For example, the sky can be blue across the top of a picture and that blue sky can persist for frame after frame. Also, because of the way the eye works, it is possible to delete some data from video pictures with almost no noticeable degradation in image quality.

TV cameras used in broadcasting usually generate 50 pictures a second (in Europe and elsewhere) or 59.94 pictures a second (in North America and elsewhere). Digital television requires that these pictures be digitized so that they can processed by computer hardware. Each picture element (a pixel) is then represented by one luminance number and two chrominance numbers. These describe the brightness and the color of the pixel (see YUV). Thus, each digitized picture is initially represented by three rectangular arrays of numbers.

A common (and old) trick to reduce the amount of data that must be processed per second is to separate the picture into two fields: the "top field," which is the odd numbered rows, and the "bottom field," which is the even numbered rows. The two fields are displayed alternately. This is called interlaced video. Two successive fields are called a frame. The typical frame rate is then 25 or 29.97 frames a second. If the video is not interlaced, then it is called progressive video and each picture is a frame. MPEG-2 supports both options.

Another trick to reduce the data rate is to thin out the two chrominance matrices. In effect, the remaining chrominance values represent the nearby values that are deleted. Thinning works because the eye is more responsive to brightness than to color. The 4:2:2 chrominance format indicates that half the chrominance values have been deleted. The 4:2:0 chrominance format indicates that three quarters of the chrominance values have been deleted. If no chrominance values have been deleted, the chrominance format is 4:4:4. MPEG-2 allows all three options.

MPEG-2 specifies that the raw frames be compressed into three kinds of frames: I(ntra-coded)-frames, P(redictive-coded)-frames, and B(idirectionally predictive-coded)-frames.

An I-frame is a compressed version of a single uncompressed (raw) frame. It takes advantage of spatial redundancy and of the inability of the eye to detect certain changes in the image. Unlike P-frames and B-frames, I-frames do not depend on data in the preceding or the following frames. Briefly, the raw frame is divided into 8 pixel by 8 pixel blocks. The data in each block is transformed by a "discrete cosine transform." The result is a 8 by 8 matrix of coefficients. This transform does not change the information in the block; the original block can be recreated exactly by applying the inverse cosine transform. The math is a little esoteric but, roughly, the transform converts spatial variations into frequency variations. The advantage of doing this is that the image can now be simplified by quantizing the coefficients. Many of the coefficients, usually the higher frequency components, will then be zero. The penalty of this step is the loss of some subtle distinctions in brightness and color. If one applies the inverse transform to the matrix after it is quantized, one gets an image that looks very similar to the original image but that is not quite as nuanced. Next, the quantized coefficient matrix is itself compressed. Typically, one corner of the quantized matrix is filled with zeros. By starting in the opposite corner of the matrix, then zigzaging through the matrix to combine the coefficients into a string, then substituting run-length codes for consecutive zeros in that string, and then applying Huffman coding to that result, one reduces the matrix to a smaller array of numbers. It is this array that is broadcast or that is put on DVDs. In the receiver or the player, the whole process is reversed, enabling the receiver to reconstruct, to a close approximation, the original frame.

4

Typically, every 15th frame or so is made into an I-frame. P-frames and B-frames might follow an I-frame like this, IBBPBBPBBPBB(I), to form a Group of Pictures (GOP); however, the standard is flexible about this.

Page 7: Video Compression Techniques

P-frames provide more compression than I-frames because they take advantage of the data in the previous I-frame or P-frame. I-frames and P-frames are called reference frames. To generate a P-frame, the previous reference frame is reconstructed, just as it would be in a TV receiver or DVD player. The frame being compressed is divided into 16 pixel by 16 pixel "macroblocks." Then, for each of those macroblocks, the reconstructed reference frame is searched to find that 16 by 16 macroblock that best matches the macroblock being compressed. The offset is encoded as a "motion vector." Frequently, the offset is zero. But, if something in the picture is moving, the offset might be something like 23 pixels to the right and 4 pixels up. The match between the two macroblocks will often not be perfect. To correct for this, the encoder computes the strings of coefficient values as described above for both macroblocks and, then, subtracts one from the other. This "residual" is appended to the motion vector and the result sent to the receiver or stored on the DVD for each macroblock being compressed. Sometimes no suitable match is found. Then, the macroblock is treated like an I-frame macroblock.

The processing of B-frames is similar to that of P-frames except that B-frames use the picture in the following reference frame as well as the picture in the preceding reference frame. As a result, B-frames usually provide more compression than P-frames. B-frames are never reference frames.

While the above paragraphs generally describe MPEG-2 video compression, there are many details that are not discussed including details involving fields, chrominance formats, responses to scene changes, special codes that label the parts of the bitstream, and so on. MPEG-2 compression is complicated. TV cameras capture pictures at a regular rate. TV receivers display pictures at a regular rate. In between, all kinds of things are happening. But it works.

Audio encoding

MPEG-2 also introduces new audio encoding methods. These are

• low bitrate encoding with halved sampling rate (MPEG-1 Layer 1/2/3 LSF) • multichannel encoding with up to 5.1 channels • MPEG-2 AAC

Profiles and Levels MPEG-2 Profiles

Abbr. Name Frames YUV Streams Comment SP Simple Profile P, I 4:2:0 1 no interlacing MP Main Profile P, I, B 4:2:0 1

422P 4:2:2 Profile P, I, B 4:2:2 1 SNR SNR Profile P, I, B 4:2:0 1-2 SNR: Signal to Noise Ratio SP Spatial Profile P, I, B 4:2:0 1-3 HP High Profile P, I, B 4:2:2 1-3

low, normal and high quality decoding

MPEG-2 Levels Abbr. Name Pixel/line Lines Framerate (Hz) Bitrate (Mbit/s)

LL Low Level 352 288 30 4 ML Main Level 720 576 30 15

H-14 High 1440 1440 1152 30 60 HL High Level 1920 1152 30 80

5

Profile @ Level

Resolution (px)

Framerate max. (Hz) Sampling Bitrate

(Mbit/s) Example Application

SP@LL 176 × 144 15 4:2:0 0.096 Wireless handsets SP@ML 352 × 288 15 4:2:0 0.384 PDAs

Page 8: Video Compression Techniques

320 × 240 24 MP@LL 352 × 288 30 4:2:0 4 Set-top boxes (STB)

720 × 480 30 MP@ML

720 × 576 25 4:2:0 15 (DVD:

9.8) DVD, SD-DVB

1440 × 1080 30 MP@H-14

1280 × 720 30 4:2:0 60 (HDV:

25) HDV

1920 × 1080 30 MP@HL

1280 × 720 60 4:2:0 80 ATSC 1080i, 720p60, HD-DVB (HDTV)

422P@LL 4:2:2

720 × 480 30 422P@ML

720 × 576 25 4:2:2 50 Sony IMX using I-frame only, Broadcast

"contribution" video (I&P only) 1440 × 1080 30

422P@H-14 1280 × 720 60

4:2:2 80 Potential future MPEG-2-based HD products from Sony and Panasonic

1920 × 1080 30 422P@HL

1280 × 720 60 4:2:2 300 Potential future MPEG-2-based HD

products from Panasonic DVD

The DVD standard uses MPEG-2 video, but imposes some restrictions:

• Allowed Resolutions o 720 × 480, 704 × 480, 352 × 480, 352 × 240 pixel (NTSC) o 720 × 576, 704 × 576, 352 × 576, 352 × 288 pixel (PAL)

• Allowed Aspect ratio (image) (Display AR) o 4:3 o 16:9 o (2.21:1 is often listed as a valid DVD aspect ratio, but is actually just a 16:9 image with the top

and bottom of the frame masked in black) • Allowed Frame rates

o 29.97 frame/s (NTSC) o 25 frame/s (PAL)

Note: By using a pattern of REPEAT_FIRST_FIELD flags on the headers of encoded pictures, pictures can be displayed for either two or three fields and almost any picture display rate (minimum ⅔ of the frame rate) can be achieved. This is most often used to display 23.976 (approximately film rate) video on NTSC.

• Audio+video bitrate o Video peak 9.8 Mbit/s o Total peak 10.08 Mbit/s o Minimum 300 Kbit/s

• YUV 4:2:0 • Additional subtitles possible • Closed captioning (NTSC only) • Audio

o Linear Pulse Code Modulation (LPCM): 48 kHz or 96 kHz; 16- or 24-bit; up to six channels (not all combinations possible due to bitrate constraints)

o MPEG Layer 2 (MP2): 48 kHz, up to 5.1 channels (required in PAL players only) o Dolby Digital (DD, also known as AC-3): 48 kHz, 32–448 kbit/s, up to 5.1 channels

6

o Digital Theater Systems (DTS): 754 kbit/s or 1510 kbit/s (not required for DVD player compliance)

Page 9: Video Compression Techniques

o NTSC DVDs must contain at least one LPCM or Dolby Digital audio track. o PAL DVDs must contain at least one MPEG Layer 2, LPCM, or Dolby Digital audio track. o Players are not required to playback audio with more than two channels, but must be able to

downmix multichannel audio to two channels. • GOP structure

o Sequence header must be present at the beginning of every GOP o Maximum frames per GOP: 18 (NTSC) / 15 (PAL), i.e. 0.6 seconds both o Closed GOP required for multiple-angle DVDs

DVB

Application-specific restrictions on MPEG-2 video in the DVB standard:

Allowed resolutions for SDTV:

• 720, 640, 544, 480 or 352 × 480 pixel, 24/1.001, 24, 30/1.001 or 30 frame/s • 352 × 240 pixel, 24/1.001, 24, 30/1.001 or 30 frame/s • 720, 704, 544, 480 or 352 × 576 pixel, 25 frame/s • 352 × 288 pixel, 25 frame/s

For HDTV:

• 720 x 576 x 50 frames/s progressive (576p50) • 1280 x 720 x 25 or 50 frames/s progressive (720p50) • 1440 or 1920 x 1080 x 25 frames/s progressive (1080p25 - film mode) • 1440 or 1920 x 1080 x 25 frames/s interlace (1080i25) • 1920 x 1080 x 50 frames/s progressive (1080p50) possible future H.264/AVC format

ATSC

Allowed resolutions:

• 1920 × 1080 pixel, 30 frame/s (1080i) • 1280 × 720 pixel, 60 frame/s (720p) • 720 × 576 pixel, 25 frame/s (576i, 576p) • 720 or 640 × 480 pixel, 30 frame/s (480i, 480p)

Note: 1080i is encoded with 1920 × 1088 pixel frames, but the last 8 lines are discarded prior to display.

ISO/IEC 13818 Part 1

Systems - describes synchronization and multiplexing of video and audio. Part 2

Video - compression codec for interlaced and non-interlaced video signals. Part 3

Audio - compression codec for perceptual coding of audio signals. A multichannel-enabled extension of MPEG-1 audio.

Part 4 Describes procedures for testing compliance.

Part 5 Describes systems for Software simulation.

Part 6 Describes extensions for DSM-CC (Digital Storage Media Command and Control.)

7Part 7

Page 10: Video Compression Techniques

Advanced Audio Coding (AAC) Part 9

Extension for real time interfaces. Part 10

Conformance extensions for DSM-CC.

(Part 8: 10-bit video extension. Primary application was studio video. Part 8 has been withdrawn due to lack of interest by industry).

Current forms

Today, nearly all video compression methods in common use (e.g., those in standards approved by the ITU-T or ISO) apply a discrete cosine transform (DCT) for spatial redundancy reduction. Other methods, such as fractal compression, matching pursuits, and the use of a discrete wavelet transform (DWT) have been the subject of some research, but are typically not used in practical products (except for the use of wavelet coding as still-image coders without motion compensation). Interest in fractal compression seems to be waning, due to recent theoretical analysis showing a comparative lack of effectiveness to such methods.

The use of most video compression techniques (e.g., DCT or DWT based techniques) involves quantization. The quantization can either be scalar quantization or vector quantization; however, nearly all practical designs use scalar quantization because of its greater simplicity.

In broadcast engineering, digital television (DVB, ATSC and ISDB ) is made practical by video compression. TV stations can broadcast not only HDTV, but multiple virtual channels on the same physical channel as well. It also conserves precious bandwidth on the radio spectrum. Nearly all digital video broadcast today uses the MPEG-2 standard video compression format, although H.264/MPEG-4 AVC and VC-1 are emerging contenders in that domain.

Multimedia compression formats

Video compression

formats ISO/IEC

MPEG-1 | MPEG-2 | MPEG-4 | MPEG-4/AVC

ITU-T H.261 | H.262 | H.263 | H.264 Others

AVS | Dirac | Indeo | MJPEG | RealVideo | VC-1 | Theora | VP6 | VP7 | WMV

Audio compression

formats ISO/IEC

MPEG

MPEG-1 Layer III (MP3) | MPEG-1 Layer II | AAC | HE-AAC

ITU-T

G.711 | G.722 | G.722.1 | G.722.2 | G.723 | G.723.1 | G.726 | G.728 | G.729 | G.729.1 | G.729a

Others

AC3 | ATRAC | FLAC | iLBC | Monkey's Audio | Musepack | RealAudio | SHN | Speex | Vorbis | WavPack | WMA

Image compression

formats ISO/IEC/ITU-

T JPEG | JPEG 2000 | JPEG-LS | JBIG | JBIG2

-- -- Others BMP | GIF | ILBM | PCX | PNG | TGA | TIFF | WMP

Media container formats

General

3GP | ASF | AVI | FLV | Matroska | MP4 | MXF | NUT | Ogg | Ogg Media | QuickTime | RealMedia

Audio only AIFF | AU | WAV -- --

8

Page 11: Video Compression Techniques

Digital Compression An uncompressed SDI signal outputs 270Mb of data every second. In digital broadcasting compression is essential to squeeze all this data into a 10MHz RF channel. Many people mistakenly equate the term 'bit rate' with picture quality. 'Bit Rate' actually refers to how the signal is processed. Thanks to the unique modular design of all Gigawave digital microwave links the 'plug-in' encoder and modulator modules can easily be changed on-site, or upgraded as new compression techniques evolve.

Compression Techniques used in Telecommunications and Broadcasting:

Standard Bit Rate (Mb/s) Delay ETSI 140 140 0 ETSI 34 34 Negligible ETSI 17 17 ETSI 8 8 DigiBeta 120 (Approx.) Negligible Digital S 50 MPEG 1 1.5 MPEG 2 1.5 - 80 2 - 24 frames Beta SX 18 EBU 24 News 8 MPEG 4 N/A Motion JPEG 30 - 100 3 frames JPEG 2000 N/A DVC Pro 25/50/100 25/50/100 3 frames DVCam 25 3 frames DV 25 3 frames Wavelets 18 - 100 <1ms Firewire (IEEE 1394) 100/200/400

Typical Compression Techniques used in IT:

Standard Bit Rate (Mb/s) Delay Media 9 N/A Ethernet 10, 100, 1000 SCSI 40 SCSII 160 MPEG 4 N/A

9

Page 12: Video Compression Techniques

AUDIO COMPRESSION TECHNIQUES

Many different compression techniques exist for for various forms of data. Video compression is simpler because many pixels are repeated in groups. Different techniques for still pictures include horizontal repeated pixel compression (pcx format), data conversion (gif format), and fractal path repeated pixels. For motion video, compression is relatively easy because large portions of the screen don't change between each frame; therefore, only the changes between images need to be stored. Text compression is extremely simple compared to video and audio. One method counts the probability of each character and then reassigns smaller bit values to the most common characters and larger bit values to the least common characters.

However, digital samples of audio data have proven to be very difficult to compress; these techniques do not work well at all for audio data. The data change often, and no values are common enough to save sufficient space. Currently, five methods are used to compress audio data with varying degrees of complexity, compressed audio quality, and amount of data compression.

Sampling Basics

The digital representation of audio data offers many advantages : high noise immunity, stability, and reproducibility. Audio in digital form also allows for efficient implementation of many audio processing functions through the computer.

Converting audio from analog to digital begins by sampling the audio input at regular, discrete intervals of time and quantizing the sampled values into a discrete number of evenly spaced levels. According to the Nyquist theory, a time-sampled signal can faithfully represent a signal up to half the sampling rate. Above that threshold, frequencies become blurred and signal noise becomes readily apparent.

The sampling frequencies in use today range from 8 kHz for basic speech to 48 kHz for commercial DAT machines. The number of quantizer levels is typically a power of 2 to make full use of a fixed number of bits per audio sample. The typical range for bits per sample is between 8 and 16 bits. This allows for a range of 256 to 65,536 levels of quantization per sample. With each additional bit of quantizer spacing, the signal to noise ratio increases by roughly 6 decibels (dB). Thus, the dynamic range capability of these representations is from 48 to 96 dB, respectively.

The data rates associated with uncompressed digital audio are substantial. For audio data on a CD, for example, which is sampled at 44.1 kHz with 16 bits per channel for two channels, about 1.4 megabits per second are processed. A clear need exists for some form of compression to enable the more efficient storage and transmission of digital audio data.

Voc File Compression

The simplest compression techniques simply removed any silence from the entire sample. Creative Labs introduced this form of compression with their introduction of the Soundblaster line of sound cards. This method analyzes the whole sample and then codes the silence into the sample using byte codes. It is very similar to run-length coding.

Linear Predictive Coding and Code Excited Linear Predictor

10

This was an early development in audio compression that was used primarily for speech. A Linear Predictive Coding (LPC) encoder compares speech to an analytical model of the vocal tract, then throws away the speech and stores the parameters of the best-fit model. The output quality was poor and was often compared to computer speech and thus is not used much today.

Page 13: Video Compression Techniques

A later development, Code Excited Linear Predictor (CELP), increased the complexity of the speech model further, while allowing for greater compression due to faster computers, and produced much better results. Sound quality improved, while the compression ratio increased. The algorithm compares speech with an analytical model of the vocal tract and computes the errors between the original speech and the model. It transmits both model parameters and a very compressed representation of the errors.

Mu-law and A-law compression

Logarithmic compression is a good method because it matches the way the human ear works. It only loses information which the ear would not hear anyway, and gives good quality results for both speech and music. Although the compression ratio is not very high it requires very little processing power to achieve. It is the international standard telephony encoding format, also known as ITU (formerly CCITT) standard. It is commonly used in North America and Japan for ISDN 8 kHz sampled, voice grade, digital telephone service. It packs each 16-bit sample into 8 bits by using a logarithmic table to encode a 13-bit dynamic range, dropping the least significant 3 bits of precision. The quantization levels are dispersed unevely instead of linearly to mimic the way that the human ear perceives sound levels differently at different frequencies. Unlike linear quantization, the logarithmic step spacings represent low-amplitude samples with greater accuracy than higher-amplitude samples. This method is fast and compresses data into half the size of the original sample. This method is used quite widely due to the universal nature of its adoption.

Adaptive Differential Pulse Code Modulation (ADPCM)

The Interactive Multimedia Association (IMA) is a consortium of computer hardware and software vendors cooperating to develop a standard for multimedia data. Their goal was to select a public-domain audio compression algorithm that is able to provide a good compression ratio while maintaining good audio quality. In addition, the coding had to be simple enough to enable software-only decoding of 44.1 kHz samples on a 20 MHz, 386-class computer.

This process is a simple conversion based on the assumption that the changes between samples will not be very large. The first sample value is stored in its entirety, and the each successive value describes the amount +/- 8 levels that the wave will change, which uses only 4 instead of 16 bits. Therefore, a 4:1 compression ratio is achieved with less loss as the sampling frequency increases. At 44.1 kHz, the compressed signal is an accurate representation of the uncompressed sample that is difficult to discern from the original. This method is used widely today because of its simplicity, wide acceptance, and high level of compression.

MPEG The Motion Picture Experts Group (MPEG) audio compression algorithm is an International Organization for Standardization (ISO) standard for high fidelity audio compressions. It is one of a three-part compression standard, the other two being video and system. The MPEG compression is lossy, but nonetheless can achieve transparent, perceptually lossless compression. MPEG compression is firmly founded in psychoaccoustic theory. The premise behind this technique is simply: if the sound cannot be heard by the listener, then it does not need to be coded. Human hearing is quite sensitive, but discerning differences in a collage of sounds is quite difficult. Masking is the phenomenon where a strong signal "covers" the sound of another signal such that the softer one cannot be heard by the human ear. An extension of this is temporal masking, which describes masking of a soft sound after loud has stopped. The time, measured under scientific conditions, that it takes to hear the softer sound is about 5 ms. Because the sensitivity of the ear is not linear but is instead dependent upon the frequency, masking effects differ depending on the frequency of the sounds.

11

MPEG compression uses masking as the basis for compressing the audio data. Those sounds that cannot be heard by the human ear do not need to be encoded. The audio spectrum is divided into 32 frequency bands because sound masking occurs over a range of frequencies for each loud sound. Then the volume levels are measured in each band to detect for any masking. Masking effects are taken into account, and the signal is then encoded.

Page 14: Video Compression Techniques

In addition to encoding a single signal, the MPEG compression supports one or two audio channels in one of four modes: 1) Monophonic 2) Dual Monophonic -- two independent channels 3) Stereo -- for stereo channels that share bits, but not using joint-stereo coding 4) Joint - stereo -- takes advantage of the correlations between stereo channels The MPEG method allows for a compression ratio of up to 6:1. Under optimal listening conditions, expert listeners could not distinguish the coded and original audio clips. Thus, although this technique is lossy, it still produces accurate representations of the original audio signal.

SPEECH COMPRESSION I. Introduction

The compression of speech signals has many practical applications. One example is in digital cellular technology where many users share the same frequency bandwidth. Compression allows more users to share the system than otherwise possible. Another example is in digital voice storage (e.g. answering machines). For a given memory size, compression allows longer messages to be stored than otherwise.

Historically, digital speech signals are sampled at a rate of 8000 samples/sec. Typically, each sample is represented by 8 bits (using mu-law). This corresponds to an uncompressed rate of 64 kbps (kbits/sec). With current compression techniques (all of which are lossy), it is possible to reduce the rate to 8 kbps with almost no perceptible loss in quality. Further compression is possible at a cost of lower quality. All of the current low-rate speech coders are based on the principle of linear predictive coding (LPC) which is presented in the following sections.

II. LPC Modeling A. Physical Model:

12

Page 15: Video Compression Techniques

When you speak:

• Air is pushed from your lung through your vocal tract and out of your mouth comes speech. • For certain voiced sound, your vocal cords vibrate (open and close). The rate at which the vocal cords

vibrate determines the pitch of your voice. Women and young children tend to have high pitch (fast vibration) while adult males tend to have low pitch (slow vibration).

• For certain fricatives and plosive (or unvoiced) sound, your vocal cords do not vibrate but remain constantly opened.

• The shape of your vocal tract determines the sound that you make. • As you speak, your vocal tract changes its shape producing different sound. • The shape of the vocal tract changes relatively slowly (on the scale of 10 msec to 100 msec). • The amount of air coming from your lung determines the loudness of your voice.

B. Mathematical Model:

• The above model is often called the LPC Model. • The model says that the digital speech signal is the output of a digital filter (called the LPC filter) whose

input is either a train of impulses or a white noise sequence. • The relationship between the physical and the mathematical models:

Vocal Tract (LPC Filter)

Air (Innovations)

Vocal Cord Vibration (voiced)

Vocal Cord Vibration Period (pitch period)

Fricatives and Plosives (unvoiced)

Air Volume (gain)

13

• The LPC filter is given by:

Page 16: Video Compression Techniques

which is equivalent to saying that the input-output relationship of the filter is given by the linear difference equation:

• The LPC model can be represented in vector form as:

• changes every 20 msec or so. At a sampling rate of 8000 samples/sec, 20 msec is equivalent to 160 samples.

• The digital speech signal is divided into frames of size 20 msec. There are 50 frames/second. • The model says that

is equivalent to

Thus the 160 values of is compactly represented by the 13 values of .

• There's almost no perceptual difference in if: o For Voiced Sounds (V): the impulse train is shifted (insensitive to phase change). o For Unvoiced Sounds (UV):} a different white noise sequence is used.

• LPC Synthesis: Given , generate (this is done using standard filtering techniques).

• LPC Analysis: Given , find the best (this is described in the next section).

III. LPC Analysis

• Consider one frame of speech signal:

14

Page 17: Video Compression Techniques

• The signal is related to the innovation through the linear difference equation:

• The ten LPC parameters are chosen to minimize the energy of the innovation:

• Using standard calculus, we take the derivative of with respect to and set it to zero:

• We now have 10 linear equations with 10 unknowns:

where

• The above matrix equation could be solved using: o The Gaussian elimination method. o Any matrix inversion method (MATLAB). o The Levinson-Durbin recursion (described below).

15

• Levinson-Durbin Recursion:

Page 18: Video Compression Techniques

Solve the above for , and then set

• To get the other three parameters: , we solve for the innovation:

• Then calculate the autocorrelation of :

16

• Then make a decision based on the autocorrelation:

Page 19: Video Compression Techniques

IV. 2.4kbps LPC Vocoder

• The following is a block diagram of a 2.4 kbps LPC Vocoder:

• The LPC coefficients are represented as line spectrum pair (LSP) parameters. • LSP are mathematically equivalent (one-to-one) to LPC. • LSP are more amenable to quantization. • LSP are calculated as follows:

• Factoring the above equations, we get:

are called the LSP parameters.

• LSP are ordered and bounded:

• LSP are more correlated from one frame to the next than LPC. • The frame size is 20 msec. There are 50 frames/sec. 2400 bps is equivalent to 48 bits/frame. These bits

are allocated as follows:

17

Page 20: Video Compression Techniques

• The 34 bits for the LSP are allocated as follows:

• The gain, , is encoded using a 7-bit non-uniform scalar quantizer (a 1-dimensional vector quantizer).

• For voiced speech, values of ranges from 20 to 146. are jointly encoded as follows:

V. 4.8 kbps CELP Coder

• CELP=Code-Excited Linear Prediction. • The principle is similar to the LPC Vocoder except:

o Frame size is 30 msec (240 samples)

o is coded directly o More bits are need o Computationally more complex o A pitch prediction filter is included o Vector quantization concept is used

18

• A block diagram of the CELP encoder is shown below:

Page 21: Video Compression Techniques

• The pitch prediction filter is given by:

where could be an integer or a fraction thereof.

• The perceptual weighting filter is given by:

where have been determined to be good choices.

• Each frame is divided into 4 subframes. In each subframe, the codebook contains 512 codevectors. • The gain is quantized using 5 bits per subframe. • The LSP parameters are quantized using 34 bits similar to the LPC Vocoder. • At 30 msec per frame, 4.8 kbps is equivalent to 144 bits/frame. These 144 bits are allocated as follows:

19

Page 22: Video Compression Techniques

VI. 8.0 kbps CS-ACELP CS-ACELP=Conjugate-Structured Algebraic CELP.

• The principle is similar to the 4.8 kbps CELP Coder except: o Frame size is 10 msec (80 samples) o There are only two subframes, each of which is 5 msec (40 samples) o The LSP parameters are encoded using two-stage vector quantization. o The gains are also encoded using vector quantization.

• At 10 msec per frame, 8 kbps is equivalent to 80 bits/frame. These 80 bits are allocated as follows:

VII. Demonstration This is a demonstration of five different speech compression algorithms (ADPCM, LD-CELP, CS-ACELP, CELP, and LPC10). To use this demo, you need a Sun Audio (.au) Player. To distinguish subtle differences in the speech files, high-quality speakers and/or headphones are recommended. Also, it is recommended that you run this demo in a quiet room (with a low level of background noise).

"A lathe is a big tool. Grab every dish of sugar."

• Original (64000 bps) This is the original speech signal sampled at 8000 samples/second and u-law quantized at 8 bits/sample. Approximately 4 seconds of speech.

• ADPCM (32000 bps) This is speech compressed using the Adaptive Differential Pulse Coded Modulation (ADPCM) scheme. The bit rate is 4 bits/sample (compression ratio of 2:1).

• LD-CELP (16000 bps) This is speech compressed using the Low-Delay Code Excited Linear Prediction (LD-CELP) scheme. The bit rate is 2 bits/sample (compression ratio of 4:1).

• CS-ACELP (8000 bps) This is speech compressed using the Conjugate-Structured Algebraic Code Excited Linear Prediction (CS-ACELP) scheme. The bit rate is 1 bit/sample (compression ratio of 8:1).

• CELP (4800 bps) This is speech compressed using the Code Excited Linear Prediction (CELP) scheme. The bit rate is 0.6 bits/sample (compression ratio of 13.3:1).

• LPC10 (2400 bps) This is speech compressed using the Linear Predictive Coding (LPC10) scheme. The bit rate is 0.3 bits/sample (compression ratio of 26.6:1).

20

Page 23: Video Compression Techniques

IMAGE COMPRESSING TECHNIQUES – JPEG

JPEG Compression

One of the hottest topics in image compression technology today is JPEG. The acronym JPEG stands for the Joint Photographic Experts Group, a standards committee that had its origins within the International Standard Organization (ISO). In 1982, the ISO formed the Photographic Experts Group (PEG) to research methods of transmitting video, still images, and text over ISDN (Integrated Services Digital Network) lines. PEG's goal was to produce a set of industry standards for the transmission of graphics and image data over digital communications networks.

In 1986, a subgroup of the CCITT began to research methods of compressing color and gray-scale data for facsimile transmission. The compression methods needed for color facsimile systems were very similar to those being researched by PEG. It was therefore agreed that the two groups should combine their resources and work together toward a single standard.

In 1987, the ISO and CCITT combined their two groups into a joint committee that would research and produce a single standard of image data compression for both organizations to use. This new committee was JPEG.

Although the creators of JPEG might have envisioned a multitude of commercial applications for JPEG technology, a consumer public made hungry by the marketing promises of imaging and multimedia technology are benefiting greatly as well. Most previously developed compression methods do a relatively poor job of compressing continuous-tone image data; that is, images containing hundreds or thousands of colors taken from real-world subjects. And very few file formats can support 24-bit raster images.

GIF, for example, can store only images with a maximum pixel depth of eight bits, for a maximum of 256 colors. And its LZW compression algorithm does not work very well on typical scanned image data. The low-level noise commonly found in such data defeats LZW's ability to recognize repeated patterns.

Both TIFF and BMP are capable of storing 24-bit data, but in their pre-JPEG versions are capable of using only encoding schemes (LZW and RLE, respectively) that do not compress this type of image data very well.

JPEG provides a compression method that is capable of compressing continuous-tone image data with a pixel depth of 6 to 24 bits with reasonable speed and efficiency. And although JPEG itself does not define a standard image file format, several have been invented or modified to fill the needs of JPEG data storage.

JPEG in Perspective

Unlike all of the other compression methods described so far in this chapter, JPEG is not a single algorithm. Instead, it may be thought of as a toolkit of image compression methods that may be altered to fit the needs of the user. JPEG may be adjusted to produce very small, compressed images that are of relatively poor quality in appearance but still suitable for many applications. Conversely, JPEG is capable of producing very high-quality compressed images that are still far smaller than the original uncompressed data.

JPEG is also different in that it is primarily a lossy method of compression. Most popular image format compression schemes, such as RLE, LZW, or the CCITT standards, are lossless compression methods. That is, they do not discard any data during the encoding process. An image compressed using a lossless method is guaranteed to be identical to the original image when uncompressed.

21

Lossy schemes, on the other hand, throw useless data away during encoding. This is, in fact, how lossy schemes manage to obtain superior compression ratios over most lossless schemes. JPEG was designed specifically to discard information that the human eye cannot easily see. Slight changes in color are not perceived well by the human eye, while slight changes in intensity (light and dark) are. Therefore JPEG's lossy encoding tends to be more frugal with the gray-scale part of an image and to be more frivolous with the color.

Page 24: Video Compression Techniques

JPEG was designed to compress color or gray-scale continuous-tone images of real-world subjects: photographs, video stills, or any complex graphics that resemble natural subjects. Animations, ray tracing, line art, black-and-white documents, and typical vector graphics don't compress very well under JPEG and shouldn't be expected to. And, although JPEG is now used to provide motion video compression, the standard makes no special provision for such an application.

The fact that JPEG is lossy and works only on a select type of image data might make you ask, "Why bother to use it?" It depends upon your needs. JPEG is an excellent way to store 24-bit photographic images, such as those used in imaging and multimedia applications. JPEG 24-bit (16 million color) images are superior in appearance to 8-bit (256 color) images on a VGA display and are at their most spectacular when using 24-bit display hardware (which is now quite inexpensive).

The amount of compression achieved depends upon the content of the image data. A typical photographic-quality image may be compressed from 20:1 to 25:1 without experiencing any noticeable degradation in quality. Higher compression ratios will result in image files that differ noticeably from the original image but still have an overall good image quality. And achieving a 20:1 or better compression ratio in many cases not only saves disk space, but also reduces transmission time across data networks and phone lines.

An end user can "tune" the quality of a JPEG encoder using a parameter sometimes called a quality setting or a Q factor. Although different implementations have varying scales of Q factors, a range of 1 to 100 is typical. A factor of 1 produces the smallest, worst quality images; a factor of 100 produces the largest, best quality images. The optimal Q factor depends on the image content and is therefore different for every image. The art of JPEG compression is finding the lowest Q factor that produces an image that is visibly acceptable, and preferably as close to the original as possible.

The JPEG library supplied by the Independent JPEG Group uses a quality setting scale of 1 to 100. To find the optimal compression for an image using the JPEG library, follow these steps:

1. Encode the image using a quality setting of 75 (-Q 75). 2. If you observe unacceptable defects in the image, increase the value, and re-encode the image. 3. If the image quality is acceptable, decrease the setting until the image quality is barely acceptable. This

will be the optimal quality setting for this image. 4. Repeat this process for every image you have (or just encode them all using a quality setting of 75).

JPEG isn't always an ideal compression solution. There are several reasons:

• As we have said, JPEG doesn't fit every compression need. Images containing large areas of a single color do not compress very well. In fact, JPEG will introduce "artifacts" into such images that are visible against a flat background, making them considerably worse in appearance than if you used a conventional lossless compression method. Images of a "busier" composition contain even worse artifacts, but they are considerably less noticeable against the image's more complex background.

• JPEG can be rather slow when it is implemented only in software. If fast decompression is required, a hardware-based JPEG solution is your best bet, unless you are willing to wait for a faster software-only solution to come along or buy a faster computer.

• JPEG is not trivial to implement. It is not likely you will be able to sit down and write your own JPEG encoder/decoder in a few evenings. We recommend that you obtain a third-party JPEG library, rather than writing your own.

• JPEG is not supported by very many file formats. The formats that do support JPEG are all fairly new and can be expected to be revised at frequent intervals.

Baseline JPEG

22

The JPEG specification defines a minimal subset of the standard called baseline JPEG, which all JPEG-aware applications are required to support. This baseline uses an encoding scheme based on the Discrete Cosine Transform (DCT) to achieve compression. DCT is a generic name for a class of operations identified and published some years ago. DCT-based algorithms have since made their way into various compression methods.

Page 25: Video Compression Techniques

DCT-based encoding algorithms are always lossy by nature. DCT algorithms are capable of achieving a high degree of compression with only minimal loss of data. This scheme is effective only for compressing continuous-tone images in which the differences between adjacent pixels are usually small. In practice, JPEG works well only on images with depths of at least four or five bits per color channel. The baseline standard actually specifies eight bits per input sample. Data of lesser bit depth can be handled by scaling it up to eight bits per sample, but the results will be bad for low-bit-depth source data, because of the large jumps between adjacent pixel values. For similar reasons, colormapped source data does not work very well, especially if the image has been dithered.

The JPEG compression scheme is divided into the following stages:

1. Transform the image into an optimal color space. 2. Downsample chrominance components by averaging groups of pixels together. 3. Apply a Discrete Cosine Transform (DCT) to blocks of pixels, thus removing redundant image data. 4. Quantize each block of DCT coefficients using weighting functions optimized for the human eye. 5. Encode the resulting coefficients (image data) using a Huffman variable word-length algorithm to remove

redundancies in the coefficients.

Figure 9-11 summarizes these steps, and the following subsections look at each of them in turn. Note that JPEG decoding performs the reverse of these steps.

Figure 9-11: JPEG compression and decompression

Transform the image

The JPEG algorithm is capable of encoding images that use any type of color space. JPEG itself encodes each component in a color model separately, and it is completely independent of any color-space model, such as RGB, HSI, or CMY. The best compression ratios result if a luminance/chrominance color space, such as YUV or YCbCr, is used. (See Chapter 2 for a description of these color spaces.)

Most of the visual information to which human eyes are most sensitive is found in the high-frequency, gray-scale, luminance component (Y) of the YCbCr color space. The other two chrominance components (Cb and Cr) contain high-frequency color information to which the human eye is less sensitive. Most of this information can therefore be discarded.

23

In comparison, the RGB, HSI, and CMY color models spread their useful visual image information evenly across each of their three color components, making the selective discarding of information very difficult. All three

Page 26: Video Compression Techniques

color components would need to be encoded at the highest quality, resulting in a poorer compression ratio. Gray-scale images do not have a color space as such and therefore do not require transforming.

Downsample chrominance components

The simplest way of exploiting the eye's lesser sensitivity to chrominance information is simply to use fewer pixels for the chrominance channels. For example, in an image nominally 1000x1000 pixels, we might use a full 1000x1000 luminance pixels but only 500x500 pixels for each chrominance component. In this representation, each chrominance pixel covers the same area as a 2x2 block of luminance pixels. We store a total of six pixel values for each 2x2 block (four luminance values, one each for the two chrominance channels), rather than the twelve values needed if each component is represented at full resolution. Remarkably, this 50 percent reduction in data volume has almost no effect on the perceived quality of most images. Equivalent savings are not possible with conventional color models such as RGB, because in RGB each color channel carries some luminance information and so any loss of resolution is quite visible.

When the uncompressed data is supplied in a conventional format (equal resolution for all channels), a JPEG compressor must reduce the resolution of the chrominance channels by downsampling, or averaging together groups of pixels. The JPEG standard allows several different choices for the sampling ratios, or relative sizes, of the downsampled channels. The luminance channel is always left at full resolution (1:1 sampling). Typically both chrominance channels are downsampled 2:1 horizontally and either 1:1 or 2:1 vertically, meaning that a chrominance pixel covers the same area as either a 2x1 or a 2x2 block of luminance pixels. JPEG refers to these downsampling processes as 2h1v and 2h2v sampling, respectively.

Another notation commonly used is 4:2:2 sampling for 2h1v and 4:2:0 sampling for 2h2v; this notation derives from television customs (color transformation and downsampling have been in use since the beginning of color TV transmission). 2h1v sampling is fairly common because it corresponds to National Television Standards Committee (NTSC) standard TV practice, but it offers less compression than 2h2v sampling, with hardly any gain in perceived quality.

Apply a Discrete Cosine Transform

The image data is divided up into 8x8 blocks of pixels. (From this point on, each color component is processed independently, so a "pixel" means a single value, even in a color image.) A DCT is applied to each 8x8 block. DCT converts the spatial image representation into a frequency map: the low-order or "DC" term represents the average value in the block, while successive higher-order ("AC") terms represent the strength of more and more rapid changes across the width or height of the block. The highest AC term represents the strength of a cosine wave alternating from maximum to minimum at adjacent pixels.

The DCT calculation is fairly complex; in fact, this is the most costly step in JPEG compression. The point of doing it is that we have now separated out the high- and low-frequency information present in the image. We can discard high-frequency data easily without losing low-frequency information. The DCT step itself is lossless except for roundoff errors.

Quantize each block

To discard an appropriate amount of information, the compressor divides each DCT output value by a "quantization coefficient" and rounds the result to an integer. The larger the quantization coefficient, the more data is lost, because the actual DCT value is represented less and less accurately. Each of the 64 positions of the DCT output block has its own quantization coefficient, with the higher-order terms being quantized more heavily than the low-order terms (that is, the higher-order terms have larger quantization coefficients). Furthermore, separate quantization tables are employed for luminance and chrominance data, with the chrominance data being quantized more heavily than the luminance data. This allows JPEG to exploit further the eye's differing sensitivity to luminance and chrominance.

24

It is this step that is controlled by the "quality" setting of most JPEG compressors. The compressor starts from a built-in table that is appropriate for a medium-quality setting and increases or decreases the value of each table

Page 27: Video Compression Techniques

entry in inverse proportion to the requested quality. The complete quantization tables actually used are recorded in the compressed file so that the decompressor will know how to (approximately) reconstruct the DCT coefficients.

Selection of an appropriate quantization table is something of a black art. Most existing compressors start from a sample table developed by the ISO JPEG committee. It is likely that future research will yield better tables that provide more compression for the same perceived image quality. Implementation of improved tables should not cause any compatibility problems, because decompressors merely read the tables from the compressed file; they don't care how the table was picked.

Encode the resulting coefficients

The resulting coefficients contain a significant amount of redundant data. Huffman compression will losslessly remove the redundancies, resulting in smaller JPEG data. An optional extension to the JPEG specification allows arithmetic encoding to be used instead of Huffman for an even greater compression ratio. (See the section called "JPEG Extensions (Part 1)" below.) At this point, the JPEG data stream is ready to be transmitted across a communications channel or encapsulated inside an image file format.

JPEG Extensions (Part 1)

What we have examined thus far is only the baseline specification for JPEG. A number of extensions have been defined in Part 1 of the JPEG specification that provide progressive image buildup, improved compression ratios using arithmetic encoding, and a lossless compression scheme. These features are beyond the needs of most JPEG implementations and have therefore been defined as "not required to be supported" extensions to the JPEG standard.

Progressive image buildup

Progressive image buildup is an extension for use in applications that need to receive JPEG data streams and display them on the fly. A baseline JPEG image can be displayed only after all of the image data has been received and decoded. But some applications require that the image be displayed after only some of the data is received. Using a conventional compression method, this means displaying the first few scan lines of the image as it is decoded. In this case, even if the scan lines were interlaced, you would need at least 50 percent of the image data to get a good clue as to the content of the image. The progressive buildup extension of JPEG offers a better solution.

Progressive buildup allows an image to be sent in layers rather than scan lines. But instead of transmitting each bitplane or color channel in sequence (which wouldn't be very useful), a succession of images built up from approximations of the original image are sent. The first scan provides a low-accuracy representation of the entire image--in effect, a very low-quality JPEG compressed image. Subsequent scans gradually refine the image by increasing the effective quality factor. If the data is displayed on the fly, you would first see a crude, but recognizable, rendering of the whole image. This would appear very quickly because only a small amount of data would need to be transmitted to produce it. Each subsequent scan would improve the displayed image's quality one block at a time.

A limitation of progressive JPEG is that each scan takes essentially a full JPEG decompression cycle to display. Therefore, with typical data transmission rates, a very fast JPEG decoder (probably specialized hardware) would be needed to make effective use of progressive transmission.

25

A related JPEG extension provides for hierarchical storage of the same image at multiple resolutions. For example, an image might be stored at 250x250, 500x500, 1000x1000, and 2000x2000 pixels, so that the same image file could support display on low-resolution screens, medium-resolution laser printers, and high-resolution imagesetters. The higher-resolution images are stored as differences from the lower-resolution ones, so they need less space than they would need if they were stored independently. This is not the same as a progressive series, because each image is available in its own right at the full desired quality.

Page 28: Video Compression Techniques

Arithmetic encoding

The baseline JPEG standard defines Huffman compression as the final step in the encoding process. A JPEG extension replaces the Huffman engine with a binary arithmetic entropy encoder. The use of an arithmetic coder reduces the resulting size of the JPEG data by a further 10 percent to 15 percent over the results that would be achieved by the Huffman coder. With no change in resulting image quality, this gain could be of importance in implementations where enormous quantities of JPEG images are archived.

Arithmetic encoding has several drawbacks:

• Not all JPEG decoders support arithmetic decoding. Baseline JPEG decoders are required to support only the Huffman algorithm.

• The arithmetic algorithm is slower in both encoding and decoding than Huffman. • The arithmetic coder used by JPEG (called a Q-coder) is owned by IBM and AT&T. (Mitsubishi also

holds patents on arithmetic coding.) You must obtain a license from the appropriate vendors if their Q-coders are to be used as the back end of your JPEG implementation.

Lossless JPEG compression

A question that commonly arises is "At what Q factor does JPEG become lossless?" The answer is "never." Baseline JPEG is a lossy method of compression regardless of adjustments you may make in the parameters. In fact, DCT-based encoders are always lossy, because roundoff errors are inevitable in the color conversion and DCT steps. You can suppress deliberate information loss in the downsampling and quantization steps, but you still won't get an exact recreation of the original bits. Further, this minimum-loss setting is a very inefficient way to use lossy JPEG.

The JPEG standard does offer a separate lossless mode. This mode has nothing in common with the regular DCT-based algorithms, and it is currently implemented only in a few commercial applications. JPEG lossless is a form of Predictive Lossless Coding using a 2D Differential Pulse Code Modulation (DPCM) scheme. The basic premise is that the value of a pixel is combined with the values of up to three neighboring pixels to form a predictor value. The predictor value is then subtracted from the original pixel value. When the entire bitmap has been processed, the resulting predictors are compressed using either the Huffman or the binary arithmetic entropy encoding methods described in the JPEG standard.

Lossless JPEG works on images with 2 to 16 bits per pixel, but performs best on images with 6 or more bits per pixel. For such images, the typical compression ratio achieved is 2:1. For image data with fewer bits per pixels, other compression schemes do perform better.

JPEG Extensions (Part 3)

The following JPEG extensions are described in Part 3 of the JPEG specification.

Variable quantization

Variable quantization is an enhancement available to the quantization procedure of DCT-based processes. This enhancement may be used with any of the DCT-based processes defined by JPEG with the exception of the baseline process.

The process of quantization used in JPEG quantizes each of the 64 DCT coefficients using a corresponding value from a quantization table. Quantization values may be redefined prior to the start of a scan but must not be changed once they are within a scan of the compressed data stream.

26

Variable quantization allows the scaling of quantization values within the compressed data stream. At the start of each 8x8 block is a quantizer scale factor used to scale the quantization table values within an image component

Page 29: Video Compression Techniques

and to match these values with the AC coefficients stored in the compressed data. Quantization values may then be located and changed as needed.

Variable quantization allows the characteristics of an image to be changed to control the quality of the output based on a given model. The variable quantizer can constantly adjust during decoding to provide optimal output.

The amount of output data can also be decreased or increased by raising or lowering the quantizer scale factor. The maximum size of the resulting JPEG file or data stream may be imposed by constant adaptive adjustments made by the variable quantizer.

The variable quantization extension also allows JPEG to store image data originally encoded using a variable quantization scheme, such as MPEG. For MPEG data to be accurately transcoded into another format, the other format must support variable quantization to maintain a high compression ratio. This extension allows JPEG to support a data stream originally derived from a variably quantized source, such as an MPEG I-frame.

Selective refinement

Selective refinement is used to select a region of an image for further enhancement. This enhancement improves the resolution and detail of a region of an image. JPEG supports three types of selective refinement: hierarchical, progressive, and component. Each of these refinement processes differs in its application, effectiveness, complexity, and amount of memory required.

• Hierarchical selective refinement is used only in the hierarchical mode of operation. It allows for a region of a frame to be refined by the next differential frame of a hierarchical sequence.

• Progressive selective refinement is used only in the progressive mode and adds refinement. It allows a greater bit resolution of zero and non-zero DCT coefficients in a coded region of a frame.

• Component selective refinement may be used in any mode of operation. It allows a region of a frame to contain fewer colors than are defined in the frame header.

Image tiling

Tiling is used to divide a single image into two or more smaller subimages. Tiling allows easier buffering of the image data in memory, quicker random access of the image data on disk, and the storage of images larger than 64Kx64K samples in size. JPEG supports three types of tiling: simple, pyramidal, and composite.

• Simple tiling divides an image into two or more fixed-size tiles. All simple tiles are coded from left to right and from top to bottom and are contiguous and non-overlapping. All tiles must have the same number of samples and component identifiers and must be encoded using the same processes. Tiles on the bottom and right of the image may be smaller than the designated size of the image dimensions and will therefore not be a multiple of the tile size.

• Pyramidal tiling also divides the image into tiles, but each tile is also tiled using several different levels of resolution. The model of this process is the JPEG Tiled Image Pyramid (JTIP), which is a model of how to create a multi-resolution pyramidal JPEG image.

A JTIP image stores successive layers of the same image at different resolutions. The first image stored at the top of the pyramid is one-sixteenth of the defined screen size and is called a vignette. This image is used for quick displays of image contents, especially for file browsers. The next image occupies one-fourth of the screen and is called an imagette. This image is typically used when two or more images must be displayed at the same time on the screen. The next is a low-resolution, full-screen image, followed by successively higher-resolution images and ending with the original image.

27

Pyramidal tiling typically uses the process of "internal tiling," where each tile is encoded as part of the same JPEG data stream. Tiles may optionally use the process of "external tiling," where each tile is a separately encoded JPEG data stream. External tiling may allow quicker access of image data, easier application of image encryption, and enhanced compatibility with certain JPEG decoders.

Page 30: Video Compression Techniques

• Composite tiling allows multiple-resolution versions of images to be stored and displayed as a mosaic . Composite tiling allows overlapping tiles that may be different sizes and have different scaling factors and compression parameters. Each tile is encoded separately and may be combined with other tiles without resampling.

SPIFF (Still Picture Interchange File Format)

SPIFF is an officially sanctioned JPEG file format that is intended to replace the defacto JFIF (JPEG File Interchange Format) format in use today. SPIFF includes all of the features of JFIF and adds quite a bit more functionality. SPIFF is designed so that properly written JFIF readers will read SPIFF-JPEG files as well.

For more information, see the article about SPIFF.

Other extensions

Other JPEG extensions include the addition of a version marker segment that stores the minimum level of functionality required to decode the JPEG data stream. Multiple version markers may be included to mark areas of the data stream that have differing minimum functionality requirements. The version marker also contains information indicating the processes and extension used to encode the JPEG data stream.

IMAGE FORMATS

There are three major graphics formats on the web: GIF, JPEG, and PNG. Of these, PNG has the spottiest support, so that generally leaves one to chose between GIF or JPEG format. There are many other available formats in which to save image files; it is likely that many of your web site visitors will not be able to view your files.

JPEG

JPEG is a lossy compression technology, so some information is lost when converting a picture to JPEG. Use this format for most photographs because the images will be smaller and look better than a GIF format picture.

GIF

GIF files are better for figures with sharp contrast (such as line drawings, Gantt charts, logos, and buttons). One can also create transparent areas and animations with GIF images. A GIF image has a maximum of 256 colors however, so images with gradations of color will not look very good.

PNG

GIF is a patented file format technology. PNG is an open-source standard that can be used for many of the applications of GIF images. PNG is better than GIF in most respects, providing more possible colors, alpha-channel transparency, and color matching features. The PNG format is not as widely supported as GIF, although it is supported (to differing degrees) on the version 4 and later browsers.

BMP

28

BMP or bitmap files are pictures from the Windows operating system. Using these on a web page can cause problems because they cannot be viewed by most browser. Stay away from using BMP files on the web.

Page 31: Video Compression Techniques

TIFF

TIFF images have great picture quality but also a very large file size. Most browsers cannot display TIFF images. Use TIFF on your machine to save images for printing or editing; do not use TIFFs on the web.

The GIF image format

GIF stands for Graphics Interchange Format . It is probably the most common image format used on the Web. GIFs have the advantage of usually being very small in size, which makes them fast-loading. Unlike JPEGs, GIFs use lossless compression, which means they make the file size small without losing or blurring any of the image itself.

GIFs also support transparency , which means that they can sit on top of a background image on your web page without having ugly rectangles around them.

Another cool thing that GIFs can do is animation. You can make an animated GIF by drawing each frame of the animation in a graphics package that supports the animated GIF format, then export the animation to a single GIF file. When you include this file in your Web page (with the img tag), your animation will be displayed on the page!

The major disadvantage of GIFs is that they only support up to 256 colours (this is known as 8-bit colour and is a type of indexed colour image). This means they're not good for photographs, or any other image that contains lots of different colours.

Making Fast-Loading GIFs

It 's worthwhile making your GIF file sizes as small as possible, so that your Web pages load quickly. People will get very bored otherwise, and probably go to another website!

Most graphics programs let you control various settings when making a GIF image, such as palette size (number of colours in the image) and dithering. Generally, speaking, use the smallest palette size you can. Usually 32 colour palette produce acceptable results, although for low-colour images you can often get away with 16. Images with lots of colours will of course need a bigger palette - say, 128, or even 256 colours.

8-colour GIF (1292 bytes)

64-colour GIF (2940 bytes)

The JPEG Image Format

29

JPEG stands for Joint Photographic Experts Group , a bunch of boffins who invented this format to display full-colour photographic images in a portable format with a small file size. Like GIF images, they are also very

Page 32: Video Compression Techniques

common on the Web. Their main advantage over GIFs is that they can display true-colour images (up to 16 million colours), which makes them much better for images such as photographs and illustrations with large numbers of colours.

The main disadvantage of the JPEG format is that it is lossy . This means that you lose some of the detail of your image when you convert it to JPEG format. Boundaries between blocks of colour may appear more blurry, and areas with lots of detail will lose their sharpness. On the other hand, JPEGs do preserve all of the colour information in the image, which of course is great for high-colour images such as photographs.

JPEGs also can't do transparency or animation - in these cases, you'll have to use the GIF format (or PNG format for transparency).

Making Fast-Loading JPEGs

As with GIFs, it pays to make your JPEGs as small as possible (in terms of bytes), so that your websites load quickly. The main control over file size with JPEGs is called quality , and usually varies from 0 to 100%, where 0% is low quality (but smallest file size), and 100% is highest quality (but largest file size). 0% quality JPEGs usually look noticeably blurred when compared to the original. 100% quality JPEGs are often indistinguishable from the original:

Low-quality JPEG (4089 bytes)

High-quality JPEG (17465 bytes)

The PNG Image Format

30

PNG is a relatively new invention compared to GIF or JPEG, although it 's been around for a while now. (Sadly some browsers such as IE6 still don't support them fully.) It stands for Portable Network Graphics . It was designed to be an alternative to the GIF file format, but without the licensing issues that were involved in the GIF compression method at the time.

Page 33: Video Compression Techniques

There are two types of PNG: PNG-8 format, which holds 8 bits of colour information (comparable to GIF), and PNG-24 format, which holds 24 bits of colour (comparable to JPEG).

PNG-8 often compresses images even better than GIF, resulting in smaller file sizes. On the other hand, PNG-24 is often less effective than JPEGs at compressing true-colour images such as photos, resulting in larger file sizes than the equivalent quality JPEGs. However, unlike JPEG, PNG-24 is lossless, meaning that all of the original image's information is preserved.

PNG also supports transparency like GIF, but can have varying degrees of transparency for each pixel, whereas GIFs can only have transparency turned on or off for each pixel. This means that whereas transparent GIFs often have jagged edges when placed on complex or ill-matching backgrounds, transparent PNGs will have nice smooth edges.

Note that unlike GIF, PNG-8 does not support animation.

One important point about PNG: Earlier browsers don't recognise them. If you want to ensure your website is viewable by early browsers, use GIFs or JPEGs instead.

16-colour PNG-8 (6481 bytes)

Full-colour PNG-24 (34377 bytes) Summary of image formats

This table summarises the key differences between the GIF, JPEG and PNG image formats.

31

GIF JPEG PNG-8 PNG-24

Better for cl ipart and drawn graphics with few colours, or large blocks of colour

Better for photographs with lots of colours or f ine colour detail

Better for cl ipart and drawn graphics with few colours, or large blocks of colour

Better for photographs with lots of colours or f ine colour detail

Can only have up to 256 colours

Can have up to 16 mil l ion colours

Can only have up to 256 colours

Can have up to 16 mil l ion colours

Images are "lossless" - they contain the same amount of information as the original (but with only 256 colours)

Images are "lossy" - they contain less information than the original

Images are "lossless" - they contain the same amount of information as the original (but with only 256 colours)

Images are "lossless" - they contain the same amount of information as the original

Can be animated Cannot be animated Cannot be animated Cannot be animated

Can have transparent areas Cannot have transparent areas Can have transparent areas Can have transparent

areas

Page 34: Video Compression Techniques

Image or Graphic?

Technically, neither. If you really want to be strict, computer pictures are files, the same way WORD documents or solitaire games are files. They're all a bunch of ones and zeros all in a row. But we do have to communicate with one another so let's decide.

Image. We'll use "image". That seems to cover a wide enough topic range.

I went to my reference books and there I found that "graphic" is more of an adjective, as in "graphic format." You see, we denote images on the Internet by their graphic format. GIF is not the name of the image. GIF is the compression factors used to create the raster format set up by CompuServe. (More on that in a moment).

So, they're all images unless you're talking about something specific.

44 Different Graphic Formats?

It does seem like a big number, doesn't it? In reality, there are not 44 different graphic format names. Many of the 44 are different versions under the same compression umbrella, interlaced and non-interlaced GIF, for example.

Before getting into where we get all 44, and there are more than that even, let me back-peddle for a moment.

There actually are only two basic methods for a computer to render, or store and display, an image. When you save an image in a specific format you are creating either a raster or meta/vector graphic format. Here's the lowdown:

Raster

Raster image formats (RIFs) should be the most familiar to Internet users. A Raster format breaks the image into a series of colored dots called pixels. The number of ones and zeros (bits) used to create each pixel denotes the depth of color you can put into your images.

If your pixel is denoted with only one bit-per-pixel then that pixel must be black or white. Why? Because that pixel can only be a one or a zero, on or off, black or white.

Bump that up to 4 bits-per-pixel and you're able to set that colored dot to one of 16 colors. If you go even higher to 8 bits-per-pixel, you can save that colored dot at up to 256 different colors.

Does that number, 256 sound familiar to anyone? That's the upper color level of a GIF image. Sure, you can go with less than 256 colors, but you cannot have over 256.

That's why a GIF image doesn't work overly well for photographs and larger images. There are a whole lot more than 256 colors in the world. Images can carry millions. But if you want smaller icon images, GIFs are the way to go.

Raster image formats can also save at 16, 24, and 32 bits-per-pixel. At the two highest levels, the pixels themselves can carry up to 16,777,216 different colors. The image looks great! Bitmaps saved at 24 bits-per-pixel are great quality images, but of course they also run about a megabyte per picture. There's always a trade-off, isn't there?

The three main Internet formats, GIF, JPEG, and Bitmap, are all Raster formats.

32

Some other Raster formats include the following:

Page 35: Video Compression Techniques

CLP Windows Clipart

DCX ZOFT Paintbrush

DIB OS/2 Warp format

FPX Kodak's FlashPic

IMG GEM Paint format

JIF JPEG Related Image format

MAC MacPaint

MSP MacPaint New Version

PCT Macintosh PICT format

PCX ZSoft Paintbrush

PPM Portable Pixel Map (UNIX)

PSP Paint Shop Pro format

RAW Unencoded image format

RLE Run-Length Encoding (Used to lower image bit rates)

TIFF Aldus Corporation format

WPG WordPerfect image format

Pixels and the Web Since I brought up pixels, I thought now might be a pretty good time to talk about pixels and the Web. How much is too much? How many is too few?

There is a delicate balance between the crispness of a picture and the number of pixels needed to display it. Let's say you have two images, each is 5 inches across and 3 inches down. One uses 300 pixels to span that five inches, the other uses 1500. Obviously, the one with 1500 uses smaller pixels. It is also the one that offers a more crisp, detailed look. The more pixels, the more detailed the image will be. Of course, the more pixels the more bytes the image will take up.

So, how much is enough? That depends on whom you are speaking to, and right now you're speaking to me. I always go with 100 pixels per inch. That creates a ten-thousand pixel square inch. I 've found that allows for a pretty crisp image without going overboard on the bytes. It also allows some leeway to increase or decrease the size of the image and not mess it up too much.

The lowest I 'd go is 72 pixels per inch, the agreed upon low end of the image scale. In terms of pixels per square inch, it 's a whale of a drop to 5184. Try that. See if you like it, but I think you'll find that lower definition monitors really play havoc with the image.

Meta/Vector Image Formats You may not have heard of this type of image formatting, not that you had heard of Raster, either. This formatting falls into a lot of proprietary formats, formats made for specific programs. CorelDraw (CDR), Hewlett-Packard Graphics Language (HGL), and Windows Metafiles (EMF) are a few examples.

33

Where the Meta/Vector formats have it over Raster is that they are more than a simple grid of colored dots. They're actual vectors of data stored in mathematical formats rather than bits of colored dots. This allows for a strange shaping of colors and images that can be perfectly cropped on an arc. A squared-off map of dots cannot produce that arc as well. In addition, since the information is encoded in vectors, Meta/Vector image formats can

Page 36: Video Compression Techniques

be blown up or down (a property known as "scalability") without looking jagged or crowded (a property known as "pixelating"). So that I do not receive e-mail from those in the computer image know, there is a difference in Meta and Vector formats. Vector formats can contain only vector data whereas Meta files, as is implied by the name, can contain multiple formats. This means there can be a lovely Bitmap plopped right in the middle of your Windows Meta file. You'll never know or see the difference but, there it is. I 'm just trying to keep everybody happy. What's A Bitmap? I get that question a lot. Usually it 's followed with "How come it only works on Microsoft Internet Explorer?" The second question's the easiest. Microsoft invented the Bitmap format. It would only make sense they would include it in their browser. Every time you boot up your PC, the majority of the images used in the process and on the desktop are Bitmaps. If you're using an MSIE browser, you can view this first example. The image is St. Sophia in Istanbul. The picture is taken from the city's hippodrome. Against what I said above, Bitmaps will display on all browsers, just not in the familiar <IMG SRC="--"> format we're all used to. I see Bitmaps used mostly as return images from PERL Common Gateway Interfaces (CGIs). A counter is a perfect example. Page counters that have that "odometer" effect ( ) are Bitmap images created by the server, rather than as an inline image. Bitmaps are perfect for this process because they're a simple series of colored dots. There's nothing fancy to building them. It 's actually a fairly simple process. In the script that runs the counter, you "build" each number for the counter to display. Note the counter is black and white. That's only a one bit-per-pixel level image. To create the number zero in the counter above, you would build a grid 7 pixels wide by 10 pixels high. The pixels you want to remain black, you would denote as zero. Those you wanted white, you'd denote as one. Here's what it looks like:

0 0 0 0 0 0 0

0 0 1 1 1 0 0

0 1 1 1 1 1 0

0 1 1 0 1 1 0

0 1 1 0 1 1 0

0 1 1 0 1 1 0

0 1 1 0 1 1 0

0 1 1 1 1 1 0

0 0 1 1 1 0 0

0 0 0 0 0 0 0

See the number zero in the graph above? I made it red so it would stand out a bit more. You create one of those patterns for the numbers 0 through 9. The PERL script then returns the Bitmap image representing the numbers and you get that neat little odometer effect. That's the concept of a Bitmap. A grid of colored points. The more bits per pixel, the more fancy the Bitmap can be. Bitmaps are good images, but they're not great. If you've played with Bitmaps versus any other image formats, you might have noticed that the Bitmap format creates images that are a little heavy on the bytes. The reason is that the Bitmap format is not very efficient at storing data. What you see is pretty much what you get, one series of bits stacked on top of another. Compression I said above that a Bitmap was a simple series of pixels all stacked up. But the same image saved in GIF or JPEG format uses less bytes to make up the file. How? Compression. "Compression" is a computer term that represents a variety of mathematical formats used to compress an image's byte size. Let's say you have an image where the upper right-hand corner has four pixels all the same color. Why not find a way to make those four pixels into one? That would cut down the number of bytes by three-fourths, at least in the one corner. That's a compression factor.

34

Bitmaps can be compressed to a point. The process is called "run-length encoding." Runs of pixels that are all the same color are all combined into one pixel. The longer the run of pixels, the more compression. Bitmaps with little detail or color variance will really compress. Those with a great deal of detail don't offer much in the way

Page 37: Video Compression Techniques

of compression. Bitmaps that use the run-length encoding can carry either the common ".bmp" extension or ".rle". Another difference between the two files is that the common Bitmap can accept 16 million different colors per pixel. Saving the same image in run-length encoding knocks the bits-per-pixel down to 8. That locks the level of color in at no more than 256. That's even more compression of bytes to boot. Here's the same image of St. Sophia in common Bitmap and the run-length encoding format. Can you see a difference? If case you're wondering, the image was saved in Windows version run-length encoding (there's also a CompuServe version) at 256 colors. It produced quite a drop in bytes, don't you think? And to be honest -- I really don't see a whole lot of difference. So, why not create a single pixel when all of the colors are close? You could even lower the number of colors available so that you would have a better chance of the pixels being close in color. Good idea. The people at CompuServe felt the same way. The GIF Image Formats So, why wasn't the Bitmap chosen as the King of all Internet Images? Because Bill Gates hadn't yet gotten into the fold when the earliest browsers started running inline images. I don't mean to be flippant either; I truly believe that. GIF, which stands for "Graphic Interchange Format," was first standardized in 1987 by CompuServe, although the patent for the algorithm (mathematical formula) used to create GIF compression actually belongs to Unisys. The first format of GIF used on the Web was called GIF87a, representing its year and version. It saved images at 8 pits-per-pixel, capping the color level at 256. That 8-bit level allowed the image to work across multiple server styles, including CompuServe, TCP/IP, and AOL. It was a graphic for all seasons, so to speak. CompuServe updated the GIF format in 1989 to include animation, transparency, and interlacing. They called the new format, you guessed it: GIF89a. There's no discernable difference between a basic (known as non-interlaced) GIF in 87 and 89 formats. See for yourself. The image is of me and another gentleman playing a Turkish Sitar. Even the bytes are the same. It 's the transparency, animation, and non-interlacing additions to GIF89a that really set it apart. Let's look at each one. Animation I remember when animation really came into the mainstream of Web page development. I was deluged with e-mail asking how to do it. There's been a tutorial up for a while now at http://www.htmlgoodies.com/tutors/animate.html. Stop by and see it for instruction on how to create the animations yourself. Here, we're going to quickly discuss the concepts of how it all works. What you are seeing in that example are 12 different images, each set one "hour" farther ahead than the one before it. Animate them all in a row and you get that stopwatch effect. The concept of GIF89a animation is much the same as a picture book with small animation cells in each corner. Flip the pages and the images appear to move. Here, you have the ability to set the cell 's (technically called an "animation frame") movement speed in 1/100ths of a second. An internal clock embedded right into the GIF keeps count and flips the image when the time comes. The animation process has been bettered along the way by companies who have found their own method of compressing the GIFs further. As you watch an animation you might notice that very little changes from frame to frame. So, why put up a whole new GIF image if only a small section of the frame needs to be changed? That's the key to some of the newer compression factors in GIF animation. Less changing means fewer bytes. Transparency Again, if you'd like a how-to, I have one you for you at http://www.htmlgoodies.com/tutors/transpar.html. A transparent GIF is fun but limited in that only one color of the 256-shade palette can be made transparent.

35

As you can see, the bytes came out the same after the image was put through the transparency filter. The process is best described as similar to the weather forecaster on your local news. Each night they stand in front of a big green (sometimes blue) screen and deliver the weather while that blue or green behind them is "keyed" out and replaced by another source. In the case of the weather forecaster, it 's usually a large map with lots of Ls and Hs.

Page 38: Video Compression Techniques

The process in television is called a "chroma key." A computer is told to hone in on a specific color, let 's say it 's green. Chroma key screens are usually green because it 's the color least likely to be found in human skin tones. You don't want to use a blue screen and then chroma out someone's pretty blue eyes. That chroma (color) is then "erased" and replaced by another image. Think of that in terms of a transparent GIF. There are only 256 colors available in the GIF. The computer is told to hone in on one of them. It 's done by choosing a particular red/green/blue shade already found in the image and blanking it out. The color is basically dropped from the palette that makes up the image. Thus whatever is behind it shows through. The shape is still there though. Try this: Get an image with a transparent background and alter its height and width in your HTML code. You'll see what should be the transparent color seeping through. Any color that's found in the GIF can be made transparent, not just the color in the background. If the background of the image is speckled then the transparency is going to be speckled. If you cut out the color blue in the background, and that color also appears in the middle of the image, it too will be made transparent. When I put together a transparent image, I make the image first, then copy and paste it onto a slightly larger square. That square is the most hideous green I can mix up. I 'm sure it doesn't appear in the image. That way only the background around the image will become clear. Interlaced vs. Non-Interlaced GIF The GIF images of me playing the Turkish Sitar were non-interlaced format images. This is what is meant when someone refers to a "normal" GIF or just "GIF". When you do NOT interlace an image, you fill it in from the top to the bottom, one line after another. The following image is of two men coming onto a boat we used to cross from the European to the Asian side of Turkey. The flowers they are carrying were sold in the manner of roses we might buy our wife here in the U.S. I bought one. Hopefully, you're on a slower connection computer so you got the full effect of waiting for the image to come in. It can be torture sometimes. That's where the brilliant Interlaced GIF89a idea came from. Interlacing is the concept of filling in every other line of data, then going back to the top and doing it all again, filling in the lines you skipped. Your television works that way. The effect on a computer monitor is that the graphic appears blurry at first and then sharpens up as the other lines fill in. That allows your viewer to at least get an idea of what's coming up rather than waiting for the entire image, line by line. The example image below is of a spice shop in the Grand Covered Bazaar, Istanbul. Both interlaced and non-interlaced GIFs get you to the same destination. They just do it differently. It 's up to you which you feel is better. JPEG Image Formats JPEG is a compression algorithm developed by the people the format is named after, the Joint Photographic Experts Group. JPEG's big selling point is that its compression factor stores the image on the hard drive in less bytes than the image is when it actually displays. The Web took to the format straightaway because not only did the image store in fewer bytes, it transferred in fewer bytes. As the Internet adage goes, the pipeline isn't getting any bigger so we need to make what is traveling through it smaller. For a long while, GIF ruled the Internet roost. I was one of the people who didn't really like this new JPEG format when it came out. It was less grainy than GIF, but it also caused computers without a decent amount of memory to crash the browser. (JPEGs have to be "blown up" to their full size. That takes some memory.) There was a time when people only had 8 or 4 megs or memory in their boxes. Really. It was way back in the Dark Ages. JPEGs are "lossy." That's a term that means you trade-off detail in the displayed picture for a smaller storage file. I always save my JPEGs at 50% or medium compression. Here's a look at the same image saved in normal, or what's called "sequential" encoding. That's a top-to-bottom, single-line, equal to the GIF89a non-interlaced format. The image is of an open air market in Basra. The smell was amazing. If you like olives, go to Turkey. Cucumbers, too, believe it or not. The difference between the 1% and 50% compression is not too bad, but the drop in bytes is impressive. The numbers I am showing are storage numbers, the amount of hard drive space the image takes up.

36

You've probably already surmised that 50% compression means that 50% of the image is included in the algorithm. If you don't put a 50% compressed image next to an exact duplicate image at 1% compression, it looks pretty good. But what about that 99% compression image? It looks horrible, but it 's great for teaching. Look at it again. See how it appears to be made of blocks? That's what's meant by lossy. Bytes are lost at the expense of

Page 39: Video Compression Techniques

detail. You can see where the compression algorithm found groups of pixels that all appeared to be close in color and just grouped them all together as one. You might be hard pressed to figure out what the image was actually showing if I didn't tell you. Progressive JPEGs You can almost guess what this is all about. A progressive JPEG works a lot like the interlaced GIF89a by filling in every other line, then returning to the top of the image to fill in the remainder. The example is again presented three times at 1%, 50%, and 99% compression. The image is of the port at Istanbul from our hotel rooftop.

Obviously, here's where bumping up the compression does not pay off. Rule of thumb: If you're going to use progressive JPEG, keep the compression up high, 75% or better.

JPEG (Joint Photographic Experts Group)

JPEG is a standardised image compression mechanism. JPEG is designed for compressing either full-colour (24 bit) or grey-scale digital images of "natural" (real-world) scenes.

It works well on photographs, naturalistic artwork, and similar material; not so well on lettering, simple cartoons, or black-and-white line drawings (files come out very large). JPEG handles only still images, but there is a related standard called MPEG for motion pictures.

JPEG is "lossy", meaning that the image you get out of decompression isn't quite identical to what you originally put in. The algorithm achieves much of its compression by exploiting known limitation of the human eye, notably the fact that small colour details aren't perceived as well as small details of light-and-dark. Thus, JPEG is intended for compressing images that will be looked at by humans.

A lot of people are scared off by the term "lossy compression". But when it comes to representing real-world scenes, no digital image format can retain all the information that impinges on your eyeball. By comparison with the real-world scene, JPEG loses far less information than GIF.

Quality v Compression

A useful property of JPEG is that the degree of lossiness can be varied by adjusting compression parameters. This means that the image maker can trade off file size against output image quality.

For good-quality, full-color source images, the default quality setting (Q 75) is very often the best choice. Try Q 75 first; if you see defects, then go up.

Except for experimental purposes, never go above about Q 95; using Q 100 will produce a file two or three times as large as Q 95, but of hardly any better quality. If you see a file made with Q 100, it 's a pretty sure sign that the maker didn't know what he/she was doing.

37

If you want a very small file (say for preview or indexing purposes) and are prepared to tolerate large defects, a Q setting in the range of 5 to 10 is about right. Q 2 or so may be amusing as "op art".

Page 40: Video Compression Techniques

GIF (Graphics Interchange Format)

The Graphics Interchange Format was developed in 1987 at the request of Compuserve, who needed a platform independent image format that was suitable for transfer across slow connections. It is a compressed (lossless) format (it uses the LZW compression) and compresses at a ratio of between 3:1 and 5:1

It is an 8 bit format which means the maximum number of colours supported by the format is 256.

There are two GIF standards, 87a and 89a (developed in 1987 and 1989 respectively). The 89a standard has additional features such as improved interlacing, the ability to define one colour to be transparent and the ability to store multiple images in one file to create a basic form of animation.

Both Mosaic and Netscape will display 87a and 89a GIFs, but while both support transparency and interlacing, only Netscape supports animated GIFs.

PNG (Portable Network Graphics format)

In January 1995 Unisys, the company Compuserve contracted to create the GIF format, announced that they would be enforcing the patent on the LZW compression technique the GIF format uses. This means that commercial developers that include the GIF encoding or decoding algorithms have to pay a license fee to Compuserve. This does not concern users of GIFs or non-commercial developers.

However, a number of people banded together and created a completely patent-free graphics format called PNG (pronounced "ping"), the Portable Network Graphics format. PNG is superior to GIF in that it has better compression and supports millions of colours. PNG files end in a .png suffix.

PNG is supported in Netscape 4.03 and above. For more information, try the PNG home page.

When should I use JPEG, and when should I stick with GIF?

JPEG is not going to displace GIF entirely. For some types of images, GIF is superior in image quality, file size, or both. One of the first things to learn about JPEG is which kinds of images to apply it to.

Generally speaking, JPEG is superior to GIF for storing full-color or grey-scale images of "realistic" scenes; that means scanned photographs and similar material. Any continuous variation in color, such as occurs in highlighted or shaded areas, will be represented more faithfully and in less space by JPEG than by GIF.

GIF does significantly better on images with only a few distinct colors, such as line drawings and simple cartoons. Not only is GIF lossless for such images, but it often compresses them more than JPEG can. For example, large areas of pixels that are all exactly the same color are compressed very efficiently indeed by GIF. JPEG can't squeeze such data as much as GIF does without introducing visible defects. (One implication of this is that large single-color borders are quite cheap in GIF files, while they are best avoided in JPEG files.)

38

Computer-drawn images (ray-traced scenes, for instance) usually fall between photographs and cartoons in terms of complexity. The more complex and subtly rendered the image, the more

Page 41: Video Compression Techniques

likely that JPEG will do well on it. The same goes for semi-realistic artwork (fantasy drawings and such).

JPEG has a hard time with very sharp edges: a row of pure-black pixels adjacent to a row of pure-white pixels, for example. Sharp edges tend to come out blurred unless you use a very high quality setting. Edges this sharp are rare in scanned photographs, but are fairly common in GIF files: borders, overlaid text, etc. The blurriness is particularly objectionable with text that's only a few pixels high. If you have a GIF with a lot of small-size overlaid text, don't JPEG it.

Plain black-and-white (two level) images should never be converted to JPEG; they violate all of the conditions given above. You need at least about 16 grey levels before JPEG is useful for grey-scale images. It should also be noted that GIF is lossless for grey-scale images of up to 256 levels, while JPEG is not.

39

Page 42: Video Compression Techniques

www.sigmatrainers.comwww.sigmatrainers.com

SIGMA TRAINERS

Contact Person - D R LUHAR - M.Tech- Ex Professor

E-103, Jai Ambe Nagar,

Near Drive-in Cinema, Thaltej, AHMEDABAD-380 054. INDIA

Phone 079-26852427 Fax 079-26840290 Mobile 0-9824001168 E-mail [email protected] [email protected] Website www.sigmatrainers.com

Basement, Hindola Complex, Lad-Society Road, Near Vastrapur Lake, Vastrapur, AHMEDABAD-380 015. INDIA

Phone 079-26850829 E-mail [email protected] Website www.sigmatrg.com

SIGMA TRAINING INSTITUTE

DEALER