Understanding JPEG and Applications · and C, and turn it back to A, B, C, and so on, “interleaved”. Figure 3: Component-interleave and table-switching control [14] Figure 4 shows

Understanding JPEG and Applications Department of Computer Science and Engineering Chakchai So-In [email protected] December, 8 2004 Introduction

In [8], from the Network Traffic Analysis testbed, the data collection from the NASA Ames Internet exchange (AIX) in Mountain View, CA [AIX] as part of an NSF/NASA collaborative effort with NLANR/MOAT has been used to monitor the Internet Traffic. It was collected from one of four OC-3 ATM links that interconnect AIX and MAE-West in San Jose, CA. As a result of this collection, Table1, 2, and 3 shows that most of the Internet Traffic is TCP (91%), especially HTTP services (57%). Moreover, according to [6], the research characterizes the Web Traffic from various ISPs; Waterloo, Calgary, Saskatchewan, NASA, Clark Net, and NCSA. Most of HTTP packet types that consume the Internet capacity are image and video (18%, 61%, 40%, 77%, 79%, and 42% respectively), thus, it is very beneficial to find out how to reduce the number of this traffic.

This report describes the efficient image coding technique that is generally used to represent the coding scheme of digital image, “JPEG”. In the first part, JPEG Model and characteristic are described in both lossy and lossless modes. DCT, Quantization, and Encoding step are expressed further briefly (lossy), and Predictive coding (DCPM) in case of lossless. To be better understanding about the JPEG coding process, the example of JPEG baseline and JPEG lossless are illustrated. Next, the compression performance of different coding techniques is explained. Lastly, this report talks about the JPEG/MJPEG applications.

Figure 1: A diagram showing the location of the optical splitter used to collect the data. Note that there are currently five links between NASA-Ames and MAE-West [8]

Total IP Bytes 193,692,014,407 Total IP Packets 451,971,619 Total Duration of Traces 24,345 sec IP packets with DF set 329,241,464

Fragments of IP Datagrams 1,023,206 Fragments of UDP Datagrams 597,228 Fragments of TCP Datagrams 9,702 IP Packets with options 3,213 Non-IP packets 462,085

Table 1: Aggregate totals for all traces collected in February, 2000 [8]

From Table 1, this is some sample data from the month of February, 2000 and the summary information presented below (Table 3) is for all the traces combined and in Table 2, it presents the top 10 IP layer protocols seen in these traces that TCP and UDP typically account for almost all of the traffic.

Protocol Number Packets Bytes Average Size TCP 6 374801201 176706563104 471 UDP 17 62456731 9842511709 157 GRE 47 7566415 5272240819 696 ICMP 1 5938044 1350011401 227 ESP 50 517353 197216792 381 IP in IP 4 265103 179257606 676 AH 51 74423 43454671 583 IPIP 94 143502 41707350 290 SKIP 57 117050 41633952 355 IGMP 2 68404 7729038 112 Table 2: Top 10 protocols seen during February, 2000 [8]

Protocol Source Destination Packets Bytes Average Size HTTP 80 0 140780543 100044030753 710 0 0 45319842 17319763013 382 NNTP 0 119 17895481 15992967942 893 HTTP 0 80 94578965 7844163850 82 FTP Data 20 0 6728097 6689611587 994 SMTP 0 25 8878925 6071084052 683 NNTP 119 0 8857217 5399672480 609 HTTP/ Proxy 8080 0 2331669 2327032104 998 Napster 0 6699 3331109 1838804438 552 HTTPS 443 0 3035809 1535037132 505 Napster 6699 0 3377828 1528188686 452 FTP Data 0 20 5498097 1262294037 229 Napster 6688 0 1230335 935883810 760 1755 0 1182358 908626624 768 POP3 110 0 1820887 798255125 438 Hotline 5501 0 685536 787122008 1148 RTSP 554 0 754508 616087123 816 Napster 0 6688 1149382 348845629 303 SMTP 25 0 6438672 339788020 52 RealAudio 7070 0 445551 298598442 670 Shoutcast 8000 0 353545 296161291 837 Web Cache 3128 0 325447 280739543 862 HTTPS 0 443 2635048 280418901 106 NetBIOS SSN 139 0 312658 264965212 847 2189 0 294654 174140223 590 Table 3: Top 25 TCP application categories seen during February, 2000 [8]

Table 4: Breakdown of Document Types and Sizes for All Data Sets [6] JPEG Still Image Data Compression Standard JPEG stands for “Joint Photographic Experts Group” and it is the standardized image compression mechanism for both lossy and lossless. It is designed to compress both full-color (24 bits color) and Gray-Scale (8 bits) images. For color mode, it works quite well for photographs, naturalistic, artwork, and others. JPEG itself does not perform well for lettering, cartoons, and line drawings compression. [15] Naturally, JPEG is "lossy" that means the reconstruction image is not quite the same as the original image (the compression ratio is up to 20 or 50), but JPEG itself also supports the lossless compression which means there is no difference between both images (the compression ratio may be up to 3). JPEG is designed to exploit known limitations of the human eye. For instance, the small color changes are perceived less accurately than small changes in brightness. Thus, JPEG is intended for compressing images that will be looked at by humans. [15] A useful property of JPEG is that the degree of “lossiness” that can be varied by adjusting compression parameters (mostly the Quantization step values). Thus, the size could be traded of with the quality of images depending on what the human perception satisfaction is. For example, if we do not care about the quality of image at all, the compression ratio could be at 200 times for “lena” image (Figure 10).

1. Color Space

There are two basic ways to produce the color image; namely, additive color and subtractive color. Additive color is used with active light-emitting systems in which the light from sources of different colors is added together to produce colors, but subtractive color is used in passive systems which use the absorb system at different wavelengths. For example, CRT, the light (electron beam) is emitted by three primary colors; Red, Green, and Blue (RGB). When three components are beamed together, the white light will be produced; otherwise, a black light is produced from the absence of these colors. Mostly, subtractive color is used for printing industry. Cyan (blue-green), Magenta, and Yellow (CMY) are used to absorb the ranges of light wavelength. If no ink on the paper, the white light is represented, and the sum of these colors will perform the black light. [10]

There are many representations of colors and normally, RGB metric is used to

represent the universal pictures. Another color scheme is used to describe color perception; namely, Brightness, Hue, and Saturation. Brightness (Luminance) describes the intensity of light (white, gray, or black) Hue describes the corresponded colors (Red, Green, Yellow, Blue, and so on) Finally, Saturation describes how vivid the color is (very strong, pastel, or nearly white) Thus, to be suitable for digital image compression (to achieve higher compression ratio), JPEG (digital image) also uses this color spaces that is one component (luminance) and the other two (hue and saturation) are called luminance-chrominance representations. This luminance provides a Gray-Scale (monochrome) version, and chrominance provides extra information that converts the Gray-Scale to color images. Consequently, “YCbCr” model (Y = Luminance and Cb/Cr = Chrominance) is represented. For instance, Cb (blue to yellow) and Cr (red to blue-green) components can be sub-sampling (down sampling) for at most half before going to the JPEG compression process further while human eye can not recognize the difference for the reason that eye can not follow as easily quick spatial changes in chrominance as changes in luminance. Hence, the number of sampling in chrominance usually is smaller than the luminance sampling. The interchange color space equation is shown below.

Y = 0.299R+0.587G+0.114B Cb = -0.169R-0.331G+0.5B Cr = 0.5R-0.419G-0.081B

2. JPEG Model

According to the CCITT recommendation standard [14], there are four JPEG models that can be represented in JPEG encoding-decoding process; however, there is a JPEG baseline which is a requirement fo any hardware and software with this JPEG scheme. Baseline process (required for all DCT-Based decoders)

o DCT-based process o Source image: 8-bit samples within each component o Sequential o Huffman coding: 2AC and 2DC Tables o Decoder shall process scans with 1,2,3, and 4 components o Interleaved and non-interleaved scans.

Extended DCT-Based processes o DCT-based process o Source image: 8-bit or 12 bits samples within each component o Sequential or Progressive o Huffman or Arithmetic coding: 4AC and 4DC Tables

o Decoder shall process scans with 1,2,3, and 4 components o Interleaved and non-interleaved scans.

Lossless Encoding o Predictive process (Not DCT-based: DCPM) o Source image: P-bit samples (2<=P<=16) o Sequential o Huffman coding: 4DC Tables o Decoder shall process scans with 1,2,3, and 4 components o Interleaved and non-interleaved scans.

Hierarchical Encoding o Multiple frames (non-differential and differential) (Multi Resolution Requirement) o Uses extended DCT-based or lossless process o Source image: 8-bit samples within each component o Decoder shall process scans with 1,2,3, and 4 components o Interleaved and non-interleaved scans.

Figure 2: Progressive versus sequential presentation [14] 3. Multiple-Component Control

This procedure controls the ordering of which data image from multiple components will be processed to create compression image, and also this process is used to ensure that the proper set of table data is applied to the proper “data units” in the image. (A data unit is a sample for lossless processes, and an 8 * 8 blocks of samples for DCT-based processes. [14] Interleaving multiple components

Figure 3 shows an example of the encoding process that is selected from multiple source components. These source images in this example are composed of three components, A, B and C, with two sets of table specifications. Normally, in sequential mode, the encoding process will encode the component A at first, then B, and finally C in order, “non-interleaved”, but if the encoding process encodes some sample units from A, then go to B, and C, and turn it back to A, B, C, and so on, “interleaved”.

Figure 3: Component-interleave and table-switching control [14]

Figure 4 shows a case of all three image components which are identical in dimensions with interleaving and non-interleaving mode. A1…An are completely scanned and then B1…Bn and C1…Cn (non-interleave), but because the sampling factor is 1:1:1, the scan order should be A1, B1, C1, A2, B2, C2 and so on (interleave). In case of that the dimensions are different (Figure 5); the selected components will be varied according to the sample requirement. In the example, two of the components are scanned from A and the half of number of horizontal samples is scanned from B and C. [14]

Figure 4: Data coding Scan (Interleave and Non-Interleave Mode) [14]

Figure 5: Interleaved order for components with different dimensions [14]

4. JPEG Coding (Lossy)

Figure 6: JPEG Lossy (DCT-based) Encoder Figure 6 shows the JPEG lossy encoding process which composes of four sub processes. After the sub sampling and color translation processes are applied to the source image, it will be divided by 8*8 blocks pixel each to prepare the data sample for DCT process. IDCT-Based coding technique (transferring form spatial domain to frequency domain) is used to encode then. Next, Quantization process, the main process for JPEG lossy, will reduce the pixel value near zeros, especially in high frequency coefficient. Finally, the Entropy Coding (the differential coding is applied for DC coefficient before going to this step) is used to encode in order to get the bit stream (Huffman with RunLength Coding). Additionally, some marker and header will be added; for example, the header information to indicate the JPEG model, the Entropy Coding (Huffman/Arithmetic), and Quantization Table values. On the other hand, the JPEG lossy decoding technique will be in the inverse direction. 4.1 DCT-Based Coding

DCT is the basic computation in “Discrete Cosine Transform”. (one class of mathematical operations that includes Fast Fourier Transform (FFT)) that is used to transform the signal graphical image from spatial domain to frequency domain) An amplitude of signal is represented by the pixel value (0-255 for Gray-Scale). This transformation function is the reversible (FDCT<->IDCT); therefore, in ideal DCT, this process is the lossless. By 8*8 blocks DCT array, each block from pixel array is transformed by 2-D DCT standard method, and use the Forward DCT to convert samples to DCT coefficients (lossless). The 2-D DCT mathematic calculation is as follows;

Figure 7: FDCT and IDCT [14]

4.2 Quantization The DCT process is usually the lossless transformation to prepare the data to be in the

lossy process, Quantization. Because the output value from DCT process is in the range between -1,024 and 1,023 which is occupied up to 11 bits. Quantization process is the main part for JPEG lossy compression to reduce the number of bits needed to be stored. (The output pixel value after this step will not be the same as the original one)

In this step, most higher-frequency components will be rounded down to zero, and

RunLength coding (VLC) is used to encode these values; however, there are still some high-frequency coefficients that are very large in number. They are encoded as non-zero values. The mathematic formula is expressed as follows: (Whenever the large Qvu is used, the more compression ratio would be achieved, but it will generate the large errors in IDCT as well; however, the errors from the high-frequency component do not impact severely on the quality of image) [8]

Sqvu = round (Svu / Qvu)

*Sqvu is the quantized DCT coefficient, normalized by the quantizer step size *Svu (coefficient) is the value of the corresponding element Qvu from the quantization Table. 16 11 10 16 24 40 51 61 12 12 14 19 26 58 60 55 14 13 16 24 40 57 69 56 14 17 22 29 51 87 80 62 18 22 37 56 68 109 103 77 24 36 55 64 81 104 113 92 49 64 78 87 103 121 120 101 72 92 95 98 112 100 103 99 17 18 24 47 99 99 99 99 18 21 26 66 99 99 99 99 24 26 56 99 99 99 99 99 47 66 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 Table 5: Quantization Table Values of luminance and chrominance [14]

Figure 8: Zig-zag Structure [16]

Next step to encode JPEG image is to apply zigzag model. The purpose for this step is to increase the length of runs of zero (Figure 8).

Figure 9: “sena” image (original image:left) (JPEG image:right) 4.3 Entropy Coding (Huffman / Arithmetic Coding) : DC and AC Coding

For JPEG Baseline, Huffman Coding is only method to implement because it is both

computationally simpler and not too complicated to implement. (Arithmetic Coding could provide systematically higher performance around 10%, but it is very complicated and consumes the computational energy)

Figure 10: Histogram Pixel DC Value of sena image (a) Original image

(b) After the different DC process

-60

-40

-20

0

20

40

60

80

0 200 400 600 800 1000 1200

-60-50-40-30-20-10

010203040

0 200 400 600 800 1000 1200

DC coefficient (the 0,0 position-upper left hand corner) is the average of the overall magnitude of the input matrix, and it is encoded by the differential encoding technique because the adjacent blocks in an image exhibit a high degree of correlation. Thus, the different DC coding from the previous DC element produces a very small number. (DC coefficient is almost an order of magnitude greater than any of the other values, and as the elements move farther and farther from the DC coefficient, they tend to become lower and lower in magnitude). For example, the “sena” image histogram pixel value is represented by Figure 10 (a) but if different DC is applied, the diff-histogram of this image will be depicted on Figure 10 (b). As a result of this process, there are a lot of zeros, and it will be much easier to take this result to encode with RunLength coding. After taking the DC different, DC and AC coefficient are ready to be encoded in Huffman with RunLength (VLC) coding process. (The nature of coding gives a count of consecutive zero values; thus, the RunLength is used to encode this pattern with Huffman Coding) Apart from Huffman coding for image pixels, three are two special Huffman code symbols; namely, End of Block, “EOB” (Each 8*8 block pixel “1010” will be added) and Zero Run Length, “ZRL” which is used to encode in case of the number of zero values more than 16s in order to reduce the number of encoding bits. Then, Variable-Length integer (VLI) coding scheme will be applied for encoding the number of bits (size) and amplitude because most of the DCT outputs should be many smaller numbers, and with this method, the smaller bits will be represented these numbers to get higher compression ratio. (How to encode is shown in the Baseline Coding Example section) Category/

SSSS/ Bit Count

Code Word Luminance DC

Code Word Chrominance DC

Value Range/ Amplitudes

0 1 2 3 4 5 6 7 8 9 10 11

00 010 011 100 101 110 1110 11110 111110 1111110 11111110 111111110

00 01 10 110 1110 11110 111110 1111110 11111110 111111110 1111111110 11111111110

0 -1,1

-3,-2,2,3 -7,-6..-4,4..7 -15..-8,8..15

-31..-16,16..31 -63..-32,32..63

-127..-64,64..127 -255..-128,128..255 -511..-256,256..511

-1023..-512,512..1023 -2047..-1024,1024..2047

Table 6: Huffman coding Value for luminance and chrominance (VLI) [16] R/Z 0 1 ….. 15 0: 1: 2: 3: 4: 5: . . .

1010(EOB) 00 01 100 1011 11010 . . .

1100 11011 1111001 111110110 11111110110 . . .

.. .. .. .. .. .. .. ..

11111111001(ZRL)1111111111110101 1111111111110110 1111111111110111 1111111111111000 1111111111111001 . . .

Table 7: Example Table for obtaining the Huffman code for a given label value and run Length (VLC) [12]

4.4 Based Line Coding Example

In our example, “sena” image (8 bits Gray-Scale) is used to encode and decode. First, the input pixel value is represented in Table 8. (In this case, to be more specific, only Gray-Scale picture is only an example; with 8 bits in each pixel (0-255); however, for color image, each component will be encoded separately. Table 9 shows the data value after applying DCT process and the result values after Quantization step (each quantize value from Table 5) are expressed in Table 10. The stream of output value (8*8) will be shown below:

-4 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Although it seems that there are a lot of runs of zero, to increase more runs of zeros, Zigzag Model is applied before going to Entropy Encoding process. Next, from Table 6 and 7, Huffman Coding with RunLength process is used to encode to obtain bit stream. Size and magnitude use VLI coding to encode these values. (Usually, JPEG will add some marker and header before writing into JPEG binary file) 10001111111110011010 (20 bits instead of 8*64=512bits so the compression ratio is about 25 times)

118 115 115 116 116 116 118 117 117 116 116 118 117 116 117 117 118 117 117 118 119 119 118 117 118 118 118 119 119 119 119 118 120 120 122 122 122 121 120 119 122 122 124 123 121 120 119 121 122 123 123 121 122 124 125 122 121 125 125 123 123 125 124 124

Table 8: Example of an 8*8 block from “sena” image

-66 -1 -2 0 -1 1 0 0 -21 -1 1 2 4 2 -1 0 1 -2 3 0 -1 0 -1 0 0 2 1 -1 1 1 1 1 0 -1 0 0 -1 0 0 0 -1 -2 2 2 -1 1 -1 0 0 1 0 -1 1 -1 1 0 1 -1 -1 2 -1 0 0 1

Table 9: Example of a DCT coefficients corresponding to the block of data from “sena” image

-4 0 0 0 0 0 0 0 -2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Table 10: Example of the quantizer labels obtained by using the quantization Table on the coefficients

4.5 JPEG Coding (Lossless-DPCM-Discrete Pulse Code Modulation)

Figure 11: JPEG Lossless Diagram There is another option for JPEG models, “JPEG lossless”, which means the quality of output image is exactly the same as the quality of the original image. (The compression ratio of this technique is less than JPEG lossy compression ratio, which usually less than 2 times; however, it depends upon the Predictor values as well. [5]) Predictive compression (The predictive coding will rely on some information on previous pixels) is used to encode this method instead of DCT. Predictive coding uses mainly a predictor equation to predict the next value and after that, the Huffman Coding technique (VLI) will be encoded each pixel values. In JPEG standard, there is no fixed value of Huffman coding table but generally, the Huffman Table from DC coding (lossy) will be employed. 4.6 Predictive Coding

C B A X

Table 11: Relationship between sample and prediction samples (xi, yi position) Figure 11 and Table 11 show the procedure of lossless encoding process. There are up to three neighborhood samples to calculate the prediction, “A”, “B”, and “C” at position “X”. All prediction values are shown in Table 12 which defines up to seven linear combinations (PSV). The actual value of this position will be formulated from the subtraction of the prediction value at position “X”. Finally, this difference will be coded by Entropy Coding method either Huffman (Based Line) or Arithmetic Coding.

Selection-Value Prediction 0 No prediction 1 Px1=a 2 Px2=b 3 Px3=c 4 Px4=a+b-c 5 Px5=a+((b-c)/2)*6 Px6=b+((a-c)/2)*7 Px7=(a+b)/2 * Shift right arithmetic

Table 12: Predictors (P(x,y-1)=”a”, P(x-1,y)=”b”, P(x-1,y-1)=”c”) [14]

SSSS Huff Code (DC) Additional Bits Difference Values 0 00 - 0 1 010 0,1 -1,1 2 011 00,01,10,11 -3,-2,2,3 3 100 000,…011,100…111 -7,-6..-4,4..7 4 101 0000,..,0111,1000,..,1111 -15..-8,8..15 5 110 00000,.,01111,10000,.,11111 -31..-16,16..31 6 1110 .. -63..-32,32..63 7 11110 .. -127..-64,64..127 8 111110 .. -255..-128,128..255 9 1111110 .. -511..-256,256..511 10 11111110 .. -1023..-512,512..1023 11 111111110 .. -2047..-1024,1024..2047 12 1111111110 .. -4095..-2048,2048..4095 13 11111111110 .. -8191..-4096,4096..8191 14 111111111110 .. -16383..-8192,8192..16383 15 1111111111110 .. -32767..-16384,16384..3276716 11111111111110 .. 32768

Table 13: Difference categories for lossless Huffman Coding (Range of [-(2i-1),+(2 i -1)] but miss the middle range [-(2 i-1-1),+2 i-1-1)])

Selection-Value “0” is used only for the differential coding in the hierarchical mode.

Due to the only one parameter calculation, the Selection-Value “1, 2, and 3” is called one-dimensional predictors, and “7” is called two-dimensional predictor and the others are called three-dimensional predictors.

Since the differential value can only be integer, without encountering overflow and underflow for Px5 and Px6, it is needed to shift right 1 bit. For example, “A”, “B”, and “C” are all 8 bits (0-255) integer value (250,240,220). For Px7, to prevent overflow, when “A” and “B” are summed, the result has to be divided by 2 from 490 (overflow) to 245. In case of Px5 and Px6, the values are shifted from 260 to 130 and 270 to 135 respectively.

In JPEG lossless, P-bits samples are from 2 to 16; therefore, if the sample is 8-bits (Gray-Scale), the Predictor will be 28-1 = 128 (in the case Pt=0); however, if the image is in a wider range of values, the point transform can be used by shifting right the sampling by Pt bits. As a result, the Predictor will be 2P-Pt-1. (Each input sample has to be shifted right by Pt bits)

In case of (0, 0) pixel, the predictor 2P-Pt-1 is used. Also, Px1 is coded for the first column (0, *) and Px2 is coded for the first row (*, 0). 4.7 Predictive Coding Example

110 100 90 80 90 100 70 90 100

Table 14: Gray Scale Source Image (Original Pixel)

-18 -10 -10 -30 0 10 -10 10 5

Table 15: Differential Image = (Original Pixel-Predictor, Px7)

SSSS Huff Add bits SSSS Huff Add bits SSSS Huff Add bits 5 110 01101 4 101 0101 4 101 0101 5 110 00001 0 00 - 4 101 1010 4 101 0101 4 101 1010 3 100 101

HUFF (DC) 11001101101010110101011100000100101101010101011011010100101 (59 bits instead of 8*9=72bits so the compression ratio is about 1.22 times)

Table 16: Predictive Coding (Px7) and Output Bit Stream In this example, the square pixel sample (3*3) is used to encode (Table 14) with Selection Value of 7. Table 14 shows the result after applying the different between pixel and the predictor (Px7=(a+b)/2), and finally Table 16 shows the encoding process by applying Huffman Variable Length Coding. (Table 13) The rule of the additional bits is in the order; if the DPCM value is positive, append the SSSS low order bits of the difference; if the DPCM value is negative, subtract one from the difference and append the SSSS low-order its of this result. The additional bits are appended most significant bit first. [14] 5. Performance and Comparison

In this section, some performance metrics are measured, especially PSNR value for lossy and compression ratio and CPU processing consumption for lossless to various encoding technique. Considering the Internet image file types, mostly GIF, JPEG (lossy and lossless), and PNG are represented; therefore, in this part, the measurement will emphasize on these images. 5.1 Why are GIF, JPEG, and PNG good? • All these compression techniques are standardized. JPEG is an ISO standard, and PNG is

an IETF RFC and a W3C recommendation.

• GIF files are compressed around 5:1, JPEG files are compressed 10:1 or 20:1, and PNG files are compressed around 7:1. To transmit image file on Internet, the smaller files, the better.

• Almost all kinds of Web browsers support them. This is 100% true for GIF and JPEG and 99% true for PNG. [2]

5.2 Measurement Image compression performance could be measured in two ways either lossy or

lossless. In lossless, since the output image will be exactly the same as the original image, the compression ratio is the main factor to measure the performance; however, some parameters such as the processing consumption and memory requirement could be used to compare as well. On the other hand, in lossy, PSNR (Peak signal-to-noise ratio) is often used to characterize the signal to the noise to measure the quality of the image with the compression ratio. (The more the merrier)

Denoting the pixels of the original image by Pi and the pixels of the reconstructed image by Qi, Mean Square Error (MSE) between two images as;

MSE = 1/n∑ (Pi-Qi)2

It is the average of the square of the errors (pixel differences) of the two images. The root mean square error (RMSE) is defined as the square root of the MSE and the PSNR is defined as; (For Gray-Scale image with eight bits per pixel, the numerator is 255 and for color images, only the luminance component is used)

PSNR = 20 log 10 maxi [Pi]/RMSE

5.3 JPEG (Lossy)

In this mode, according to [11], the result of the compression compared to applying

standard algorithm (DCT-Based) based on images; “black1”, “earth”, “fish”, and “sanskar” is shown as in the Table.

Image Pixel (Dimension) Size (bytes) PSNR black1 184*256 9894 60.4491 Earth 152*160 7289 50.8852 Fish 96*184 1966 53.2505 Sanskar 448*600 28098 51.5771 Table 17: PSNR result of JPEG encoding [11]

In this research (Table 18 and 19), the experiment did vary the quantization step value

from 0.5 to 5 to measure the PSNR and the output size (compression ratio) of the image. Consequently, the more the quantization step, the less the memory bytes used and the less the PSNR value (low quality image) Moreover, in Table 20, although the new technique is applied, Wavelet versus DCT-Based technique, this technique can not improve performance obviously or even worse. It may be because the vector quantization method will not be implemented in the Wavelet technique, and also the Wavelet coefficient can not be suited for this experiment. [11]

Quantization Multiples

Memory used (bytes)

PSNR (Earth) Memory used (bytes)

PSNR (Fish)

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

9940 7289 6475 5988 4088 3859 3709 3499 3581 2674

51.2168 50.8852 34.2413 29.8862 24.6195 24.4436 23.9693 22.3698 22.2017 22.0197

2657 1967 1792 1589 995 950 912 868 784 707

53.7309 53.2505 40.1448 35.4498 35.2324 34.8918 33.611 33.107 28.7631 24.5972

Table 18: PSNR result of JPEG encoding with multiple quantization values [11]

Quantization Multiples

Memory used (bytes)

PSNR (Black1)

Memory used (bytes)

PSNR (Sanskar)

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

13627 9895 8901 8522 5015 4726 4545 4446 3883 3312

60.725 60.4491 33.6742 28.1311 28.035 27.6936 26.455 26.1615 24.972 23.2314

36920 28099 25190 22808 17176 16311 15634 14900 13859 12789

52.0864 51.5771 38.8478 34.5129 34.1356 33.6424 32.8696 32.0964 31.7498 31.3403

Table 19: PSNR result of JPEG encoding with multiple quantization values [11] Files DCT-Size

(bytes) PSNR (DCT) Wavelet-Size

(bytes) PSNR

(Wavelet) black1 9894 60.4491 10999 38.0456 Earth 7289 50.8852 8164 37.2693 Fish 1966 53.2505 3253 40.6093 Sanskar 28098 51.5771 36696 41.423 Table 20: PSNR result of JPEG encoding by DCT versus Wavelet techniques [11]

Additionally, we conducted some extra experiment to measure the JPEG coding efficiency, “Irfanview”, JPEG encoding and coding freeware is used to compress images file to show the comparison of the compression ratio. In Table 21 and Figure 13, it shows the compression ratio with the lowest image quality (JPEG) that could be up to 256 times (lena) and 20 times for JPEG2000.

Original (Kbytes)

JPEG(1%) (Kbytes)

Ratio JPEG2000 (1%) (Kbytes)

Ratio

Lena 769 3 256.33333 15 17.08889pepers 769 4 192.25 15 12.81667mandrill 769 4 192.25 16 12.01563simpson 250 2 125 6 20.83333Table 21: one percent quality of JPEG encoding compression ratio (at most 256 times)

Furthermore, Table 24 shows the comparison of compression ration by several image compression techniques; namely, “bmp”, “tiff”, “JPEG lossy” (75% ISO recommendation), “JPEG lossless”, “JPEG2000 lossy” (75%), “JPEG2000 lossless”, “png” and, “gif” which compare with the original images. The original image files; “ppm” (PorTable Pixel Map) (24 bits RGB color) and “pgm” (PorTable Gray Map) (8-bit Grey-Scale) have been used because there are not compressed, and also they need only a few headers. These format are produced by “pbmplus” and “netpbm” free packet of utilities and it is very easy to convert a file from a regular format to other formats. The standard images are used in the experiment; namely, “lena”, “mandrill”, “pepers”, “simpson”, “jpeg(text)”, “jpeg(color)”, and “icon”. The results are shown as follows: • “png” (PorTable Network Graphics) format recieves the highest compression ratio for text

or non correlation color image, and also “gif” (Graphics Interchange Format) format is in the second rank.

• “JPEG (lossy)” is the highest compression ratio technique. (up to 20 times for “lena”) • In fact, we can not conclude which compression method is better than the others because

“gif” and “png” are the lossless image compression technique (256 colors), but JPEG (224 colors) is lossy; however, when we compare “JPEG/JPEG2000 lossless” with “gif” and “pnd”, the compression ratio will be quite the same except text image compression.

• “png” is the highest compression ratio for icon. • “JPEG (lossless) and JPEG2000 (lossless)” compression ratio for color image are almost

the same. (up to 2.98 times for “simpson”) • Considering the “JPEG (lossy)”, the compression ratio for color image is better than the

compression ratio for Gray-Scale image. • Because “tiff” (Tagged Image File Format) and “bmp” (Window Bitmap) are 32-bit

CMYK and 24-bit uncompress color image representation, the compression ratio are almost 1:1.

5.4 Lossless (JPEG and GIF-LZW)

In this model, according to [5], the result of the comparison by applying standard algorithm; JPEG lossless, LZW, Gzip, and Pack is shown in the Table. Compression Program Lena Football F-18 Flowers Lossless JPEG 1.45 1.54 2.29 1.26 Compress (LZW) 0.86 1.24 2.21 0.87 Gzip (Lempel-Ziv) 1.08 1.36 3.10 1.05 Gzip –9 (optimal Lempel-Ziv) 1.08 1.36 3.13 1.05 Pack (Huffman Coding) 1.02 1.12 1.19 1.00

Table 22: Compression ratio comparison among different compression program [5] It could be said that mostly JPEG (lossless) achieves the highest compression ratio when compare with other methods; however, this technique consumes a lot of CPU processing. (Table 23) In this research, all programs are compiled with “gcc –O2” and are timed on a SUN SPARC 10 with 32 MB of memory using the UNIX “time” program. [5]

Figure 12 shows the relationship between the PSV value and compression ratio. According to this research, it seems that the more information that is used, the better the compression ratio. Among the three one-dimensional PSVs (1, 2, and 3), PSV 3 uses the farthest neighbor, the upper left pixel, and the compression ratio is the worst. For PSV 7, two-dimensional PSV, the ratio is better, and finally for three-dimensional PSVs (4, 5, and 6) have similar prediction functions which the ratios are quite similar; however, PSV 4 achieves the highest compression ratio. [5] Encoding CPU time (sec) Decoding CPU time (sec) Lossless JPEG 5.2 (+- 0.4) Lossless JPEG 1.9 (+-0.3) Compress (LZW) 2.0 (+-0.6) Compress (LZW) 1.1 (+-0.3) Gzip (Lempel-Ziv) 4.9 (+-1.5) Gzip (Lempel-Ziv) 0.5 (+-0.1) Gzip –9 (optimal Lempel-Ziv)

10.1 (+-5.8) Gzip –9 (optimal Lempel-Ziv)

0.5 (+-0.1)

Pack (Huffman Coding)

0.8 (+-0.1) Pack (Huffman Coding)

1.8 (+-0.4)

Table 23: CPU consumption comparison among different compression program [5]

Figure 12: The average compression ratio for the Seven PSVs [5]

Dimension

Original Size

(ppm/pgm) bmp tiff JPEG(75%) JPEG(lossless) JPEG2000(75%) JPEG2000(lossless) png gif JPEG(text) 200*320 63 0.9844 0.9692 5.2500 6.3000 4.5000 4.5000 21.0000 15.7500 JPEG(color) 200*320 188 1.0021 1.0005 16.7857 6.9630 7.5200 6.7143 75.2000 52.2222 lena(Gray) 512*512 257 0.9961 0.9961 7.7879 1.7020 5.5870 1.8489 1.1682 0.9809 lena 512*512 769 1.0000 0.9987 20.7838 1.6502 11.6515 1.7398 1.6362 3.7330 pepers(Gray) 512*512 257 0.9961 0.9961 7.5588 1.6369 4.5088 1.7365 1.1422 0.9735 pepers 512*512 769 1.0000 0.9987 19.7179 1.8009 16.0208 2.3445 1.8397 4.4971 mandrill(Gray) 512*512 257 0.9961 0.9961 3.7794 1.2537 2.2544 1.3112 1.0983 0.8538 mandrill 512*512 769 1.0000 0.9987 10.2533 1.2304 3.6619 1.3259 1.2524 3.2042 simpson(Gray) 340*250 84 0.9882 0.9882 7.0000 2.2703 5.2500 2.8000 2.5455 2.0000 simpson 340*250 250 1.0000 1.0000 17.8571 2.6882 10.4167 2.9762 4.0984 7.5758 icon 36*36 4 1.0000 0.8000 2.0000 2.0000 2.0000 1.3333 4.0000 2.0000

Table 24: The compression ratio of various images with different compression techniques

Figure 13: Three image examples with 1 percent JPEG DCT-Based quality (“lenay.jpg”, “pepers.jpg,” and “mandrill.jpg”)

Figure 14: Images example comparison (original:left and JPEG (75%) :right images);

“lena”,“mandrill”,“pepers”,“simpson”,“JPEG(text)”,“JPEG(color)”,“icon” 6. JPEG2000

Recently, the new international standard for image compression has been deployed by

International Standards Organization (ISO), “JPEG2000". Instead of DCT technology (JPEG) standard, Wavelets technology is implemented with coefficient adjustment. Usually, it provides the higher compression (up to 20 per cent plus) and the superior image quality without any distortion or loss.

JPEG2000 Advantages [6]

• Higher compression without compromising quality. • Progressive image reconstruction allows full image even during the transmission process. • Define Region of interest that allows to specify some important areas to be compressed at

a higher quality than other one. • Lossy and lossless compression: have the choice of loosing quality but having a small file

size or preserve all the quality but still saving a lot of space. • Error resilience functionality for noisy channels allows the transmission of JPEG 2000

images in mobile applications. • No artifact with very high compression. • The JP2 file format (.jp2) is XML based metadata. • Easier random access to code stream for reorganization of data while transmission that

data.

JPEG2000 is the new image encoding standard that provides essential features to many emerging imaging applications. Some of the JPEG/JPEG2000 application examples are described below.

• Mobile Image application • Medical Imaging • Scanners • Digital Cameras and photography • Satellite imagery • Document and image storage

For video image processing (MJPEG)

• Internet video and imaging and zoom image applications • CD-Rom video distribution • Videoconferencing • Video capture systems • Medical Video Streaming • Video Motion Detecting and Object tracking System

7. JPEG Application (Motion JPEG-MJPEG) MJPEG (M-JPEG) is meant to “Motion-JPEG” which is the encoding technique that simply independently compresses image efor ach video frame. Due to compressing each individual image itself, there will be the same guarantee quality. (Actually, M-JPEG is no such standard. Some different vendors can apply JPEG to individual sequence video frame; thus, they have each done it differently. As a result, MJPEG files may be not compatible with different vendors.)

Figure 15: Example of a sequence of three complete MJPEG images [1]

Figure 16: Example of a sequence of three complete MPEG images [1]

In Figure 15, the independent JPEG image will be sent in full detail including

background, but in Figure 16, contrary to one of the best-known audio and video streaming techniques is the standard called “MPEG”, each image will be sent in different frame; the first reference frame and the different data from the other frame.

The MPEG concept is to compress inter-frame images in order to transmit over the network, by using the first reference frame, I-frame, (intracoded pictured: this function is equivalent to JPEG image) that is coded independently from the other types in every some period of time, B-frame (bidirectionally predictive-coded picture that coding is based on both preceding and following I and P frame), and P-frame (DC-Coded picture that contains DC coefficient information based on the preceding I or P picture) B and P frames indicate the different between each image (Figure 17). In reconstruction mode, the decoder will build image based on the I-frame and obtain the different data from the others. [9]

Figure 17: Group of pictures in intraframe coding (MPEG) [1] MJPEG MPEG-1 MPEG-2 H.263 Target bit rate

N/A* About 1.5 Mbit/sec 2

– 15 Mbit/sec 64, 128, 192 kbit/sec up to approx 2 Mbit/sec

Supported frame rates

Camera / Video Server dependent

25/30 fps 25/30 fps Any, up to 30 fps

Resolution Any 320 x 288 320 x 240

320 x 288 320 x 240 720 x 576 720 x 480

352 x 288

Image quality

Low to Very good

Good Very good Low

Target application

Still images Digital video on CD (VCD)

DVD, HDTV Tele-conference

Basic algorithm

(DCT)

DCT with motion vectors



Standard ISO/IEC 10918

ISO/IEC 11172 ISO/IEC 13818 ITU-T H.263

Table 25: A comparison of some of the most common compression methods [1]

(“Since the MJPEG and MJPEG 2000 standards are primarily for still image compression techniques, they do not have set limits on frame rate, image resolution, image quality or target bit rates. MJPEG bit rate is dependent on available bandwidth and transfer capacity of the camera or video” [1])

I B B P B B P B B I B B P

MJPEG Advantages • Due to none of inter-frame compression like MPEG, the latency transmission time is quite

low (it is good for live video or video motion detection and object tracking). Thus, the quality of image will be mostly based on the speed and quality of the codec and the transmission capacity.

• JPEG compression is very cheap to implement in hardware principally on account of low complexity; therefore, low-cost coding and decoding (less computation) can be achieved. As a result, the compression process is faster and simpler.

• There is no limitation of video frame size (MPEG1:352*240, MPEG2:640*480 and 720*480), and also regardless of movement or image complexity, the quality of image remains constant.

• Whenever packet loss, it will be recovered easily and quickly for the independent images. • Recently, the new version of MJPEG called MJPEG-2000 has been deployed. The DCT

compression in JPEG is replaced by Wavelet compression technique; hence, the quality and compression ratio are improved.

• No licensing fees

Figure 18: Variable Bit Rate (MPEG4 and MJPEG) [1]

MJPEG Disadvantages • High bandwidth consumption and high storage requirement at frame rates > 5 fps.

(Figure18) [1] • M-JPEG itself does not implement the video compression (only image compression) then

it will generate a relatively large amount of image data that is sent across the network. • There is not very much Software Support. M-JPEG Application (Example) According to [3], they study the comparison of a selection of ultrasound studies using various levels of MPEG and MJPEG compression technologies to quantify and compare the perceptive image losses. In Figure 19, “Agilent_03” is the 3-second sequence that is originally derived from a DICOM source file, sized 600 x 430, and black-padded to 640 x 480. The source is characteristic of a monochrome four chamber view with valve motion.

Figure 19: Typical image frame from Agilent_03 sequence [3]

After running a test comparison both M-JPEG and MPEG, the result of the study is shown in Table 26. “For a given quality factor, MPEG typically outperformed JPEG and achieve a higher compression ratio. However, in both sequences, the MPEG process was not able to achieve error rates as low as JPEG processing regardless of compression ratio.” [3]

Table 26: Compression Sweet-Spots for Acuson_03 Sequence [3] 8. Conclusion

Since, image type is the main component that consume Internet capacity, image compression will be very useful method to implement, especially nowadays, the cost for CPU processing and memory have been reducing but the transmission cost will almost be the same. Thus, JPEG is the image compression standard that is used for this purpose; however, not only supporting transmission application but also various kind of applications; for example, Mobile Image application, Medical Imaging, or even Document and image storage. Furthermore, the combination of multiple JPEG image, MJPEG, can make more advantage to implement the high quality video stream as the example mentions.

In this report, the intention is to make understanding of the JPEG idea by taking a clear example; the JPEG model, DCT, Quantization, and Entropy Coding for lossy and Predictive Coding for lossless. Also, some performance comparison among image types has been explored. It could not say that every step in JPEG is perfect that is suitable for every image. Some step could be improved, for example, the Entropy Coding; however, to understand the basic idea of JPEG, perhaps new image compression could be deployed in the future (at this time, JPEG2000 has recently deployed but there are not too much software support) Basically, whenever implementing the image compression, JPEG is the best choice and it depends on the human satisfaction. PNG/GIF is good for text image compression. From our experiment, the compression performance of JPEG2000 is not improved obviously.

9. Reference 1. Axis Communications Company. Digital Video Compression, Reviewing the

Methodologies and Standards to Use for Video Transmission and Storage. Axis White Paper, June 2004.

2. C. Bob and Kelly. Why are GIF, JPEG, and PNG good? web-building.crispen.org/formats/, 2004.

3. F. Aaron, MPEG-vs-JPEG Compression for Medical Images: A Qualitative Comparison for WG12. VMI Medical Inc, Jan 2001.

4. H. Andy C. PVRG-JPEG CODEC 1.1. PorTable Video Research Group (PRVG) Stanford University, November 1993.

5. H. Kongji. Experiments with a Lossless JPEG Codec. Master Thesis from Cornell University, Sep 1994.

6. Martin F. Arlitt and Carey L. Williamson. Web Server Workload Characterization: The Search for Invariants (Extended Version). ACM SIGMETRICS Conference, March 1996.

7. Morgan Multimedia. More about JPEG. www.morgan-multimedia.com, Nov 2004. 8. M. Sean and C. kc. Trends in Wide Area IP Traffic Patterns. Caida White Paper/ ITC

Specialist Seminar, Sept 2000. 9. N. Mark. The Data Compression Book 2nd. M&T Book, 1996. 10. P. William B and M. Joan L. JPEG still image data compression standard. Van Nostrand

Reinhold, 1993. 11. S. Avinash. Investigation of Variations to The JPEG STILL IMAGE COMPRESSION

STANDARD to improve compression ratio. Master Thesis from University of Missouri-Rolla, 2000.

12. S. David. Data Compression 3rd. Springer-Verlag, 2004. 13. S. Khalid. Introduction to Data Compression 2nd. Morgan Kaufmann Publishers, 2000. 14. Standardization of Terminal Equipment and Protocols for Telematic Services. Information

Technology Digital Compression and Coding of Continuous-Tone Still Images Requirements and Guidelines. CCITT Recommendation, Recommendation T.81, Sep 1992.

15. Tom lane. JPEG image compression FAQ. Independent JPEG Group, June 2004. 16. Wallace, G. The JPEG Still Picture Compression Standard. IEEE Transactions on

Consumer Electronics, Dec 1991.

Understanding JPEG and Applications · and C, and turn it back to A, B, C, and so on, “interleaved”. Figure 3: Component-interleave and table-switching control [14] Figure 4 shows

Documents