disp.ee.ntu.edu.twdisp.ee.ntu.edu.tw/tutorial/file/tutorial_2/CH_2.doc · Web viewThe DCT-Based compression method is known as the JPEG Baseline, which is the most widely used image

Chapter 2 Existing Image Coding Standards

After introducing the basics of each component in image compression flow chart, it is time to introduce two of the most well-known image compression standards JPEG and JPEG 2000. In this section, we will introduce the basic functional blocks in these two image compression standards.

2.1The JPEG Standard

JPEG is the first international digital image compression standard for still images, both grayscale and color, and many of the later image and video compression standards are mainly based on the basic structure of JPEG. The JPEG standard defines two basic compression methods. One is the DCT-Based compression method, which is the lossy compression, and another is the Predictive compression method, which is the lossless compression. The DCT-Based compression method is known as the JPEG Baseline, which is the most widely used image compression format in the World Wide Web and digital camera. In this subsection, we will introduce both the lossy and lossless JPEG compression methods.

2.1.1 The JPEG Encoder

Fig. 2.1 shows the DCT-Based JPEG encoder. The YCbCr color transform and the chrominance downsampling is not defined in the JPEG standard, but most JPEG compression software will perform these processing because it makes the JPEG encoder to reduce the data quantity more efficiently. The downsampled YCbCr passing through the JPEG encoder will be encoded as bitstream afterward.

1

Image Compression Tutorial

Fig. 2.1 The DCT-Based encoder of JPEG compression standard

2.1.2 Transform Coding – Discrete Cosine Transform

As we know, the natural image contents change slowly across the image in a small region, so the JPEG encoder will segment the original image into several small 8×8 blocks, which are called Macroblocks. The encoder will perform DCT on these macroblocks and the frequency transform coefficients will be obtained. Because the transform are performed on the 8×8 blocks, the forward DCT formula in can be rewritten as follows

2.1.3 Quantization

The second step after DCT is quantization. As just mentioned, the human visual is more sensitive to the low frequency components than the high frequency components, so the quantization will assign large quantization step size to the high frequency components to discard the redundant information and reduce the data quantity and assign small quantization step size to the low frequency components to save the significant information. The JPEG standard defines two quantization tables, one for the luminance and another for the chrominance. The luminance quantization table and chrominance quantization table are defined in and , respectively.

2


With the quantization table, we can encode the DCT coefficients as

The quantization step side is smaller in the left top direction because the low frequency components locate. The right bottom direction components are assigned larger quantization step size because the high frequency components locate. On the other hand, it can be observed that most of the high frequency quantization step sizes for chrominance are large and the same, this is due to the characteristics of the human visual system. The human visual system is more sensitive to the luminance information than the chrominance information, so the distortion of the high frequency chrominance components is not detected by the human eyes easily. Thus, in general, we will assign large quantization step size to the high frequency components in the chrominance signal.

2.1.4 Differential Coding of the DC Components3


After quantization, we will separate the DC components from the AC components because large correlation still exists between the DC components in the neighboring macroblocks. Assume the DC component in i-th macroblock is DCi and the DC component in the previous block is DCi-1. One example is shown in Fig. 2.2.

Fig. 2.2 Differential Coding of the DC components in the neighboring blocks

We will perform differential coding on the two DC components as follows

We set DC0 = 0. DC of the current block DCi will be equal to DCi-1 + Diffi . Therefore, in the JPEG file, the first coefficient is actually the difference of DCs. Then the difference is encoded with the AC coefficients.

2.1.5 Zigzag Scanning of the AC Coefficients

However, the quantizated and differential coded coefficients are 2D signal and the structure seems not suitable for the entropy encoder. Thus, the JPEG standard perform zigzag scanning on the AC coefficients to convert it to the 1D signal and order the sequence from low frequency to high frequency. The zigzag scan order is illustrated in Fig. 2.3. It is worth noting that the zigzag scanning is performed on the 63 AC coefficients only, exclusive of the DC coefficient.

Fig. 2.3 Zigzag scanning order

4


2.1.6 Run Length Coding of the AC Coefficients

Because we assign larger quantization step size to the high frequency components, most of the high frequency components will be truncated to zero while coarse quantization (strong quantization) is applied. In order to take advantage of this property, the JPEG standard defines run length coding to convert the AC coefficients to more efficient format for the entropy encoder. The RLC step replaces the quantized values by

where RUNLENGTH is the number of zeros, and VALUE is the nonzero coefficients.

For example, assume the quantized 63 AC coefficients are as follows

Then we can perform RLC on this vector and we will have

where EOB means that the remaining coefficients are all zeros.

2.1.7 Entropy Coding

The DC and the AC coefficients after run length coding finally undergo the entropy coding. Because it takes large computation to grow the optimal Huffman tree for the input image and the dynamic Huffman tree must be included in the bitstream, the JPEG defines the two Huffman tables for the DC and AC coefficients, respectively. These Huffman coding tables save large amount of computation to encode the images and the Huffman tree is not necessarily included in the encoded bitstream because the default Huffman tree defined in the JPEG standard is fixed. The codeword table for the value of the DC and AC coefficients is shown in Table 2.1.

Table 2.1 The codeword table for VALUE of the coefficientsValues Bits for the value Size

0 0

-1,1 0,1 1

-3,-2,2,3 00,01,10,11 2

5


-7,-6,-5,-4,4,5,6,7 000,001,010,011,100,101,110,111 3

-15,...,-8,8,...,15 0000,...,0111,1000,...,1111 4

-31,...,-16,16,...31 00000,...,01111,10000,...,11111 5

-63,...,-32,32,...63 000000,...,011111,100000,...,111111 6

-127,...,-64,64,...,127 0000000,...,0111111,1000000,...,1111111 7

-255,..,-128,128,..,255 ... 8

-511,..,-256,256,..,511 ... 9

-1023,..,-512,512,..,1023 ... 10

-2047,...,-1024,1024,...,2047 ... 11

Each differential coded DC coefficients will be encoded as

where SIZE indicates how many bits are needed to represent the VALUE of the coefficient, and VALUE is the actual encoded binary DC coefficients. SIZE must be Huffman coded while VALUE is not.

For example, if we have four DC coefficients 150, 5, -6, 3, then the DC coefficients will be turned into according to Table 2.1.

After the value of SIZE is obtained, we encode them with the Huffman table defined in Table 2.2, so we can finally encode all the DC coefficients as

Table 2.2 The Huffman table for the SIZE information of the luminance DC coefficientsSize Values Code Length

0 00 2

1 010 3

2 011 3

3 100 3

4 101 3

5 110 3

6 1110 4

7 11110 5

6


8 111110 6

9 1111110 7

10 11111110 8

11 111111110 9

The AC coefficients can be encoded with similar Huffman Table. The Huffman table for the AC coefficients is shown in Table 2.3. The codeword format of the AC coefficients is defined as follows

Table 2.3 The Run/Size Huffman table for the luminance AC coefficientsRun/Size code length code word

0/0 (EOB) 4 1010

15/0 (ZRL) 11 11111111001

0/1 2 00

...

0/6 7 1111000

...

0/10 16 1111111110000011

1/1 4 1100

1/2 5 11011

...

1/10 16 1111111110001000

2/1 5 11100

...

4/5 16 1111111110011000

...

15/10 16 1111111111111110

With Table 2.1, the AC coefficients after RLC in can be encode as follows

Finally we must encode the Run/Length data in with the Huffman table defined in Table 2.3, so the final bitstream for the AC coefficients can be encoded as

7


2.1.8 The JPEG Decoder

The encoded bitstream can be converted back to the spatial image coefficients through an inverse process. The structure of the JPEG decoder is shown in Fig. 2.4. It is easy to reconstruct the image at the decoder.

Fig. 2.4 The DCT-Based decoder of JPEG compression standard

However, the quantization operation is a many-to-one mapping, so the compression in baseline JPEG is lossy operation. The reconstructed image is just an approximation of the original image. On the other hand, the JPEG segment the original image into several macroblocks and perform transform coding and quantization on each blocks. This will introduce the very serious artifact called blocking effect, which results from coarse quantization that discards most of the high frequency components of each segmented macroblock. One example of the blocking effect is shown in Fig. 2.5.

8


Fig. 2.5 (a) The original image (b) The highly compressed image

2.2The Lossless JPEG

In addition to the lossy DCT-Based compression method, the JPEG standard also defines the lossless encoder for compressing the image and achieving perfect reconstruction. The encoder of the lossless JPEG is shown in Fig. 2.6.

Fig. 2.6 The encoder of the lossless JPEG

2.2.2 Predictor of Lossless JPEG

In order to illustrate the predictor in lossless JPEG, we must refer to Fig. 2.7.

Fig. 2.7 3-Sample Prediction Neighborhood

9


The syntax X represents the pixel to be predicted, and A, B, C are its neighboring pixels. The function of the predictor is defined in Table 2.4, and we can choose one from the eight prediction modes. The predictive value X will be subtracted from the actual value of X afterward. It is worth noting that the sample X can be predicted from the left and upper samples because of the decoding order. At the decoder, we will decode the coefficients from left to right and from top to bottom. Therefore, we cannot predict the sample X from the samples in the right and lower sides, which are not being decoded yet.

Table 2.4 The function of the predictorSelection Value Prediction

0 No prediction1 A2 B3 C4 A+B-C5 A+[(B-C)/2]6 B+[(A-C)/2]7 (A+B)/2

The residue of the sample X will be encoded with the Huffman table defined in the JPEG standard afterward. Since there is no quantization operation in the lossless JPEG and the predictor functions are composed of the ADD, SUBTRACT and SHIFT(divided by 2), there is no information loss in the lossless JPEG.

2.3The JPEG 2000 Standard

With the expansion of the high requirements of multimedia and Internet applications, the JPEG cannot fulfill the demand gradually. Therefore, the new call for the new standard JPEG 2000 was launched to solve many increasing shortcomings of the existing image compression standards. The features of JPEG 2000 include: (1) Efficient lossy and lossless compression with the same coding platform. (2) Superior image quality. (3) Additional features such as spatial scalability and region of interest. The JPEG 2000 encoder and decoder are shown in Fig. 2.8 and Fig. 2.9, respectively. We will introduce each functional blocks in the rest of this subsection.

10


Fig. 2.8 The block diagram of the JPEG 2000 encoder

Fig. 2.9 The block diagram of the JPEG 2000 decoder

2.3.1 Forward Component Transform

As mentioned in the previous sections, we must perform color space transform to separate the luminance components from the chrominance components for efficient processing, and the forward component transform is aim to accomplish this work. In JPEG 2000, two types of component transform are defined: the reversible component transform (RCT) and the irreversible component transform (ICT). The RCT is reversible and integer-to-integer, while the ICT is irreversible and real-to-real. The forward RCT is defined as and we can find that it is floating-point transform. Since the RCT is irreversible, it can be used in the lossy compression only.

where , , and are the red, green, and blue planes,

respectively. , , and are the Y, Cb, and Cr planes, respectively.

The ICT is defined in to . It is obvious that the transform is integer transform, so it can be applied to both lossy and lossless transform.

11


Where , , , , , and are defined the

same as in .

2.3.2 Discrete Wavelet Transform

In JPEG 2000, the Discrete Wavelet Transform is exploited to reduce the correlation between pixels. Before DWT, each image components will be partitioned into several tiles as shown in Fig. 2.10. The tiling operation will result into the loss of the image quality, but each tile can be encoded and decoded separately. This property makes JPEG 2000 enable the applications of region of interest (ROI), which can provide higher quality in some region we are interested in. On the other hand, the arithmetic codes suffer from the noise in the channel, and the tiling operation enable the decoder to reconstruct the image even parts of the tiles cannot be recovered.

Fig. 2.10 Tiling, and DWT of each image tile

After tiling, we perform 2D DWT on each tile to split it into several frequency subbands. The DWT used in JPEG 2000 can be reversible or irreversible. The default irreversible DWT defined in JPEG 2000 is the Daubechies 9/7 filter, and the analysis and synthesis filter coefficients are listed in Table 2.5. Since the 9/7 filter coefficients are real, the forward and inverse transform are both irreversible, so it can be applied in lossy compression only. On the other hand, the reversible DWT defined in JPEG 2000 is the 5/3 filter, and the analysis and synthesis filter coefficients are listed in Table 2.6. Because the 5/3 filter in JPEG 2000 is reversible, it can be applied in both lossy and

12


lossless compression.

Table 2.5 Daubechies 9/7 analysis filter coefficients

Analysis Filter Coefficients Synthesis Filter Coefficients

nLowpass Filter Highpass Filter Lowpass Filter Highpass Filter

0 0.602949018236 1.115087052456 1.115087052456 0.6029490182363

±1

0.266864118442 -0.059127176311 0.591271763114 -0.2668641184428

±2

-0.078223266528 -0.057543526228 -0.057543526228 -0.0782232665289

±3

-0.0168641184428 0.091271763114 -0.0912717631142 0.0168641184428

±4

0.026748757410 0.0267487574108

Table 2.6 5/3 analysis filter coefficients

Analysis Filter Coefficients Synthesis Filter Coefficients

nLowpass Filter Highpass Filter Lowpass Filter Highpass Filter

0 6/8 6/8 1 6/8

±1

2/8 2/8 1/2 -2/8

±2

-1/8 -1/8 -1/8

2.3.3 Quantization

After transform coding, we must perform quantization to reduce the precision of the data. In JPEG 2000, different quantization is performed on different subbands of each tile. The largest characteristic of the JPEG 2000 quantization is that it can achieve lossless and lossy quantization by specifying different parameters. We represent the wavelet coefficient in subband b as , and we can quantize it as

13


where is the quantization step size and . is the nominal

dynamic range of subband b, which depends on the largest value in that subband. The exponent and the mantissa are the other two parameters to adjust the strength of quantization.

We can achieve lossy and lossless quantization in JPEG 2000 by means of adjusting the parameters

1. Lossless Quantization: If , , then the step side . Thus, lossless quantization can be achieved.

2. Lossy Quantization: If lossy quantization is applied, then we must specify

and in the following ranges - and .

2.3.4 Tier-1 Encoder

The entropy encoder adopted in JPEG 2000 is called Embedded Block Coding with Optimized Truncation, abbreviated EBCOT. We can divide the operation of EBCOT into two steps: Tier-1 and Tier-2. The Tier-1 encoder is shown in Fig. 2.11, and it is composed of three parts: bit-plane conversion, context formation, and the arithmetic coding. Each part of the tier-1 encoder will be introduced in the rest of this sub-section.

Fig. 2.11 The block diagram of the tier-1 encoder

1) Bit-plane Conversion

This is the first part of the tier-1 encoder, and it converts the quantized wavelet coefficients into several bit-planes. The first bit-plane is the sign plane, which is composed of the sing bit of each wavelet coefficients. The other planes are the magnitude plane, from MSB to LSB. The bit-plane conversion is shown in the left of

14


Fig. 2.12. After the bit-planes are obtained, we must determine the significance of every bit. We take the right figure of Fig. 2.13 for example. A bit is viewed as insignificant before the first nonzero bit is met from MSB to LSB, which is marked white; otherwise it is viewed as significant, which is marked gray. The bit 1 marked green is the first nonzero bit at position [x,y].We define the significance of the bit in (x,y) in bit-plane P as vP[x,y], and vP[x,y]=1 if it is significant; else vP[x,y]=0.

Fig. 2.12 Bit-plan scanning and the significant and insignificant bits

2) Fractional Bit-plane Coding

After the significance of each bit is marked, the encoder can generate the contextual information for each bit. With the contextual information, the arithmetic encoder can compress the data more efficiently. Before encoding, the each subband will be partitioned into several non-overlapping rectangles called code blocks. After the code blocks are obtained, the data in each code block can be encoded separately from the other code blocks. The order of the coding in each code block is shown in Fig. 2.13, the scanning order is from top to down, from left to right, and from the higher bit-plane to the lower bit-plane. After 64 bits are scanned in the code block, the coding position will jump to the next four bits in the vertical direction, and repeat the same scanning scheme.

Fig. 2.13 The scanning order of the bit to be coded in a code block15


After we know the scanning order of the bits, we can generate the contextual information based on the adjacent bits around it. The adjacent bits will form a context window as shown in Fig. 2.14, and the bit marked X is the current one to be coded. The contextual information is generated by four coding methods: zero coding, sign coding, magnitude refinement coding, and run length coding. Each coding will generate the Context (CX) and the Decision (D), which are used for arithmetic coding. The four coding methods needs two variables σ and φ, and both of them are initialized to 0 in each bit. σ[x,y] is 1 after the first nonzero bit at [x,y] is met, and φ[x,y] is just the sign bit at position [x,y] in the bit plane.

Fig. 2.14 The context window of JPEG 2000

a) Zero Coding: In zero coding, the decision D is equal to vP[x,y], and the context CX is determined by the context tables and what type of the subband it locates. The context table for zero coding is listed in Table 2.7. Σ H is the sum of the significance of the two horizontal adjacent bits, Σ V is the sum of the significance of the two vertical adjacent bits, and Σ D is the sum of the significance of the four diagonal adjacent bits.

Table 2.7 The context table for zero coding in JPEG 2000.

LL and LH subbands HL subbands HH subbands CX

2 x x x 2 x x ≥ 3 81 ≥ 1 x ≥ 1 1 x ≥ 1 2 71 0 ≥ 1 0 1 ≥ 1 0 0 61 0 0 0 1 0 ≥ 2 1 50 2 x 2 0 x 1 1 40 1 x 1 0 x 0 1 30 0 ≥ 2 0 0 ≥ 2 ≥ 2 0 20 0 1 0 0 1 1 0 10 0 0 0 0 0 0 0 0

Note: x in this table denotes “do not care”

16


b) Sign Coding: In sign coding, the two parameters H and V are defined as

With these two parameters H and V, we can determine context CX by looking up Table 2.8. On the other hand we can obtain a new variable from Table 2.8, and the decision D in sign coding is defined as , where denotes XOR operation.

Table 2.8 The context table for sign coding in JPEG 2000

H V CX

1 1 0 131 0 0 121 -1 0 110 1 0 100 0 0 90 -1 1 10-1 1 1 11-1 0 1 12-1 -1 1 13

c) Magnitude Refinement Coding: In magnitude refinement coding, the decision D is equal to vP[x,y]. On the other hand, the context CX is determined by σ′ and the neighboring bits around it. σ′[x,y] is initialized to 0, and it will become 1 after the first time of the magnitude refinement coding is met at [x,y]. The context table for magnitude refinement coding is defined in Table 2.9.

Table 2.9 The context table for the magnitude refinement coding

σ′[x,y]σ [x-1,y]+σ [x+1,y]+σ [x-1,y-1]+

σ[x-1,y+1]+σ [x+1,y-1]+σ [x+1,y+1]CX

1 x 16

17


0 ≥ 1 150 0 14

d) Run-Length Coding: In run length coding, the four consecutive four bits in one stripe is coded at the same time. The pair (CX,D) is (17,0) when all the four bits are 0, else (CX,D) is (17,1). In the case that at least one nonzero bit exists in the stripe, two more pairs (CX,D1) and (CX,D2) are required to indicate the first nonzero bit at position (D1D2), and the context CX is fixed to 18. For example, if the stripe is (0110), then first nonzero bit is at position (1)10 and the binary representation of the position is (01)2, so D1 is 0 and D2 is 1. The position of the next bit to be coded is at position 2, so the next bit to be coded is 1 and the other coding method will be applied.

All the coding methods can constitute three coding passes by different assemble, and the coding condition of each coding method is listed in Table 2.10. The algorithm flowchart of these three coding passes is listed in Fig. 2.15, and they can constitute the whole fractional bit-plane coding algorithm. Since the three coding passes are not applied to code a bit at the same position in a bit-plane, the whole coding algorithm is called fractional bit-plane coding.

Table 2.10 The coding condition of three coding passes

Significance Propagation PassThe current bit is insignificant but at least one of its neighbors is significant

Magnitude Refinement Pass The current bit is already significantCleanup Pass The remaining bits that are insignificant all the time

18


Fig. 2.15 fractional bit-plane coding algorithm3) Arithmetic Coding

The decision and context data generated from fractional bit-plane coding is further encoded by the arithmetic encoder. As we know, the arithmetic coding is a recursive probability interval subdivision process. Since the arithmetic encoder in JPEG 2000 is binary, there are only two sub-intervals. With each decision as the symbol, the current probability interval is subdivided into narrower sub-interval. The bi-level interval is shown in Fig. 2.16. If the value of the decision D is 1, which is called more probable symbol (MPS), so we pick out the upper interval as shown in the left side of Fig. 2.17; otherwise, the value of the decision D is 0, which is called the less probable symbol (LPS), so we pick out the lower interval as shown in the right side of Fig. 2.17. The binary arithmetic encoder adopted in JPEG 2000 is adaptive, and the probability of the input symbols will change adaptively with the context CX. The algorithm flowchart of the binary arithmetic encoder is shown in Fig. 2.18.

19


Fig. 2.16 The probability distribution of the MPS and LPS

Fig. 2.17 The interval subdivision of the MPS and the LPS

20


Fig. 2.18 The algorithm flowchart of the binary arithmetic encoder in JPEG 2000

2.3.5 Tier-2 Encode

After the source input image is entropy encoded by the tier-1 encoder, the function of the tier-2 encoder is to package the output data of the tier-1 encoder into the final bitstream. On the other hand, the other side information and parameters regarding the setting of the encoder such as the resolution scalability, the size of the image, the region of interest, and so on. The bitstream generated from the source input image is called body, and the other generated from the side information and parameters is called the header. This is the final stage of the JPEG 2000 encoder, and the final bitstream will be generated.

21

disp.ee.ntu.edu.twdisp.ee.ntu.edu.tw/tutorial/file/tutorial_2/CH_2.doc · Web viewThe DCT-Based compression method is known as the JPEG Baseline, which is the most widely used image

Documents