Top Banner
[JPEG IMAGE COMPRESSION] The paper analyzes JPEG image compression utilizing the discrete cosine transform algorithm, quantization tables for compression quality factors and run-length encoding technique. 2013 ENGI-4557 Digital Communications Jonathan Evangelista Brian England Lucas Muller
67
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript

JPEG Image Compression

ENGI-4557 Digital Communications JPEG Image Compression

46

2013ENGI-4557 Digital Communications Jonathan Evangelista Brian England Lucas Muller

[JPEG Image Compression]The paper analyzes JPEG image compression utilizing the discrete cosine transform algorithm, quantization tables for compression quality factors and run-length encoding technique.

Table of ContentsIntroduction2Discrete Cosine Transform3DCT in Basis Vectors5MATLAB Functions for Discrete Cosine Transform dct2 and idct28Zig Zag Transformation9Run-Length Encoding10Message Decoding12Quantization for a JPEG Image12Appendix A Code for DCT and Graphical User Interfaces (GUIs)16DCT as Basis Vectors (DCTTest.m)16JPEG Compression (used only for a sample block) (JPEGCompression.m)17JPEG Compression for Gray Scale Image (used in Lena Image) (JPEGCompressionGrayscale,m)1990% Compression (JPEGCompression90.m)2110% Compression (JPEGCompression10.m)23Homepage (Homepage.m)25DCT Test Using Basis Vectors (DCTTest.m)27GUI For Test Bench First Page (DCTFig.m)28GUI For Test Bench Second Page (DCTFig2.m)33GUI for Lena First Page (DCTLena.m)36GUI For Lena Second Page (DCTLena2.m)38Zig Zag Transform for Quantized, Rounded Matrix (zigzag.m)44Run Length Encoding for Zig Zag Transformed Image (RLE_encode.m)46Run Length Decoding for Received Encoded Image (RLE_decode.m)47Inverse Zig Zag Function for Decoded Image (invzigzag.m)48Appendix B MATLAB Graphical User Interfaces50Works Cited52

Introduction

This project analyzes the method of modifying a raw image in spatial domain, such as a bitmap file, into a compressed state. The compression method used is JPEG compression. The discrete cosine transform (DCT) formula is a mathematical algorithm which takes the values of an image in spatial domain and transforms them to the frequency domain; which is required for the compression. A quantization table determines the type and amount of compression which is desired on the image which is to be reconstructed. Many redundant high frequency values from the image are removed using run-length encoding. This is what ultimately reduces the size of the image file. Once the image is to be viewed, the file is decoded and reconstructed. The final result is a compressed image.

Discrete Cosine Transform

The Discrete Cosine Transform (DCT) is a relative of the Discrete Fourier Transform (DFT). The key difference between the two is that DCT uses only real numbers, hence cosine. The forward DCT takes a signal from the spatial domain and transforms it into the frequency domain. This provides many values of zero in the transformed matrix which will allow for efficient encoding such as run-length or Huffman (run-length used in this project). The inverse DCT (IDCT) does the exact opposite, where the signal from the frequency domain is transformed back into the spatial domain to provide a reconstructed image.In image compression, DCT is two-dimensional, as opposed to single-dimensional DCT which would be used in sound wave compression, for example. In JPEG compression, an original image of raw data (such as a bitmap file) is divided into blocks of 8x8 pixel values, which represent a colour with an unsigned integer from 0-255. Once the blocks are segmented, DCT is applied to them to obtain coefficients in the frequency domain. The 8x8 block size was determined by the DCT algorithm which was able to be implemented in VLSI (very large scale integration).The formula which describes the forward discrete cosine transform is:

(Eq. 1.1)

The formula which describes the inverse discrete cosine transform is:

(Eq. 1.2)

In both above cases,

The best way to understand this formula is to analyze a simple mathematical example. Observe the 8x8 block of pixels below:

** * * * * * *

* * * * * * * *

* * * * * * * *

* * * * * * * *

* * * * * * * *

* * * X * * * *

* * * * * * * *

* * * * * * * *

v

u

Figure 1.1In the above 8x8 table which represents an 8x8 block of pixels, each asterisk (*) represents a pixel with a value from 0-255. To get an idea of how the DCT formula works, an X has been placed in the table in the 4th column and 3rd row, and this specific pixel will be analyzed in the formula:

So in this case, u = 3, and v = 2 because the eight columns and rows are from values 0 to 7 (not 1 to 8). Since u and v are not equal to zero, both Cu and Cv are both equal to 1. This yields the equation below:

(Eq. 1.3)

Solving the DCT formula would provide the DCT coefficient which is desired at this location of the 8x8 block. The position of DCT coefficients in the new 8x8 block of values in the frequency domain can be represented as a high frequency or low frequency value. Below is an image provided by XIL Programmers Guide August 1994 and gives an illustration to explain the frequency distribution or DCT coefficient distribution in any given 8x8 block

Figure 1.2

The inverse DCT would provide the coefficient which is desired in the original state of the raw image in the spatial domain.DCT in Basis VectorsAn alternate way of viewing the DCT formula is in terms of basis vectors. Below is the formula for this method:(Eq. 1.4)

Where: , , And The P matrix is the raw image matrix of spatial domain values.

The D matrix is represented by:

(Eq. 1.5)

Where,

And,

Observing these matrices, a comparison can be made to the original DCT equation (Eq. 1.1).MATLAB code may be useful in understanding the methodology of the DCT in basis vectors. The example below will go through the process of obtaining DCT coefficients from a raw image 8x8 block of values in the spatial domain:%This is an example of an 8x8 block of spatial domain values. Each value %represents an unsigned integer from 0 - 255 to represent a colour. P0 = [139 144 149 153 155 155 155 155; 144 151 153 156 159 156 156 156; 150 155 160 163 158 156 156 156; 159 161 162 160 160 159 159 159; 159 160 161 162 162 155 155 155; 161 161 161 161 160 157 157 157; 162 162 161 163 162 157 157 157; 162 162 161 161 163 158 158 158]; %The line below subtracts 128 from each element in the above matrix to %make an signed integer to work with for the DCT algorithm. P = P0 - 128; %The two lines below are to produce the C(omega) term in Eq. 1.5. S = eye(8)/2;S(1,1) = 1/2/sqrt(2); %Below, the DCT vectors are arranged and then multiplied by S to complete%equation Eq 1.5.

D = zeros(8,8); for t = [0:7] for w = [0:7] %where w is u and v D(w+1, t+1) = cos((2*t+1)*w*pi/16); end end %Equation Eq. 1.5 therefore is: D = S * D; %Perform forward DCT to obtain the 8x8 matrix of DCT coefficients.F = D * P * D';

This program outputs the 8x8 matrix value of F (which would further be rounded to 0 decimal places):

This is one possible way to achieve DCT coefficients in MATLAB, and is simply just an explanation of the formula. In the case of this project, the MATLAB functions dct2 and idct2 are utilized.

MATLAB Functions for Discrete Cosine Transform dct2 and idct2

The algorithm used in this project is generated through MATLAB. From the help option, the following information is found which explains the dct2 algorithm in minor detail:

Zig Zag Transformation

After the 8x8 matrix of values has been quantized by the source device, the information must be sent as a string of bits to the receiver. In order to do this, we first use the Zig Zag Transform to change the two-dimensional matrix into a one-dimensional array of entries.Starting from the top-left corner, the transform scans in an alternating up-right and down-left pattern, gathering the values from the matrix and inserting them one at a time into the one-dimensional output array. The code first checks if the scanning is moving in the up-right or bottom-left direction by checking the modulo 2 sum of the current X and Y coordinate, that is to say that if the sum of the X and Y coordinates divided by two has a remainder of zero, then the scan is moving in the up-right direction and otherwise, it is moving in the down-left direction. This process is shown in Figure 2.1. Note that this process of checking the modulo 2 sum, all entries on a diagonal have the same sum and therefore have the same base rules as the rest of the nodes in that diagonal.Figure 2.1Once the basic direction of the node has been determined, additional rules are required for the proper functionality of the Zig Zag transformation. For the case where the modulo 2 sum equals zero, the diagonals have even numbered coordinate sums. When the X coordinate is at the minimum (1) and Y is at the maximum (8), the current node is the top-right and it must move down-left one space. If the X coordinate is at the minimum elsewhere, the code must move one space to the right. If the Y coordinate is at the maximum and the X coordinate is anywhere but the minimum, the code moves down one space. Any other spaces with even coordinate sums result in the code reading the value and move up-right.For cases where the modulo 2 sum is not zero, the diagonals have odd numbered coordinate sums. When the X coordinate is at the maximum and the Y coordinate is not at the maximum, the entry is at the lower left corner and the code moves one space to the right. If the Y coordinate is at the minimum and the X coordinate is at any value less than the maximum, the code moves down one space. Any other spaces with odd coordinate sums result in the code reading the value and move down-left.Lastly, then the code reads the value with X and Y at the maximum and copies that entry into the output array, the Zig Zag transformation is complete. At this point, the output array should contain most of the non-zero values at the beginning of the array followed by the zero values in large sequential group, or several smaller sequential groups divided by non-zero values. It is at this point that the output array will be Run-Length Encoded to reduce the number of bytes and subsequently increase the transmission speed.Run-Length Encoding

The Run-Length Encoding receives a one-dimensional array of any size and scans through each entry. It first reads the entry and adds that value to the one-dimensional output array. Next, it checks ahead in the input array until the value is not equal to the first one scanned. The number of entries checked is then added as the next value in the output array. As shown in figure 2.2, the code reads a value of 8 which is repeated 3 times in a row; value 0 which only occurs once in a row; value 4 which is repeated twice; and value 0 again, which is repeated twice this time.Input ArrayOutput Array

[8, 8, 8, 0, 4, 4, 0, 0][8, 3, 0, 1, 4, 2, 0, 2]

Figure 2.2In this particular case, Run-Length Encoding has not saved any space these values can be assumed to use 4 bits per entry and both the input and output arrays have 8 entries each, 32 bits total. However, this method is incredibly useful with JPEG compression since the quantized rounded matrix contains a short list of non-zero values in the upper right corner and is heavily populated by zeros elsewhere. After the Zig Zag Transformation, this results in an array containing 64 entries, mostly sequential zeros. This case can be seen in figure 2.3.

Input ArrayOutput Array

[1, 2, 3, 4, 0, 0, 0, 0, , 0][1, 1, 2, 1, 3, 1, 4, 1, 0, n]

Figure 2.3The Run-Length Encoding clearly loses efficiency when encoding values that are not repeated since the output contains the value and the number of sequential entries (in this case, 1), effectively doubling the number of bits required. However, each time a value is sequentially repeated, we have saved bits equal to one entry. When we apply this to the Zig Zag transformed quantized rounded matrix, we receive an output array that is significantly shorter than the 64-entry matrix that we started with.At this point, it would be possible to use Huffman encoding to further reduce the size of the transmitted message but for the scope of this project, this step is complex and unnecessary. The image has been compressed and can now be transmitted to another device.See Figure 2.4 below for the complete process: Message DecodingFollowing the transmission of the message from the source device to the receiver, the message must be converted back into a readable format in order for the receiver to perform transform operations to covert the data into the compressed JPEG image.Firstly, the encoded array must be inflated back into a one-dimensional array containing 64 entries. To do this, the Run-Length Encoding must be reversed by reading the first entry in the input array, adding that value to the output array N times, where N is the number following the value. This process is repeated until the input array has been completely read, which should be a total of 64 entries.Following the reverse Run-Length Encoding, the receiver must perform reverse Zig Zag encoding to return values to their correct locations. This is done in a very similar way to the regular Zig Zag encoding process. An 8x8 output matrix is initialized with all zeros and the decoder moves through it in the exact same way as the encoder did before transmitting the message. The only difference is that in this case, the values from the reverse Run-Length Encoded message are read and entered into the current entry of the output matrix. The result of this process is identical to the Quantized Rounded Matrix that the source device had created before the transmission and encoding steps.From this point, the receiving device may continue to use inverse transform methods to recreate the viewable compressed JPEG image.Quantization for a JPEG Image

Quantization involved in image compression is a lossy compression technique which is achieved by compressing a range of values to a single quantum value. By reducing the number of symbols in a given stream, the total stream of information becomes more compressible. In the case of a jpeg image, reducing the number of colours to represent the digital image makes it possible to reduce the file size. For JPEG images quantization is used in conjunction with Discrete Cosine Transform to achieve the desired compression.The human eye is very good at perceiving small differences in brightness over a large area, but does not do so well when it comes to the exact strength of high frequency brightness variation. This is what allows us to reduce the amount of information required to display the image. The high frequency components are just ignored seeing as the eye cannot see the differences anyways. This is accomplished by dividing the each component in the frequency domain by a constant for that component and then rounding to the nearest integer. The result of this procedure is that many of the high frequency components will be rounded down to zero and many others will be small values either positive or negative.The process of taking the DCT of an image is to establish the frequency values. Then using the standard JPEG luminance and chrominance quantization matrices one can begin the process of compressing the image. As describe above the DCT coefficient matrix is divided element by element using one of the quantization matrix and then rounding to the nearest integer which will gives us the quantized coefficients of the DCT values. An example is shown below of this process using the standard luminance quantization matrix:.

Figure 3.1 - DCT Coefficient Matrix

Fig 3.2 - Standard Luminance Quantization MatrixOnce we have the DCT coefficient matrix, we begin by dividing each element by corresponding element in the standard luminance quantization matrix and then rounding to the nearest integer to get the table shown below.

Fig 3.3 - Normalized Quantized Coefficient TableIt can be seen from the normalized quantized coefficient table that there many zeroes. These were the high frequency components of the image that have been reduced to zero or small values. The low frequency components of the image are all grouped in the upper left corner of the table. From here we can begin the process of encoding. If another level of compression is desired, the standard JPEG luminance and chrominance quantization matrices can be used as a base to adjust the quality factor of the image. This is accomplished by setting the desired quality factor; and depending on the set value, it will fall under one of two conditions:

The standard JPEG luminance and chrominance quantization matrices are already set to a quality factor of 50, this is why two conditions exists. By determining which quality factor is used, the standard tables can scaled up or down. The equation below shows the formula to calculate the new table values based on the desired quality factor used.

The higher the quality factor used, the more the image will retain its integrity, i.e., less compression will occur.

The two quantization tables shown above have a quality factor of 10 and 90. What is noticed is that a quality factor (qf) of 10 will cause the most compression to occur because it will eliminate a large portion of the high frequency components. With a qf=90, the image will end up retaining a large portion of the original values so less compression is achieved.With the quality factor determined, the process of finding the normalized quantized coefficient table is the same as described previously. Once this table has been calculated, encoding can begin.

Appendix A Code for DCT and Graphical User Interfaces (GUIs)DCT as Basis Vectors (DCTTest.m)

%This is an example of an 8x8 block of spatial domain values. Each value %represents an unsigned integer from 0 - 255 to represent a colour. P0 = [139 144 149 153 155 155 155 155; 144 151 153 156 159 156 156 156; 150 155 160 163 158 156 156 156; 159 161 162 160 160 159 159 159; 159 160 161 162 162 155 155 155; 161 161 161 161 160 157 157 157; 162 162 161 163 162 157 157 157; 162 162 161 161 163 158 158 158]; %The line below subtracts 128 from each element in the above matrix to %make an signed integer to work with for the DCT algorithm. P = P0 - 128; %The two lines below are to produce the C(omega) term in Eq. 1.5. S = eye(8)/2;S(1,1) = 1/2/sqrt(2); %Below, the DCT vectors are arranged and then multiplied by S to complete%equation Eq 1.5. D = zeros(8,8); for t = [0:7] for w = [0:7] %where w is u and v D(w+1, t+1) = cos((2*t+1)*w*pi/16); end end %Equation Eq. 1.5 therefore is: D = S * D; %Perform forward DCT to obtain the 8x8 matrix of DCT coefficients.F = D * P * D';

JPEG Compression (used only for a sample block) (JPEGCompression.m)

%JPEG Compression using 50% quantization table P0original = [139 144 149 153 155 155 155 155; 144 151 153 156 159 156 156 156; 150 155 160 163 158 156 156 156; 159 161 162 160 160 159 159 159; 159 160 161 162 162 155 155 155; 161 161 161 161 160 157 157 157; 162 162 161 163 162 157 157 157; 162 162 161 161 163 158 158 158]; %Original 8x8 Matrix. P0 = P0original - 128; %Subtracts 128 from original 8x8 Matrix (for a signed integer rather than unsigned). P0fDCT = roundn(dct2(P0), 0); %Calculates forward DCT of P0. Q0 = [16 11 10 16 24 40 51 61; 12 12 14 19 26 58 60 55; 14 13 16 24 40 57 69 56; 14 17 22 29 51 87 80 62; 18 22 37 56 68 109 103 77; 24 35 55 64 81 104 113 92; 49 64 78 87 103 121 120 101; 72 92 95 98 112 100 103 99]; %This is the quantization matrix (for 50% compression) %The script below will determined the normalized quantized coefficients by%dividing, element by element, P0fDCT/Q0. Q0Norm = ldivide(Q0, P0fDCT); %Produces the normalized quantized coefficient matrix. Q0NormRounded = roundn(Q0Norm, 0); %Rounds the normalized quantized coefficient matrix. %-------------------------------------------------------------------------%%Now the quantized matrix is formed, perform zig zag scan run-length%encode. %Once the file is encoded (i.e., after run-length encoding), run the rest%of the script to decode the file and reconstruct the image. %-------------------------------------------------------------------------% Q0DeNorm = times(Q0NormRounded,Q0); %Produces the denormalized quantized coefficient matrix. %Now that the denormalized quantized matrix is obtained, the image needs to%be reconstructed to its compressed 8x8 matrix. Q0Reconstructed = idct2(Q0DeNorm); %Inverse Discrete Cosine Transform to reconstruct image. Q0Reconstructed128 = Q0Reconstructed + 128; %Adds 128 to the signed integer to make it once again an unsigned 0-255 integer. Q0ReconstructedRounded = roundn (Q0Reconstructed128, 0); %Rounds the matrix values to integers. %Now compressed image is reconstructed

JPEG Compression for Gray Scale Image (used in Lena Image) (JPEGCompressionGrayscale,m)

%This breaks a 2 dimensional grayscale image, "lena512.bmp" into 8x8 blocks%and performs a compression. n = 1; %set "n" variable to 1m = 1; %set "m" variable to 1 I = imread('lena512.bmp'); %read in the image %The ridiculously inefficient line below breaks up the 512x512 grayscale%image into 8x8 blocks of pixelsP0original = mat2cell(I,[8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8], [8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8]); while (n +Y% |% |% |% V% % +X %Output element i = the starting point in(1,1)output(i) = in(x, y); %until the end of the matrixwhile ((x move right else y = y + 1; end; %CASE C: last column but not max row => move down elseif ((y == ymax) && (x < xmax)) x = x + 1; %CASE D: all other cases => move up-right elseif ((x > xmin) && (y < ymax)) x = x - 1; y = y + 1; end; %===Downwards Direction=== else %points where x+y != a multiple of 2 %CASE E: bottom row => move right if (x == xmax) y = y + 1; %CASE F: first column, any row but first elseif (y == ymin && x < xmax) x = x + 1; %CASE G: all other cases => move down-left elseif ((x < xmax) && (y > ymin)) % all other cases x = x + 1; y = y - 1; end; end; %increment i to add the next element to output array i = i + 1; %Output element i = the current point in the matrix in(x,y) output(i) = in(x, y); %CASE I: bottom-right corner => break loop if ((x == xmax) && (y == ymax)) output(i) = in(x, y); break end; end;

Run Length Encoding for Zig Zag Transformed Image (RLE_encode.m)

function encoded = RLE_encode(input)

%Receives an input array of values and outputs a compressed array using run%length encoding. Sequential entries are put into an output array in pairs %[i, j], where i represents the numerical value of the entry and represents%the number of times it appears, sequentially.

%get length = the length of the input vectorlength = size(input,2);%by default, set run_length to 1 run_length = 1;%initialize empty array for encoded outputencoded = []; %for the whole vectorfor i=2:length %look back at previous entry and check equivalence if input(i) == input(i-1) %increment run_length if they are equal run_length = run_length + 1; else %push value and run length into encoded output vector if they are %not equal encoded = [encoded input(i-1) run_length]; run_length = 1; endend if length > 1 % Add last value and run length to output encoded = [encoded input(i) run_length];else % Special case if input is of length 1 encoded = [input(1) 1];end

Run Length Decoding for Received Encoded Image (RLE_decode.m)

function decoded = RLE_decode(encoded) %Receives a run length encoded array and outputs an expanded array using%run length decoding. For every pair of entries in the input array [i, j]% i represents the numerical value of an entry in the output array and j% represents the number of times it is sequentially shown.%i.e. [8,4] is decoded to make [8, 8, 8, 8]

my_size = size(encoded);length = my_size(2); index = 1;decoded = [];% iterate through the inputwhile (index