JPEG Image Compression
ENGI-4557 Digital Communications JPEG Image Compression
46
2013ENGI-4557 Digital Communications Jonathan Evangelista Brian
England Lucas Muller
[JPEG Image Compression]The paper analyzes JPEG image
compression utilizing the discrete cosine transform algorithm,
quantization tables for compression quality factors and run-length
encoding technique.
Table of ContentsIntroduction2Discrete Cosine Transform3DCT in
Basis Vectors5MATLAB Functions for Discrete Cosine Transform dct2
and idct28Zig Zag Transformation9Run-Length Encoding10Message
Decoding12Quantization for a JPEG Image12Appendix A Code for DCT
and Graphical User Interfaces (GUIs)16DCT as Basis Vectors
(DCTTest.m)16JPEG Compression (used only for a sample block)
(JPEGCompression.m)17JPEG Compression for Gray Scale Image (used in
Lena Image) (JPEGCompressionGrayscale,m)1990% Compression
(JPEGCompression90.m)2110% Compression
(JPEGCompression10.m)23Homepage (Homepage.m)25DCT Test Using Basis
Vectors (DCTTest.m)27GUI For Test Bench First Page (DCTFig.m)28GUI
For Test Bench Second Page (DCTFig2.m)33GUI for Lena First Page
(DCTLena.m)36GUI For Lena Second Page (DCTLena2.m)38Zig Zag
Transform for Quantized, Rounded Matrix (zigzag.m)44Run Length
Encoding for Zig Zag Transformed Image (RLE_encode.m)46Run Length
Decoding for Received Encoded Image (RLE_decode.m)47Inverse Zig Zag
Function for Decoded Image (invzigzag.m)48Appendix B MATLAB
Graphical User Interfaces50Works Cited52
Introduction
This project analyzes the method of modifying a raw image in
spatial domain, such as a bitmap file, into a compressed state. The
compression method used is JPEG compression. The discrete cosine
transform (DCT) formula is a mathematical algorithm which takes the
values of an image in spatial domain and transforms them to the
frequency domain; which is required for the compression. A
quantization table determines the type and amount of compression
which is desired on the image which is to be reconstructed. Many
redundant high frequency values from the image are removed using
run-length encoding. This is what ultimately reduces the size of
the image file. Once the image is to be viewed, the file is decoded
and reconstructed. The final result is a compressed image.
Discrete Cosine Transform
The Discrete Cosine Transform (DCT) is a relative of the
Discrete Fourier Transform (DFT). The key difference between the
two is that DCT uses only real numbers, hence cosine. The forward
DCT takes a signal from the spatial domain and transforms it into
the frequency domain. This provides many values of zero in the
transformed matrix which will allow for efficient encoding such as
run-length or Huffman (run-length used in this project). The
inverse DCT (IDCT) does the exact opposite, where the signal from
the frequency domain is transformed back into the spatial domain to
provide a reconstructed image.In image compression, DCT is
two-dimensional, as opposed to single-dimensional DCT which would
be used in sound wave compression, for example. In JPEG
compression, an original image of raw data (such as a bitmap file)
is divided into blocks of 8x8 pixel values, which represent a
colour with an unsigned integer from 0-255. Once the blocks are
segmented, DCT is applied to them to obtain coefficients in the
frequency domain. The 8x8 block size was determined by the DCT
algorithm which was able to be implemented in VLSI (very large
scale integration).The formula which describes the forward discrete
cosine transform is:
(Eq. 1.1)
The formula which describes the inverse discrete cosine
transform is:
(Eq. 1.2)
In both above cases,
The best way to understand this formula is to analyze a simple
mathematical example. Observe the 8x8 block of pixels below:
** * * * * * *
* * * * * * * *
* * * * * * * *
* * * * * * * *
* * * * * * * *
* * * X * * * *
* * * * * * * *
* * * * * * * *
v
u
Figure 1.1In the above 8x8 table which represents an 8x8 block
of pixels, each asterisk (*) represents a pixel with a value from
0-255. To get an idea of how the DCT formula works, an X has been
placed in the table in the 4th column and 3rd row, and this
specific pixel will be analyzed in the formula:
So in this case, u = 3, and v = 2 because the eight columns and
rows are from values 0 to 7 (not 1 to 8). Since u and v are not
equal to zero, both Cu and Cv are both equal to 1. This yields the
equation below:
(Eq. 1.3)
Solving the DCT formula would provide the DCT coefficient which
is desired at this location of the 8x8 block. The position of DCT
coefficients in the new 8x8 block of values in the frequency domain
can be represented as a high frequency or low frequency value.
Below is an image provided by XIL Programmers Guide August 1994 and
gives an illustration to explain the frequency distribution or DCT
coefficient distribution in any given 8x8 block
Figure 1.2
The inverse DCT would provide the coefficient which is desired
in the original state of the raw image in the spatial domain.DCT in
Basis VectorsAn alternate way of viewing the DCT formula is in
terms of basis vectors. Below is the formula for this method:(Eq.
1.4)
Where: , , And The P matrix is the raw image matrix of spatial
domain values.
The D matrix is represented by:
(Eq. 1.5)
Where,
And,
Observing these matrices, a comparison can be made to the
original DCT equation (Eq. 1.1).MATLAB code may be useful in
understanding the methodology of the DCT in basis vectors. The
example below will go through the process of obtaining DCT
coefficients from a raw image 8x8 block of values in the spatial
domain:%This is an example of an 8x8 block of spatial domain
values. Each value %represents an unsigned integer from 0 - 255 to
represent a colour. P0 = [139 144 149 153 155 155 155 155; 144 151
153 156 159 156 156 156; 150 155 160 163 158 156 156 156; 159 161
162 160 160 159 159 159; 159 160 161 162 162 155 155 155; 161 161
161 161 160 157 157 157; 162 162 161 163 162 157 157 157; 162 162
161 161 163 158 158 158]; %The line below subtracts 128 from each
element in the above matrix to %make an signed integer to work with
for the DCT algorithm. P = P0 - 128; %The two lines below are to
produce the C(omega) term in Eq. 1.5. S = eye(8)/2;S(1,1) =
1/2/sqrt(2); %Below, the DCT vectors are arranged and then
multiplied by S to complete%equation Eq 1.5.
D = zeros(8,8); for t = [0:7] for w = [0:7] %where w is u and v
D(w+1, t+1) = cos((2*t+1)*w*pi/16); end end %Equation Eq. 1.5
therefore is: D = S * D; %Perform forward DCT to obtain the 8x8
matrix of DCT coefficients.F = D * P * D';
This program outputs the 8x8 matrix value of F (which would
further be rounded to 0 decimal places):
This is one possible way to achieve DCT coefficients in MATLAB,
and is simply just an explanation of the formula. In the case of
this project, the MATLAB functions dct2 and idct2 are utilized.
MATLAB Functions for Discrete Cosine Transform dct2 and
idct2
The algorithm used in this project is generated through MATLAB.
From the help option, the following information is found which
explains the dct2 algorithm in minor detail:
Zig Zag Transformation
After the 8x8 matrix of values has been quantized by the source
device, the information must be sent as a string of bits to the
receiver. In order to do this, we first use the Zig Zag Transform
to change the two-dimensional matrix into a one-dimensional array
of entries.Starting from the top-left corner, the transform scans
in an alternating up-right and down-left pattern, gathering the
values from the matrix and inserting them one at a time into the
one-dimensional output array. The code first checks if the scanning
is moving in the up-right or bottom-left direction by checking the
modulo 2 sum of the current X and Y coordinate, that is to say that
if the sum of the X and Y coordinates divided by two has a
remainder of zero, then the scan is moving in the up-right
direction and otherwise, it is moving in the down-left direction.
This process is shown in Figure 2.1. Note that this process of
checking the modulo 2 sum, all entries on a diagonal have the same
sum and therefore have the same base rules as the rest of the nodes
in that diagonal.Figure 2.1Once the basic direction of the node has
been determined, additional rules are required for the proper
functionality of the Zig Zag transformation. For the case where the
modulo 2 sum equals zero, the diagonals have even numbered
coordinate sums. When the X coordinate is at the minimum (1) and Y
is at the maximum (8), the current node is the top-right and it
must move down-left one space. If the X coordinate is at the
minimum elsewhere, the code must move one space to the right. If
the Y coordinate is at the maximum and the X coordinate is anywhere
but the minimum, the code moves down one space. Any other spaces
with even coordinate sums result in the code reading the value and
move up-right.For cases where the modulo 2 sum is not zero, the
diagonals have odd numbered coordinate sums. When the X coordinate
is at the maximum and the Y coordinate is not at the maximum, the
entry is at the lower left corner and the code moves one space to
the right. If the Y coordinate is at the minimum and the X
coordinate is at any value less than the maximum, the code moves
down one space. Any other spaces with odd coordinate sums result in
the code reading the value and move down-left.Lastly, then the code
reads the value with X and Y at the maximum and copies that entry
into the output array, the Zig Zag transformation is complete. At
this point, the output array should contain most of the non-zero
values at the beginning of the array followed by the zero values in
large sequential group, or several smaller sequential groups
divided by non-zero values. It is at this point that the output
array will be Run-Length Encoded to reduce the number of bytes and
subsequently increase the transmission speed.Run-Length
Encoding
The Run-Length Encoding receives a one-dimensional array of any
size and scans through each entry. It first reads the entry and
adds that value to the one-dimensional output array. Next, it
checks ahead in the input array until the value is not equal to the
first one scanned. The number of entries checked is then added as
the next value in the output array. As shown in figure 2.2, the
code reads a value of 8 which is repeated 3 times in a row; value 0
which only occurs once in a row; value 4 which is repeated twice;
and value 0 again, which is repeated twice this time.Input
ArrayOutput Array
[8, 8, 8, 0, 4, 4, 0, 0][8, 3, 0, 1, 4, 2, 0, 2]
Figure 2.2In this particular case, Run-Length Encoding has not
saved any space these values can be assumed to use 4 bits per entry
and both the input and output arrays have 8 entries each, 32 bits
total. However, this method is incredibly useful with JPEG
compression since the quantized rounded matrix contains a short
list of non-zero values in the upper right corner and is heavily
populated by zeros elsewhere. After the Zig Zag Transformation,
this results in an array containing 64 entries, mostly sequential
zeros. This case can be seen in figure 2.3.
Input ArrayOutput Array
[1, 2, 3, 4, 0, 0, 0, 0, , 0][1, 1, 2, 1, 3, 1, 4, 1, 0, n]
Figure 2.3The Run-Length Encoding clearly loses efficiency when
encoding values that are not repeated since the output contains the
value and the number of sequential entries (in this case, 1),
effectively doubling the number of bits required. However, each
time a value is sequentially repeated, we have saved bits equal to
one entry. When we apply this to the Zig Zag transformed quantized
rounded matrix, we receive an output array that is significantly
shorter than the 64-entry matrix that we started with.At this
point, it would be possible to use Huffman encoding to further
reduce the size of the transmitted message but for the scope of
this project, this step is complex and unnecessary. The image has
been compressed and can now be transmitted to another device.See
Figure 2.4 below for the complete process: Message
DecodingFollowing the transmission of the message from the source
device to the receiver, the message must be converted back into a
readable format in order for the receiver to perform transform
operations to covert the data into the compressed JPEG
image.Firstly, the encoded array must be inflated back into a
one-dimensional array containing 64 entries. To do this, the
Run-Length Encoding must be reversed by reading the first entry in
the input array, adding that value to the output array N times,
where N is the number following the value. This process is repeated
until the input array has been completely read, which should be a
total of 64 entries.Following the reverse Run-Length Encoding, the
receiver must perform reverse Zig Zag encoding to return values to
their correct locations. This is done in a very similar way to the
regular Zig Zag encoding process. An 8x8 output matrix is
initialized with all zeros and the decoder moves through it in the
exact same way as the encoder did before transmitting the message.
The only difference is that in this case, the values from the
reverse Run-Length Encoded message are read and entered into the
current entry of the output matrix. The result of this process is
identical to the Quantized Rounded Matrix that the source device
had created before the transmission and encoding steps.From this
point, the receiving device may continue to use inverse transform
methods to recreate the viewable compressed JPEG image.Quantization
for a JPEG Image
Quantization involved in image compression is a lossy
compression technique which is achieved by compressing a range of
values to a single quantum value. By reducing the number of symbols
in a given stream, the total stream of information becomes more
compressible. In the case of a jpeg image, reducing the number of
colours to represent the digital image makes it possible to reduce
the file size. For JPEG images quantization is used in conjunction
with Discrete Cosine Transform to achieve the desired
compression.The human eye is very good at perceiving small
differences in brightness over a large area, but does not do so
well when it comes to the exact strength of high frequency
brightness variation. This is what allows us to reduce the amount
of information required to display the image. The high frequency
components are just ignored seeing as the eye cannot see the
differences anyways. This is accomplished by dividing the each
component in the frequency domain by a constant for that component
and then rounding to the nearest integer. The result of this
procedure is that many of the high frequency components will be
rounded down to zero and many others will be small values either
positive or negative.The process of taking the DCT of an image is
to establish the frequency values. Then using the standard JPEG
luminance and chrominance quantization matrices one can begin the
process of compressing the image. As describe above the DCT
coefficient matrix is divided element by element using one of the
quantization matrix and then rounding to the nearest integer which
will gives us the quantized coefficients of the DCT values. An
example is shown below of this process using the standard luminance
quantization matrix:.
Figure 3.1 - DCT Coefficient Matrix
Fig 3.2 - Standard Luminance Quantization MatrixOnce we have the
DCT coefficient matrix, we begin by dividing each element by
corresponding element in the standard luminance quantization matrix
and then rounding to the nearest integer to get the table shown
below.
Fig 3.3 - Normalized Quantized Coefficient TableIt can be seen
from the normalized quantized coefficient table that there many
zeroes. These were the high frequency components of the image that
have been reduced to zero or small values. The low frequency
components of the image are all grouped in the upper left corner of
the table. From here we can begin the process of encoding. If
another level of compression is desired, the standard JPEG
luminance and chrominance quantization matrices can be used as a
base to adjust the quality factor of the image. This is
accomplished by setting the desired quality factor; and depending
on the set value, it will fall under one of two conditions:
The standard JPEG luminance and chrominance quantization
matrices are already set to a quality factor of 50, this is why two
conditions exists. By determining which quality factor is used, the
standard tables can scaled up or down. The equation below shows the
formula to calculate the new table values based on the desired
quality factor used.
The higher the quality factor used, the more the image will
retain its integrity, i.e., less compression will occur.
The two quantization tables shown above have a quality factor of
10 and 90. What is noticed is that a quality factor (qf) of 10 will
cause the most compression to occur because it will eliminate a
large portion of the high frequency components. With a qf=90, the
image will end up retaining a large portion of the original values
so less compression is achieved.With the quality factor determined,
the process of finding the normalized quantized coefficient table
is the same as described previously. Once this table has been
calculated, encoding can begin.
Appendix A Code for DCT and Graphical User Interfaces (GUIs)DCT
as Basis Vectors (DCTTest.m)
%This is an example of an 8x8 block of spatial domain values.
Each value %represents an unsigned integer from 0 - 255 to
represent a colour. P0 = [139 144 149 153 155 155 155 155; 144 151
153 156 159 156 156 156; 150 155 160 163 158 156 156 156; 159 161
162 160 160 159 159 159; 159 160 161 162 162 155 155 155; 161 161
161 161 160 157 157 157; 162 162 161 163 162 157 157 157; 162 162
161 161 163 158 158 158]; %The line below subtracts 128 from each
element in the above matrix to %make an signed integer to work with
for the DCT algorithm. P = P0 - 128; %The two lines below are to
produce the C(omega) term in Eq. 1.5. S = eye(8)/2;S(1,1) =
1/2/sqrt(2); %Below, the DCT vectors are arranged and then
multiplied by S to complete%equation Eq 1.5. D = zeros(8,8); for t
= [0:7] for w = [0:7] %where w is u and v D(w+1, t+1) =
cos((2*t+1)*w*pi/16); end end %Equation Eq. 1.5 therefore is: D = S
* D; %Perform forward DCT to obtain the 8x8 matrix of DCT
coefficients.F = D * P * D';
JPEG Compression (used only for a sample block)
(JPEGCompression.m)
%JPEG Compression using 50% quantization table P0original = [139
144 149 153 155 155 155 155; 144 151 153 156 159 156 156 156; 150
155 160 163 158 156 156 156; 159 161 162 160 160 159 159 159; 159
160 161 162 162 155 155 155; 161 161 161 161 160 157 157 157; 162
162 161 163 162 157 157 157; 162 162 161 161 163 158 158 158];
%Original 8x8 Matrix. P0 = P0original - 128; %Subtracts 128 from
original 8x8 Matrix (for a signed integer rather than unsigned).
P0fDCT = roundn(dct2(P0), 0); %Calculates forward DCT of P0. Q0 =
[16 11 10 16 24 40 51 61; 12 12 14 19 26 58 60 55; 14 13 16 24 40
57 69 56; 14 17 22 29 51 87 80 62; 18 22 37 56 68 109 103 77; 24 35
55 64 81 104 113 92; 49 64 78 87 103 121 120 101; 72 92 95 98 112
100 103 99]; %This is the quantization matrix (for 50% compression)
%The script below will determined the normalized quantized
coefficients by%dividing, element by element, P0fDCT/Q0. Q0Norm =
ldivide(Q0, P0fDCT); %Produces the normalized quantized coefficient
matrix. Q0NormRounded = roundn(Q0Norm, 0); %Rounds the normalized
quantized coefficient matrix.
%-------------------------------------------------------------------------%%Now
the quantized matrix is formed, perform zig zag scan
run-length%encode. %Once the file is encoded (i.e., after
run-length encoding), run the rest%of the script to decode the file
and reconstruct the image.
%-------------------------------------------------------------------------%
Q0DeNorm = times(Q0NormRounded,Q0); %Produces the denormalized
quantized coefficient matrix. %Now that the denormalized quantized
matrix is obtained, the image needs to%be reconstructed to its
compressed 8x8 matrix. Q0Reconstructed = idct2(Q0DeNorm); %Inverse
Discrete Cosine Transform to reconstruct image. Q0Reconstructed128
= Q0Reconstructed + 128; %Adds 128 to the signed integer to make it
once again an unsigned 0-255 integer. Q0ReconstructedRounded =
roundn (Q0Reconstructed128, 0); %Rounds the matrix values to
integers. %Now compressed image is reconstructed
JPEG Compression for Gray Scale Image (used in Lena Image)
(JPEGCompressionGrayscale,m)
%This breaks a 2 dimensional grayscale image, "lena512.bmp" into
8x8 blocks%and performs a compression. n = 1; %set "n" variable to
1m = 1; %set "m" variable to 1 I = imread('lena512.bmp'); %read in
the image %The ridiculously inefficient line below breaks up the
512x512 grayscale%image into 8x8 blocks of pixelsP0original =
mat2cell(I,[8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
8 8], [8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
8]); while (n +Y% |% |% |% V% % +X %Output element i = the starting
point in(1,1)output(i) = in(x, y); %until the end of the
matrixwhile ((x move right else y = y + 1; end; %CASE C: last
column but not max row => move down elseif ((y == ymax)
&& (x < xmax)) x = x + 1; %CASE D: all other cases =>
move up-right elseif ((x > xmin) && (y < ymax)) x = x
- 1; y = y + 1; end; %===Downwards Direction=== else %points where
x+y != a multiple of 2 %CASE E: bottom row => move right if (x
== xmax) y = y + 1; %CASE F: first column, any row but first elseif
(y == ymin && x < xmax) x = x + 1; %CASE G: all other
cases => move down-left elseif ((x < xmax) && (y >
ymin)) % all other cases x = x + 1; y = y - 1; end; end; %increment
i to add the next element to output array i = i + 1; %Output
element i = the current point in the matrix in(x,y) output(i) =
in(x, y); %CASE I: bottom-right corner => break loop if ((x ==
xmax) && (y == ymax)) output(i) = in(x, y); break end;
end;
Run Length Encoding for Zig Zag Transformed Image
(RLE_encode.m)
function encoded = RLE_encode(input)
%Receives an input array of values and outputs a compressed
array using run%length encoding. Sequential entries are put into an
output array in pairs %[i, j], where i represents the numerical
value of the entry and represents%the number of times it appears,
sequentially.
%get length = the length of the input vectorlength =
size(input,2);%by default, set run_length to 1 run_length =
1;%initialize empty array for encoded outputencoded = []; %for the
whole vectorfor i=2:length %look back at previous entry and check
equivalence if input(i) == input(i-1) %increment run_length if they
are equal run_length = run_length + 1; else %push value and run
length into encoded output vector if they are %not equal encoded =
[encoded input(i-1) run_length]; run_length = 1; endend if length
> 1 % Add last value and run length to output encoded = [encoded
input(i) run_length];else % Special case if input is of length 1
encoded = [input(1) 1];end
Run Length Decoding for Received Encoded Image
(RLE_decode.m)
function decoded = RLE_decode(encoded) %Receives a run length
encoded array and outputs an expanded array using%run length
decoding. For every pair of entries in the input array [i, j]% i
represents the numerical value of an entry in the output array and
j% represents the number of times it is sequentially shown.%i.e.
[8,4] is decoded to make [8, 8, 8, 8]
my_size = size(encoded);length = my_size(2); index = 1;decoded =
[];% iterate through the inputwhile (index