1. BLOCK TRUNCATION CODING FOR IMAGE COMPRESSION 1.1 DIGITAL IMAGE FUNDAMENTALS This chapter deals with the fundamentals of digital image signal representation and the basic Block Truncation Coding (BTC) for image compression. A frame of a digital image can be visualized as an orderly arrangement of ‘picture elements’ (pixels) arranged in horizontal lines and many such lines are stacked one below the other. It may also be visualized as a matrix of pixels arranged in rows and columns. For example a ‘512 x 512’ image has 512 horizontal lines in a frame, each with 512 pixels. A pixel is the tiniest, visible part of an image having its own color (hue) and brightness (intensity of light). The brightness is referred as ‘luminance ‘(luma) and the color is referred as ‘chrominance’ (chroma). Any colour can be represented as a mixture of three primary colors namely red, green and blue. When an image is scanned electronically, each pixel of the image produces its own R, G, B (red, green, blue) signals corresponding to the intensity of the primary colors in that pixel [17], [28]. In digital processors, each of the R, G, B signals are represented by 8 bits, corresponding to 256 quantization levels, starting from zero intensity to full intensity. It is customary to explain any image processing using monochrome (black and white) image, which can be extended to each of the R, G, B components of the color image, separately [32]. 1.2 NEED FOR IMAGE COMPRESSION Digital images are in general stored in memories, preprocessed, transmitted and reprocessed for final applications. The quantum of binary data to be handled by an image processor is enormous. For example, a ‘256 x 256’ frame of a monochrome image will have 524288 (256 x 256 x8) bits at the rate of 8 bits per pixel. A 5 minutes video at the rate of 25 such frames per second will have 3932160000 (nearly 40 million) bits! Obviously, it will be advantageous to reduce the number of bits before transmission with the capability of reproducing an acceptable image quality at the receiver. This process is known as ‘Lossy Image Compression’. This will primarily reduce the transmission time and also the storage memory required.
19
Embed
1. BLOCK TRUNCATION CODING FOR IMAGE COMPRESSION …shodhganga.inflibnet.ac.in/bitstream/10603/33680/3/chapter 2.pdf · BLOCK TRUNCATION CODING FOR IMAGE COMPRESSION ... The above
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1. BLOCK TRUNCATION CODING FOR IMAGE COMPRESSION
1.1 DIGITAL IMAGE FUNDAMENTALS
This chapter deals with the fundamentals of digital image signal representation
and the basic Block Truncation Coding (BTC) for image compression. A frame of a
digital image can be visualized as an orderly arrangement of ‘picture elements’ (pixels)
arranged in horizontal lines and many such lines are stacked one below the other. It
may also be visualized as a matrix of pixels arranged in rows and columns. For example
a ‘512 x 512’ image has 512 horizontal lines in a frame, each with 512 pixels. A pixel is
the tiniest, visible part of an image having its own color (hue) and brightness (intensity
of light). The brightness is referred as ‘luminance ‘(luma) and the color is referred as
‘chrominance’ (chroma). Any colour can be represented as a mixture of three primary
colors namely red, green and blue. When an image is scanned electronically, each pixel
of the image produces its own R, G, B (red, green, blue) signals corresponding to the
intensity of the primary colors in that pixel [17], [28]. In digital processors, each of the
R, G, B signals are represented by 8 bits, corresponding to 256 quantization levels,
starting from zero intensity to full intensity. It is customary to explain any image
processing using monochrome (black and white) image, which can be extended to each
of the R, G, B components of the color image, separately [32].
1.2 NEED FOR IMAGE COMPRESSION
Digital images are in general stored in memories, preprocessed, transmitted and
reprocessed for final applications. The quantum of binary data
to be handled by an image processor is enormous. For example, a ‘256 x 256’ frame of
a monochrome image will have 524288 (256 x 256 x8) bits at the rate of 8 bits per pixel.
A 5 minutes video at the rate of 25 such frames per second will have 3932160000
(nearly 40 million) bits! Obviously, it will be advantageous to reduce the number of bits
before transmission with the capability of reproducing an acceptable image quality at
the receiver. This process is known as ‘Lossy Image Compression’. This will primarily
reduce the transmission time and also the storage memory required.
But images could also be compressed without reduction in quality by employing
suitable coding techniques. Inherently, such ‘Lossless Image Compression’ methods [9]
yield lesser compression, compared to ‘Lossy’ methods [30].
The ‘Compression ratio’ (CR) and ‘Bit Rate’ (BR) are used to measure the
amount of image compression, while the ‘Peak Signal to Noise Ratio (PSNR)’ and
‘Root Mean Square Error’ (RMSE) are used to measure the resulting error of image
compression. Contrast (C) is is a measure of image visual quality.
Both time-domain [20] and transform based frequency-domain [37], [38], [39]
image compression techniques are employed in image compression. Block Truncation
Coding (BTC) is an apparently elegant and efficient time-domain compression
technique.
2.3 TRADITIONAL BLOCK TRUNCATION CODING
The Block Truncation Coding (BTC) was introduced by Delp and Mitchell [10], in
1979. This coding is based on dividing the image into non overlapping blocks of equal
size. In digital signal processors, an image is divided into smaller blocks of ‘k x k’ pixels
for processing. For example a ‘512 x 512’ frame may be divided into blocks of ‘8 x 8’
pixels. Sometimes microblocks of ‘2 x 2’ pixels, miniblocks of ‘4 x 4’ pixels, maxiblocks
of ’16 x 16’ pixels and macroblocks of ‘32 x 32’ pixels are also used.
BTC involves replacing the original intensity value of each pixel in a block either
by a ‘low mean’ intensity value ‘a’ or a ‘high mean’ intensity value ‘b’ based on a
threshold intensity value. This threshold is the mean intensity of the pixels in the block.
A ‘bit plane’ is created by representing the ‘a’ value pixels by ‘0’s and ‘b’ value pixels by
‘1’s.
(2.3.1)
(2.3.2)
qm
qxa
q
qmxb
Here, ‘m’ is the total number of pixels equal to 𝑘2 (16 for a 4x4 block)
‘q’ is the number of ‘0’s in the bit plane
�̅� is the mean intensity of ‘m’ pixels
‘σ’ is the standard deviation of intensities of ‘m’ pixels.
(2.3.3)
𝜎 = [𝑥2̅̅ ̅ − (�̅�)2]0.5 (2.3.4)
(2.3.5)
where 𝑚 = 𝑘2and 𝑥𝑖,𝑗is the intensity value of the pixel (i,j) of the image,𝑥 ̅is the mean
intensity, 𝑥2̅̅ ̅is the mean of squared intensities and σ is the standard deviation (SD).
The encoder transmits the ‘bit plane’ of total ‘m’ bits, along with �̅� and ‘σ’ of each
8 bits. In the decoder, the ‘0’s and ‘1’s of the bit plane are replaced by 8-bit ‘a’s and ‘b’s
calculated from Eqns. 2.3.1 and 2.3.2 to reproduce the BTC image, which is a close
approximation of the original image.
2.4. ‘CR’, ‘BR’, ‘RMSE’, ‘PSNR’ & ‘C’ PARAMETERS OF COMPRESSION
As indicated in Section 2.3, the ‘compression ratio’ (CR) and ‘bit rate’ (BR) are
used to measure the amount of image compression, while the ‘Root Mean Square
Error’ (RMSE) and the ‘Peak Signal to Noise Ratio (PSNR)’ are used to measure the
resulting error of image compression. Contrast (C) is a measure of image visual quality.
The ‘compression ratio’ (CR) is defined as the ratio of the number bits of the
original image to the number bits after compression
Hence ‘compression ratio’
(CR) = ( 8 m ) / ( m + 16 ) (2.4.1)
m
jix jix
m 1,
2
,2
1
m
jijix
mx
1,,
1
The ‘Bit Rate’ (BR) is a parameter defined as the ratio of the number bits
generated after BTC, including the bits for �̅� and σ, to the number of pixels in the
image.
Hence ‘Bit Rate’ (BR) = (m +16) / m.
(BR) x (CR) = 8 Bits / pixel in original image.
The ‘ Root Mean Square Error’ (RMSE) is defined as,
0.5
(2.4.2)
where ‘𝑑𝑖’ is the difference between the intensity of the 𝑖𝑡ℎ pixel in the original image
and the reconstructed image, and 262144 is equal to 512 x 512.
The Peak Signal to Noise Ratio (PSNR) is defined as
(2.4.3)
wherein ‘𝑋𝑚𝑎𝑥’ is the maximum pixel intensity in the 512x512 image.
The contrast (C) of an image is equal to the standard deviation of the intensity
values of the all the pixels of the image. Based on block by block approach, for a
‘512x512’ image of 4096 blocks of ‘8x8’ pixels, the Contrast ‘C’ of the image is
𝐶 = (1
64) √[∑ 𝜎2
𝑛4096𝑛=1 ] (2.4.4)
where ‘σn’ is the Standard Deviation of the nth ‘8x8’ block, given by
𝜎𝑛 = (1
8) √[∑ (𝑥𝑖 − �̅�)264
𝑖=1 ] (2.4.5)
dBRMSE
XPSNR maxlog1020
262144
1
2
512
1
id iRMSE
where 𝑥𝑖 = intensity of the 𝑖𝑡ℎpixel of the nth ‘8x8’ block �̅�=
mean intensity of the ‘n’th ‘8x8’ block
The above equation for ‘C’ and ‘𝜎𝑛’ are applicable both for the original and the
reconstructed images.
After the application of BTC, ‘𝑝’ numbers of the pixels, represented by 0s in the
bit plane, are assigned with low-mean intensity ‘a’, and ‘𝑞’ (= 𝑘2 − 𝑝) pixels,
represented by 1s in the bit plane, are assigned with high-mean intensity ‘b’. The
contrast ‘C’ of this BTC block is equal to the standard deviation of the ‘𝑝’ number of ‘a’s
and ‘𝑞’ (= 𝑘2 − 𝑝) number of ‘𝑏’s. Using Eqn. (2.4.5), we get
𝜎𝑛 = [(𝑏−𝑎)
𝑝+𝑞] [𝑝𝑞]
1
2 (2.4.6)
where,
𝑎 = low-mean intensity corresponding to 0s in the BTC bit plane
𝑏 = high-mean intensity corresponding to 1s in the BTC bit plane.
𝑝 = the number of 0s in the bit plane corresponding to low-mean ‘a’,
and
q = the number of 1s in the bit plane corresponding to high-mean ‘b’.
While CR and BR are dependent only on the image block size, PSNR, RMSE and C are
dependent on the intensities of pixels. The CR and BR values are listed in Table 2.1 for
various block sizes of the image.
Table 2.1: CR and BR values for various block sizes of a 512x512 image.
Block
size
2 x2
pixels
4 x4
pixels
8 x8
pixels
16 x16
pixels
32 x32
pixels
64 x 64 pixels
m=k2 4 16 64 256 1024 4096
CR 1.6 4 6.4 7.5294118 7.8769231 7.9688716
BR 5 2 1.25 1.0625 1.015625 1.0039063
This Table 2.1 is graphically shown in Fig.2.1
Figure 2.1: Graph showing the variation of Compression Ratio and Bit Rate for various
block sizes.
As the block size increases, the bit rate decreases and the CR increases as
shown in figure 2.1.
2.5 ILLUSTRATION OF BTC APPLIED TO AN ARBITRARY 4X4 BLOCK
For illustration, a 4x4 block of pixels having arbitrary gray level intensities, in
shown in Figure 2.2, along with its corresponding bit plane.
0
1
2
3
4
5
6
7
8
9
2x2 4x4 8x8 16x16 32x32 64x64
CR
an
d B
R
Block size
CR
Bit Rate
(a) (b)
Figure 2.2: (a) 4x4 Pixels Block, (b) Corresponding Bit Plane
Using equations (2.3.3) and (2.3.4) the mean ( 𝑥 ̅̅ ̅) of the 4x4 pixels block is 3
and the standard deviation (𝜎) is 2.64 . The encoder develops a single bit plane of 4x4
size by representing all 𝑥𝑖,𝑗 < 3̅ by 0s ,and all 𝑥𝑖,𝑗 ≥ 3 by 1s.This bit plane along with �̅�
and σ are transmitted to the receiver.
Using the equations (2.3.1) and (2.3.2), the decoder in the receiver estimates a
low-mean value ‘a’ (0.007), to replace the 0s, and a high- mean value ’b’ (5.328),to
replace the 1s, in the received bit plane. Thus the 0s in the bit plane are replaced by
0.007 and the 1s in the bit plane are replaced by 5.328. The 4 x4 block of the image
reconstructed by the decoder is shown in Figure 2.3.