Top Banner
THE UNIVERSITY OF TENNESSEE ECE 572 Digital Image Processing Project #3 Steganography Implementation Instructor: Dr. Qi Student: Getao Liang December 10, 2005
22
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Prj3 Report

THE UNIVERSITY OF TENNESSEE

ECE 572

Digital Image Processing

Project #3

Steganography Implementation

Instructor: Dr. Qi

Student: Getao Liang

December 10, 2005

Page 2: Prj3 Report

THE UNIVERSITY OF TENNESSEE

Abstract: Steganography is derived from the Greek for covered writing and essentially means “to hide in plain sight”. It is the art and science of communicating in such a way that the presence of a message cannot be detected. As other image processing technologies, steganography can also be applied on both spatial domain and frequency domain. In this project, I will discuss the theory and implementation of two simple schemes for steganography, which are LSB technique and transform domain technique. By evaluation the techniques with both text and image message, I try to show the different performance produced by these two algorithm with different parameters. Besides, I also try to test the robustness against JPEG lossy compression for DCT scheme to prove the correctness of the theory and my implementation. All the results from the experiments support my assumption and the theoretical analysis.

Page 3: Prj3 Report

THE UNIVERSITY OF TENNESSEE I. Introduction With the rapid development of the world integration in economy and culture, communication between people in different continents plays an important role. Since information is the most important thing within the communication, methods are proposed to secure the confidential content from undesired observer. The technologies for this purpose can be divided into two main categories, Cryptography and Steganography. The main difference between these two technologies is the appearance of the processed data, or results. Cryptography tries to find algorithms to encode the information into sort of random data that is unknowable to unauthorized receivers. However any receiver can tell there is hidden information existed within the object they received from the strange and unreadable appearance, even though no one can know the exact secret message except the one with decoder. Steganography is derived from the Greek for covered writing and essentially means “to hide in plain sight”. It is the art and science of communicating in such a way that the presence of a message cannot be detected. That means, different to cryptography, steganography deals with not only hiding information from undesired receiver, but also preventing the understand of the very existence of the hiding message from enemy, who might touches the processed cover by any chance, including passive, active and malicious receiver. A few criteria and requirements are used to measure the efficient of a steganographiy technique. The first one is imperceptibility, which requires the results to be unaware by receivers with advanced analysis technologies, such as statistical analysis and high-pass filter analysis. This is the most important, when you are facing active or malicious attackers, who would have great resource to apply analysis and break to the cover. Security is another important factor, which concerns more on the difficulties of perform inverse performance on the cover to get the hidden message. Capacity is used to describe maximum amount the secure data a cover can hold. The last thing is the robustness against cover modification, malicious modification or lossy compression. But it is impossible to satisfy all the requirements at the same time, so it is for the designer to trade-off between them to find a most optimized scheme for preferred circumstance. People have been using different steganographiy technologies for many fields covering from civilian communication to military secret information exchange, ever since old ages. The demand of security in modern electrical communication boosts the development of steganographiy. There are many techniques for steganographiy, including substitution systems, transformed domain techniques and statistical steganography. The first one is a most simple method, but it is easy to be decoded or destroyed. The last two technologies are both applied on the supraliminal channels of the cover, which makes it harder to be destroyed without great distortion on the cover.

Page 4: Prj3 Report

THE UNIVERSITY OF TENNESSEE II. Technical Approach

All the steganographiy techniques can be treated as two categories based on what channel the method used to hide the secret message. One of them is substitution system or bitplane tool, which uses the lease significant bits (LSB) in the cover data. The other one is applied to supraliminal channels of the cover, which are the main construction of the cover. Discrete Cosine Transform (DCT) technique is one of them. Following, I will talk about the two important technique mentioned in detail. To fully understand the processing mechanism of different steganographiy algorithms, I used c++ programming language to implement and simulate LSB and DCT techniques with secret message in the form of both image and plain text. The cover I choose is an black and white image (lena.pgm, 512x512). In order to exam the efficiency of the algorithm, I use MATLAB to perform some analysis. Also the results of analysis will make you have a better visual explanation on the difference. I used image library of version 3 from Dr. Qi.

1. Least Significant Bit (LSB) Technique Least significant bit approach, which apply LSB insertion and noise manupulation is common in steganograpgy and is relatively easy to apply in image and audio. A surprising amount of information can be hidden in the cover image, but with only little perceptible impact to the appearance of the carrier. This technique utilizes the property of the digitalized form of cover data. As for a normal 8-bit data, some portion of the bits play more important role than the others do. As can be seen from Figure 1, the 7-bit located in the left is called Most Significant Bit (MSB), and the 0-bit on the right is referred to Least Significant Bit (LSB). Of course, the significant between the bits are relative to each others. The 5-bit is more significant than the 4-bit, but less significant than the 6-bit. Usually the higher level bits contribute great to represent the information contained, and the lower level bits always contain noise. Thanks to this property, we can hide messenger in the bits, where are always occupied by noise, such that the attacker might ignore them and fail to be aware the information hidden in the cover.

MSB LSB

0-bit 1-bit 2-bit 3-bit 4-bit 5-bit 7-bit 6-bit

Figure-1. 8-bit data

Page 5: Prj3 Report

THE UNIVERSITY OF TENNESSEE To hide more information of the secret message, we need to replace the LSBs of

nce you made the decision on how many bits to be replaced, you can continue to

ne simple way to avoid this problem is to extend the secret message to the same

the cover with same length of MSBs of the secret message. For an 8-bit cover, we now have at lease seven bits to be replaced by the secret message. Figure 2 shows the cases of different bits used in the scheme. More bits we used for the message means more capacity, more robust against cover modification, but greater damage to the appearance of the cover, so less imperceptibility or security of the scheme.

0-bit 1-bit 2-bit 3-bit 4-bit 5-bit 7-bit 6-bit Cover

0-bit 1-bit 2-bit 3-bit 4-bit 5-bit 7-bit 6-bit Message

Figure-2. LSB encode cases

Ofind a scheme to select the bytes we used to hide the information of secret message. In this step, it is very important to take the probability or distribution of the original data into account. The secret message is always far shorter than the carrier. Let’s consider a very simple selection method, which uses the bytes from the very beginning of the cover with the same length of the secret message. Because of the added hidden information, the data distribution of the first portion of the processed cover is various from other parts. So it is very easy for the attacker with sufficient analysis tools to break the hidden information. Osize as the cover by inserting random bits before performing the secret writing. So the distributions of different portions will tend to be so similar that it is not so easy to be cracked. But if the attackers have priori knowledge of the distribution pattern of the cover, they can easy detect the different between original cover and the stego-cover caused by the secret information and random noise. Some more

7-bit 1-bit 2-bit 3-bit 4-bit 5-bit 7-bit 6-bit

LSB Coded (4 bit) 4-bit 5-bit 7-bit 6-bit 5-bit 7-bit 6-bit 4-bit

LSB Coded (1 bit)

LSB Coded (4 bit) 7-bit 1-bit 2-bit 3-bit 4-bit 5-bit 7-bit 6-bit

Page 6: Prj3 Report

THE UNIVERSITY OF TENNESSEE complicated techniques can help to solve this problem. Besides statistic analysis, another method the attackers always used for analysis is

or a normal image, edges are only located in small portion of the whole image.

Laplacian high-pass filter. Let’s take image processing as example. HP filter is used to detect edges in image by performing second-order differential operation to neighbor pixels, the mechanism of which can be shown by the following equation.

),(4)1,()1,(),1(),1(),(2 yxfyxfyxfyxfyxfyxf −−+++−++=∀

FSo, the intensity histogram of the HP filtered original image should be concentrated around zero. However, for stego-image, due to the hidden information substituted, the different between neighbor pixels become larger than it was, so the selection method mentioned on last paragraph is not imperceptions to advanced attackers at all.

Figure-3. (a)Histogram of HPed covers, (b)Histogram of HPed stego-images

hat’s why a more sophisticated approach is introduced, called random interval

corresponding interval , …, .

;

and

Tmethod, which use a pseudorandom generator to spread the secret message over the entire cover in a rather random manner. To use this method, both part of the communication should share a stego-key k used as a seed for generation, and also

the equation to generate pseudorandom sequence k , …, k , and the 1 m

1j mj

)( 1−= nn kfk

11 kj = nnn kkj += −1

In my program, for simplicity, I used the simplest generator equations:

and

kkn =

nnn kjj += −1 . The performance should be the same as the above equations.

So the encoder and decoder algorithms for the LSB technique with random

Page 7: Prj3 Report

THE UNIVERSITY OF TENNESSEE interval method are list below, where k and j are m-th pseudorandom

number and interval, M(m) is m-th message, (i,j) are intensity of cover pixel and stego-image pixel.

m m

C(i,j) and S

Algorithm-1: LSB Encoder

equence using

Calculate random interval sequence

then

<- M(m))

)=C(i,j)

end

lgorithm-2: LSB Decoder

equence using

Calculate random interval sequence

end

. Transform Domain Technique (TDT) embed secret information in cover

Generate random s mk 1k

mj

for all cover pixels

if (i,j) match j m

S(i,j) = (C(i,j)_LSB

else

S(i,j

end if

m++

for

A

Generate random s mk 1k

mj

for all stego-image pixels

if (i,j) match j then m

M(m) = S(i,j)_LSB

end if

m++

for

2Though the LSB technique is an easy way to image without changing the original appearance a lot, its fatal disadvantage is that a slight modification or processing, such as denoising, lossy compression and smoothing on the stego-image might easily destroy the hiding message. To hide the secret information in the frequency domain, in stead of in the time/spatial domain, can solve the problem. Different from LSB technique, it hides information in the significant area of the transform domain to achieve higher robust to cover modification, including compression, cropping, and some image

Page 8: Prj3 Report

THE UNIVERSITY OF TENNESSEE processing. At the same time of providing great robust, transform domain technique also try to cause less noticeable visual difference, which means good imperceptibility. Discrete Cosine Transform is one of the methods of transform domain. The

advantages of DCT to other transform include simple algorithm and real number operation. These benefits make it a most popular technique for steganograpgy for image.

∑∑− − ++

=1 1

)2

)12(cos()2

)12(cos(),()()(2),(N

i

N

j Njv

NiujisvAuA

NvuS ππ

∑∑− − ++

=1 1

)2

)12(cos()2

)12(cos(),()()(2),(N

u

N

v Njv

NiuvuSvAuA

Nvus ππ

where ⎪⎩

⎪⎨⎧ =

=otherwise

kuA

0

12

1)(

ust like Fourier Transform, DCT is reversible, so we can first perform DCT to an

DT with DCT is designed concerned the specification of JPEG, so the result

Jcover image, then do apply some kind of algorithm to secret write information on the significant area of the transform domain, and perform the inverse transform to the modified data set. Tproduced by TDT with DCT technique can survive JPEG lossy compression greatly. The following figures (Figure-4 and -5) show the encoder and decoder procedures of one of the popular steganograpgy techniques using DCT and its inverse transform.

Figure-4. DCT encoder diagram

Divide into 8x8 blocks

Cover

Divide into 8x8 blocks

Divide into 8x8 blocks

Message

DCT to selected block

Message embed algorithm

iDCT to each block

Combine all blocks

Stego-Image

Page 9: Prj3 Report

THE UNIVERSITY OF TENNESSEE

Divide into 8x8 blocks

Divide into 8x8 blocks

Divide into 8x8 blocks

Message

DCT to selected block

Message decode algorithm

Stego-Image

Figure-5. DCT decoder diagram

Knowing the basic procedures of TDT technique, we now focus more on the embedding and extraction algorithms, that perform core roles of the steganograpgy, located in encoder and decoder respectively. To design the algorithms to hide information in the frequency domain, we need to first understand how DCT coefficients represent the image information, and the quantization mechanism of JPEG. The knowledge from these fields tell us that, by switching a few DCT coefficients, which is selected based on the quantization rule, will not change the carrier appearance a lot. So we can store 1 bit secret information on one DCT-transformed image block by the relative amount of 2 or more DCT coefficients. Exchange the coefficients with each other, when it dis-obeys the designed rule when embedding secret bits. The encoding and

decoding algorithms can be designed as follow, where is the i-th message bit,

is the selected block for i-th message bit, is ’s CDT, and &

are the two designed coefficient pair.

im

ib iB ib ),( 11 vu

),( 22 vu

Algorithm-3: TDT_CDT Encoder

for all message bit im

choose selected block

)( ii bCDTB =

if <0 then im

if then ),(),( 2211 vuBvuB ii >

Page 10: Prj3 Report

THE UNIVERSITY OF TENNESSEE swep and ),( 11 vuBi ),( 22 vuBi

end if

else

if ),(),( 2211 vuBvuB ii ≤ then

swep and ),( 11 vuBi ),( 22 vuBi

end if

end if

adjust both values so that xvuBvuB ii >− |),(),(| 2211

)( ''ii BiCDTb =

end for

create stego-image out of all ib '

Algorithm-4: TDT_DCT Decoder

for all selected block ' iB

)( ''ii bCDTB =

if then ),(),( 2211 vuBvuB ii >

=1 im

else

im =0

end if

end for

The value adjusting of the encoder used to enhance the robust against cover modification, the higher x is, the higher robust the algorithm gets, but less imperceptive. Based on the quantization values used in the JPEG compression scheme, I choose the B(4,1) and B(3,2) to be the coefficient pair used for storage of hiding message, since the quantization level of these two coefficients are the same. So this selection performs very well for JPEG compression.

Page 11: Prj3 Report

THE UNIVERSITY OF TENNESSEE III. Experiments and Results

1. Least Significant Bit (LSB) Technique Secret message can be in various forms, such as plain text, image, sound or any other digitized form. In this project, I am trying to implement the LSB technique to hide both plain text and image, in order to fully test the efficient and applicability of the algorithms. Two algorithms were created. One is a simple LSB technique with continuous selection of elements to hide message, and the other one is a more sophisticated scheme that applied pseudorandom interval method. Both algorithms will be tested with text and image secret message. The test results are shown below for better illustration. Besides, I used a MATLAB program to calculate SNR between the cover-image and the stego-image to give a numerical comparison between the results of different schemes or same schemes with different parameters. a. Short Text Message

To give a better view of the performance of the algorithm, cases with different parameters are performed. For simplicity of the report, only 3 significant cases are given for text message. Detail information will be given in the part of testing image message. Text message is totally different from the image, because of their property. The secret message is “Hello World.” And the cover image I choose is a 97x128 PGM file from the internet, which is shown as Figure-6. The histogram of the cover image and the high-pass filtered cover are also shown below in Figure-7.

Figure-6. Cover Image

Figure-7. Histogram of Cover and HPed-Cover

Page 12: Prj3 Report

THE UNIVERSITY OF TENNESSEE As mentioned above, the MSBs of image contribute more than the LSBs in most cases. So only portion of the MSBs can give the viewer a approximation of the original image. But it is not true for plain text; change on any bit of the character 8-bit representation will be decoded to another character. So it is important to hide all bit data of the text message into the cover. In my program, the 8-bit data are divided into signal bit and each of them will be embedded in one selected cover pixel. So that 8 pixels for each of 12 characters needs 72 pixels as cover elements. Also for comparison, I will prove results from cases with hiding only 2 and 7 MSBs of the character. The program output is list below for all cases.

Input secrect message text is: Hello World. Case#1 – 8 bits: Decoded text message text is: Hello World. Decoded DEC message in DEC is: 72 101 108 108 111 32 87 111 114 108 100 46 Case#2 – 7 bits: Decoded text message text is: Hdlln Vnrld. Decoded DEC message in DEC is: 72 100 108 108 110 32 86 110 114 108 100 46 Case#3 – 2 bits: Decoded text message text is: @@@@@ @@@@@ Decoded DEC message in DEC is: 64 64 64 64 64 0 64 64 64 64 64 0

(a) (b) (c)

Figure-8. Stego-image for hiding 8, 7 and 2 MSBs

The SNRs for the above 3 cases are around 45.1375 dB. That is because only one LSB of the cover are replaced by the secret message bit, and another reason is that the secret message are pretty small in size, only 12 bytes. The above results prove the implemented program can successfully embedding and extract short secret text message in to image. Of course, we expect it is also true for long text message, which will be tested in the following section.

Page 13: Prj3 Report

THE UNIVERSITY OF TENNESSEE b. Long Text Message

Basically, the case for long text message is pretty similar to those for the short message. However, since the longer secret message, we can have some more visual perception of the algorithm from detail and difference images between cover and stego-image. For this case, I chose a long enough paragraph from New York Times randomly. So it should be strong enough to represent any readable text message in the world. One problem, the linux command line did response to some special symbol, such as !, / and *. So I have no choice but to skip them.

(a) (b)

Figure-9. (a). Stego-image using simple scheme; (b).Difference between cover and stego-image;

Figure-10. Histogram of Cover and HPed-Cover

(a) (b) (b)

Figure-11. Other results with different parameters. (a). Diff-image for 2000 displacement of first cover element; (b)-(c). Diff-image using

random interval scheme with key=2 and key=3 ;

Page 14: Prj3 Report

THE UNIVERSITY OF TENNESSEE

Figure-12. Histogram of Cover and HPed-Cover for case from Figure11(c)

Though the histograms of the cover form both cases are pretty the same as that of the original one, the histograms for HPed-cover for case 2, which is shown in Figure-11(c) and Figure-12, looks closer to that of the cover than that of the other case. That contributes to the efficient immunity to Laplasian analysis of the random interval method, by trading the cost of algorithm complexity and pre-communication between encoder and decoder for keys and equations. The SNRs for the above 4 cases are 31.0206 dB, 32.6937 dB, 34.3686 dB and 35.5778dB. You might notice that the SNR is improved greatly from case1 to case4. That’s also the advantage of the random interval method.

c. Image Message

For better visual interpretation, I used a larger image, 512x512 lena.pgm, for cover image, and two message images, one with same size of the cover, 512x512 barbara.pgm, and one with smaller size, 97x128 fig1.pgm.

Figure-13. Cover, Message#1 and Message#2

The reason I choose to use 2 message images is mainly to test the full 8-bit data embedding for lossless image steganography, which makes it impossible to embed the message into a cover with a size less than 8 times larger than that of message.

Page 15: Prj3 Report

THE UNIVERSITY OF TENNESSEE First we will test the simple scheme first with embedding message information from 1 bit to 7 bits. The stego-images and the decoded message images for all cases are shown below for comparison.

Case#1: 1 bit LSB Case

Case#2: 2 bit LSB Case

Case#3: 3 bit LSB Case

Page 16: Prj3 Report

THE UNIVERSITY OF TENNESSEE Case#4: 4 bit LSB Case

Case#5: 5 bit LSB Case

Case#6: 6 bit LSB Case

Page 17: Prj3 Report

THE UNIVERSITY OF TENNESSEE Case#7: 7 bit LSB Case

Figure-14. Stego-images and Decoded Message for 7 cases

From the stego-images of all cases in figure-14, we can find that the quality of stego-image is getting worse and worse when the number of used LSBs increases. On the other hand, the decode images are getting more and more matching in quality compared to the original message. This means that the more LSBs used for secret writing causes more capacity and accuracy, but less imperceptivity. The designer should trade-off these parameters for different purpose. To illustrate the imperceptivity of different cases, the histograms of the stego-images of case1 and case7 are listed below. The different shape of the 4th picture tells us that the difference between neighbor pixels in the stego-image is far greater than that of case1, which means case7 is less imperceptivity than case1. This conclusion matches the one we derived from observing the stego-images.

Figure-15. Histogram of Cover and HPed-Cover for case1 and case7

Page 18: Prj3 Report

THE UNIVERSITY OF TENNESSEE Next, we will exam the random interval method used in LSB technique. As we mentioned above, we use a smaller size image as secret message, and try to embed the full 8-bit data of each pixel into the same cover image. For comparison, we will perform two simulations with key equals to 1 and 2.

------------------------------

Figure-16. Stego-image, decoded message, difference image and histogram of Cover and HPed-Cover for case of key = 1 and key = 2.

Page 19: Prj3 Report

THE UNIVERSITY OF TENNESSEE 2. Transform Domain Technique Just as what was done for the LSB technique, I will try to exam the transform domain steganograpgy with discrete cosine transform by embedding both plain text and image message. For text message, I use the same one as for LSB scheme, which is “Hello World.” However, since full data embedding is applied for transform domain technique and each 8x8 image block is used to secret write one bit of message, we need to make sure the cover are 8x8x8 times larger in size than the message. Since the evaluation of the algorithm are the same as the LSB scheme, we will focus more on showing result images rather than to explain the cause. a. Plain Text Message

For simplicity and better visual results, I only examed the scheme with the short text message and a small size cover image, 256x256 cameraman.pgm, which is listed below with the results. Input secrect message text is: Hello World. Decoded text message text is: Hello World. Decoded DEC message in DEC is: 72 101 108 108 111 32 87 111 114 108 100 46

Figure-17. Cover Image, Stego-image and difference image

From the results image, we can see that it is hardly to find any noticeable changes between the stego-image and the cover image. And the beautiful mono difference image again prove the conclusion that, by using transform domain technique, we can embed secret data into cover without changing too much of the appearance of the carrier, while providing more robust to other forms of modification.

b. Image Message

Next, let’s work on the scheme with image message. The message image (zoomed) I generated for the test purpose, 16x16 UT.pgm, is shown below.

Figure-18. Message Image

Page 20: Prj3 Report

THE UNIVERSITY OF TENNESSEE For this section, I use a 512*512 lena.pgm as test image, which have been already shown for the former sections. The recovered message image and the difference image between the cover and the stego-image are shown below.

Figure-19. Recovered message(zoomed), Stego-image and difference image

Again, we cannot tell any difference between the stego-image and the cover image, and this is also supported by the mono-black difference image between the cover and the stego-image. You might notice the difference image in this section is black while that of the previous section is something white. This is caused by the auto-rescale function from the image library I used and the quantization process when the c++ program was performing the DCT and IDCT. However, the monochrome diff-image indicates the small difference between two images. As we mention before, the DCT scheme has a good performance in surviving the JPEG lossy compression. Now, I will try to compress the stego-image with 80%, 85%, 90% and 95% JPEG qualities respectively, and then try to recover the secret image we embedded. And the results are listed below, whose qualities are from worst to best, when the compression rates rising. We can still easily recognized the shape from the 3rd and 4th recovered message, but not for the first 2 images.

Figure-19. Recovered message(zoomed), for 80%, 85%, 90% and 95%

Page 21: Prj3 Report

THE UNIVERSITY OF TENNESSEE IV. Conclusion In this project, I have gone through the comprehensive step on steganography in both spatial domain and frequency domain. The results indicated the algorithm on bith domains can finish the steganography task greatly, given there is not active or malicious attacker involved. For LSB technique, we can use a few MSBs of the message to substitute the same length LSBs of the cover. And the number of LSBs to be replace for the secret message determines the quality of recovered image. This is only true to message in form of sound or image, but not true for text message. All 8 bits must be embedded for the successful recovery of message. Besides, the selection algorithm of cover elements is very important when there are active attackers existed. By choosing the selection scheme carefully, we can manage to make the distribution of the stego-image look like the original one. The transform domain technique is much better than the LSB scheme by higher imperceptive, more security and more robust to cover modification. But the cost for this improvement is the algorithm complexity. The result told me that using DCT scheme for secret writing; the image message can be recovered approximately in some degree even though the stego-image is compressed by JPEG.

Page 22: Prj3 Report

THE UNIVERSITY OF TENNESSEE V. Reference

Digital Image Processing – Rafael C. Gonzalez & Richard E.Woods Information hiding techniques for steganography and digital watermarking - Stefan Katzenbeisser, Fabien A.P. Petitcolas, editors. Boston : Artech House, 2000.

http://www.cs.bham.ac.uk/~mdr/teaching/modules03/security/students/SS5/Steganography.htm http://www.zenith.com/sub_hdtv/mpeg_tutorial/DCxfrm.HTM http://www.wepin.com/pgp/stego.html