DIGITAL IMAGE CRYPTOSYSTEM WITH ADAPTIVE STEGANOGRAPHY

JOMO KENYATTA UNIVERSITYOFAGRICULTURE AND TECHNOLOGY

INSTITUTE OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGYBSc COMPUTER TECHNOLOGY

Literature ReviewOnDIGITAL IMAGE CRYPTOSYSTEM WITH ADAPTIVE STEGANOGRAPHY

NAME: JOHN NJENGA KIMUHUREG NO: CS 282-0782/2009

SUPERVISORS:DR. OKEYOMR. J WAINAINADECLARATION

I declare that all materials presented here are my own original work, or fully and specifically acknowledged wherever adapted from other sources. The work has not been submitted previously, in whole or in part, to qualify for any academic award. The content of the thesis is the result of work which has been carried out since the official approval of my proposal.

ContentsDECLARATIONiiLIST OF FIGURESvLIST OF TABLESvACKNOWLEDGEMENTSviABSTRACTviiCHAPTER ONE1INTRODUCTION11.1 BACKGROUND2A Steganographic Framework21.2 STATEMENT OF THE PROBLEM41.3 OBJECTIVES OF THE STUDY61.4.1 Specific Objectives61.4 RESEARCH QUESTIONS7CHAPTER TWO82.0 LITERATURE REVIEW82.1Spatial Domain82.2 Transform Domain102.3 EXISTING ATTACKS112.3.1 Steganalysis112.3.1.1 Targeted Attacks122.3.1.2 Blind Attacks142.3 STATISTICAL RESTORATION17Introduction172.3.1 Embedding by Pixel18Algorithm Pixel Swap Embedding192.3.1.1 Security Analysis202.3.1.2 New Statistical Restoration Scheme212.4 Mathematical Formulation of Proposed scheme222.4.2 Restoration with Minimum Restoration252.4.3 Security Analysis262.4.3.1 Analysis262.5 SPATIAL DESYNCHRONIZATION272.5.1 Introduction272.5.2 Calibration Attack282.6 Counter Measures to Blind Steganalysis292.6.1 Spatial Block Desynchronization302.7 The Proposed Algorithm312.8 CONCLUSION AND FUTURE DIRECTIONS362.8.1 Conclusion362.8.2 Future Directions372.9 PROJECT SCHEDULE383.0 RESEARCH BUDGET39REFERENCES40

LIST OF FIGURESFigure 1.4 Frameworks for Private Key Passive Warden Steganography3Figure 2.1, Trade-off between embedding capacity, Undetectability and robustness in data hiding4Figure 3.1: A generalized steganographic framework5Figure 4.1: Flipping of set cardinalities during embedding12Figure 5.1: Calibration of the stego image for cover statistics estimation14Figure 6 Block Diagram of Spatial Block Desynchronization30Figure 7: Spatial desynchronization used in the proposed J5 Algorithm32Figure 8 Research Budget39

LIST OF TABLESTable 1: p-value of Rank Sum Test for 23 DCA34Table 2: p-value of Rank Sum Test for 274 DCA35Table 3 : Project Schedule38

ABBREVIATIONS AND SYMBOLS

J5The Proposed Algorithm that uses Spatial Desynchronization for Low DetectionQIM Quantization Index Modulation, a High-Capacity Robust Watermarking SchemeSDSASpatially Desynchronized Steganography AlgorithmYASSyet another Steganographic Scheme23 DCADimensional Calibration Attack

ACKNOWLEDGEMENTS

It is with great reverence that I wish to express my deep gratitude towards Mr. J. Wainaina for his astute guidance, constant motivation and trust, without which this work would never have been possible. I am sincerely indebted to him for his constructive criticism and suggestions for improvement at various stages of the work. I would also like to thank Madam Ann Kibe, Research Scholar, for her guidance, invaluable suggestions and for bearing with me during the thought provoking discussions which made this work possible. I am also thankful to Dr. Okeyo, for clearing some of my doubts through email. I am grateful to my parents and brother for their perennial inspiration. Last but not the least; I would like to thank all my seniors, friends and my classmates especially Ben, Lenny and Mercy for making my stay at JKUAT comfortable and a fruitful learning experience.

Date: August 22, 2013Signed..John Njenga Kimuhu

Signed..SupervisorDate______________

ABSTRACT

Steganography is defined as the science of hiding or embedding data in a transmission medium. Its ultimate objectives, which are undetectability, robustness and capacity of the hidden data, are the main factors that distinguish it from Cryptography. In this paper a study on Digital Image Cryptosystem with Adaptive Steganography has been presented.The problem of data hiding has been attacked from two directions. The first approach tries to overcome the Targeted Steganalytic Attacks. The work focuses mainly on the first order statistics based targeted attacks. Two algorithms have been presented which can preserve the first order statistics of an image after embedding. The second approach aims at resisting Blind Steganalytic Attacks especially the Calibration based Blind Attacks which try to estimate a model of the cover image from the stego image. A Statistical Hypothesis Testing framework has been developed for testing the efficiency of a blind attack. A generic framework for JPEG steganography has been proposed which disturbs the cover image model estimation of the blind attacks. Comparison results show that the proposed algorithm can successfully resist the calibration based blind attacks and some non-calibration based attacks as well.

41

CHAPTER ONE INTRODUCTION

Since the rise of the Internet one of the most important factors of information technology and communication has been the security of information. Everyday tons of data are transferred through the Internet through e-mail, file sharing sites, social networking sites etc. to name a few. As the number of Internet users rises, the concept of Internet security has also gain importance. The fiercely competitive nature of the computer industry forces web services to the market at a breakneck pace, leaving little or no time for audit of system security, while the tight labour market causes Internet project development to be staffed with less experienced personnel, who may have no training in security. This combination of market pressure, low unemployment, and rapid growth creates an environment rich in machines to be exploited, and malicious users to exploit those machines.

Due to the fast development of communication technology, it is convenient to acquire multimedia data. Unfortunately, the problem of illegal data access occurs every time and everywhere. Hence, it is important to protect the content and the authorized use of multimedia data against the attackers. Data encryption is a strategy to make the data unreadable, invisible or incomprehensible during transmission by scrambling the content of data.In an image cryptosystem, it uses some reliable encryption algorithms or secret keys to transform or encrypt secret images into ciphered images. Only the authorized users can decrypt secret images from the ciphered images. The ciphered images are meaningless and non-recognizable for any unauthorized users who grab them without knowing the decryption algorithms or the secret keys according to (Bhattacharyya, Banerjee, & Sanyal, 2011)Steganographys niche in security is to supplement cryptography, not replace it. If a hidden message is encrypted, it must also be decrypted if discovered, which provides another layer of protection.

Dissimilarly, steganographic techniques refer to methods of embedding secret data into cover data in such a way that people cannot discern the existence of the hidden data. The image steganographic methods (or called virtual image cryptosystems) are proposed to hide the secret images into readable but non-critical cover images. They are designed to reduce the notice of illegal users. Common methods for data hiding can be categorized into spatial and transform domain methods. In the spatial domain, information hiding is an emerging research area, which encompasses applications such as copyright protection for digital media, watermarking, fingerprinting, and steganography.In watermarking applications, the message contains information such as owner identification and a digital time stamp, which usually applied for copyright protection.Fingerprint, the owner of the data set embeds a serial number that uniquely identifies the user of the data set. This adds to copyright information to makes it possible to trace any unauthorized use of the data set back to the user.Steganography hide the secret message within the host data set and presence imperceptible and is to be reliably communicated to a receiver. The host data set is purposely corrupted, but in a covert way, designed to be invisible to an information analysis.

1.1 BACKGROUNDA Steganographic Framework

Any steganographic system can be studied as shown in Figure 1.4. For a steganographic algorithm having a stego-key, given any cover image the embedding process generates a stego image. The extraction process takes the stego image and using the shared key applies the inverse algorithm to extract the hidden message. This system can be explained using the prisoners problem (Figure 1.4) where Alice and Bob are two inmates who wish to communicate in order to hatch an escape plan. However communication between them is examined by the warden, Wendy. To send the secret message to Bob, Alice embeds the secret message m into the cover object c, to obtain the stego object s. The stego object is then sent through the public channel. In a pure steganographic framework, the technique for embedding the message is unknown to Wendy and shared as a secret between Alice and Bob. In private key steganography Alice and Bob share a secret key which is used to embed the message. The secret key, for example, can be a password used to seed a pseudo-random number generator to select pixel locations in an image cover-object for embedding the secret message. Wendy has no knowledge about the secret key that Alice and Bob share, although she is aware of the algorithm that they could be employing for embedding messages. In public key steganography, Alice and Bob have private-public key pairs and know each others public key. In this thesis we confine ourselves to private key steganography only.

Figure 1.4 Frameworks for Private Key Passive Warden Steganography

Figure 1.5 1 Private Key steganography

1.2 STATEMENT OF THE PROBLEM

As mentioned, steganography deals with hiding of information in some cover source, on the other hand, Steganalysis is the art and science of detecting messages hidden using steganography; this is analogous to cryptanalysis applied to cryptography.The goal of steganalysis is to identify suspected packages, determine whether or not they have a payload encoded into them, and, if possible, recover that payload. Hence, the major challenges of effective steganography are:-1. Security of Hidden Communication: In order to avoid raising the suspicions of eaves- droppers, while evading the meticulous screening of algorithmic detection, the hidden contents must be invisible both perceptually and statistically.

2. Size of Payload: Unlike watermarking, which needs to embed only a small amount of copyright information, steganography aims at hidden communication and therefore usually requires sufficient embedding capacity. Requirements for higher payload and secure communication are often contradictory.

Depending on the specific application scenarios, a tradeoff has to be sought.

Figure 2.1, Trade-off between embedding capacity, Undetectability and robustness in data hiding

One of the possible ways of categorizing the present steganalytic attacks is on the following two categories;-

1. Visual Attacks: These methods try to detect the presence of information by visual inspection either by the naked eye or by a computer. The attack is based on guessing the embedding layer of an image (say a bit plane) and then visually inspecting that layer to look for any unusual modifications in that layer.

2. Statistical Attacks: These methods use first or higher order statistics of the image to reveal tiny alterations in the statistical behavior caused by steganographic embedding and hence can successfully detect even small amounts of embedding with very high accuracy. These class of steganalytic attacks are further classified as Targeted Attacks or Blind Attacks as explained in detail in the next few sections.

Figure 3.1: A generalized steganographic framework

1.3 OBJECTIVES OF THE STUDY

In order to develop a good steganography algorithm, one should have knowledge about the different steganalysis techniques. Keeping this in mind, an approach aimed at preservation of the marginal statistics of a cover image was proposed. The preservation of marginal statistics helps in defeating the targeted attacks designed for specific steganographic algorithms and thus an algorithm that inherently preserves the first order statistics of the cover image while embedding itself was proposed.

1.4.1 Specific Objectives

This research study focuses on the following topics:1. Designing an algorithm to inherently preserve the first order statistics of the cover image while embedding itself

2. Designing a low detection algorithm for embedding data such that the stego population remains statistically closer to the cover population and the difference between these two cannot be observed in the statistics drawn from the two populations.3. Designing a digital image steganography algorithm using global and dual histogram compensation along with matrix encoding to minimize changes.4. Designing a steganographic algorithm to resist calibration based blind steganalytic attacks

1.

1.4 RESEARCH QUESTIONS

1. Is it possible to use Steganography to supplement Cryptography and not to replace it? A message in cipher text might arouse suspicion on the part of the recipient while an invisible message created with steganographic methods will not.

2. What are the known Steganalysis (counter-Steganography) methods and how can they be avoided to have a crack-proof steganography?

3. If the main goal of a good steganography is to be invisible is there a way we can come up with a stealth algorithm that will resist the most known steganalytic attacks?

CHAPTER TWO

2.0 LITERATURE REVIEW

In this chapter we provide the necessary background required for this research area. In section 2.1 we discuss briefly some of the existing steganographic techniques. In section 2.2 we present some of the steganalytic attacks proposed till date as a counter measure to the steganographic algorithms.

2.1 Spatial Domain

These techniques use the pixel gray levels and their color values directly for encoding the message bits. These techniques are some of the simplest schemes in terms of embedding and extraction complexity. The major drawback of these methods is amount of additive noise that creeps in the image which directly affects the Peak Signal to Noise Ratio and the statistical properties of the image. Moreover these embedding algorithms are applicable mainly to lossless image compression schemes like TIFF images. For lossy compression schemes like JPEG, some of the message bits get lost during the compression step.

The most common algorithm belonging to this class of techniques is the Least Significant Bit (LSB) Replacement technique in which the least significant bit of the binary representation of the pixel gray levels is used to represent the message bit. This kind of embedding leads to an addition of a noise of 0.5p on average in the pixels of the image where p is the embedding rate in bits/pixel. This kind of embedding also leads to an asymmetry and a grouping in the pixel gray values (0, 1) ;( 2, 3). . . (254,255). this asymmetry is exploited in the attacks developed for this technique as explained further in section 2.2. To overcome this undesirable asymmetry, the decision of changing the least significant bit is randomized i.e. if the message bit does not match the pixel bit, then pixel bit is either increased or decreased by 1. This technique is popularly known as LSB Matching. It can be observed that even this kind of embedding adds a noise of 0.5p on average. To further reduce the noise, (Zhang, Zhang, & Wang, 2007) have suggested the use of a binary function of two cover pixels to embed the data bits. The embedding is performed using a pair of pixels as a unit, where the LSB of the first pixel carries one bit of information, and a function of the two pixel values carries another bit of information. It has been shown that embedding in this fashion reduces the embedding noise introduced in the cover signal.

In (Bhattacharyya et al., 2011), a multiple base number system has been employed for embedding data bits. While embedding, the human vision sensitivity has been taken care of. The variance value for a block of pixels is used to compute the number base to be used for embedding. A similar kind of algorithm based on human vision sensitivity has been proposed by (, Condell, Curran, & Kevitt, 2010) by the name of Pixel Value Differencing. This approach is based on adding more amount of data bits in the high variance regions of the image for example near the edges by considering the difference values of two neighboring pixels. This approach has been improved further by clubbing it with least significant bit embedding in (Budiman, 2010).

According to (Fridrich, 2012), For a given medium, the steganographic algorithm which makes fewer embedding changes or adds less additive noise will be less detectable as compared to an algorithm which makes relatively more changes or adds higher additive noise. Following the same line of thought Crandall (Crandall, 1998) have introduced the use of an Error Control Coding technique called Matrix Encoding. In Matrix Encoding, q message bits are embedded in a group of 2q 1 cover pixels while adding a noise of 1 2q per group on average. The maximum embedding capacity that can be achieved is 2q1. For example, 2 bits of secret message can be embedded in a group of 3 pixels while adding a noise of 0.75 per group on average. The maximum embedding capacity achievable is 2/3 = 0.67 bits/pixel. F5 algorithm ( et al., 2010)is probably the most popular implementation of Matrix Encoding.

LSB replacement technique has been extended to multiple bit planes as well. Recently (Science & Goel, 2008) has claimed that LSB replacement involving more than one least significant bit planes is less detectable than single bit plane LSB replacement. Hence the use of multiple bit planes for embedding has been encouraged. But the direct use of 3 or more bit planes leads to addition of considerable amount of noise in the cover image. And (Science & Goel, 2008) have given a detailed analysis of the noise added by the LSB embedding in 3 bit planes. Also, a new algorithm which uses a combination of Single Digit Sum Function and Matrix Encoding has been proposed. It has been shown analytically that the noise added by the proposed algorithm in a pixel of the image is 0.75p as compared to 0.875p added by 3 plane LSB embedding where p is the embedding rate. One point to be observed here is that most of the approaches proposed so far are based on minimization of the noise embedded in the cover by the algorithm. Another direction of steganographic algorithm is preserving the statistics of the image which get changed due to embedding. This research paper proposes two algorithms based on this approach itself. In the next section we cover some of the transform domain steganographic algorithms.

2.2 Transform DomainThese techniques try to encode message bits in the transform domain coefficients of the image. Data embedding performed in the transform domain is widely used for robust watermarking. Similar techniques can also realize large-capacity embedding for steganography. Candidate transforms include discrete cosine Transform (DCT), discrete wavelet transform (DWT), and discrete Fourier transform (DFT).

By being embedded in the transform domain, the hidden data resides in more robust areas, spread across the entire image, and provides better resistance against signal processing. For example, we can perform a block DCT and, depending on payload and robustness requirements, choose one or more components in each block to form a new data group that, in turn, is pseudo randomly scrambled and undergoes a second-layer trans- formation. Modification is then carried out on the double transform domain coefficients using various schemes. These techniques have high embedding and extraction complexity. Because of the robustness properties of transform domain embedding, these techniques are generally more applicable to the Watermarking aspect of data hiding. Many steganographic techniques in these domains have been inspired from their watermarking counterparts.

F5 (Westfeld & Wolf, 1998) uses the Discrete Cosine Transform coefficients of an image for embedding data bits. F5 embeds data in the DCT coefficients by rounding the quantized coefficients to the nearest data bit. It also uses Matrix Encoding for reducing the embedded noise in the signal. F5 is one the most popular embedding schemes in DCT domain steganography, though it has been successfully broken in (Science & Goel, 2008).

The transform domain embedding does not necessarily mean generating the transform coefficients on blocks of size 8 8 as done in JPEG compression techniques. It is possible to design techniques which take the transforms on the whole image. Other block based JPEG domain and wavelet based embedding algorithms have been proposed in (Westfeld & Wolf, 1998) .

2.3 EXISTING ATTACKS

2.3.1 Steganalysis

Steganography is a game of hide and seek. While steganography aims at hiding data with maximum stealthiness, steganalysis aims to detect the presence of any hidden information in the stego media (in this thesis, it refers to JPEG images).

In the past, steganography avoided any visual distortions in the stego images. Hence, majority of the stego images do not reveal any visual clues as to whether a certain image contains any hidden message or not. Current steganalysis aims to focus more on detecting statistical anomalies in the stego images which are based on the features extracted from typical cover images without any modifications. Cover images without any modification or distortion contain a predictable statistical correlation which when modified in any form will result in distortions to that correlation. These include global histograms, blockiness, inter and intra block dependencies and other first and second order statistics of the image. Most steganalysis algorithms are based on exploiting these strong dependencies which are typical of natural images.

The steganalytic attacks developed till date can be classified into visual and statistical attacks. The statistical attacks can further be classified as;- 1. Targeted Attacks 2. Blind Attacks Each of these classes of attack is covered in detail in the next two subsections along with several examples of each category.

2.3.1.1 Targeted AttacksThese attacks are designed keeping a particular steganographic algorithm in mind. These attacks are based on the image features which get modified by a particular kind of steganographic embedding. A particular steganographic algorithm imposes a specific kind of behaviour on the image features. This specific kind of behaviour of the image statistics is exploited by the targeted attacks. Some of the targeted attacks are as follows:

1. Histogram Analysis: The histogram analysis method exploits the asymmetry introduced by LSB replacement. The main idea is to look for statistical artifacts of embedding in the histogram of a given image. It has been observed statistically that in natural images

Figure 4.1: Flipping of set cardinalities during embedding

(Cover images), the number of odd pixels and the number of even pixels are not equal. For higher embedding rates of LSB Replacement these quantities tend to become equal. So, based on this artifact a statistical attack based on the Chi-Square Hypothesis Testing is developed to probabilistically suggest one of the following two hypotheses:

Null Hypothesis H 0: The given image contains steganographic embedding Alternative Hypothesis H 1: The given image does not contain steganographic embedding

The decision to accept or reject the Null Hypothesis H0 is made on basis of the observed confidence value p. A more detailed discussion on Histogram Analysis can be found in ( et al., 2010).

1. Sample Pair Analysis: Sample Pair Analysis is another LSB steganalysis technique that can detect the existence of hidden messages that are randomly embedded in the least significant bits of natural continuous-tone images. It can precisely measure the length of the embedded message, even when the hidden message is very short relative to the image size. The key to this methods success is the formation of 4 subsets of pixels (X, Y , U, and V ) whose cardinalities change with LSB embedding (as shown in Figure 2.1), and such changes can be precisely quantified under the assumption that the embedded bits are randomly scattered. A detailed analysis on Sample Pair technique can be found in (Petitcolas, Anderson, & Kuhn, 1999). Another attack called RS Steganalysis based on the same concept has been independently proposed by (Kodovsk & Fridrich, 2009).1. HCF-COM based Attack: This attack first proposed by (Harmsen & Pearlman, n.d.) is based on the Center of Mass (COM) of the Histogram Characteristic Function (HCF) of an image. This attack was further extended for LSB Matching by (Ker, 2007). This attack observes the COM of a cover/stego image (C(HC)/C(HS)) and its calibrated version obtained by down sampling the image (C(HC (HC) C (HC)C (HC) C (HS) > C (H C) C (H S)From Equations 2.1 and 2.2, a dimensionless discriminator for classification can be obtained as C (HS) C (HS) of training data, an image can be classified either as cover or stego. Some other targeted attacks also exist in literature which has not been covered in this survey. A detailed survey can be found in (Kodovsk & Fridrich, 2009).

2.3.1.2 Blind Attacks

The blind approach to steganalysis is similar to the pattern classification problem. The pattern classifier, in our case a Binary Classifier, is trained on a set of training data. The training data comprises of some high order statistics of the transform domain of a set of cover and stego images and on the basis of this trained dataset the classifier is presented with images for classification as a non-embedded or an embedded image. Many of the blind steganalytic techniques often try to estimate the cover image statistics from stego image by trying to minimize the effect of embedding in the stego image. This estimation is sometimes referred to as Cover Image Prediction. Some of the most popular blind attacks are defined next.

1. Wavelet Moment Analysis (WAM): Wavelet Moment Analyzer (WAM) is the most popular Blind Steganalyzer for Spatial Domain Embedding. It has been proposed by (Goljan, Fridrich, & Holotyak, 2011). WAM uses a de-noising filter to remove Gaussian noise from images under the assumption that the stego image is an additive mixture of a non-stationary Gaussian signal (the cover image) and a stationary Gaussian signal with a known variance (the noise).

Figure 5.1: Calibration of the stego image for cover statistics estimation

As the filtering is performed in the wavelet domain, all the features (statistical moments) are calculated as higher order moments of the noise residual in the wavelet domain. The detailed procedure for calculating the WAM features in a gray scale image can be found in (Goljan et al., 2011). WAM is based on a 27 dimension feature space. It then uses a Fisher Linear Discriminant (FLD) as a classifier. It must be noted that WAM is a state of the art steganalyzer for Spatial Domain Embedding and no other blind attack has been reported which performs better than WAM.

1. Calibration Based Attacks: The calibration based attacks estimate the cover image statistics by nullifying the impact of embedding in the cover image. These attacks were first proposed by (Fridrich, 2012) And are designed for JPEG domain steganographic schemes. They estimate the cover image statistics by a process termed as Self Calibration. The steganalysis algorithms based on this self-calibration process can detect the presence of steganographic noise with almost 100% accuracy even for very low embedding rates ( et al., 2010). This calibration is done by decompressing the stego JPEG image to spatial domain and cropping 4 rows from the top and 4 columns from the left and recompressing the cropped image as shown in Figure 2.2. The cropping and subsequent recompression produces a calibrated image with most macroscopic features similar to the original cover image. The process of cropping by 4 pixels is an important step because the 8 8 grid of recompression does not see the previous JPEG compression and thus the obtained DCT coefficients are not influenced by previous quantization (and embedding) in the DCT domain.

1. Farids Wavelet Based Attack: This attack was one of the first blind attacks to be proposed in steganographic research (Lyu & Farid, n.d.) for JPEG domain steganography. It is based on the features drawn from the wavelet coefficients of an image. This attack first makes an n level wavelet decomposition of an image and computes four statistics namely Mean, Variance, Skewness and Kurtosis for each set of coefficients yielding a total of 12 (n 1) coefficients. The second set of statistics is based on the errors in an optimal linear predictor of coefficient magnitude. It is from this error that additional statistics i.e. the mean, variance, skewness, and kurtosis are extracted thus forming a 24 (n 1) dimensional feature vector. For implementation purposes, n is set to 4 i.e. four level decomposition on the image is performed for extraction of features. The source code of this attack is available at (FARID). After extraction of features, a Support Vector Machine (SVM) is used for classification.

2.3 STATISTICAL RESTORATION

Statistical undetectability is one of the main aspects of any steganographic algorithm. To maintain statistical undetectability, the steganographic techniques are designed with the aim of minimizing the artifacts introduced in the cover signal by the embedding technique. The main emphasis is generally on minimizing the noise added by embedding while increasing the payload. This is an important consideration in the design of embedding algorithms, since the noise added effects the statistical properties of a medium. As already mentioned previously, the algorithm which makes fewer embedding changes or adds less additive noise generally provides better security than the algorithm which makes relatively more changes or adds higher additive noise (Kumar, 2011). From the point of view of the steganalyst, the attacks are designed to examine a signal and look for statistics which get distorted due to embedding. These statistics range from marginal statistics of first and second order in case of targeted attacks and up to 9th order statistics for blind attacks (Goljan et al., 2011). So, in order to defeat these steganalytic attacks, there has been a shift from the above mentioned data hiding paradigm. Algorithms have been proposed which try to restore the statistics which get distorted during the embedding procedure and are used for steganalysis.

Introduction

In steganographic research several algorithms have been proposed for preserving statistical features of the cover for achieving more security. Provos Outguess algorithm ( et al., 2010)was an early attempt at histogram compensation for LSB hiding, while Eggers et al (Science & Goel, 2008) have suggested a more rigorous approach to the same end, using histogram-preserving data-mapping (HPDM) and adaptive embedding respectively. Solanki proposed a statistical restoration method for converting the stego image histogram into the cover histogram. This algorithm is based on a theorem proved by R Tzschoppe, R. Buml and J J. Eggers which tries to convert one vector x into another vector y while satisfying a Minimum Mean Square Error (MMSE) criterion. The algorithm considers the stego image histogram as source vector x and tries to convert it into the cover image histogram i.e. the target vector y. All the bins of the source histogram are compensated by mapping the input data with values in increasing order. This algorithm suffers from the following limitations:

1. The algorithm assumes the cover image to be a Gaussian cover and does not give good results for non-Gaussian cover images.

2. The algorithm ignores low probability image regions for embedding due to erratic behavior in low probability tail.

3. The algorithm has been tried specifically for Quantization Index Modulation algorithm (Solanki, Dabeer, Madhow, Manjunath, & Chandrasekaran, 2009)and it has not been tested for some well-known embedding schemes like LSB Replacement, LSB matching etc.

To overcome the above limitations we propose two algorithms for preserving the cover image statistics after embedding. The first algorithm is designed to inherently preserve the first order statistics during embedding itself. The algorithm makes an explicit attempt at restoring the cover image histogram after embedding. These algorithms are discussed in detail in the next two sections.

2.3.1 Embedding by Pixel

The main motivation the steganographic algorithm proposed in this section is to embed data such that the histogram of the image does not get modified. Such a requirement entails an embedding procedure which does not modify the pixel values such that the corresponding bin value in the histogram is changed. We propose a simple yet effective algorithm called Pixel Swap Embedding which embeds message bits into the cover image without making any modifications to the image histogram. The main idea is to consider a pair of pixels such that their difference is within a fixed threshold value. To embed a value of 0, check if the first pixel is greater than the second pixel or not. Otherwise swap these two gray level values. Similarly pixel value of 1 can be embedded by making the value of first pixel lesser than the second pixel. The algorithm is discussed formally in the next subsection.

Algorithm Pixel Swap EmbeddingThe algorithm is summarized below.Algorithm: Pixel Swap Embedding (PSE) Input: Cover Image (I) Input Parameters: Message Stream (), Threshold (), Shared Pseudo Random Key (k) Output: Stego Image IsBegin 1. (X1, x2) = randomize (i,k)2. if |x1 x2| then goto step 3else goto step 1. 3. if (i) = 0if x1 x2 then swap(x1,x2) i = i+1else i = i+1 goto step 1else goto step 4. 4. if (i) = 1if x1 x2 then swap(x1,x2) i = i+1 else i = i+1 goto step 1else goto step 1.End Pixel Swap Embedding

The Randomize (I,k) function generates random non-overlapping pairs of pixels (x1,x2) using the secret key k shared by both ends. Once a pair (x1, x2) has been used by the algorithm it cannot be reused again. The function Swap(x1, x2) interchanges the gray values of the two pixels x1 and x2. The extraction of the message bits is a simple inverse process of the above algorithm. It is easily understood that this scheme automatically preserves the values of all image histogram bins since no extra value is introduced in the cover. Hence it can resist the attacks based on first order statistics. One important point to be observed here is that the threshold used in the algorithm directs the tradeoff between the embedding rate and the noise introduced in the cover signal. The noise added shall be limited as long as is kept small. We tested the algorithm for = 2 and = 5 i.e. effectively we are making modifications to the Least Significant Planes of the pixel gray level but without changing the bin value of the two gray values. The achievable embedding rate would be high for images having low variance than for images having high variance as the number of pixel pairs satisfying the condition in Step 2 of the PSE algorithm would be higher in the former case than in the latter case.

2.3.1.1 Security Analysis

To check the robustness of the PSE algorithm we conducted security tests on a set of one hundred gray scale images (Bhattacharyya et al., 2011). All the images were converted to the Tagged Image Format (TIFF) and resized to 256256 pixels. PSE was tested against the Sample Pair attack proposed in (Science & Goel, 2008). As explained in 2.2.1 Sample Pair is a targeted attack based on the first order statistics of the cover image and tries to exploit the distortion which takes place in the image statistics. Also, a similar kind of attack called RS-Steganalysis has been proposed independently by (Bhattacharyya et al., 2011)which is based on the same concept of exploiting the first order statistics of the cover image. Hence, in this work we have tested the performance of our schemes against Sample Pair Attack only assuming that it will give similar performance against RS- Steganalysis as well. The performance of PSE against Sample Pair has been shown in Figure 3.3. Data bits were hidden in the images as the maximum possible embedding rates for = 5. It can be observed that the message length predicted by Sample Pair Attack is much less than the actual message length embedded in the image. In the next section we introduce the second algorithm based on the idea of statistical preservation which explicitly tries to match the cover image histogram after embedding.

2.3.1.2 New Statistical Restoration Scheme

In this section we propose a new statistical restoration scheme which explicitly tries to convert the stego image histogram into the cover image histogram after completion of embedding. As mentioned in 2.2, the restoration algorithm proposed in (Solanki et al., 2009; Sullivan, Solanki, Manjunath, Madhow, & Chandrasekaran, 2006)gives good results only under the assumption that the cover image will be close to a Gaussian distribution. The proposed scheme tries to overcome this limitation and provides better restoration of image histogram for non-Gaussian cover distributions as well. The histogram h (I) of an gray scale image I with range of gray value [0 . . . L] can be interpreted as a discrete function where h(rk) = nk/n where rk is kth gray level, nk is the number of pixels with gray value = rk and n is the total number of pixels in the image I. Histogram h(I) can also be represented as h(I) = {h(r0),h(r1),h(r2), . . . ,h(rL1)} or simply, h(I) = {h(0),h(1),h(2), . . . ,h(L 1)}. Let us represent the histogram of the stego image h (I) as follows:-

(a) Maximum achievable embedding rate for =2

(b) Maximum achievable embedding rate for =5We then categorize the image pixels into two streams, Embedding Stream and the Restoration Stream. During embedding we maintain the Meta data about those pixels which get changed during embedding and the amount of change in those pixels. Then we compensate the histogram with the pixels from the Restoration Stream using the Meta data information such that the original histogram of the cover can be restored. So by restoration we try to equalize S [h (I) and h (I). The algorithm is formalized in the next section.

2.4 Mathematical Formulation of Proposed scheme

The proposed restoration scheme is dependent on the embedding scheme. The whole idea of embedding and restoring is that some of image pixels are used for embedding and rest are used for restoration. Without loss of generality, we can say that if number of pixels used for embedding is greater than 50% of the whole image then complete restoration is not possible but converse is not always true. One cannot say that if the numbers of available compensation pixels are greater than or equal to 50% of the whole image, then full compensation is possible. But we can certainly see that the probability of full compensation increases with increase in the number of pixels available for compensation. So a tradeoff has to be sought between the embedding rate and restoration percentage in order to get the optimum embedding procedure. For better understanding of the algorithm some definitions are described next. Let the cover image, stego image (i.e. embedded but not yet compensated) and compensated stego image (stego image after compensation) be defined by C, S and R respectively. Suppose Cij, Sij and Rij represent the (i, j) the pixel of C, S and R images respectively (0 < i < m, 0 < j < n, m is number of rows and n is number of columns of image matrices).Embed Matrix (): It is a m n characteristic matrix representing whether a pixel has been used for embedding or not.

If (i, j) Th pixel is used for embedding (i, j) (3, 1)If (i, j) Th pixel is not used for embedding

i. Compensation Vector (): It is a one dimensional vector with length L where L is number of existing gray levels in the cover image (C). (k) = u means that u number of pixels with gray value k can be used for restoration.ii. Changed Matrix (): It is an L L matrix where L is number of existing gray levels in the cover image (C). (x, y) = means during embedding number of pixels are changed from gray value x to gray value y.

2.4.1 Algorithm Statistical Restoration

The statistical restoration algorithm is summarized below:Algorithm: Statistical Restoration Algorithm (SRA) Input: Cover Image (I) Input Parameters: Compensation Matrix (), Changed Matrix () Output: Stego Image (Is)Beginfor all k (i, j) do {1. K = (i, j)2. If k > 0, k number of pixels with gray value i from the set of pixels used for compensation are changed to gray value j for full compensation. Else k pixels with gray value j from the set of pixels used for compensation are changed to gray value i for full compensation.3. Modify the Compensation Vector () In the above algorithm we have made the assumption that for (i) < k, full compensation is not possible.

2.4.2 Restoration with Minimum Restoration

The additional noise added due to compensation is an important issue. The goal is to design a restoration procedure in such a way that additional noise should be kept minimal. In the SRA algorithm, the noise introduced depends on the embedding algorithm used. The total noise () introduced at the time of restoration can be estimated by:- Equation 1

Where ^h (i) and h (i) is the histogram of the stego and cover images respectively. L - 1 is the no. of bins in the histogram. Ki (0 _ Ki _ L - 1) is a bin that is used to repair at least one unit of data in ith bin.

Where 1< abs (i - Ki)

DIGITAL IMAGE CRYPTOSYSTEM WITH ADAPTIVE STEGANOGRAPHY

Documents