JOMO KENYATTA UNIVERSITYOFAGRICULTURE AND TECHNOLOGY
INSTITUTE OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGYBSc
COMPUTER TECHNOLOGY
Literature ReviewOnDIGITAL IMAGE CRYPTOSYSTEM WITH ADAPTIVE
STEGANOGRAPHY
NAME: JOHN NJENGA KIMUHUREG NO: CS 282-0782/2009
SUPERVISORS:DR. OKEYOMR. J WAINAINADECLARATION
I declare that all materials presented here are my own original
work, or fully and specifically acknowledged wherever adapted from
other sources. The work has not been submitted previously, in whole
or in part, to qualify for any academic award. The content of the
thesis is the result of work which has been carried out since the
official approval of my proposal.
ContentsDECLARATIONiiLIST OF FIGURESvLIST OF
TABLESvACKNOWLEDGEMENTSviABSTRACTviiCHAPTER ONE1INTRODUCTION11.1
BACKGROUND2A Steganographic Framework21.2 STATEMENT OF THE
PROBLEM41.3 OBJECTIVES OF THE STUDY61.4.1 Specific Objectives61.4
RESEARCH QUESTIONS7CHAPTER TWO82.0 LITERATURE REVIEW82.1Spatial
Domain82.2 Transform Domain102.3 EXISTING ATTACKS112.3.1
Steganalysis112.3.1.1 Targeted Attacks122.3.1.2 Blind Attacks142.3
STATISTICAL RESTORATION17Introduction172.3.1 Embedding by
Pixel18Algorithm Pixel Swap Embedding192.3.1.1 Security
Analysis202.3.1.2 New Statistical Restoration Scheme212.4
Mathematical Formulation of Proposed scheme222.4.2 Restoration with
Minimum Restoration252.4.3 Security Analysis262.4.3.1 Analysis262.5
SPATIAL DESYNCHRONIZATION272.5.1 Introduction272.5.2 Calibration
Attack282.6 Counter Measures to Blind Steganalysis292.6.1 Spatial
Block Desynchronization302.7 The Proposed Algorithm312.8 CONCLUSION
AND FUTURE DIRECTIONS362.8.1 Conclusion362.8.2 Future
Directions372.9 PROJECT SCHEDULE383.0 RESEARCH
BUDGET39REFERENCES40
LIST OF FIGURESFigure 1.4 Frameworks for Private Key Passive
Warden Steganography3Figure 2.1, Trade-off between embedding
capacity, Undetectability and robustness in data hiding4Figure 3.1:
A generalized steganographic framework5Figure 4.1: Flipping of set
cardinalities during embedding12Figure 5.1: Calibration of the
stego image for cover statistics estimation14Figure 6 Block Diagram
of Spatial Block Desynchronization30Figure 7: Spatial
desynchronization used in the proposed J5 Algorithm32Figure 8
Research Budget39
LIST OF TABLESTable 1: p-value of Rank Sum Test for 23
DCA34Table 2: p-value of Rank Sum Test for 274 DCA35Table 3 :
Project Schedule38
ABBREVIATIONS AND SYMBOLS
J5The Proposed Algorithm that uses Spatial Desynchronization for
Low DetectionQIM Quantization Index Modulation, a High-Capacity
Robust Watermarking SchemeSDSASpatially Desynchronized
Steganography AlgorithmYASSyet another Steganographic Scheme23
DCADimensional Calibration Attack
ACKNOWLEDGEMENTS
It is with great reverence that I wish to express my deep
gratitude towards Mr. J. Wainaina for his astute guidance, constant
motivation and trust, without which this work would never have been
possible. I am sincerely indebted to him for his constructive
criticism and suggestions for improvement at various stages of the
work. I would also like to thank Madam Ann Kibe, Research Scholar,
for her guidance, invaluable suggestions and for bearing with me
during the thought provoking discussions which made this work
possible. I am also thankful to Dr. Okeyo, for clearing some of my
doubts through email. I am grateful to my parents and brother for
their perennial inspiration. Last but not the least; I would like
to thank all my seniors, friends and my classmates especially Ben,
Lenny and Mercy for making my stay at JKUAT comfortable and a
fruitful learning experience.
Date: August 22, 2013Signed..John Njenga Kimuhu
Signed..SupervisorDate______________
ABSTRACT
Steganography is defined as the science of hiding or embedding
data in a transmission medium. Its ultimate objectives, which are
undetectability, robustness and capacity of the hidden data, are
the main factors that distinguish it from Cryptography. In this
paper a study on Digital Image Cryptosystem with Adaptive
Steganography has been presented.The problem of data hiding has
been attacked from two directions. The first approach tries to
overcome the Targeted Steganalytic Attacks. The work focuses mainly
on the first order statistics based targeted attacks. Two
algorithms have been presented which can preserve the first order
statistics of an image after embedding. The second approach aims at
resisting Blind Steganalytic Attacks especially the Calibration
based Blind Attacks which try to estimate a model of the cover
image from the stego image. A Statistical Hypothesis Testing
framework has been developed for testing the efficiency of a blind
attack. A generic framework for JPEG steganography has been
proposed which disturbs the cover image model estimation of the
blind attacks. Comparison results show that the proposed algorithm
can successfully resist the calibration based blind attacks and
some non-calibration based attacks as well.
41
CHAPTER ONE INTRODUCTION
Since the rise of the Internet one of the most important factors
of information technology and communication has been the security
of information. Everyday tons of data are transferred through the
Internet through e-mail, file sharing sites, social networking
sites etc. to name a few. As the number of Internet users rises,
the concept of Internet security has also gain importance. The
fiercely competitive nature of the computer industry forces web
services to the market at a breakneck pace, leaving little or no
time for audit of system security, while the tight labour market
causes Internet project development to be staffed with less
experienced personnel, who may have no training in security. This
combination of market pressure, low unemployment, and rapid growth
creates an environment rich in machines to be exploited, and
malicious users to exploit those machines.
Due to the fast development of communication technology, it is
convenient to acquire multimedia data. Unfortunately, the problem
of illegal data access occurs every time and everywhere. Hence, it
is important to protect the content and the authorized use of
multimedia data against the attackers. Data encryption is a
strategy to make the data unreadable, invisible or incomprehensible
during transmission by scrambling the content of data.In an image
cryptosystem, it uses some reliable encryption algorithms or secret
keys to transform or encrypt secret images into ciphered images.
Only the authorized users can decrypt secret images from the
ciphered images. The ciphered images are meaningless and
non-recognizable for any unauthorized users who grab them without
knowing the decryption algorithms or the secret keys according to
(Bhattacharyya, Banerjee, & Sanyal, 2011)Steganographys niche
in security is to supplement cryptography, not replace it. If a
hidden message is encrypted, it must also be decrypted if
discovered, which provides another layer of protection.
Dissimilarly, steganographic techniques refer to methods of
embedding secret data into cover data in such a way that people
cannot discern the existence of the hidden data. The image
steganographic methods (or called virtual image cryptosystems) are
proposed to hide the secret images into readable but non-critical
cover images. They are designed to reduce the notice of illegal
users. Common methods for data hiding can be categorized into
spatial and transform domain methods. In the spatial domain,
information hiding is an emerging research area, which encompasses
applications such as copyright protection for digital media,
watermarking, fingerprinting, and steganography.In watermarking
applications, the message contains information such as owner
identification and a digital time stamp, which usually applied for
copyright protection.Fingerprint, the owner of the data set embeds
a serial number that uniquely identifies the user of the data set.
This adds to copyright information to makes it possible to trace
any unauthorized use of the data set back to the user.Steganography
hide the secret message within the host data set and presence
imperceptible and is to be reliably communicated to a receiver. The
host data set is purposely corrupted, but in a covert way, designed
to be invisible to an information analysis.
1.1 BACKGROUNDA Steganographic Framework
Any steganographic system can be studied as shown in Figure 1.4.
For a steganographic algorithm having a stego-key, given any cover
image the embedding process generates a stego image. The extraction
process takes the stego image and using the shared key applies the
inverse algorithm to extract the hidden message. This system can be
explained using the prisoners problem (Figure 1.4) where Alice and
Bob are two inmates who wish to communicate in order to hatch an
escape plan. However communication between them is examined by the
warden, Wendy. To send the secret message to Bob, Alice embeds the
secret message m into the cover object c, to obtain the stego
object s. The stego object is then sent through the public channel.
In a pure steganographic framework, the technique for embedding the
message is unknown to Wendy and shared as a secret between Alice
and Bob. In private key steganography Alice and Bob share a secret
key which is used to embed the message. The secret key, for
example, can be a password used to seed a pseudo-random number
generator to select pixel locations in an image cover-object for
embedding the secret message. Wendy has no knowledge about the
secret key that Alice and Bob share, although she is aware of the
algorithm that they could be employing for embedding messages. In
public key steganography, Alice and Bob have private-public key
pairs and know each others public key. In this thesis we confine
ourselves to private key steganography only.
Figure 1.4 Frameworks for Private Key Passive Warden
Steganography
Figure 1.5 1 Private Key steganography
1.2 STATEMENT OF THE PROBLEM
As mentioned, steganography deals with hiding of information in
some cover source, on the other hand, Steganalysis is the art and
science of detecting messages hidden using steganography; this is
analogous to cryptanalysis applied to cryptography.The goal of
steganalysis is to identify suspected packages, determine whether
or not they have a payload encoded into them, and, if possible,
recover that payload. Hence, the major challenges of effective
steganography are:-1. Security of Hidden Communication: In order to
avoid raising the suspicions of eaves- droppers, while evading the
meticulous screening of algorithmic detection, the hidden contents
must be invisible both perceptually and statistically.
2. Size of Payload: Unlike watermarking, which needs to embed
only a small amount of copyright information, steganography aims at
hidden communication and therefore usually requires sufficient
embedding capacity. Requirements for higher payload and secure
communication are often contradictory.
Depending on the specific application scenarios, a tradeoff has
to be sought.
Figure 2.1, Trade-off between embedding capacity,
Undetectability and robustness in data hiding
One of the possible ways of categorizing the present
steganalytic attacks is on the following two categories;-
1. Visual Attacks: These methods try to detect the presence of
information by visual inspection either by the naked eye or by a
computer. The attack is based on guessing the embedding layer of an
image (say a bit plane) and then visually inspecting that layer to
look for any unusual modifications in that layer.
2. Statistical Attacks: These methods use first or higher order
statistics of the image to reveal tiny alterations in the
statistical behavior caused by steganographic embedding and hence
can successfully detect even small amounts of embedding with very
high accuracy. These class of steganalytic attacks are further
classified as Targeted Attacks or Blind Attacks as explained in
detail in the next few sections.
Figure 3.1: A generalized steganographic framework
1.3 OBJECTIVES OF THE STUDY
In order to develop a good steganography algorithm, one should
have knowledge about the different steganalysis techniques. Keeping
this in mind, an approach aimed at preservation of the marginal
statistics of a cover image was proposed. The preservation of
marginal statistics helps in defeating the targeted attacks
designed for specific steganographic algorithms and thus an
algorithm that inherently preserves the first order statistics of
the cover image while embedding itself was proposed.
1.4.1 Specific Objectives
This research study focuses on the following topics:1. Designing
an algorithm to inherently preserve the first order statistics of
the cover image while embedding itself
2. Designing a low detection algorithm for embedding data such
that the stego population remains statistically closer to the cover
population and the difference between these two cannot be observed
in the statistics drawn from the two populations.3. Designing a
digital image steganography algorithm using global and dual
histogram compensation along with matrix encoding to minimize
changes.4. Designing a steganographic algorithm to resist
calibration based blind steganalytic attacks
1.
1.4 RESEARCH QUESTIONS
1. Is it possible to use Steganography to supplement
Cryptography and not to replace it? A message in cipher text might
arouse suspicion on the part of the recipient while an invisible
message created with steganographic methods will not.
2. What are the known Steganalysis (counter-Steganography)
methods and how can they be avoided to have a crack-proof
steganography?
3. If the main goal of a good steganography is to be invisible
is there a way we can come up with a stealth algorithm that will
resist the most known steganalytic attacks?
CHAPTER TWO
2.0 LITERATURE REVIEW
In this chapter we provide the necessary background required for
this research area. In section 2.1 we discuss briefly some of the
existing steganographic techniques. In section 2.2 we present some
of the steganalytic attacks proposed till date as a counter measure
to the steganographic algorithms.
2.1 Spatial Domain
These techniques use the pixel gray levels and their color
values directly for encoding the message bits. These techniques are
some of the simplest schemes in terms of embedding and extraction
complexity. The major drawback of these methods is amount of
additive noise that creeps in the image which directly affects the
Peak Signal to Noise Ratio and the statistical properties of the
image. Moreover these embedding algorithms are applicable mainly to
lossless image compression schemes like TIFF images. For lossy
compression schemes like JPEG, some of the message bits get lost
during the compression step.
The most common algorithm belonging to this class of techniques
is the Least Significant Bit (LSB) Replacement technique in which
the least significant bit of the binary representation of the pixel
gray levels is used to represent the message bit. This kind of
embedding leads to an addition of a noise of 0.5p on average in the
pixels of the image where p is the embedding rate in bits/pixel.
This kind of embedding also leads to an asymmetry and a grouping in
the pixel gray values (0, 1) ;( 2, 3). . . (254,255). this
asymmetry is exploited in the attacks developed for this technique
as explained further in section 2.2. To overcome this undesirable
asymmetry, the decision of changing the least significant bit is
randomized i.e. if the message bit does not match the pixel bit,
then pixel bit is either increased or decreased by 1. This
technique is popularly known as LSB Matching. It can be observed
that even this kind of embedding adds a noise of 0.5p on average.
To further reduce the noise, (Zhang, Zhang, & Wang, 2007) have
suggested the use of a binary function of two cover pixels to embed
the data bits. The embedding is performed using a pair of pixels as
a unit, where the LSB of the first pixel carries one bit of
information, and a function of the two pixel values carries another
bit of information. It has been shown that embedding in this
fashion reduces the embedding noise introduced in the cover
signal.
In (Bhattacharyya et al., 2011), a multiple base number system
has been employed for embedding data bits. While embedding, the
human vision sensitivity has been taken care of. The variance value
for a block of pixels is used to compute the number base to be used
for embedding. A similar kind of algorithm based on human vision
sensitivity has been proposed by (, Condell, Curran, & Kevitt,
2010) by the name of Pixel Value Differencing. This approach is
based on adding more amount of data bits in the high variance
regions of the image for example near the edges by considering the
difference values of two neighboring pixels. This approach has been
improved further by clubbing it with least significant bit
embedding in (Budiman, 2010).
According to (Fridrich, 2012), For a given medium, the
steganographic algorithm which makes fewer embedding changes or
adds less additive noise will be less detectable as compared to an
algorithm which makes relatively more changes or adds higher
additive noise. Following the same line of thought Crandall
(Crandall, 1998) have introduced the use of an Error Control Coding
technique called Matrix Encoding. In Matrix Encoding, q message
bits are embedded in a group of 2q 1 cover pixels while adding a
noise of 1 2q per group on average. The maximum embedding capacity
that can be achieved is 2q1. For example, 2 bits of secret message
can be embedded in a group of 3 pixels while adding a noise of 0.75
per group on average. The maximum embedding capacity achievable is
2/3 = 0.67 bits/pixel. F5 algorithm ( et al., 2010)is probably the
most popular implementation of Matrix Encoding.
LSB replacement technique has been extended to multiple bit
planes as well. Recently (Science & Goel, 2008) has claimed
that LSB replacement involving more than one least significant bit
planes is less detectable than single bit plane LSB replacement.
Hence the use of multiple bit planes for embedding has been
encouraged. But the direct use of 3 or more bit planes leads to
addition of considerable amount of noise in the cover image. And
(Science & Goel, 2008) have given a detailed analysis of the
noise added by the LSB embedding in 3 bit planes. Also, a new
algorithm which uses a combination of Single Digit Sum Function and
Matrix Encoding has been proposed. It has been shown analytically
that the noise added by the proposed algorithm in a pixel of the
image is 0.75p as compared to 0.875p added by 3 plane LSB embedding
where p is the embedding rate. One point to be observed here is
that most of the approaches proposed so far are based on
minimization of the noise embedded in the cover by the algorithm.
Another direction of steganographic algorithm is preserving the
statistics of the image which get changed due to embedding. This
research paper proposes two algorithms based on this approach
itself. In the next section we cover some of the transform domain
steganographic algorithms.
2.2 Transform DomainThese techniques try to encode message bits
in the transform domain coefficients of the image. Data embedding
performed in the transform domain is widely used for robust
watermarking. Similar techniques can also realize large-capacity
embedding for steganography. Candidate transforms include discrete
cosine Transform (DCT), discrete wavelet transform (DWT), and
discrete Fourier transform (DFT).
By being embedded in the transform domain, the hidden data
resides in more robust areas, spread across the entire image, and
provides better resistance against signal processing. For example,
we can perform a block DCT and, depending on pay- load and
robustness requirements, choose one or more components in each
block to form a new data group that, in turn, is pseudo randomly
scrambled and undergoes a second-layer trans- formation.
Modification is then carried out on the double transform domain
coefficients using various schemes. These techniques have high
embedding and extraction complexity. Because of the robustness
properties of transform domain embedding, these techniques are
generally more applicable to the Watermarking aspect of data
hiding. Many steganographic techniques in these domains have been
inspired from their watermarking counterparts.
F5 (Westfeld & Wolf, 1998) uses the Discrete Cosine
Transform coefficients of an image for embedding data bits. F5
embeds data in the DCT coefficients by rounding the quantized
coefficients to the nearest data bit. It also uses Matrix Encoding
for reducing the embedded noise in the signal. F5 is one the most
popular embedding schemes in DCT domain steganography, though it
has been successfully broken in (Science & Goel, 2008).
The transform domain embedding does not necessarily mean
generating the transform coefficients on blocks of size 8 8 as done
in JPEG compression techniques. It is possible to design techniques
which take the transforms on the whole image. Other block based
JPEG domain and wavelet based embedding algorithms have been
proposed in (Westfeld & Wolf, 1998) .
2.3 EXISTING ATTACKS
2.3.1 Steganalysis
Steganography is a game of hide and seek. While steganography
aims at hiding data with maximum stealthiness, steganalysis aims to
detect the presence of any hidden information in the stego media
(in this thesis, it refers to JPEG images).
In the past, steganography avoided any visual distortions in the
stego images. Hence, majority of the stego images do not reveal any
visual clues as to whether a certain image contains any hidden
message or not. Current steganalysis aims to focus more on
detecting statistical anomalies in the stego images which are based
on the features extracted from typical cover images without any
modifications. Cover images without any modification or distortion
contain a predictable statistical correlation which when modified
in any form will result in distortions to that correlation. These
include global histograms, blockiness, inter and intra block
dependencies and other first and second order statistics of the
image. Most steganalysis algorithms are based on exploiting these
strong dependencies which are typical of natural images.
The steganalytic attacks developed till date can be classified
into visual and statistical attacks. The statistical attacks can
further be classified as;- 1. Targeted Attacks 2. Blind Attacks
Each of these classes of attack is covered in detail in the next
two subsections along with several examples of each category.
2.3.1.1 Targeted AttacksThese attacks are designed keeping a
particular steganographic algorithm in mind. These attacks are
based on the image features which get modified by a particular kind
of steganographic embedding. A particular steganographic algorithm
imposes a specific kind of behaviour on the image features. This
specific kind of behaviour of the image statistics is exploited by
the targeted attacks. Some of the targeted attacks are as
follows:
1. Histogram Analysis: The histogram analysis method exploits
the asymmetry introduced by LSB replacement. The main idea is to
look for statistical artifacts of embedding in the histogram of a
given image. It has been observed statistically that in natural
images
Figure 4.1: Flipping of set cardinalities during embedding
(Cover images), the number of odd pixels and the number of even
pixels are not equal. For higher embedding rates of LSB Replacement
these quantities tend to become equal. So, based on this artifact a
statistical attack based on the Chi-Square Hypothesis Testing is
developed to probabilistically suggest one of the following two
hypotheses:
Null Hypothesis H 0: The given image contains steganographic
embedding Alternative Hypothesis H 1: The given image does not
contain steganographic embedding
The decision to accept or reject the Null Hypothesis H0 is made
on basis of the observed confidence value p. A more detailed
discussion on Histogram Analysis can be found in ( et al.,
2010).
1. Sample Pair Analysis: Sample Pair Analysis is another LSB
steganalysis technique that can detect the existence of hidden
messages that are randomly embedded in the least significant bits
of natural continuous-tone images. It can precisely measure the
length of the embedded message, even when the hidden message is
very short relative to the image size. The key to this methods
success is the formation of 4 subsets of pixels (X, Y , U, and V )
whose cardinalities change with LSB embedding (as shown in Figure
2.1), and such changes can be precisely quantified under the
assumption that the embedded bits are randomly scattered. A
detailed analysis on Sample Pair technique can be found in
(Petitcolas, Anderson, & Kuhn, 1999). Another attack called RS
Steganalysis based on the same concept has been independently
proposed by (Kodovsk & Fridrich, 2009).1. HCF-COM based Attack:
This attack first proposed by (Harmsen & Pearlman, n.d.) is
based on the Center of Mass (COM) of the Histogram Characteristic
Function (HCF) of an image. This attack was further extended for
LSB Matching by (Ker, 2007). This attack observes the COM of a
cover/stego image (C(HC)/C(HS)) and its calibrated version obtained
by down sampling the image (C(HC (HC) C (HC)C (HC) C (HS) > C (H
C) C (H S)From Equations 2.1 and 2.2, a dimensionless discriminator
for classification can be obtained as C (HS) C (HS) of training
data, an image can be classified either as cover or stego. Some
other targeted attacks also exist in literature which has not been
covered in this survey. A detailed survey can be found in (Kodovsk
& Fridrich, 2009).
2.3.1.2 Blind Attacks
The blind approach to steganalysis is similar to the pattern
classification problem. The pattern classifier, in our case a
Binary Classifier, is trained on a set of training data. The
training data comprises of some high order statistics of the
transform domain of a set of cover and stego images and on the
basis of this trained dataset the classifier is presented with
images for classification as a non-embedded or an embedded image.
Many of the blind steganalytic techniques often try to estimate the
cover image statistics from stego image by trying to minimize the
effect of embedding in the stego image. This estimation is
sometimes referred to as Cover Image Prediction. Some of the most
popular blind attacks are defined next.
1. Wavelet Moment Analysis (WAM): Wavelet Moment Analyzer (WAM)
is the most popular Blind Steganalyzer for Spatial Domain
Embedding. It has been proposed by (Goljan, Fridrich, &
Holotyak, 2011). WAM uses a de-noising filter to remove Gaussian
noise from images under the assumption that the stego image is an
additive mixture of a non-stationary Gaussian signal (the cover
image) and a stationary Gaussian signal with a known variance (the
noise).
Figure 5.1: Calibration of the stego image for cover statistics
estimation
As the filtering is performed in the wavelet domain, all the
features (statistical moments) are calculated as higher order
moments of the noise residual in the wavelet domain. The detailed
procedure for calculating the WAM features in a gray scale image
can be found in (Goljan et al., 2011). WAM is based on a 27
dimension feature space. It then uses a Fisher Linear Discriminant
(FLD) as a classifier. It must be noted that WAM is a state of the
art steganalyzer for Spatial Domain Embedding and no other blind
attack has been reported which performs better than WAM.
1. Calibration Based Attacks: The calibration based attacks
estimate the cover image statistics by nullifying the impact of
embedding in the cover image. These attacks were first proposed by
(Fridrich, 2012) And are designed for JPEG domain steganographic
schemes. They estimate the cover image statistics by a process
termed as Self Calibration. The steganalysis algorithms based on
this self-calibration process can detect the presence of
steganographic noise with almost 100% accuracy even for very low
embedding rates ( et al., 2010). This calibration is done by
decompressing the stego JPEG image to spatial domain and cropping 4
rows from the top and 4 columns from the left and recompressing the
cropped image as shown in Figure 2.2. The cropping and subsequent
recompression produces a calibrated image with most macroscopic
features similar to the original cover image. The process of
cropping by 4 pixels is an important step because the 8 8 grid of
recompression does not see the previous JPEG compression and thus
the obtained DCT coefficients are not influenced by previous
quantization (and embedding) in the DCT domain.
1. Farids Wavelet Based Attack: This attack was one of the first
blind attacks to be pro- posed in steganographic research (Lyu
& Farid, n.d.) for JPEG domain steganography. It is based on
the features drawn from the wavelet coefficients of an image. This
attack first makes an n level wavelet decomposition of an image and
computes four statistics namely Mean, Variance, Skewness and
Kurtosis for each set of coefficients yielding a total of 12 (n 1)
coefficients. The second set of statistics is based on the errors
in an optimal linear predictor of coefficient magnitude. It is from
this error that additional statistics i.e. the mean, variance,
skewness, and kurtosis are extracted thus forming a 24 (n 1)
dimensional feature vector. For implementation purposes, n is set
to 4 i.e. four level decomposition on the image is performed for
extraction of features. The source code of this attack is available
at (FARID). After extraction of features, a Support Vector Machine
(SVM) is used for classification.
2.3 STATISTICAL RESTORATION
Statistical undetectability is one of the main aspects of any
steganographic algorithm. To maintain statistical undetectability,
the steganographic techniques are designed with the aim of
minimizing the artifacts introduced in the cover signal by the
embedding technique. The main emphasis is generally on minimizing
the noise added by embedding while increasing the pay- load. This
is an important consideration in the design of embedding
algorithms, since the noise added effects the statistical
properties of a medium. As already mentioned previously, the
algorithm which makes fewer embedding changes or adds less additive
noise generally provides better security than the algorithm which
makes relatively more changes or adds higher additive noise (Kumar,
2011). From the point of view of the steganalyst, the attacks are
designed to examine a signal and look for statistics which get
distorted due to embedding. These statistics range from marginal
statistics of first and second order in case of targeted attacks
and up to 9th order statistics for blind attacks (Goljan et al.,
2011). So, in order to defeat these steganalytic attacks, there has
been a shift from the above mentioned data hiding paradigm.
Algorithms have been proposed which try to restore the statistics
which get distorted during the embedding procedure and are used for
steganalysis.
Introduction
In steganographic research several algorithms have been proposed
for preserving statistical features of the cover for achieving more
security. Provos Outguess algorithm ( et al., 2010)was an early
attempt at histogram compensation for LSB hiding, while Eggers et
al (Science & Goel, 2008) have suggested a more rigorous
approach to the same end, using histogram-preserving data-mapping
(HPDM) and adaptive embedding respectively. Solanki proposed a
statistical restoration method for converting the stego image
histogram into the cover histogram. This algorithm is based on a
theorem proved by R Tzschoppe, R. Buml and J J. Eggers which tries
to convert one vector x into another vector y while satisfying a
Minimum Mean Square Error (MMSE) criterion. The algorithm considers
the stego image histogram as source vector x and tries to convert
it into the cover image histogram i.e. the target vector y. All the
bins of the source histogram are compensated by mapping the input
data with values in increasing order. This algorithm suffers from
the following limitations:
1. The algorithm assumes the cover image to be a Gaussian cover
and does not give good results for non-Gaussian cover images.
2. The algorithm ignores low probability image regions for
embedding due to erratic behavior in low probability tail.
3. The algorithm has been tried specifically for Quantization
Index Modulation algorithm (Solanki, Dabeer, Madhow, Manjunath,
& Chandrasekaran, 2009)and it has not been tested for some
well-known embedding schemes like LSB Replacement, LSB matching
etc.
To overcome the above limitations we propose two algorithms for
preserving the cover image statistics after embedding. The first
algorithm is designed to inherently preserve the first order
statistics during embedding itself. The algorithm makes an explicit
attempt at restoring the cover image histogram after embedding.
These algorithms are discussed in detail in the next two
sections.
2.3.1 Embedding by Pixel
The main motivation the steganographic algorithm proposed in
this section is to embed data such that the histogram of the image
does not get modified. Such a requirement entails an embedding
procedure which does not modify the pixel values such that the
corresponding bin value in the histogram is changed. We propose a
simple yet effective algorithm called Pixel Swap Embedding which
embeds message bits into the cover image without making any
modifications to the image histogram. The main idea is to consider
a pair of pixels such that their difference is within a fixed
threshold value. To embed a value of 0, check if the first pixel is
greater than the second pixel or not. Otherwise swap these two gray
level values. Similarly pixel value of 1 can be embedded by making
the value of first pixel lesser than the second pixel. The
algorithm is discussed formally in the next subsection.
Algorithm Pixel Swap EmbeddingThe algorithm is summarized
below.Algorithm: Pixel Swap Embedding (PSE) Input: Cover Image (I)
Input Parameters: Message Stream (), Threshold (), Shared Pseudo
Random Key (k) Output: Stego Image IsBegin 1. (X1, x2) = randomize
(i,k)2. if |x1 x2| then goto step 3else goto step 1. 3. if (i) =
0if x1 x2 then swap(x1,x2) i = i+1else i = i+1 goto step 1else goto
step 4. 4. if (i) = 1if x1 x2 then swap(x1,x2) i = i+1 else i = i+1
goto step 1else goto step 1.End Pixel Swap Embedding
The Randomize (I,k) function generates random non-overlapping
pairs of pixels (x1,x2) using the secret key k shared by both ends.
Once a pair (x1, x2) has been used by the algorithm it cannot be
reused again. The function Swap(x1, x2) interchanges the gray
values of the two pixels x1 and x2. The extraction of the message
bits is a simple inverse process of the above algorithm. It is
easily understood that this scheme automatically preserves the
values of all image histogram bins since no extra value is
introduced in the cover. Hence it can resist the attacks based on
first order statistics. One important point to be observed here is
that the threshold used in the algorithm directs the tradeoff
between the embedding rate and the noise introduced in the cover
signal. The noise added shall be limited as long as is kept small.
We tested the algorithm for = 2 and = 5 i.e. effectively we are
making modifications to the Least Significant Planes of the pixel
gray level but without changing the bin value of the two gray
values. The achievable embedding rate would be high for images
having low variance than for images having high variance as the
number of pixel pairs satisfying the condition in Step 2 of the PSE
algorithm would be higher in the former case than in the latter
case.
2.3.1.1 Security Analysis
To check the robustness of the PSE algorithm we conducted
security tests on a set of one hundred gray scale images
(Bhattacharyya et al., 2011). All the images were converted to the
Tagged Image Format (TIFF) and resized to 256256 pixels. PSE was
tested against the Sample Pair attack proposed in (Science &
Goel, 2008). As explained in 2.2.1 Sample Pair is a targeted attack
based on the first order statistics of the cover image and tries to
exploit the distortion which takes place in the image statistics.
Also, a similar kind of attack called RS-Steganalysis has been
proposed independently by (Bhattacharyya et al., 2011)which is
based on the same concept of exploiting the first order statistics
of the cover image. Hence, in this work we have tested the
performance of our schemes against Sample Pair Attack only assuming
that it will give similar performance against RS- Steganalysis as
well. The performance of PSE against Sample Pair has been shown in
Figure 3.3. Data bits were hidden in the images as the maximum
possible embedding rates for = 5. It can be observed that the
message length predicted by Sample Pair Attack is much less than
the actual message length embedded in the image. In the next
section we introduce the second algorithm based on the idea of
statistical preservation which explicitly tries to match the cover
image histogram after embedding.
2.3.1.2 New Statistical Restoration Scheme
In this section we propose a new statistical restoration scheme
which explicitly tries to convert the stego image histogram into
the cover image histogram after completion of embedding. As
mentioned in 2.2, the restoration algorithm proposed in (Solanki et
al., 2009; Sullivan, Solanki, Manjunath, Madhow, &
Chandrasekaran, 2006)gives good results only under the assumption
that the cover image will be close to a Gaussian distribution. The
proposed scheme tries to overcome this limitation and provides
better restoration of image histogram for non-Gaussian cover
distributions as well. The histogram h (I) of an gray scale image I
with range of gray value [0 . . . L] can be interpreted as a
discrete function where h(rk) = nk/n where rk is kth gray level, nk
is the number of pixels with gray value = rk and n is the total
number of pixels in the image I. Histogram h(I) can also be
represented as h(I) = {h(r0),h(r1),h(r2), . . . ,h(rL1)} or simply,
h(I) = {h(0),h(1),h(2), . . . ,h(L 1)}. Let us represent the
histogram of the stego image h (I) as follows:-
(a) Maximum achievable embedding rate for =2
(b) Maximum achievable embedding rate for =5We then categorize
the image pixels into two streams, Embedding Stream and the
Restoration Stream. During embedding we maintain the Meta data
about those pixels which get changed during embedding and the
amount of change in those pixels. Then we compensate the histogram
with the pixels from the Restoration Stream using the Meta data
information such that the original histogram of the cover can be
restored. So by restoration we try to equalize S [h (I) and h (I).
The algorithm is formalized in the next section.
2.4 Mathematical Formulation of Proposed scheme
The proposed restoration scheme is dependent on the embedding
scheme. The whole idea of embedding and restoring is that some of
image pixels are used for embedding and rest are used for
restoration. Without loss of generality, we can say that if number
of pixels used for embedding is greater than 50% of the whole image
then complete restoration is not possible but converse is not
always true. One cannot say that if the numbers of available
compensation pixels are greater than or equal to 50% of the whole
image, then full compensation is possible. But we can certainly see
that the probability of full compensation increases with increase
in the number of pixels available for compensation. So a tradeoff
has to be sought between the embedding rate and restoration
percentage in order to get the optimum embedding procedure. For
better understanding of the algorithm some definitions are
described next. Let the cover image, stego image (i.e. embedded but
not yet compensated) and compensated stego image (stego image after
compensation) be defined by C, S and R respectively. Suppose Cij,
Sij and Rij represent the (i, j) the pixel of C, S and R images
respectively (0 < i < m, 0 < j < n, m is number of rows
and n is number of columns of image matrices).Embed Matrix (): It
is a m n characteristic matrix representing whether a pixel has
been used for embedding or not.
If (i, j) Th pixel is used for embedding (i, j) (3, 1)If (i, j)
Th pixel is not used for embedding
i. Compensation Vector (): It is a one dimensional vector with
length L where L is number of existing gray levels in the cover
image (C). (k) = u means that u number of pixels with gray value k
can be used for restoration.ii. Changed Matrix (): It is an L L
matrix where L is number of existing gray levels in the cover image
(C). (x, y) = means during embedding number of pixels are changed
from gray value x to gray value y.
2.4.1 Algorithm Statistical Restoration
The statistical restoration algorithm is summarized
below:Algorithm: Statistical Restoration Algorithm (SRA) Input:
Cover Image (I) Input Parameters: Compensation Matrix (), Changed
Matrix () Output: Stego Image (Is)Beginfor all k (i, j) do {1. K =
(i, j)2. If k > 0, k number of pixels with gray value i from the
set of pixels used for compensation are changed to gray value j for
full compensation. Else k pixels with gray value j from the set of
pixels used for compensation are changed to gray value i for full
compensation.3. Modify the Compensation Vector () In the above
algorithm we have made the assumption that for (i) < k, full
compensation is not possible.
2.4.2 Restoration with Minimum Restoration
The additional noise added due to compensation is an important
issue. The goal is to design a restoration procedure in such a way
that additional noise should be kept minimal. In the SRA algorithm,
the noise introduced depends on the embedding algorithm used. The
total noise () introduced at the time of restoration can be
estimated by:- Equation 1
Where ^h (i) and h (i) is the histogram of the stego and cover
images respectively. L - 1 is the no. of bins in the histogram. Ki
(0 _ Ki _ L - 1) is a bin that is used to repair at least one unit
of data in ith bin.
Where 1< abs (i - Ki)