Steganalysis Based on Multiple Features Formed by Statistical …shi/PaperDownload/steganalysis/IH… · · 2008-03-02Steganalysis Based on Multiple Features ... a steganalysis

M. Barni et al. (Eds.): IH 2005, LNCS 3727, pp. 262 – 277, 2005. © Springer-Verlag Berlin Heidelberg 2005

Steganalysis Based on Multiple Features Formed by Statistical Moments of Wavelet Characteristic Functions

Guorong Xuan1, Yun Q. Shi2, Jianjiong Gao1, Dekun Zou2, Chengyun Yang1, Zhenping Zhang1, Peiqi Chai1, Chunhua Chen2, and Wen Chen2

1 Tongji University, Shanghai, China [email protected]

2 New Jersey Institute of Technology, Newark, NJ, USA [email protected]

Abstract. In this paper1 , a steganalysis scheme based on multiple features formed by statistical moments of wavelet characteristic functions is proposed. Our theoretical analysis has pointed out that the defined n-th statistical moment of a wavelet characteristic function is related to the n-th derivative of the corre-sponding wavelet histogram, and hence is sensitive to data embedding. The se-lection of the first three moments of the characteristic functions of wavelet sub-bands of the three-level Haar wavelet decomposition as well as the test image has resulted in total 39 features for steganalysis. The effectiveness of the pro-posed system has been demonstrated by extensive experimental investigation. The detection rate for Cox et al.’s non-blind spread spectrum (SS) data hiding method, Piva et al.’s blind SS method, Huang and Shi’s 88× block SS method, a generic LSB method (as embedding capacity being 0.3 bpp), and a generic QIM method (as embedding capacity being 0.1 bpp) are all above 90% over all of the 1096 images in the CorelDraw image database using the Bayes classifier. Furthermore, when these five typical data hiding methods are jointly considered for steganalysis, i.e., when the proposed steganalysis scheme is first trained se-quentially for each of these five methods, and is then tested blindly for stego-images generated by all of these methods, the success classification rate is 86%, thus pointing out a new promising approach to general blind steganalysis. The detection results of steganalysis on Jsteg, Outguess and F5 have further demon-strated the effectiveness of the proposed steganalysis scheme.

1 Introduction

Steganalysis is the science and art to detect if an image contains hidden message, what the data embedding method is, what the used key is, and finally, if possible, what the hidden message is. It is the opposite side of steganography, which is also sometimes referred to as data hiding, or watermarking. Therefore, steganalysis also provides an effective way to evaluate the security performance of a data hiding 1 This research is supported partly by National Natural Science Foundation of China (NSFC) on

the project “The Research of Theory and Key Technology of Lossless Data Hiding (90304017)”, and by New Jersey Commission of Science and Technology via New Jersey Center of Wireless Networking and Internet Security (NJWINS).

Steganalysis Based on Multiple Features 263

method. That is, it can be used to improve the security of a data hiding algorithm. Thus, a good data hiding method should be able to hide data imperceptibly not only to human eyes, but also to computer analysis.

Steganalysis seems a prohibitive task because of the diversity of cover images, the variety of data hiding methods and the infinite possibility of hidden messages. The basis of steganalysis is that there exists difference between the images before and after data hiding, and the difference is detectable. Normally, natural images tend to be continuous and smooth. The correlation between adjacent pixels is strong. Often, the hidden data will be independent to the cover media. The watermarking process may change the continuity because it incurs random variation. As a result, it may reduce the correlation among adjacent pixels, bit-planes and image blocks. Discovering the difference of some statistical characteristics between the cover and stego media be-comes the key issue in steganalysis.

In [1], the first four statistical moments of wavelet coefficients and their prediction errors of nine high frequency subbands are used to form a 72-dimensional (72-D) feature vector for steganalysis. However, as shown and analyzed later in this paper, the performance in terms of detection rate is not satisfactory, because the selected features are not sensitive to data hiding process. The steganalysis method based on the mass center of histogram characteristic function has shown improved effectiveness in steganalysis [2]. The performance is however still not high enough because the rather limited number of features cannot achieve high detection rate.

In this paper, the statistical moments of characteristic functions (CF’s) of wavelet subbands are proposed to form multi-dimensional (M-D) feature vector for steganaly-sis. We analyze why these features are effective to steganalysis. The substantially superior performance in steganalysis over the prior arts [1, 2] has been demonstrated by extensive experimental investigation.

The rest of this paper is organized as follows: Section 2 discusses the features pro-posed for steganalysis. In Section 3, our new effective steganalysis system is pro-posed. Experimental evaluation of the proposed steganalysis system is presented in Section 4. Finally, conclusion is drawn and discussion is made in Section 5.

2 Features Using Moments of Wavelet Characteristic Functions

In this section, we focus on the proposed M-D feature vector based on statistical mo-ments of wavelet characteristic functions.

2.1 Steganalysis as a Task of Pattern Recognition

Based on whether an image contains hidden message, images can be classified into two classes: the image with no hidden message and the corresponding stego-image (the same image with message hidden in it). Steganalysis can thus be considered as a task of pattern recognition to decide which class a test image belongs to. The key issue for steganalysis just like for pattern recognition is feature selection. The features should be sensitive to the data hiding process. In other words, the features should be rather different for the image without hidden message and for the corresponding stego-image. The larger the difference, the better the features are. The features should

264 G. Xuan et al.

be as general as possible, i.e., they are effective to all different types of images and different data hiding schemes. Often in practice it is very hard to achieve a high rec-ognition rate with a single feature when the classification such as steganalysis is com-plicated in nature. Therefore, M-D feature vectors should be used under the circum-stances. Each image is a sample point in the M-D feature space. Steganalysis has thus become a pattern classification process in the M-D feature space. It is desirable to have features in individual dimensions of the feature vector independent to one an-other. Just like for pattern recognition, in addition to feature selection, classifier de-sign is another key issue for steganalysis; and the performance of a steganalysis scheme, both feature selection and classifier design, is evaluated by its classification success or error rate.

2.2 Moments of Wavelet Characteristic Functions for Steganalysis

As introduced in Section 1, an effective feature is proposed in [2], which is the mass center of histogram characteristic function (defined as the Fourier transform of the histogram). It has been proved that after a message is embedded into an image the mass center will decrease under the assumption that the hidden data are Gaussian distributed, additive to, and independent to the cover image.

It is well-known that the histogram of a digital image or a wavelet subband is es-sentially the probability mass function (pmf), if the image grayscale values or the wavelet coefficient values are treated as a random variable. Furthermore, if each com-ponent of the histogram is multiplied by a correspondingly shifted unit impulse, we then have the probability density function (pdf). According to [3], one can consider the characteristic function and the pdf (here, histogram) are similar to a Fourier trans-form pair (with the sign in the exponential reversed). Denote histogram by h(xj), and characteristic function (CF) by H(fk), both j and k are allowed to vary from 0, 1, up to N-1. Then they form a pair of discrete Fourier transform (DFT). That is, the mass center defined in [2] is essentially the first moment of the characteristic function of the image.

On the other hand, because of the de-correlation capability of wavelet transform, the coefficients of different subbands at the same level are kind of independent to each other. Therefore, the features generated from different wavelet subbands at the same level are kind of independent to each other as well. This is suitable for stegana-lysis (a particular type of pattern recognition as discussed in Section 2.1).

Motivated by these considerations, we propose to use the statistical moments of the characteristic functions of wavelet subbands as features for steganalysis. The n-th statistical moment of a characteristic function,

nM , is defined as follows.

( ) ( )⎟⎠

⎞⎜⎝

⎛⎟⎠

⎞⎜⎝

⎛= ∑∑==

2/

0

2/

0

N

kk

N

kk

nkn fHfHfM (1)

where ( )kfH is the magnitude of the characteristic function, which is the DFT of the

histogram. According to the Fourier transform theory, since the histogram is real-valued, the

magnitude of CF, ( )kfH , is even symmetric, while the CF’s phase angle is odd


symmetric. Therefore, only a half of points need to be used in the moment calculation for steganalysis.

In most of additive data hiding schemes, the to-be-embedded data obey Gaussian-like distribution. That is, the magnitude of the DFT of the hidden data is decreasing as the frequency index changes from 1, 2, up to N/2. Clearly, the sequence of ( )kfH

is non-negative. Therefore, by using the discrete Chebyshev inequality [2, 4(pp.239-240)], it can be shown that the defined moments will decrease after the data hiding process is applied, indicating that the defined features are sensitive to data hiding.

Next an analysis is provided to show that moments in characteristic function do-main are more sensitive to data hiding than moments in histogram domain. In Table 2.1, the first three order absolute moments calculated in histogram domain and that calculated in characteristic function domain are listed for comparison, where the his-togram is assumed to obey Gaussian distribution. It is shown that the n-th moment of

CF is proportional to ( )1/nσ , while the n-th moment of histogram is proportional to

nσ . The histogram of a natural image can often be modeled as a mixture of several Gaussian distributions. In fact, the histogram of wavelet subband coefficients is gen-erally modeled as Laplace-like, which can be modeled as the mixture of two Gaussian distributions with large variance difference. Since the moment of CF is proportional to ( )1/

nσ , it is mainly determined by the distribution with the smaller variance; and

the moment of histogram is proportional to nσ , so it is mainly determined by the distribution with the larger variance. Or, we can say the moment of CF reflects the status on the peak of the histogram while the moment of histogram reflects the overall status of the histogram. In the process of data hiding, the distribution with smaller variance changes much more obviously than that with larger variance. In other words, the status on the peak of the histogram is impacted more greatly by the process of data hiding than the overall status of the histogram. So we can observe that the moment of CF is more sensitive to the process of data hiding.

Table 2.1. Moments in histogram domain and in characteristic function domain

n-th order absolute moment n=1 n=2 n=3 in histogram domain

( ){ }dxxhxn

∫∞

∞−

under Gaussian distribution ( ) 22

2

1 σ

πσ

x

exh−

=

2σπ

2σ 32 2σπ

in characteristic function domain

( )( ) ( )( )nf H f df H f df

∞ ∞

−∞ −∞∫ ∫

under characteristic function of Gaussian

distribution ( ) 2

22 f

efHσ−

=

2

σ π

2

1

σ

3

2 2

σ π

266 G. Xuan et al.

2.3 Proposed 39-D Feature Vector for Steganalysis

To facilitate the discussion, the proposed 39-D feature vector is presented here first. Some more discussions will be presented in Section 3. In our proposed steganalysis scheme we apply a three-level Haar discrete wavelet transformation (DWT) to a test image. Therefore there are 12 subbands, denoted by LLi, HLi, LHi, HHi, and i=1,2,3. The first three moments for each of these 12 subbands and the test image, denoted by LL0, result in 39 features, or, equivalently a 39-D feature vector.

2.4 N-th Moments of Wavelet Subbands Versus N-th Moments of Characteristic Functions of Wavelet Subbands

As described in Section 1, the statistical moments of wavelet subbands have been proposed as features for steganalysis in [1]. In Section 2.2 we proposed to use statisti-cal moments of characteristic functions (CF’s) of wavelet subbands as features for steganalaysis. In this subsection, we compare these two different sets of features, and shall show why the moments of CF’s of wavelet subbands are more effective.

In doing so, we use mathematical derivation in the analogue domain for the sake of format simplicity. As pointed in [3], the inverse transform of characteristic func-tion produces the pdf (here, the histogram) as follows.

( ) ( )∫∞

∞−

−= dfefHxh fxj π2 (2)

Furthermore, we can derive the n-th derivative of the histogram evaluated at the ori-gin, x=0, as follows.

( ) ( )

( ) ( )

( ) ( )( )( ) ( )( )⎪⎩

⎪⎨⎧

∈−

∈−=

−=

=

∫∫

∫

∫

∞−

∞

∞

∞−

=

∞

∞−

−

=

0

2/)1(

0

2/

0

2

0

,Im22)1(

,Re22)1(

2

oddndffHf

evenndffHf

dffHfj

dfefHdx

dxh

dx

d

nnn

nn

nn

x

fxjn

n

x

n

n

π

π

π

π

(3)

Straightforwardly, Formula (4) can be obtained from Equation (3).

( ) ( ) ( )∫∞

=

≤⎟⎟⎠

⎞⎜⎜⎝

⎛0

0

22 dffHfxhdx

d nn

x

n

n

π (4)

We then observe that the right hand of the above inequality is the moments of CF’s,

nM , multiplied by a scalar, which is dependent to the energy of an image or a

wavelet subband, from which the moment is generated. This indicates that the features we defined actually are the upper bound (up to a scalar) of the magnitude of the n-th derivative of the histogram evaluated at the origin of the histogram, i.e., x=0. Furthermore, this observation can easily be extended to the case when 0≠x by using the translation property of the Fourier transform theory. That is, the defined n-th moment of a CF is closely related to the n-th derivative of the corresponding histogram. We have already showed at the end of Section 2.2 that the n-th moments


defined will decrease after data embedding. This decrease lowers the upper bound of the magnitude of the n-th derivative of the histogram. This implies that the moments of CF’s defined in our method can sensitively catch the changes caused by data hiding.

On the one hand, as shown above, the n-th moments of wavelet CF’s have been shown closely related to the n-th derivative of the corresponding histogram. On the other hand, the n-th moments of wavelet subbands, selected features for steganalysis in [1], calculated through integration, are actually the statistical average of the n-th power of wavelet coefficients in the wavelet subbands. Under the assumption that the hidden data obey Gaussian distribution and are additive to the cover image, the histo-gram of the stego-image, the convolution of the histogram of the original image and the histogram of the hidden data (Gaussian distributed), will obviously become more flat than before data embedding. Obviously, the moments of wavelet CF’s, which are related to the n-th derivatives of the histogram, will be able to catch this change. On contrary, the moments of wavelet subbands will average the change, and, conse-quently, are less sensitive to data hiding than the moments of CF’s of wavelet sub-bands. Compared with that obtained by applying the proposed method, the experi-mental results obtained by applying the 72-D feature vector exactly the same as pro-posed in [1], shown in Section 4, have verified this analysis.

2.5 Differentiation Versus Integration

It is observed from Section 2.4 that, roughly speaking, one set of features for stegana-lysis (the moments of wavelet subbands [1]) perceives the histogram change caused by data hiding via integration, another set of features (the moments of characteristic functions of wavelet subbands) perceives the histogram change caused by data hiding via differentiation. The latter is expected to be more sensitive to the changes caused by data hiding. This has been verified by experimental works presented in Section 4.

Although it has been proved in [2] that the first moment of histogram characteris-tic function will decrease after data hiding, and it has been said in [2] that the histo-gram of the stego-image will be flatter than that of the original image, the following question has not been answered in [2] yet. That is, the histogram’s becoming flatter owing to data hiding should also be able to be measured by the statistical moments of the test image. Why do we need the moments of the histogram characteristic function to catch this change in histogram? The mathematical relation between the n-th deriva-tive of the histogram and the n-th moment of the corresponding CF shown in Section 2.4 has provided an answer.

2.6 Further Discussion and Graphical Illustration

We have assumed the noise introduced by data hiding is additive and Gaussian, and is independent to the cover image. These assumptions are valid for all of three major types of data hiding techniques, i.e., the spread spectrum (SS) method, the least sig-nificant bit-plane (LSB) method, and the quantization index modulation (QIM) method. It is well-known that the pdf of the sum of two independent random signals is the convolution of the individual pdf of these two signals. Because of the

268 G. Xuan et al.

assumptions made above, obviously, the pdf, hence the histogram, of the stego-image is expected to be more flat than that of the original image. This type of change is expected to be perceived in steganalysis. Now, according to Formula (4), the n-th moments defined and used in the proposed method is related to the magnitude of the n-th derivative of the histogram at the origin (x=0). As said, this observation can be extended to other points as 0≠x . Therefore, we expect the features defined can catch the changes in the flatness of the histogram resulted from the data embedding. To facilitate the discussion, let us consider the cases, which cover all of subbands involved in the steganalysis. Two cases are discussed separately.

Case 1. For high frequency subbands, i.e., LHi, HLi, HHi, i=1,2,3, the DWT coeffi-cients in these subbands have mean values around at x=0. The histograms are known to be Laplacian-like. As shown by Formula (4), the n-th moment of the characteristic function is the upper bound of the magnitude of the n-th derivative of the histogram at x=0 (up to a scalar). This is to say that the moments, our features, can catch the changes occurring with the peak of the histogram. We shall show this peak point is very sensitive to data embedding, thus making our steganalysis effective.

Case 2. The second case consists of LLi, i=0,1,2,3. That is, not only the test image, but also all of LL subbands in the three-level DWT decomposition are included. The LLi subbands as i=1,2,3 are essentially the low frequency pass filtered version of the test image. In Case 2, Formula (4) is still valid. That is, the magnitude of the n-th derivatives of the histogram evaluated at the origin x=0 and other points are upper bounded by the n-th moment of CF defined (up to a scalar quantity).

Now, let us take a look into the histogram from the moments of characteristic func-tions. That is, we use some graphs to illustrate what we analyzed above. Due to the space limitation, we show only the first level of four Haar wavelet subbands of a given test image. Furthermore, the graph size is rather limited. To view these graphs clearly, readers are suggested to zoom them up to 500%. Figure 2.1 (a) shows one of CorelDraw [5] images with the order No. 18093. Figure 2.1 (b) is its grayscale image obtained by using the irreversible color transformation (ICT) [6]. Figure 2.1(c) is the stego-image of the grayscale image, using Cox et al.’s SS embedding method. In Figure 2.2, the histograms of the four subbands at the first level Haar wavelet trans-form of this grayscale image are displayed. Figure 2.3 provides a magnified view of these four histograms around the small interval containing x=0. In Figure 2.4, the graphs of characteristic functions of these four subbands of the image No. 18093 are shown. In the legend field of all figures, “Orig” denotes the original image, while “Cox” denotes the watermarked image is generated with Cox et al.’s spread spectrum data hiding method [7]. The numbers in the legend field are the first moment of the characteristic function of the corresponding subbands of the image.

It is observed from Figure 2.2 and, more clearly, from Figure 2.3, that the histo-grams of the wavelet subbands of the stego-image tend to be flatter than their coun-terparts of the original image as discussed. And, from observing the first order mo-ments (listed in each graph), it appears that the first order moment of the stego-image (in this example, generated by using Cox et al.’s SS data hiding method) is smaller than the first moment of the original image. That is, after data hiding process, the upper bound of the magnitude of the first order derivative of histogram of the


0 10 20 300

1000

2000

3000LL

Orig.:101.1cox:98.4

-2 -1 0 1 20.5

1

1.5

x 104 HL

Orig.:121.6cox:115.0

-2 -1 0 1 2

0.6

0.8

1

1.2

1.4

1.6x 10

4 LH

Orig.:116.9cox:99.7

-2 -1 0 1 2

0.6

0.8

1

1.2

1.4

1.6

x 104 HH

Orig.:120.3cox:105.9

stego-image at x=0 reduces from that of the original image, which agrees with what depicted in Figure 2.3. With this simple graph illustration, we have partially verified our analysis made above. This is true in general in our experimental works with all of the 1096 images in the CorelDraw image database.

(a) Original color image (b) Grayscale image (c) Stego-image (Cox et al.’s SS)

Fig. 2.1. CorelDraw image No.18093

Fig. 2.2. Histogram of the first level wavelet subbands of image No. 18093

Fig. 2.3. Zoom in of Figure 2.2

0 100 2000

1000

2000

3000LL

Orig.:101.1cox:98.4

-100 -50 0 50 1000

0.5

1

1.5

2x 10

4 HL

Orig.:121.6cox:115.0

-100 -50 0 50 1000

0.5

1

1.5

2x 10

4 LH

Orig.:116.9cox:99.7

-100 -50 0 50 1000

0.5

1

1.5

2x 10

4 HH

Orig.:120.3cox:105.9

270 G. Xuan et al.

0 200 400 60010

0

105

LL

Orig.:101.1cox:98.4

0 200 400 60010

3

104

105

HL

Orig.:121.6cox:115.0

0 200 400 60010

3

104

105

LH

Orig.:116.9cox:99.7

0 200 400 60010

3

104

105

HH

Orig.:120.3cox:105.9

Fig. 2.4. Characteristic function of the first-level wavelet subbands of image No. 18093

3 Proposed Steganalysis Scheme

In this section, the M-D feature vector based on moments of CF’s of wavelet sub-bands of a test image, and the Bayes classifier used in our steganalysis scheme are presented.

3.1 39-D Feature Vector

As discussed above, the proposed 39-D feature vector includes, in its components, the 1st , 2nd and 3rd moments of the characteristic function of 13 subbands (the image itself, LL1, HL1, LH1, HH1, LL2, HL2, LH2, HH2, LL3, HL3, LH3, HH3).

Note that we choose to include the moments of CF’s generated from the DWT subbands, LLi, i=1,2,3, into feature vector for steganalysis as well. Our experimental works have shown that these features also make contributions towards the success of steganalysis. Readers are referred to Table 4.2 in Section 4.

We select the three-level DWT decomposition, and we use the first three order moments of the characteristic functions as features because our experimental investi-gation have shown that it does not improve performance further if we use more than three-level decomposition and/or use more than the first three order moments.

3.2 Bayes Classifier

In addition to feature selection, the design of classifier is another key element in steganalysis. It affects the classification performance in terms of classification success rate as well as computational complexity and, hence, implementation speed. There-fore, the classifier plays an important role in steganalysis.

In this paper, the Bayes classifier under the condition of Gaussian distribution is adopted to steganalyze test images, each represented by a 39-D feature vector, denote

by Xi, where i is the index of the test image. The notations of 1 2,ω ω are used to de-

note the class of original images and the class of stego-images, respectively. Assume that both image classes obey Gaussian distribution. The mean vectors and covariance


matrixes of 1ω and 2ω are denoted by 1 2,µ µ and 1 2,Σ Σ , respectively. The Bayes

classifier can be stated as follows [8]. A. Maximum posterior decision: if

1 2( / ) ( / )i iP X P Xω ω≥ , 1iX ω∈ (5)

else 2iX ω∈ (6)

where: 2

1

( ) ( / )( / ) , 1, 2

( ) ( / )

k i kk i

m i mm

P p XP X k

P p X

ω ωωω ω

=

= =∑

(7)

and ( ) ( )111 ,, Σ= µω ii XNXp , ( ) ( )222 ,, Σ= µω ii XNXp (8)

where N stands for normal (Gaussian) distribution.

B. Decision function: ( ) ( ) 2121 , ωω ∈∈≥ iiii XelseXXgXgIf (9)

where, ( ) ( ) kkkTki

T

kkikTiik XXXXg Σ−Σ−Σ+Σ−= −−− ln

2

1

2

1

2

1 111 µµµ (10)

4 Evaluation of the Proposed Steganalysis Method

To evaluate the proposed steganalysis scheme based on the multiple moments of wavelet characteristic function, we use the CorelDraw image database [5] as the ex-perimental image set. This database contains 1096 images in total, including images of various kinds, say, architecture, place, leisure, ocean, animal, food and so on. In the experiments, we randomly choose 5/6 of the 1096 CorelDraw images (specifically, 896 images in our experiments) for training purpose, following the common practice in the automatic recognition of Arabic numerals [9]. The remaining 1/6 of the 1096 images (specifically, the remaining 200 images) are used for testing purpose. The successful classification rate in steganalysis is referred to as detection rate in this paper. To be reliable, the detection rates are reported by averaging the rates obtained in multiple times (specifically 30 times) of such types of randomly conducted experiments.

In the first set of experiments, data are embedded into images by using the follow-ing five typical data embedding methods, i.e., the non-blind spread spectrum (SS) method by Cox et al. [7], the blind SS method by Piva et al. [10], the 8x8 block based SS method by Huang and Shi [11], a generic quantization index modulation (QIM) method by Chen et al. [12], and a generic least significant bit-plane method (LSB). The non-blind SS method by Cox et al. is noted for its strong robustness. The hidden data are a random number sequence obeying Gaussian distribution with zero mean and unit variance. The data are embedded into the 1000 coefficients of global discrete cosine transform (DCT) coefficients of the largest magnitudes except the DC coeffi-cient. The original cover image is needed for hidden data extraction. The SS method by Piva et al. is blind. That is, it does not need the original cover image for hidden data extraction. It embeds data into some 16,000 selected middle frequency DCT

272 G. Xuan et al.

coefficients. The block SS method by Huang Shi is also blind. Data are hidden in the low frequency block DCT coefficients. Note that the LSB is one type of methods widely used by many steganographic algorithms. A generic LSB data hiding method with embedding rate as 0.3 bpp (bit per pixel) is used in this experiment. For the QIM data hiding, some selected middle frequency of 8x8 block DCT coefficients are quantized to embed data. Here a typical JPEG quantization table is used. The quanti-zation step size used in the QIM scheme is 5. The data embedding capacity is set to be 0.1 bpp.

The consideration that various data hiding methods, in particular the SS methods, are included in our experimental investigation is justified as follows. Although it may not carry as many information bits as the LSB methods in general, the SS methods can still serve for the covert communication purpose. For example, a terrorist com-mand may need only to send a ‘GO’ command to his cell members for an attack. By the way, some newly developed SS methods can hide a large amount of data. For instance, a data embedding rate from 0.5 bpp (bits per pixel) to 0.75 bpp can now be easily achieved [e.g., 13]. In addition, the SS methods are known more robust than the LSB. Therefore, it is necessary to consider the SS methods for steganalysis.

In the second set of experiments, data are embedded into color images by using some steganographic tools, i.e., Jsteg [14], Outguess [15] and F5 [16], respectively.

4.1 Experimental Results with Five Typical Data Hiding Methods

For each of these five data hiding methods, 1096 stego-images are generated from the 1096 CorelDraw images. For each method, now we have 1096 pairs of images, one is the original image, another is the stego-image generated by the data hiding method. Then the 39-D feature vector as defined above is extracted from each of these 1096 pairs of images. The detection rate is reported by averaging over 30 times randomly conducted experiments. The test results are shown in the right-most column of Table 4.1. There TP stands for true positive, FP for false positive, and the average is the arithmetic average of TP and TN (true negative).

Table 4.1. Detection rate in the unit of % (averaged over 30 times experiments)

Harmsen’s [2] Farid’s [1] Proposed Data hiding methods

TP FP average TP FP average TP FP average Cox et al.’s SS 54.1 15.6 69.2 77.6 47.9 64.9 95.7 5.4 95.1 Piva et. al’s SS 91.8 45.9 73.0 86.5 10.9 87.8 96.1 10.8 92.6 Huang and Shi’ block SS

96.7 33.6 81.5 92.0 39.7 76.1 98.3 7.0 95.7

Generic QIM (0.1 bpp) 90.2 46.6 71.8 99.5 0.00 99.7 98.9 2.8 98.0 Generic LSB (0.3 bpp) 79.7 56.9 61.4 89.9 46.1 71.9 94.4 6.2 94.1 5 methods combined 85.9 62.1 77.9 67.6 20.4 69.6 84.5 8.4 85.7

To compare the performance of the proposed method with Farid’s method [1], we use the exactly same 72-D feature vector as proposed in [1], the same Bayes classifier used above, and the 1096 CorelDraw images to conduct the similar steganalysis ex-periments. The test results are shown in the middle column of Table 4.1.


We also use the features proposed in [2], the Bayes classifier introduced above, and the 1096 CorelDraw images to conduct the same experiments as described above. The corresponding results are shown in the left column of Table 4.1.

It is obvious from Table 4.1 that the proposed steganalysis scheme outperforms both of the prior arts proposed in [1,2].

By the combination of the five methods, it is meant that all the stego-images asso-ciated with the five methods and the original images are used together in experiments. Concretely, we now have 1096 6-tuple images with each 6-tuple having one original CorelDraw image, and five stego-images generated by these five data hiding methods, respectively. Again, 896 6-tuples are randomly selected for training and the remain-ing 200 6-tuples are used for testing. The purpose of this experiment is to examine if our proposed method can successfully detect stego-images from original images when all of these five data hiding methods are jointly considered. From Table 4.1, we can see the average detection rate is 86%. It is reasonable to see the combined detection rate is somehow lower than that obtained for each individual data hiding method. However, the detection rate of 86% indicates that our proposed scheme has some blind steganalysis capability. In other words, the proposed method has made a signifi-cant step towards establishment of a blind and powerful steganalysis system.

Table 4.2 contains the average detection rates obtained by applying each individual statistical moment alone for the non-blind spread spectrum (SS) data hiding method by Cox et al. [7]. The moments in the right-most three columns (referred to as the right side below) are the moments of CF’s of wavelet subbands (our proposed method), while the left side columns (from the 2nd to 4th columns in Table 4.2) are moments of wavelet subbands, which, excluding LLi, i=0,1,2,3, are proposed for steganalysis in [1]. It is clearly that each individual detection rate in the right side is higher than its counterpart in the left side, indicating that the proposed wavelet CF’s moments are more effective to steganalsysi than the moments of wavelet subbands. Furthermore, as pointed out in Section 3.1, the utilization of the moments of wavelet CF’s, specifically, LLi, i=0,1,2,3, has been justified. It is clearly observed that these moments do make relatively strong contribution to steganalysis.

Table 4.2. Average detection rate in unit of % by applying each feature alone

1st moment

of histogram 2nd moment of histogram

3rd moment of histogram

1st moment of CF

2nd mo-ment of CF

3rd moment of CF

LL0 50.4 50.1 50.3 63.4 65.8 54.0 LL1 50.5 50.3 50.2 63.8 62.9 54.9 LH1 50.1 50.0 50.0 54.7 54.9 54.7 HL1 50.1 50.0 50.0 54.4 54.9 54.6 HH1 50.2 50.0 50.0 55.2 54.9 55.2 LL2 50.5 50.4 50.5 64.2 56.2 53.6 LH2 50.1 50.0 50.0 55.5 55.4 55.3 HL2 50.2 50.0 50.0 55.8 55.6 55.6 HH2 50.1 50.0 50.1 51.7 52.3 53.0 LL3 50.5 50.5 50.5 56.6 52.9 54.2 LH3 50.5 50.1 50.0 62.3 61.6 59.9 HL3 50.3 50.1 50.0 61.5 59.2 55.9 HH3 50.1 50.0 50.0 51.6 52.1 51.3

274 G. Xuan et al.

4.2 Experimental Results in the Reduced Feature Space

In order to facilitate the visualization of steganalysis, we apply the Bhattacharyya distance technique [17] developed and utilized in the pattern recognition field to re-duce the M-D (M=39 in our case) feature vectors to r-D (r=3 in our case) feature vectors. According to [17], the matrix A in the dimensionality reduction is obtained by minimizing the upper bound of the detection error rate in the M-D space,

i.e., ( )AA

m εε min= , where ( ) rmAA ×= is the dimensionality reduction matrix.

With respect to all of the 1096 CorelDraw images, and the corresponding 1096 stego-images generated by applying the generic LSB data hiding method (data em-bedding rate is 0.3 bpp as described above), we apply the steganalysis methods in [1], [2] and the proposed method, respectively, to produce feature vectors according to [1], [2], and this paper. Then, all of the 2192 features vectors of 72-D [1] and 39-D (our proposed) are reduced to 3-D by using the above-mentioned Bhattacharyya dis-tance technique. Note that the 2192 feature vectors generated by [2] are 3-D vectors already. Figures 4.1 (a),(b),(c) display, respectively, the distribution of these 3-D feature vectors. There the red points denote the feature vectors of the original images, while the blue pints the feature vectors of the stego-images.

As shown in Figure 4.1, the detection rate of the proposed steganalsyis method with the 39-D feature vectors is 94.0%. When applying the Bhattacharyya distance technique to reduce the 39-D feature vectors to the 3-D feature vectors, the detection rate is 87.0%, indicating the detection rate does not lower much. With the steganalysis method in [2] is applied, the 3-D feature vectors are produced, the detection rate is 54.7%. With the steganalysis method in [1], the detection rate is 71.8% for 72-D fea-ture vectors, and is 50.1% for the reduced 3-D feature vectors.

It is observed from Figure 4.1 that the distribution of the 3-D feature vectors be-tween the original and the stego-images with the proposed steganalysis method are most clearly separable among these three steganalysis methods. This agrees with the difference among the detection rates reported in Table 4.1.

02000

40006000

800010000

0

500

1000

15000

200

400

600

800

1000

0100

200300

0100

200300

4000

100

200

300

400

500

01

23

4

x 104

01

23

4

x 104

0

0.5

1

1.5

2

2.5

x 104

(a) proposed (b) Harmsen (c) Farid

(39-D 94.1%, 3-D 87.0%) (3-D 61.4%) (72-D 71.9%, 3-D 50.1%)

Fig. 4.1. Distribution of 3-D feature vectors (CorelDraw image database, LSB data hiding)


4.3 Experimental Results with Three Staganographic Algorithms

Jsteg, OutGuess and F5 algorithms have been, respectively, applied to each of the 1096 color CorelDraw images to generate stego-images. Similar to [1], the central portions of some randomly selected CorelDraw images with sizes of

8080,4040,2020 ××× are embedded. For both original and stego color images, the

ICT has been applied to produce corresponding gray-scale images. Features are then generated from the gray-scale images for steganalysis. Bayes classifier has been used as classifier. The test results are shown in Table 4.3. Note that OutGuess sometimes cannot be applied to some color images to generate stego-images. As a result, only about half of 1096 CorelDraw images can hide a central portion of color image of size

8080× . Therefore, there are no test results of 8080× data hiding for OutGuess. Note that though the Bayes classifier is optimum when the priori probabilities obey

Gaussian distribution, non-linear classifiers such as neural network and SVM can generally provide better performance in pattern classification [8]. In addition, it is noted that detection rates can be improved significantly by collecting statistics from within and across all three color components [1]. These tasks are on our agenda of future work. Here by using the same Bayes classifier and the same procedure to col-lect features from converted gray-scale images, it is desired to compare the effective-ness of different feature sets in steganalysis. Table 4.3 indicates that our proposed feature set outperforms that proposed in [1] in general.

Table 4.3. Test results on several steganographic algorithms

Method JSteg F5 OutGuess Payload 10x10 20x20 40x40 80x80 10x10 20x20 40x40 80x80 10x10 20x20 40x40

Farid 51.9% 58.8% 80.3% 99.4% 49.7% 50.5% 51.1% 68.7% 59.8% 60.0% 75.4% Ours 54.6% 64.0% 75.5% 87.9% 50.1% 50.8% 56.1% 74.3% 77.1% 78.2% 82.7%

5 Conclusion

In this paper, we have proposed to use statistical moments of wavelet characteristic functions as features for steganalysis. In theoretical analysis and in extensive experi-ments, the superiority of the proposed features over statistical moments of wavelet subbands, which is discussed in [1], has been shown. Specifically, we show that the n-th moments of wavelet characteristic function are related to the magnitude of the n-th derivative of the histogram at different values, x, in the histogram. Note that when the x=0, the peak points of histograms of high frequency subbands are considered. There-fore, the proposed features are sensitive to the changes of the histogram of wavelet subbands caused by data hiding. Equivalently, the differentiation of histogram is more effective than integration of histogram for steganalysis. Graphs and experiment results support this observation.

We have also shown that, owing to the de-correlation property of wavelet decom-position, the wavelet based feature vector, i.e., adding the statistical moments of char-acteristic function of wavelet subbands is much more effective than the features ex-tracted from image in spatial domain alone as proposed in [2].

276 G. Xuan et al.

39-D feature vectors are proposed for steganalysis. It includes the 1st , 2nd and 3rd moments of characteristic function of the subbands with the 3-level Haar wavelet decomposition. Bayes classifier is adopted to classify the testing images.

Extensive experimental works have demonstrated that the proposed steganalysis system based on the proposed M-D feature vector is rather effective. For the non-blind spread spectrum data hiding method by Cox et al. which is the tough method for steganalysis, the detection rate reaches 95%, while the steganalysis schemes in [1] and in [2] implemented by us can only reach 65% and 69%, respectively.

Besides, a fit-in-for-all system is tested with the stego-images generated by all the five typical data hiding methods. The average correct classification rate is 86%. This promising result has pointed out a new and practical way towards blind and powerful steganalysis for future research. The test results on Jsteg, OutGuess and F5 have fur-ther demonstrated the effectiveness of the proposed steganalysis scheme.

In addition, all of these experiments are conducted over a set of images with a large size, which is considered necessary for steganalysis.

References

1. S. Lyu and H. Farid: Detecting hidden messages using higher-order statistics and support vector machines. Proc. of 5th International Workshop on Information Hiding, Noordwi-jkerhout, The Netherlands, 2002.

2. J. J. Harmsen: Steganalysis of Additive Noise Modelable Information Hiding. Master The-sis of Rensselaer Polytechnic Institute, Troy, New York , advised by Professor W. A. Pearlman, (2003)

3. A. Leon-Garcia: Probability and Random Processes for Electrical Engineering. 2nd Ed., Addison-Wesley (1994)

4. D. S. Mitrinovic, J. E. Pecaric and A. M. Fink: Classical and New Inequalities in Analysis. The Netherlands. Kluwer Academic Publishers (1993)

5. CorelDraw Software, www.corel.com.

6. C. Christopoulos, A. Skodras, and T. Ebrahimi: The JPEG2000 Still Image Coding Sysyem: An Overview. IEEE Transactions on Consumer Electronics, vol. 46. (Nov. 2000) 1103-1127

7. I. J. Cox, J. Kilian, T. Leighton and T. Shamoon: Secure Spread Spectrum Watermarking for Multimedia. IEEE Trans. on Image Processing, Vol.6 (1997) 1673-1687

8. K. Fukunaga: Introduction to Statistical Pattern Recognition, 2nd Edition, Academic Press Inc.. Boston. (1990)

9. The MNIST DATABASE of handwritten digits. Yann LeCun, NEC Research Institute. http://yann.lecun.com/exdb/mnist/

10. A. Piva, M. Barni, E Bartolini, V. Cappellini: DCT-based Watermark Recovering without Resorting to the Uncorrupted Original Image. Proc. of the 1997 International Conference on Image Processing vol. 1 (1997) 520

11. J. Huang and Y. Q. Shi: An adaptive image watermarking scheme based on visual masking. IEE Electronic Letters, vol. 34, (1998)748-750

12. B. Chen and G. W. Wornell: Digital watermarking and information embedding using dither modulation. Proc. of IEEE Second Workshop of Multimedia Signal Processing. Los Ange-les, CA. (Dec. 1998) 273-278


13. G. Xuan, Y. Q. Shi, Z. Ni, “Reversible data hiding using integer wavelet transform and companding technique,” IWDW04, Korea, October 2004.

14. Jsteg V4, by Derek Upham, is available at ftp.funet.fi 15. OutGuess, by Niels Provos, is available at www.outguess.org 16. F5, by A. Westfeld, is available at wwwrn.inf.tu-dresden.de/~westfeld/f5.html. 17. G. Xuan, P. Chai, M. Wu, “Bhattacharyya distance feature selection,” Proceedings of the

13th International Conference on Pattern Recognition, pp. 195-199, Aug. 25-29, 1996, Vienna, Austria.

Steganalysis Based on Multiple Features Formed by Statistical …shi/PaperDownload/steganalysis/IH… · · 2008-03-02Steganalysis Based on Multiple Features ... a steganalysis

Documents