Top Banner
A Markov Process Based Approach to Effective Attacking JPEG Steganography Yun Q. Shi, Chunhua Chen, Wen Chen New Jersey Institute of Technology Newark, NJ USA 07102 {shi,cc86}@njit.edu Abstract. In this paper, a new steganalysis scheme is presented to effectively detect the advanced JPEG steganography. For this purpose, we first choose to work on JPEG 2-D arrays formed from the magnitudes of JPEG quantized block DCT coefficients. Difference JPEG 2-D arrays along horizontal, vertical and diagonal directions are then used to enhance changes caused by JPEG steg- anography. Markov process is applied to modeling these difference JPEG 2-D arrays so as to utilize the second order statistics for steganalysis. In addition to the utilization of difference JPEG 2-D arrays, a thresholding technique is de- veloped to greatly reduce the dimensionality of transition probability matrices, i.e., the dimensionality of feature vectors, thus making the computational com- plexity of the proposed scheme manageable. The experimental works are pre- sented to demonstrate that the proposed scheme has outperformed the existing steganalyzers in attacking OutGuess, F5, and MB1. 1 Introduction Internet has become an important communication channel since the 90’s of the last century, through which emails, speeches, images and videos are easily transmitted and shared. With image steganography, covert communication through the Internet can al- so be conducted. Steganography is the art and science of “invisible” communication, which is to conceal the very existence of hidden messages. Images have many attributes, which make it suitable for steganography. Images can convey a large size of message. For in- stance, some steganographic method can accomplish a steganographic proportion that exceeds 13% of the image file size [1]. Because the non-stationarity of images, the image steganography is hard to attack. Especially, as the interchange of digital images is frequently used nowadays, image steganography becomes promising. Recently, research in the field of JPEG (Joint Photographic Experts Group) steg- anography has become active as JPEG images are used popularly. Many steg- anographic techniques operating on JPEG images have been published and become publicly available. Most of the techniques in this category modify the LSB (least sig- nificant bit) of the block discrete cosine transform (BDCT) coefficients, which are the
16

A Markov Process Based Approach to Effective Attacking ...shi/PaperDownload/steganalysis/IHW06.pdf · A Markov Process Based Approach to Effective Attacking JPEG Steganography ...

Apr 25, 2018

Download

Documents

phamhuong
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • A Markov Process Based Approach

    to Effective Attacking JPEG Steganography

    Yun Q. Shi, Chunhua Chen, Wen Chen

    New Jersey Institute of Technology

    Newark, NJ USA 07102 {shi,cc86}@njit.edu

    Abstract. In this paper, a new steganalysis scheme is presented to effectively

    detect the advanced JPEG steganography. For this purpose, we first choose to

    work on JPEG 2-D arrays formed from the magnitudes of JPEG quantized

    block DCT coefficients. Difference JPEG 2-D arrays along horizontal, vertical

    and diagonal directions are then used to enhance changes caused by JPEG steg-

    anography. Markov process is applied to modeling these difference JPEG 2-D

    arrays so as to utilize the second order statistics for steganalysis. In addition to

    the utilization of difference JPEG 2-D arrays, a thresholding technique is de-

    veloped to greatly reduce the dimensionality of transition probability matrices,

    i.e., the dimensionality of feature vectors, thus making the computational com-

    plexity of the proposed scheme manageable. The experimental works are pre-

    sented to demonstrate that the proposed scheme has outperformed the existing

    steganalyzers in attacking OutGuess, F5, and MB1.

    1 Introduction

    Internet has become an important communication channel since the 90s of the last

    century, through which emails, speeches, images and videos are easily transmitted and

    shared. With image steganography, covert communication through the Internet can al-

    so be conducted.

    Steganography is the art and science of invisible communication, which is to

    conceal the very existence of hidden messages. Images have many attributes, which

    make it suitable for steganography. Images can convey a large size of message. For in-

    stance, some steganographic method can accomplish a steganographic proportion that

    exceeds 13% of the image file size [1]. Because the non-stationarity of images, the

    image steganography is hard to attack. Especially, as the interchange of digital images

    is frequently used nowadays, image steganography becomes promising.

    Recently, research in the field of JPEG (Joint Photographic Experts Group) steg-

    anography has become active as JPEG images are used popularly. Many steg-

    anographic techniques operating on JPEG images have been published and become

    publicly available. Most of the techniques in this category modify the LSB (least sig-

    nificant bit) of the block discrete cosine transform (BDCT) coefficients, which are the

  • outcomes of block-wise two-dimensional (2-D) DCT followed by quantization using

    JPEG quantization table.

    In this paper we look at three recent published and most advanced steganographic

    methods, i.e., Outguess [2], F5 [1], and the model-based steganography (MB) [3].

    OutGuess constructs a universal steganographic framework, which embeds hidden

    data using the redundancy of a cover image. For JPEG images, OutGuess preserves

    statistics of the BDCT coefficient histogram. Two measures are taken to reduce the

    change on the cover image introduced by data embedding. Before embedding, Out-

    Guess identifies the redundant BDCT coefficients which have least effect on the cover

    image and will be modified if necessary during the data embedding. It also adjusts the

    untouched coefficients during the embedding procedure to preserve the original histo-

    gram of the BDCT coefficients after embedding.

    F5 was developed from Jsteg, F3, and F4. JPEG is the only image format that F5

    works with. F5 takes two main actions to increase the security against steganalysis at-

    tacks: straddling and matrix coding. Straddling scatters the message as uniformly as

    possible over the cover image to equalize the change density. With matrix embedding,

    F5 improves the embedding efficiency that is defined as the number of bits embedded

    per change of BDCT coefficient. Generally speaking, the smaller the embedding mes-

    sage size is, the larger the embedding efficiency of F5 is.

    In general, the hidden data may be uncorrelated to the cover image, which is util-

    ized by many steganalysis algorithms to attack the data hiding algorithms. MB em-

    bedding tries to make the embedded data correlated to the cover image. This is real-

    ized by splitting the cover image into two parts, modeling the parameter of the

    distribution of the second part given the first part, encoding the second part using the

    model and to-be-embedded message, and then combining the two parts to form the

    stego image. In embedding method MB1 ([3]), which operates on JPEG images, a

    Cauchy distribution is used to model the JPEG BDCT mode histogram. The embed-

    ding procedure keeps the lower precision version of the BDCT mode histogram un-

    changed.

    To attack steganography, some steganalysis schemes have been proposed. There

    are two categories, i.e., specific steganalysis and universal steganalysis [4]. Specific

    steganalysis concentrates on detecting some particular steganographic tool and has

    good performance on this steganographic tool if well designed. Universal steganalysis

    yet tries to steganalyze any steganographic tool, known or unknown in advance.

    Farid proposed a universal steganalyzer based on images high order statistics [5].

    Quadrature mirror filters are used to decompose the image into wavelet subbands and

    then the high order statistics are calculated for each high frequency subband. The sec-

    ond set of statistics is calculated for the errors in an optimal linear predictor of the co-

    efficient magnitude. Both sets of statistical moments are used as features for stegana-

    lysis. It can achieve generally better detection rate than random guess for universal

    steganographic methods.

    In [6], Shi et al presented a universal steganalysis system. The statistical moments

    of characteristic functions of the image, its prediction-error image, and their discrete

    wavelet transform (DWT) subbands are selected as features. All of the low-low wave-

    let subbands are also used in their system. This steganalyzer can provide a better per-

    formance than [5] in general.

  • In [7], Fridrich has proposed a set of distinguishing features from the BDCT do-

    main and spatial domain aiming at detecting information embedded in JPEG images.

    The statistics of the original image are estimated by decompressing the JPEG image

    followed by cropping the four rows and four columns on the boundary, and then re-

    compressing the cropped image to JPEG format using the original quantization table.

    The author claimed that the obtained image has statistical properties very much simi-

    lar to that of the cover image. Features for steganalysis are generated from the statis-

    tics of the JPEG image and its estimated version. Designed specifically for detecting

    JPEG steganography, this scheme performs better than [5, 6] in attacking JPEG steg-

    anography [1, 2, 3].

    Recently, a specific steganalysis scheme detecting data hidden with spread spec-

    trum method is proposed, in which the inter-pixel dependencies are used and a Mark-

    ov chain model is adopted [8]. The empirical transition matrix of a given test image is

    formed. This matrix has a dimensionality of 256256 for a grayscale image with a bit

    depth of 8. That is, this matrix has 65,536 elements. Obviously, these elements cannot

    be straightforwardly used as features. The authors select several largest probabilities

    along the main diagonal together with their neighbors, and some randomly selected

    probabilities along the main diagonal as features. As a result, some information loss is

    inevitable due to the random fashion of feature selection. Furthermore, this method

    uses Markov chain only along horizontal direction, which cannot reflect the 2-D na-

    ture of digital image.

    In this paper, a new steganalysis scheme is presented to effectively detect the ad-

    vanced JPEG steganography. First, we choose to work on JPEG 2-D arrays to formu-

    late features for steganalysis. Difference JPEG 2-D arrays along horizontal, vertical

    and diagonal directions are then used to generally enhance changes caused by JPEG

    steganography. Markov process is applied to modeling these difference JPEG 2-D ar-

    rays so as to utilize the second order statistics for steganalysis. In addition to the utili-

    zation of difference JPEG 2-D arrays, a thresholding technique is developed to greatly

    reduce the dimensionality of transition probability matrixes, i.e., the dimensionality of

    feature vectors, thus making the computational complexity of the proposed scheme

    manageable. The experimental works are presented to demonstrate that the proposed

    scheme has outperformed the state-of-the-arts in attacking OutGuess, F5, and MB1.

    The rest of this paper is organized as follows. The feature construction procedure is

    described in Section 2. In Section 3, support vector machine, the classifier used in our

    investigation, is introduced. Experimental results are given in Section 4. Next, some

    discussion is made in Section 5. Finally, conclusion is drawn in Section 6.

    2 Feature Construction

    In this paper, steganalysis is considered as a task of two-class pattern recognition.

    That is, a given test image needs to be classified as either a stego image (with hidden

    data) or a non-stego image (without hidden data). Therefore, feature construction is a

    key step in the steganalysis.

  • As mentioned in Section 1, modern steganorgraphic methods such as OutGuess and

    MB have made great efforts to keep the changes of BDCT coefficients caused by data

    hiding as less as possible. In particular, they attempt to keep the changes on the histo-

    gram of JPEG coefficients as less as possible. Under these circumstances, we propose

    to use the second order statistics as features for steganalysis to detect these JPEG

    steganographic methods.

    In this section, we first define the JPEG 2-D array, followed by introducing the dif-

    ference JPEG 2-D array along different directions. We then propose to model the dif-

    ference JPEG 2-D array using Markov random process. According to the theory of

    random process, the transition probability matrix can be used to characterize the

    Markov process. Our proposed features are derived from the transition probability ma-

    trix. In order to achieve an appropriate balance between steganalysis capability and

    computational complexity, we use the so-called one-step transition probability matrix

    in this work. In order to further reduce computational cost by reducing the dimension-

    ality of feature vectors, we resort to a thresholding technique.

    2.1 JPEG 2-D Array

    Generating features from the exact 88 block discrete cosine transform (BDCT) do-

    main to attack the steganographic algorithms operating on JPEG images is natural and

    reasonable. For this purpose, it is necessary to first study the property of JPEG BDCT

    coefficients. Su

    Sv

    Fig. 1. A sketch of JPEG coefficient 2-D array

    For a given image, consider a 2-D array consisting of all of the 88 block DCT co-

    efficients which have been quantizated with a JPEG quantization table and have not

    been zig-zag scanned, run-length coded and Huffman coded. That is, this 2-D array

    has the same size as the given image with each 88 block filled up with the corre-

    sponding JPEG quantized 88 block DCT coefficients. Furthermore, we take absolute

    value for each DCT coefficient, resulting in a 2-D array as shown in Figure 1. We call

    this resultant 2-D array as JPEG 2-D array in this paper. The features proposed in this

    scheme are formed from the JPEG 2-D array.

  • The reason for taking absolute values is discussed below. Note that these JPEG

    BDCT quantized coefficients can be either positive, or negative, or zero. It is known

    that the BDCT coefficients have been decorrelated effectively. Since the BDCT coef-

    ficients in general do not obey Gaussian distribution, however, these coefficients are

    not statistically independent of each other. It is also well-known that the power of an

    88 block of DCT coefficients is highly concentrated in the DC (direct current) and

    low-frequency AC (alternative current) coefficients. The JPEG quantization, after

    which the majority of high-frequency BDCT AC coefficients may become zero, fur-

    ther enhances this disparity in power distribution among quantized BDCT coefficients.

    The general trend in power distribution of the BDCT coefficients in each block is non-

    increasing along the zig-zag scan order of all of the DCT coefficients in the block if

    we ignore some up-and-down of small magnitudes. This is consistent with the fact that

    the zig-zag scanning makes the use of run-length coding efficient [9]. Combining the-

    se observations, we can state that the magnitude of the non-zero BDCT coefficients is

    somehow correlated each other along the zig-zag scan order. Hence, there exists the

    correlation among the absolute values of the BDCT coefficients along horizontal, ver-

    tical and diagonal directions. This observation can be further justified by observing

    Figure 3 shown below. That is, the difference of the absolute values of two immedi-

    ately (horizontally in Figure 3) neighboring BDCT coefficients are highly concen-

    trated around 0, having a Laplacian-like distribution. The same is true along the verti-

    cal and diagonal directions.

    In addition, the steganographic methods operating on the JPEG images do not

    touch the DCT DC coefficients nor change the sign of the DCT AC coefficients dur-

    ing data embedding [2, 3] (note that a DCT coefficient with a non-zero magnitude

    changing to zero is not a sign change). Further discussion in this regard is made in

    Section 5.1, which shows that taking absolute value results in higher detection rates in

    general and lower computational complexity.

    2.2 Difference JPEG 2-D Array

    According to [6], the disturbance caused by the data embedding manifests itself more

    obviously in the prediction-error image than in the original test image. Hence, it is ex-

    pected that the disturbance caused by the steganographic methods in JPEG images can

    be enlarged by observing the difference between an element and one of its neighbors

    in the JPEG 2-D array. For this purpose, we consider the following four difference

    JPEG 2-D arrays.

    Denote the JPEG 2-D array generated from a given test image by ( , )F u v

    ( [1, ], [1, ])u vu S v S , where uS is the size of the JPEG 2-D array in horizontal direc-

    tion and vS in vertical direction. Then as shown in Figure 2, the difference arrays are

    generated by the following formulae:

    ( , ) ( , ) ( 1, )hF u v F u v F u v= + , (1)

    ( , ) ( , ) ( , 1)vF u v F u v F u v= + , (2)

  • ( , ) ( , ) ( 1, 1)dF u v F u v F u v= + + , (3)

    ( , ) ( 1, ) ( , 1)mdF u v F u v F u v= + + , (4)

    where [1, 1], [1, 1]u vu S v S and ( , ), ( , ), ( , ), ( , )h v d mdF u v F u v F u v F u v denote the

    difference arrays in the horizontal, vertical, main diagonal, and minor diagonal direc-

    tions, respectively.

    _=

    2JPEG D Array Horizontal

    Difference Array (a)

    2JPEG D Array Vertical

    Difference Array

    _=

    (b)

    _=

    2JPEG D Array

    Main Diagonal

    Difference Array

    (c)

    _=

    2JPEG D Array

    Minor Diagonal

    Difference Array

  • (d)

    Fig. 2. The generation of four difference JPEG 2-D arrays. Parts (a), (b), (c), and (d) corre-

    spond to horizontal, vertical, main diagonal, and minor diagonal difference JPEG 2-D arrays,

    respectively

    It is observed that the distribution of the elements of the above-described difference

    arrays is Laplacian-like. Most of the difference values are close to zero. In our ex-

    perimental works reported in this paper, an image set consisting of 7560 JPEG images

    with quality factors ranging from 70 to 90 is used. The arithmetic average of the his-

    tograms of the horizontal difference JPEG 2-D arrays generated from this JPEG image

    set and the histogram of the horizontal difference JPEG 2-D array generated from a

    randomly selected image from the set are shown in Figure 3 (a) and (b), respectively.

    It is observed that most elements in the horizontal difference JPEG 2-D arrays fall into

    the interval [-T, T] as long as T is large enough. The values of mean and standard de-

    viation of percentage number of elements of horizontal difference JPEG 2-D arrays

    for the image set falling into [-T, T] when T = {1, 2, 3, 4, 5, 6, 7} are shown in Table

    1. Both Figure 3 and Table 1 support the claim of Laplacian-like distribution of the

    elements of the horizontal difference JPEG 2-D arrays. The same is true for the differ-

    ence JPEG 2-D array along other three directions.

    Table 1. Mean and standard deviation of percentage numbers of elements of horizontal

    difference JPEG 2-D arrays falling within [-T, T] for T = 1, 2, 3, 4, 5, 6, 7

    [-1, 1] [-2, 2] [-3, 3] [-4, 4] (*) [-5, 5] [-6, 6] [-7, 7]

    Mean 84.72 88.58 90.66 91.99 92.92 93.60 94.12

    Standard deviation 5.657 4.243 3.464 2.836 2.421 2.104 1.850

    * 91.99% is the mean, meaning that on statistic average 91.99% of all elements of

    horizontal difference arrays generated from the image set fall into the range [-4, 4].

    The standard deviation is 2.836%.

    (a) (b)

  • Fig. 3. Histogram plots. Part (a) displays the statistical average of histograms of horizontal dif-

    ference arrays generated from the image set consisting of 7560 JPEG images with quality fac-

    tors ranging from 70 to 90. Part (b) corresponds to the histogram of horizontal difference array

    of a randomly selected image in the set

    2.3 Transition Probability Matrix

    As mentioned before, the modern steganographic methods such as OutGuess and MB

    have made great efforts to keep the changes on the histogram of JPEG BDCT coeffi-

    cients as less as possible during data embedding. Therefore, we propose to use higher

    order statistics for steganalyzing the JPEG steganography. In this work the second or-

    der statistics are used in order not to increase the computational complexity dramati-

    cally.

    We propose to model the above-defined difference JPEG 2-D arrays by using

    Markov random process. According to the theory of random process, the transition

    probability matrix can be used to characterize the Markov process. There are so-called

    one-step transition probability matrix and n-step transition probability matrix [10].

    Roughly speaking, the former refers to the transition probabilities between two imme-

    diately neighboring elements in the difference JPEG 2-D array while the latter refers

    to the transition probabilities between two elements separated by (n-1) elements. In

    order to have a suitable balance between high steganalysis capability and manageable

    computational complexity, we only use the one-step transition probability matrix in

    this work, as shown in Figure 4.

    Fig. 4. The formation of the transition probability matrices

    In order to further reduce computational complexity, we resort to a thresholding

    technique. That is, we select a threshold value T, meaning that we only consider those

    elements in the difference JPEG 2-D arrays whose value falls into {-T, -T+1, , -1, 0,

    1, , T-1, T}. If an element whose value is either larger than T or smaller than T, it

    will be represented by T or T correspondingly. This procedure results in a transition

    probability matrix of dimensionality (2T+1)(2T+1). The elements of these four ma-trices associated with the horizontal, vertical, main diagonal and minor diagonal dif-

    ference JPEG 2-D arrays are given by

  • 1 1

    1 1

    1 1

    1 1

    ( ( , ) , ( 1, ) )

    { ( 1, ) | ( , ) }

    ( ( , ) )

    v u

    v u

    S S

    v u

    S S

    v u

    F u v m F u v n

    p F u v n F u v m

    F u v m

    = =

    = =

    = + =

    + = = =

    =

    , (5)

    1 1

    1 1

    1 1

    1 1

    ( ( , ) , ( , 1) )

    { ( , 1) | ( , ) }

    ( ( , ) )

    v u

    v u

    S S

    v u

    S S

    v u

    F u v m F u v n

    p F u v n F u v m

    F u v m

    = =

    = =

    = + =

    + = = =

    =

    , (6)

    1 1

    1 1

    1 1

    1 1

    ( ( , ) , ( 1, 1) )

    { ( 1, 1) | ( , ) }

    ( ( , ) )

    v u

    v u

    S S

    v u

    S S

    v u

    F u v m F u v n

    p F u v n F u v m

    F u v m

    = =

    = =

    = + + =

    + + = = =

    =

    , (7)

    1 1

    1 1

    1 1

    1 1

    ( ( 1, ) , ( , 1) )

    { ( , 1) | ( 1, ) }

    ( ( 1, ) )

    v u

    v u

    S S

    v u

    S S

    v u

    F u v m F u v n

    p F u v n F u v m

    F u v m

    = =

    = =

    + = + =

    + = + = =

    + =

    , (8)

    where { , 1, ,0, , }, { , 1, ,0, , }m T T T n T T T + + , and

    1, ( , ) , ( , 1)( ( , ) , ( , 1) )

    0,

    if F u v m F u v nF u v m F u v n

    Otherwise

    = + == + = =

    . (9)

    In summary, we have (2T+1)(2T+1) elements for each of these four transition

    probability matrices. In total, we have 4(2T+1)(2T+1) elements. All of them are

    serving as features for steganalysis. In other words, we have 4(2T+1)(2T+1)-D feature vectors for steganaysis. It is clear that we should choose a proper T value for

    good steganalysis capability with manageable computational complexity.

    For this reason, in our experimental works, we set the threshold, T, equal to 4 ac-

    cording to our statistical study shown in Figure 3 and Table 1. Hence, if an element

    has an absolute value larger than 4, this element is reassigned a new absolute value 4

    without sign change. The resultant transition probability matrix is of 99 for each of

    the four difference JPEG 2-D arrays. That is, 99 = 81 elements in each of these four

    transition probability matrices, or equivalently, we have 814 = 324 elements in total.

    The feature construction procedure is summarized in Figure 5.

  • JPEG

    Coefficient

    Array

    Given

    Image

    Horizontal

    Difference

    Array

    Vertical

    Difference

    Array

    Main

    Diagonal

    Difference

    Array

    (2T+1)(2T+1)

    Feature Components

    Transition

    Probability

    Matrix

    | |

    Minor

    Diagonal

    Difference

    Array

    Transition

    Probability

    Matrix

    Transition

    Probability

    Matrix

    Transition

    Probability

    Matrix

    (2T+1)(2T+1)

    Feature Components

    (2T+1)(2T+1)

    Feature Components

    (2T+1)(2T+1)

    Feature Components

    (2T+1)(2T+1)4-D

    Feature Vector

    Fig. 5. The block diagram of the feature formation procedure

    3 Support Vector Machine

    The support vector machine (SVM) is a kind of popularly used classifiers for pattern

    recognition. It is easier to use than neural network (NN) while its performance is com-

    parable to the NN.

    SVM is based on the idea of hyperplane classifier. It uses Lagrangian multipliers to

    find the optimal separation hyperplane which distinguishes the positive pattern from

    the negative pattern. If the feature vectors are one-dimensional (1-D), the separation

    hyperplane reduces to a point on the number axis.

    SVM can handle both linear separable and no-linear separable cases. Denote the

    training data pairs by { , }, 1, ,i i i l =y , where N

    i Ry is the feature vector, N is the

    dimensionality of the feature vectors, and 1i = for positive/negative pattern class.

    In the steganalysis context, an image with hidden data (stego-image) is considered as a

    positive pattern while an image without hidden data is considered as a negative pattern.

    The linear support vector algorithm looks for a hyperplane : 0TH b+ =w y and two

    hyperplanes 1 : 1TH b+ = w y and 2 : 1

    TH b+ =w y parallel to and with equal dis-

    tances to H with the condition that there are no data points between 1H and 2H and

    the distance between 1H and 2H is maximized, where w and b are the parameters

    to be optimized. Once the SVM has been trained, the novel exemplar z from the test-ing data can be classified using w and b .

    For non-linearly case, the learning machine maps the input feature vectors to a

    higher dimensional space where a linear hyperplane is located by using kernel func-

    tion. There are three basic kernels: polynomial, radial basis function and sigmoid. For

  • more detailed information about SVM, readers please refer to [11]. In our investiga-

    tion, the polynomial kernel is used [12].

    4 Experiments and Results

    4.1 Image set

    As mentioned in Section 2, an image set consisting of 7560 JPEG images with quality

    factors ranging from 70 to 90 is used in our experimental work. Among these 7560

    images, 2500 images were taken by members of our research group in different places

    at different time with different digital cameras; the other 5060 images were download-

    ed from the Internet. Each image was cropped (central portion) to the dimension of ei-

    ther 768512 or 512768. Some sample images are given in Figure 6.

    The images shown in Figure 6 are color images. In our experiments, the chromi-

    nance components are set to be zero while the luminance coefficients are untouched

    before data embedding.

  • Fig. 6. Some sample images used in this experimental work

    4.2 Stego images generation

    Our experiments focus on attacking the Outguess, F5, and MB1 steganographic meth-

    ods. The codes for these algorithms are publicly available [13, 14, 15].

    As mentioned before, there are quite a few zero coefficients in the JPEG 2-D array.

    Also, the amount of zero coefficients per image varies from image to image. Therefore,

    the absolute embedding rate of each image also varies. A reasonable way to define

    embedding rate is to consider a ratio between message length to non-zero elements in

    the JPEG 2-D array. The ratio is often measured in the unit of bpc, i.e., bits per non-

    zero BDCT AC coefficients (after quantization). In our experimental works, the con-

    sidered embedding rates for OutGuess are 0.05, 0.1, and 0.2 bpc, respectively. The

    numbers of stego image generated are 7498, 7452, and 7215, respectively. For F5 and

    MB1, we consider four embedding rates, 0.05, 0.1, 0.2, and 0.4 bpc. For each rate, we

    have 7560 stego images. Note that we set step size equal to two when implementing

    MB1.

    4.3 Experimental results obtained by using SVM with polynomial kernel

    We randomly select 1/2 of the non-stego and stego image pairs to train the SVM clas-

    sifier and the remaining 1/2 pairs to test the trained classifier. We use Farids [4], Shi

    et al.s [5], and Fridrichs [6], and our proposed steganalyzerss features to detect

    OutGess, F5 and MB schemes. The test results shown in Table 2 are the arithmetic

    average of 20 random experiments with polynomial kernel.

    Table 2. Performance comparison using polynomial kernel (in the unit of %; TN stands for true

    negative rate, TP stands for true positive rate, and AR stands for accuracy)

    bpc Farids Shi et al.s Fridrichs Our Proposed

    TN TP AR TN TP AR TN TP AR TN TP AR

    OutGuess 0.05 59.0 57.6 58.3 55.6 58.5 57.0 49.8 75.4 62.6 87.6 90.1 88.9 OutGuess 0.1 70.0 63.5 66.8 61.4 66.3 63.9 68.9 83.3 76.1 94.6 96.5 95.5 OutGuess 0.2 81.9 75.3 78.6 72.4 77.5 75.0 90.0 93.6 91.8 97.2 98.3 97.8

    F5 0.05 55.6 45.9 50.8 57.9 45.0 51.5 46.1 61.0 53.6 58.6 57.0 57.8 F5 0.1 55.5 48.4 52.0 54.6 54.6 54.6 58.4 63.3 60.8 68.1 70.2 69.1 F5 0.2 55.7 55.3 55.5 59.5 63.3 61.4 77.4 77.2 77.3 85.8 88.3 87.0

  • F5 0.4 62.7 65.0 63.9 71.5 77.1 74.3 92.6 93.0 92.8 95.9 97.6 96.8 MB1 0.05 48.5 53.2 50.8 57.0 49.2 53.1 39.7 66.9 53.3 79.4 82.0 80.7

    MB1 0.1 51.9 52.3 52.1 57.6 56.6 57.1 45.6 70.1 57.9 91.2 93.3 92.3 MB1 0.2 52.3 56.7 54.5 63.2 66.7 65.0 58.3 77.5 67.9 96.7 97.8 97.3 MB1 0.4 55.3 63.6 59.4 74.2 80.0 77.1 82.9 86.8 84.8 98.8 99.4 99.1

    It is observed that our proposed steganalyzer outperforms the prior-arts by a sig-

    nificant margin. The detection rate for F5 at the same embedding rate is lower than

    that of MB1. This will be discussed in the next section.

    4.4 Experimental results with features from one direction at a time

    We also implement experiment with reduced dimensionality of feature vectors in or-

    der to examine the contributions made by features along different directions. Hence,

    we use features from only one direction at a time. The results shown in Table 3 are the

    arithmetic average of 20 random experiments with polynomial kernel.

    It is observed that the contributions made from the horizontal and vertical direc-

    tions are more than that from the main diagonal and minor diagonal directions. Fur-

    thermore, the contribution made from the main diagonal is larger than that from the

    minor diagonal direction. Comparing Table 2 and Table 3, we can observe that com-

    bining four directions has enhanced the detection rate in attacking JPEG steganogra-

    phy.

    Table 3. Detection rate with reduced feature space (in the unit of %)

    bpc Horizontal Vertical Main Diagonal Minor Diagonal

    TN TP AR TN TP AR TN TP AR TN TP AR

    OutGuess 0.05 77.7 82.6 80.1 78.9 83.1 81.0 75.9 79.0 77.5 73.8 77.4 75.6 OutGuess 0.1 89.1 95.0 92.0 90.5 95.4 93.0 88.8 93.1 90.9 86.6 92.3 89.4

    OutGuess 0.2 95.4 98.3 96.8 95.8 98.2 97.0 95.3 97.9 96.6 93.8 97.5 95.6

    F5 0.05 55.8 53.7 54.7 56.7 52.4 54.6 51.6 56.3 54.0 51.3 52.9 52.1 F5 0.1 61.6 62.3 62.0 61.7 62.3 62.0 57.4 62.8 60.1 54.2 56.9 55.5

    F5 0.2 75.0 79.8 77.4 75.8 80.2 78.0 71.8 76.2 74.0 61.4 65.7 63.6

    F5 0.4 91.5 95.6 93.5 91.3 95.7 93.5 89.1 92.5 90.8 77.4 82.7 80.1

    MB1 0.05 69.9 72.4 71.1 70.6 72.8 71.7 67.6 69.6 68.6 66.1 67.4 66.7 MB1 0.1 82.5 87.9 85.2 83.7 87.7 85.7 81.2 84.4 82.8 78.1 82.5 80.3

    MB1 0.2 92.5 96.4 94.4 94.1 96.8 95.5 92.8 95.6 94.2 90.1 93.9 92.0

    MB1 0.4 97.6 98.9 98.2 98.2 99.4 98.8 97.9 99.1 98.5 96.5 98.7 97.6

  • 5 Discussion

    Some further discussions are made in this section.

    5.1 Taking absolute values in JPEG 2-D array: advantages

    In Section 2, we have indicated that the magnitude (i.e., absolute values) of the neigh-

    boring BDCT coefficients are correlated to each other and the known JPEG steg-

    anographic algorithms do not change the signs of coefficients. These motivated us to

    take absolute values of the JPEG coefficients in forming JPEG 2-D array. Now we

    continue this discussion.

    We shall show that if we do not take absolute values, the performance of the stega-

    nalysis will go down and the computational complexity will increase.

    Lets consider the formulation of JPEG 2-D array without taking absolute value.

    While forming difference 2-D array, the dynamic range will obviously increase. Hence,

    a larger threshold value T has to be used to build four Markov transition probability

    matrices. Assume that we set up a new threshold ' 8T = , thus resulting in four transi-tion probability matrices of 1717 each. The resultant feature dimensionality will in-

    crease to 17174 = 1156, which raises the computation cost significantly. Table 4

    provides a performance comparison between using 324-D feature vectors (T=4) and

    using 1156-D feature vectors (T=8) for attacking the MB1 with an embedding rate

    0.2 bpc. Our experiments indicate that this trend of performance reduction also holds

    for other embedding rates, and for OutGuess and F5 as well. Obviously, taking abso-

    lute value provides higher detection rates and lower computational complexity.

    Table 4. Performance comparison: with and without taking absolute value (in the unit of %)

    bpc with without

    TN TP AR TN TP AR

    MB1 0.2 96.7 97.8 97.3 93.9 94.2 94.1

    5.2 Detection rates for F5

    Taking a close look at Table 2, one can observe that the detection rates achieved by

    our proposed steganalyzer for MB1 are higher than that for F5 at the same embedding

    rates. It appears contradicting to what reported in [7, 16]. In what follows we discuss

    this issue from two angles.

    One is from a theoretical analysis. We can show that a steganographic method,

    which always reduces the magnitude of a non-zero DCT AC coefficients by one in or-

    der to embed a bit (F5 belongs to this category), will have a relatively larger probabil-

    ity to keep the elements in the difference JPEG 2-D array unchanged after the data

    embedding than another steganographic method, which has equal probability to in-

    crease the magnitude of a non-zero DCT AC coefficient by one or decrease the magni-

    tude by one in order to embed one bit.

  • Another angle is from an experimental investigation, which is based on the image

    set used in our experimental works, i.e., 7560 JPEG images with quality factors rang-

    ing from 70 to 90. In the experimental study, the mean values of embedding efficiency

    (defined in Section 1) of MB1 and F5 at four different data embedding rates, i.e., 0.05

    bpc, 0.1 bpc, 0.2 bpc and 0.4 bpc are obtained and listed in Table 5. From these statis-

    tical means, one can observe that at the low rates such as 0.05 bpc and 0.1 bpc, F5

    changes fewer DCT coefficients than MB1 does. The opposite is true at the high rates

    such as 0.2 bpc and 0.4 bpc. This statistics reveals some inside information, which can

    partially explain the phenomenon. Obviously, further investigation in this regard is

    needed, which is our future work.

    Table 5. The mean value of embedding efficiency

    bpc

    0.05 0.1 0.2 0.4

    F5 2.8695 2.4586 2.0606 1.7484

    MB1 2.1141 2.1139 2.1142 2.1141

    6 Conclusion

    We have presented an effective steganalysis scheme in this paper, which outperforms

    the state-of-the-arts in detecting the modern steganographic methods for JPEG images:

    OutGuess, F5, and MB1. The success can be attributed to the following measures

    taken in this new scheme.

    (1) Taking absolute values in forming JPEG 2-D arrays not only helps raise stega-

    nalysis capability but also helps reduce computational complexity.

    (2) Difference JPEG 2-D arrays along horizontal, vertical, diagonal and minor di-

    agonal directions have enlarged changes caused by steganographic methods.

    (3) Thresholding technique applied to handle transition probability matrices has

    greatly reduced dimensionality of feature vectors to a manageable extent.

    (4) Through using Markov process to model difference JPEG 2-D arrays and using

    all of elements of transition probability matrices as features the second order statistics

    have been used in this proposed steganalyzer.

    References

    1. A. Westfeld, F5 a steganographic algorithm: High capacity despite better steganalysis, 4th

    International Workshop on Information Hiding, Pittsburgh, PA, USA, 2001

    2. N. Provos, Defending against statistical steganalysis, 10th USENIX Security Symposium,

    Washington DC, USA, 2001

    3. P. Sallee, Model-based methods for steganography and steganalysis, International Journal

    of Image and Graphics, 5(1): 167-190, 2005

  • 4. M. Kharrazi, H. T. Sencar, and N. Memon, Image Steganography: Concepts and Practice,

    Lecture Note Series, Institute f or Mathematical Sciences, National University of Singapore,

    2004

    5. H. Farid, Detecting hidden messages using higher-order statistical models, International

    Conference on Image Processing, Rochester, NY, USA, 2002

    6. Y. Q. Shi, G. Xuan, D. Zou, J. Gao, C. Yang, Z. Zhang, P. Chai, W. Chen, and C. Chen,

    Steganalysis based on moments of characteristic functions using wavelet decomposition,

    prediction-error image, and neural network, International Conference on Multimedia and

    Expo, Amsterdam, Netherlands, 2005

    7. J. Fridrich, Feature-based steganalysis for JPEG images and its implications for future de-

    sign of steganographic schemes, 6th Information Hiding Workshop, Toronto, ON, Canada,

    2004

    8. K. Sullivan, U. Madhow, S. Chandrasekaran, and B. S. Manjunath, Steganalysis of Spread

    Spectrum Data Hiding Exploiting Cover Memory, the International Society for Optical En-

    gineering, Electronic Imaging, San Jose, CA, USA, 2005

    9. Y. Q. Shi and H. Sun, Image and Video Compression for Multimedia Engineering: Funda-

    mentals, Algorithms and Standards, CRC press, 1999

    10. A. Leon-Garcia, Probability and Random Processes for Electrical Engineering, 2nd Edi-

    tion, Addison-Wesley Publishing Company, 1994

    11. C. J. C. Burges. A tutorial on support vector machines for pattern recognition, Data Min-

    ing and Knowledge Discovery, 2(2):121-167, 1998

    12. C. C. Chang and C. J. Lin, LIBSVM: a library for support vector machines, 2001.

    http://www.csie.ntu.edu.tw/~cjlin/libsvm

    13. http://www.outguess.org/

    14. http://wwwrn.inf.tu-dresden.de/~westfeld/f5.html

    15. http://redwood.ucdavis.edu/phil/papers/iwdw03.htm

    16. M. Kharrazi, H. T. Sencar, N. D. Memon, Benchmarking steganographic and steganalysis

    techniques, Security, Steganography, and Watermarking of Multimedia Contents 2005, San

    Jose, CA, USA, 2005