Top Banner
JPEG steganography detection with Benfords Law Panagiotis Andriotis * , George Oikonomou, Theo Tryfonas Crypto Group, University of Bristol, Faculty of Engineering, Merchant Venturers Building, Woodland Road, Bristol BS8 1UB, UK article info Article history: Received 29 October 2012 Received in revised form 23 January 2013 Accepted 25 January 2013 Keywords: Steganalysis Generalized Benfords Law Steganography detection Data hiding Quantized DCT coefcients abstract In this paper we present a novel approach to the problem of steganography detection in JPEG images by applying a statistical attack. The method is based on the empirical Ben- fords Law and, more specically, on its generalized form. We prove and extend the validity of the logarithmic rule in colour images and introduce a blind steganographic method which can ag a le as a suspicious stego-carrier. The proposed method achieves very high accuracy and speed and is based on the distributions of the rst digits of the quantized Discrete Cosine Transform coefcients present in JPEGs. In order to validate and evaluate our algorithm, we developed steganographic tools which are able to analyse image les and we subsequently applied them on the popular Uncompressed Colour Image Database. Furthermore, we demonstrate that not only can our method detect steganography but, if certain criteria are met, it can also reveal which steganographic algorithm was used to embed data in a JPEG le. ª 2013 Elsevier Ltd. All rights reserved. 1. Introduction The use of several means of covert communication is appealing among individuals or groups that are interested in securing the content of an exchange concealing the act of their interactions. Steganography is one of the methods which have been introduced in order to hide information and covertly spread hidden data through public channels without causing suspicion. JPEG images constitute a widely used medium of secret communication, partially thanks to the fact that they can be produced by any camera, smart- phone or image processing tool and can be easily exchanged between a variety of applications (McBride et al., 2005). Steganography aims to transport a message in a hidden fashion by embedding it in a transport medium called a carrier (Fridrich et al., 2001). The grouping of the carrier with the secret message is known as a stego medium or stego cover. The detection of steganographic algorithms and techniques can be a hard task, even more so if the se- cret data are encrypted with a stego key. Steganalysis is the process of attacking and breaking steganographic methods, either by simply detecting the presence of a se- cret message or by extracting and potentially destroying it (Chandramouli et al., 2004). The success of a steganalytic method can be quantied either by the accuracy of the prediction of a secret messages presence in a stego object or by the extraction of the hidden information. Steganalysis methods can be further classied into two broad cate- gories: targeted and blind (or universal). In targeted steg- analysis the attack is mounted against an already known embedding technique. Blind steganalysis methods aim to determine whether an object is carrying a hidden message, without any a-priori knowledge. When the stego carrier is a JPEG image steganalysis is prominently based on two approaches: visual and statistical attacks (Westfeld and Ptzmann, 2000; Jolion, 2001). Visual attacks demand long training steps and a signicant amount of resources. Statistical attacks are more resource- efcient and as a result, several can be found in the liter- ature (Chandramouli and Subbalakshmi, 2004). These are * Corresponding author. Tel.: þ44 117 33 15740; fax: þ44 117 33 15719. E-mail addresses: [email protected] (P. Andriotis), g.oikonomou@ bristol.ac.uk (G. Oikonomou), [email protected] (T. Tryfonas). Contents lists available at SciVerse ScienceDirect Digital Investigation journal homepage: www.elsevier.com/locate/diin 1742-2876/$ see front matter ª 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.diin.2013.01.005 Digital Investigation 9 (2013) 246257
12

JPEG steganography detection with Benford's Law

Apr 21, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: JPEG steganography detection with Benford's Law

e at SciVerse ScienceDirect

Digital Investigation 9 (2013) 246–257

Contents lists availabl

Digital Investigation

journal homepage: www.elsevier .com/locate/di in

JPEG steganography detection with Benford’s Law

Panagiotis Andriotis*, George Oikonomou, Theo TryfonasCrypto Group, University of Bristol, Faculty of Engineering, Merchant Venturers Building, Woodland Road, Bristol BS8 1UB, UK

a r t i c l e i n f o

Article history:Received 29 October 2012Received in revised form 23 January 2013Accepted 25 January 2013

Keywords:SteganalysisGeneralized Benford’s LawSteganography detectionData hidingQuantized DCT coefficients

* Corresponding author. Tel.: þ44 117 33 15740; faE-mail addresses: [email protected] (P. And

bristol.ac.uk (G. Oikonomou), [email protected]

1742-2876/$ – see front matter ª 2013 Elsevier Ltdhttp://dx.doi.org/10.1016/j.diin.2013.01.005

a b s t r a c t

In this paper we present a novel approach to the problem of steganography detection inJPEG images by applying a statistical attack. The method is based on the empirical Ben-ford’s Law and, more specifically, on its generalized form. We prove and extend the validityof the logarithmic rule in colour images and introduce a blind steganographic methodwhich can flag a file as a suspicious stego-carrier. The proposed method achieves very highaccuracy and speed and is based on the distributions of the first digits of the quantizedDiscrete Cosine Transform coefficients present in JPEGs. In order to validate and evaluateour algorithm, we developed steganographic tools which are able to analyse image filesand we subsequently applied them on the popular Uncompressed Colour Image Database.Furthermore, we demonstrate that not only can our method detect steganography but, ifcertain criteria are met, it can also reveal which steganographic algorithm was used toembed data in a JPEG file.

ª 2013 Elsevier Ltd. All rights reserved.

1. Introduction

The use of several means of covert communication isappealing among individuals or groups that are interestedin securing the content of an exchange concealing the act oftheir interactions. Steganography is one of the methodswhich have been introduced in order to hide informationand covertly spread hidden data through public channelswithout causing suspicion. JPEG images constitute a widelyused medium of secret communication, partially thanks tothe fact that they can be produced by any camera, smart-phone or image processing tool and can be easilyexchanged between a variety of applications (McBrideet al., 2005).

Steganography aims to transport a message in a hiddenfashion by embedding it in a transport medium calleda carrier (Fridrich et al., 2001). The grouping of the carrierwith the secret message is known as a stego medium or

x: þ44 117 33 15719.riotis), [email protected] (T. Tryfonas).

. All rights reserved.

stego cover. The detection of steganographic algorithmsand techniques can be a hard task, even more so if the se-cret data are encrypted with a stego key. Steganalysis isthe process of attacking and breaking steganographicmethods, either by simply detecting the presence of a se-cret message or by extracting and potentially destroying it(Chandramouli et al., 2004). The success of a steganalyticmethod can be quantified either by the accuracy of theprediction of a secret message’s presence in a stego objector by the extraction of the hidden information. Steganalysismethods can be further classified into two broad cate-gories: targeted and blind (or universal). In targeted steg-analysis the attack is mounted against an already knownembedding technique. Blind steganalysis methods aim todetermine whether an object is carrying a hidden message,without any a-priori knowledge.

When the stego carrier is a JPEG image steganalysis isprominently based on two approaches: visual and statisticalattacks (Westfeld and Pfitzmann, 2000; Jolion, 2001).Visual attacks demand long training steps and a significantamount of resources. Statistical attacks are more resource-efficient and as a result, several can be found in the liter-ature (Chandramouli and Subbalakshmi, 2004). These are

Page 2: JPEG steganography detection with Benford's Law

P. Andriotis et al. / Digital Investigation 9 (2013) 246–257 247

based on the fact that the images’ histograms or high orderstatistics get modified after the steganographic techniquestake place. Modern blind steganalytic schemes engage su-pervised learning to differentiate between the plain mediaand stego images and also distinguish the data hidingalgorithm used for steganography (Solanki et al., 2007).

Benford’s empirical law of anomalous numbers(Benford, 1938) has been successfully used in the past forfraud detection in the accountancy sector. It has also beendemonstrated that the law in a generalized form can beemployed to perform a series of forensic tasks on JPEGimages, such as the detection of double compression (Fuet al., 2007). This work was limited to grey scale imageshowever. The generalized Benford’s Law has beenemployed for steganalysis elsewhere (Zaharis et al., 2011),but there it was applied on raw byte values and not from animage analysis perspective.

In this context, this paper’s contribution is two-fold:

� We adopt the generalized Benford’s Law as the basis ofa novel statistical attack for blind steganalysis and weprovide evidence of its applicability on colour JPEGimages.

� We demonstrate that the attack can perform steg-analysis very quickly and achieves a satisfactory detec-tion rate.

The proposed attack is based on an analysis of thequantized coefficients of a large amount of colour images.Our method indicates that it is possible to predict thebehaviour of the distributions of their significant digits andany disturbances of these distributions can then be con-sidered an indication of the presence of steganography. Bystudying the deviations of their distributions, we proposea decision making model based on our findings related tothe behaviour of digit 2. Moreover, we developed a set ofautomated tools which implement the attack and can beused to conduct blind steganalysis and thus help forensicanalysts to identify suspicious colour JPEG images. In orderto validate the method and assess its performance, we usedit to analyse files taken from a widely-used database ofapproximately 1340 images, enriched by our own set cre-ated by the use of a smartphone. Our analysis includescomparative evaluation with the open source steganalysissoftware Stegdetect.

The rest of this paper is organized as follows. In Section2, we highlight our main motivations and discuss sometheoretical background. In Section 3 we present our newdetection algorithm. The experimental results are providedin Section 4 and the discussion on the results can be foundin Section 5. Finally, in Section 6 we present results fromtesting our method in various steganalytic tasks. The con-clusion is drawn in Section 7.

2. Theoretical background and motivation

The term JPEG comes from the consortium that createdthe standard (Joint Photographic Experts Group). It is one ofthe most common formats and it is widely used by allthe manufacturers of digital consuming products such as

digital cameras. It comes from the need to exchange imagesthrough different platforms and applications. The maingoal of the JPEG compression is to discard informationwhich is imperceptible to the human eye while leavingunchanged the aesthetic details of the image. Simulta-neously, the JPEG compression reduces image data size. Adetailed presentation of the procedure followed in order tocompress a data stream with the JPEG standard can befound in Wallace (1992). Usually the Discrete CosineTransform (DCT) encoding procedure consists of six basicsteps: Conversion of the representation of colours fromRGB (Red, Green, Blue) to YCbCr, downsampling of thechrominance values (usually by a factor of two), trans-formation of values to frequencies (using 8 � 8 pixelblocks), quantization process, zigzag ordering, losslesscompression using a variant of Huffman encoding.

In more detail, an image consists of pixels and each pixelusually has three bytes that represent its three basic colourcomponents: Red, Green and Blue. The first step to the JPEGencoding procedure is to convert these pixel values fromRGB to YCbCr which is another colour space that has threecomponents. Y represents the brightness of an image and iscalled luminance while Cb and Cr represent colours andthey are called chrominance. It is known that the humaneye can recognize the difference in the luminance of animage more easily than the chrominance coefficients (Leeet al., 2006). The type II DCT is responsible for the quanti-zation process. DCT is a mathematical transformation (usescosine functions) that converts the pixel values of 8 � 8blocks to blocks of 64 frequency coefficients. These num-bers are critical for our method.

A digital image and especially a JPEG image can bea perfect cover medium because it usually has largeamounts of space where one can embed information. Thereare numerous factors that result in a successful embeddingprocedure such as the embedding technique and the coverimage characteristics (McBride et al., 2005). A generalassumption is that the image should be busy, meaning thatit should lack large areas of similarities. Popular techniquesused to hide information in images are the Least SignificantBit (LSB) and the DCT encoding. Embedding techniquesfocus on the quantized DCT coefficients and they usuallyembed data by applying LSB encoding in those coefficientsthat are not equal to zero. In McBride et al. (2005) we canfind a list of tools that use the quantized DCT coefficients toembed data in JPEG images. They rely on the fact that theprocedureswhich follow thequantizationphase are losslessand the hidden information can then be obtained. Indicativealgorithms from this category are Jsteg, Outguess, JPHideand F5. Those techniques introduce irregularities in thestatistics of the quantized DCT coefficients of a colour JPEGimage. Our goal is to reliably detect such irregularities.

Statistical attacks aim to determine whether the exam-ined data comply with specific statistical rules that normalimage files would follow. A very popular attack is the Chi-squared test which compares the statistical behaviour ofa suspected image with the theoretically expected prop-erties of its carrier (Westfeld and Pfitzmann, 2000). Histo-gram attacks, which can also be classified as statistical,depict disturbances in the distribution of the frequencies ofDCTcoefficients of a JPEG image. Thesefigures can reveal the

Page 3: JPEG steganography detection with Benford's Law

1 The upper left coefficient of each block.

P. Andriotis et al. / Digital Investigation 9 (2013) 246–257248

existence of a steganographic attempt. A comprehensiveand well informed work on steganalysis trends was pub-lished by Chandramouli and Subbalakshmi (2004).

Nowadays, machine learning techniques are common inthe field of steganalysis. These techniques are based onimage features which get altered during the embeddingprocess and machine learning is the de facto standardprocedure that deals with them utilising support vectormachines (SVM) and lately, ensemble classifiers (Zonget al., 2012; Kodovsky et al., 2012). The features constitutea model for natural, pure images which can be used againstthe suspected stego carriers. However, despite their accu-racy, these techniques are time consuming, they introduceextensive training steps and their complexity is high.For this reason we are implementing a new model basedon Benford’s Law and we introduce a method in orderto identify stego carriers in a fast, simple and efficientmanner.

2.1. Benford’s empirical law of anomalous numbers

The first attempt to decode the behaviour of the firstdigits in a set of natural numbers was conducted towardsthe end of the 19th century by Newcomb (1881). This notepresents a table which lists the probabilities of occurrenceof the first digits of a set of natural numbers. Numberscannot be zero and they have more than one digit. Fiftyyears later Benford (1938) rediscovered and restated thelaw. He investigated large groups of natural numbers andobserved that, in all selected groups, the probability of thefirst digit x of a number being 1 is higher than that of thefirst digit being 9. Furthermore, the distribution of theappearance of the first (or significant) digits in a set ofnatural numbers follows a logarithmic rule. Therefore:

Pðx ¼ 1Þ > Pðx ¼ 2Þ > . > Pðx ¼ 9Þ:The mathematical equation which describes the first

digits law is presented in Equation (1):

pðnÞ ¼ log10

�1þ 1

n

�; n ¼ 1;.;9: (1)

p(n) represents the probability of n being the first digit ofa number in a set of natural numbers. Sets should containas many numbers as possible in a random fashion. Thisempirical law is applicable to different groups of naturalnumbers such as population, addresses, drainage and deathrates.

According to this empirical view we are able to predictthat, in a set of natural numbers, it is more probable to findnumbers with the significant digit to be 1 than 8 or 9. Thislaw looks like it fights against common sense but it is nowwidely used in the area of expenses and accounting frauddetection and was also introduced in various social occa-sions. For instance, Schäfer et al. detected fraud and fakesurvey results using the Benford’s Law (Schäfer et al., 2004).The basic principle behind all examples is that natural datagenerally follow the first digit law quite well in contrastto maliciously changed or randomly guessed data. Someattempts to utilize the results of the findings of the loga-rithmic law can also be encountered in literature related to

image processing and digital forensics (Jolion, 2001; Fuet al., 2007; Pérez-González et al., 2007).

2.2. Generalized Benford’s Law

In 2007, Fu et al. presented a new approach to imageforensic analysis using the law of anomalous numbers andstudied in depth the behaviour of the JPEG image blockcoefficients (Fu et al., 2007). In this work there are someconclusions about the validity of Benford’s Law in the mostsignificant digits of DCT coefficients (before quantization)of the 8 � 8 pixel blocks of any grey scale JPEG image; theDC coefficients1 are excluded from the research. Experi-ments were conducted considering only 8 bit grey scalepictures, using as main reference a widely used dataset ofTIFF images called the Uncompressed Colour Image Database(UCID) (Schaefer and Stich, 2004). The use of such a data-base guaranteed that those images have never before beenJPEG compressed. They also examine the distribution of thefirst digits of the quantized DCT coefficients that emergeafter the quantization process. After completing the cal-culation of the appearance of significant digits of the DCTcoefficients in this set of images, their mean distributionwas obtained. The significant digits of DCT coefficientsconform quite well to the Benford’s Law, with goodness offit results confirmed by using x2 divergence.

By conducting thorough experiments on the same set ofimages, the authors also calculated themean distribution ofthe first digits of the quantized DCT coefficients underdifferent quality compression factors (QF ¼ 100, 90, 80, 70,60, 50). The results show that the distributions of thosecoefficients also follow a logarithmic trend. A comparisonbetween themean distributions that they obtained for eachcompression quality and the expected Benford’s Law dis-tributions revealed that the quantized coefficients do notfollow the rule Equation (1) in a very strict way as the DCTcoefficients do. However, there is also a logarithmic lawbehind the distribution of the first digits of the quantizedDCT coefficients. The model they proposed is described bythe following Equation (2):

pðnÞ ¼ N$log10

�1þ 1

sþ nq

�; x ¼ 1;2;.;9 (2)

N, s and q are parameters which describe precisely thosedistributions under different compression quality factors.In the special occasion where N ¼ 1, s ¼ 0 and q ¼ 1 we canconclude that Equation (2) which is called the generalizedBenford’s Law (gBL) (Fu et al., 2007), is equal to the Ben-ford’s Law Equation (1).

3. Method and algorithm

Our method focuses on the distributions of the signifi-cant digits which can be extracted from the quantized co-efficients of colour JPEG images. The decompression ofa JPEG image is exactly the inverted process of what wepresented in Section 2. In Section 2.2 we underlined that

Page 4: JPEG steganography detection with Benford's Law

P. Andriotis et al. / Digital Investigation 9 (2013) 246–257 249

the gBL was proposed by investigating 8 bit grey scaleimages only by Fu et al. (2007). Thus, only the luminance ofeach image was taken into consideration. For this reasonwe decided to investigate the behaviour of the quantizedDCT coefficients of all the components of a JPEG image;both luminance and chrominance. The investigation con-tributes to previous work by extending the results and bycreating a new reference as a basis to describe the expecteddistributions of the quantized DCT coefficients of a colourJPEG image. The knowledge of the compression quality iscritical at this phase. The compression quality factor can berevealed by looking at the image’s metadata. In our ex-periments we used the standard luminance and chromi-nance quantization tables, provided by the IndependentJPEG Group (IJG).

The basic steps of our method include the calculation ofthe appearance of the significant digits of the quantizedDCT coefficients of all the components of a colour JPEGimage. For example, if the first row of an 8 � 8 block ofcoefficients is [154 32119 2 0 0 0], the first digits are [x 3112 x x x] (154 is the DC coefficient and it is excluded and alsothe zeros are not taken into consideration). Then we esti-mate their expected distribution (given by Equation (2))and finally compare the deviations between the expectedand the calculated distributions. We use this information todecide if the image is suspicious or not. In some cases, thesame data can be used to determine exactly which steg-anography algorithmwas used to embed the hidden object.We analysed the behaviour of the digits using specificquality factor compressions: QF¼ 100, 90, 80, 75, 70, 60, 50.

In order to achieve this, we need amodel to represent thedistributions of the quantized DCT coefficients of any colourimage. This canbe feasible ifweprove thatEquation (2) is stilla reliablemodel that describes the probability of appearanceof the first digits of the quantized DCT coefficients of a JPEGimage, even if these were collected from all the componentsof the image; luminanceaswell as chrominance.Weused thesecond version of the UCID for this experiment which con-tains 1338 uncompressed TIFF images. A Matlab script waswritten to compress themwith different quality factors. Thescript used Matlab’s functions imread and imwrite andcompressed the images within seconds. As a result, weaccumulated seven groups of 1338 JPEG images that hadnever been compressed before. This step was vital for theaccuracy of our work because we were able to know thecompression history of each image. Secondly, we calculatedthe distributions of the first digits of the quantized DCT co-efficients. After this step themeandistributions for eachdigitwere calculated by Matlab. The algorithm that was used canbe described by the following pseudo code.

We estimated the goodness-of-fit of the generalizedBenford’s Law using the Matlab’s Curve Fitting Toolbox. Toavoid the calculation of any complex values fromMatlabwehad to define the boundaries of parameters N, s, q. The useof the curve fitting toolbox for all quality factors resulted inthe conclusion that gBL can describe the distributions ofthe appearance of quantized DCT coefficient first digits ofa colour JPEG image in a very satisfactory manner. Asa matter of fact, the statistics that Matlab provides to esti-mate the fitting results, show that the gBL describes themean distributions perfectly (R-Square ¼ 1, AdjustedR-square ¼ 1). Table 1 presents the values of parameters N,q, s for each quality factor. There is also a column whichrepresents the Sum of Squares Due to Error (SSE). SSE isanother fitting statistic that Matlab provides and Table 1shows that in our case this error is infinitely minor.

We are now able to calculate the expected distributionsof the appearance of the quantized DCT coefficients. Theidea behind this concept is that given the quantizationtable of the luminance of a JPEG image, we can obtain thecompression quality that was used during encoding. Af-terwards, we can calculate the distributions of appearanceof the coefficients and compare them with the expecteddistributions. We will be able to estimate the deviationsbetween the distributions (current and expected) anddecide if the JPEG image is suspicious or not. In our researchwe used the percentage of the deviations because it makesthe comparison between first digit distributions morereasonable. For example, digit 9’s distribution is alwaysbetween 1 and 2% and digit 1’s distribution can vary from55 to 60%. Their deviations should be measurable andcomparable and this is why we should use the % of de-viations as a common measurement system.

Subsequently, we measured the impact of steganog-raphy on these distributions. We chose random imagesfrom our seven sets (each set contained 1338 JPEG files)and we embedded data with JPHide, Outguess and Vsl.JPHide and Outguess hide data in the quantized DCT co-efficients. We then calculated the deviations of the distri-butions of the first digits of the quantized DCT coefficientsfor each potential stego carrier. By doing this, we gaineda clear picture of the consequences that these algorithmscause to the distributions of the first digits. Table 2 showsthe % deviations caused to an image when we embeddeda text file with JPHSWIN and their (absolute) difference.

At this stage we tried to verge on the issue of findinga reliable indicator that could safely reveal the suspiciousimage. We focused our interest on the deviations of thedistributions of the first digits of the quantized coefficients

Table 1Goodness of fit for the gBL model for luminance and chrominance.

Qualityfactor

N q s Goodness-of-fit(SSE)

100 1.608 1.605 0.0702 5.129e-0690 1.25 1.585 �0.405 7.235e-0780 1.344 1.685 �0.376 3.007e-0675 1.396 1.731 �0.3549 3.986e-0670 1.434 1.766 �0.339 4.455e-0660 1.514 1.843 �0.3114 5.464e-0650 1.584 1.909 �0.2875 5.119e-06

Page 5: JPEG steganography detection with Benford's Law

P. Andriotis et al. / Digital Investigation 9 (2013) 246–257250

of pure images and stego carriers. The stego carriers werecreated by the same pure images but they also containedmessages (in .txt format) which were embedded byJPHSWIN. We carefully examined about 480 images com-pressed with different quality factors. Fig. 1 illustrates de-viations of first digit distributions for images compressedwith quality factor 75. The solid line indicates deviationsthat emerged from the inspection of pure images and thedashed line indicates the deviations for the same imagesafter applying steganography on them. The horizontal axisof the preceding figures represents the images we exam-ined and the vertical axis states the percentage (%) of thedeviations of the distributions of the examined digit. Theoverview we got by examining the figures we formed fromimages that were compressed by various quality factorswas similar to what we can see on Fig. 1.

The analysis of the difference in deviations of the dis-tributions of pure images and their respective stego carriesreveals that the differences are inmost cases larger than 5%.Furthermore, we observe that differences in deviations aremore extreme for digits 2, 4, 6 and 8. Fig. 1b further revealsa characteristic of digit 2 that no other digit seems to have.When we examined pure images the deviations of digit 2were very stable. The range of these deviations was quiteconvenient and usually varied from 0 to 3 or 4%. Exceptfrom that, the deviations of digit 2 after the embedding ofa message on the same images behaved in a similar fashion,but this time the deviations exceed the 4% threshold. Wecannot see the same attitude from the digit 1 for example(Fig. 1a). Here, the deviations are within a small range butwe can see that the two lines do not have the samebehaviour compared to the two lines of Fig. 1b. In Fig. 1bwecan see that the solid line is almost always below thedashed line. Thus, in most of the cases, we expect that animage which contains a hidden message will present de-viations which are higher than a certain threshold T. In thespecific example, a suitable threshold would be T ¼ 3. Itbecomes more interesting if we underline that thethresholds for all examined quality factors vary between 3and 4. Taking these findings into consideration we con-cluded that the most stable and reliable indicator fora suspicious image to be revealed is the deviation of digit 2.If this deviation exceeds a specific threshold, which de-pends on the quality factor of the compression of theexamined image, we can conclude that the image is sus-picious. We approximated these values statistically for eachcompression quality and present them in Table 3.

Table 2Difference between deviations of distributions in a pure image and a stegocarrier.

Firstdigits

Deviations(pure)

Deviations(stego)

jDifferencej

1 5.117569 0.947102 4.1704672 0.678150 9.001642 8.3234923 8.373005 10.988395 2.615394 9.832138 1.039585 8.7925535 3.874760 4.447051 0.5722916 9.937180 1.376626 8.5605547 14.700152 17.820417 3.1202658 8.818516 1.664183 7.1543339 14.713667 13.687573 1.026094

We should underline at this point that we did notmanage to verify the accordance of the previous resultswith images that were compressed with QF ¼ 100. Asa matter of fact, both deviations and differences betweenthe pure images and stego carriers seem like they do notfollow any rule that complies with our findings for theother quality factors. This phenomenon occurs becausewhen compressing with quality factor 100, the quantiza-tion tables have a very weak effect on the first digits.

We repeated the same tests to JPEG images using Out-guess and Vsl as the embedding algorithms. We analysedthe data using the same methodology and discovered thatthe impact of steganography on the distributions of the firstdigits was significant. Often the difference between theexpected and the given deviations was more than 70%. Wealso confirmed that the deviations of digit 2 were smoothand the thresholds of Table 3 were sufficient and capable todetect a suspicious JPEG image. A closer look at the effectsof the application of steganography with Outguess and Vslrevealed that both algorithms change the image quantiza-tion tables when they embed a message into their internalstructure. Outguess always uses the quality factor of 75 andVsl always quantizes with QF ¼ 100. Consequently, theexpected distributions of the inspected images are sig-nificantly different than the observed ones. Our researchrevealed that Outguess leaves the quantization table ofquality 75 as a fingerprint or signature. The same goes forVsl which turns the quality factor of the stego carriers to100. In other words, the metadata of a stego carrier createdby Outguess or Vsl will always indicate that the qualityfactor used to quantize the block coefficients is QF ¼ 75 orQF ¼ 100, respectively. We used these fingerprints whenwe built the decisionmakingmodule of our programs. If wetry to investigate an image which has a quality factor of 75or 100 and the deviation of digit 2 is really large, we candeduce that Outguess or Vsl was used, respectively.

The research on the behaviour of the first digits of thequantized DCT coefficients of colour JPEG images and theanalysis of the data we gathered from their distributionsand deviations resulted in the development of a new uni-versal steganalytic tool which we called StegBennie. Thistool uses the characteristics of the distributions of digit 2and it is a new approach to the problem of steganalysis ofcolour JPEG images. StegBennie applies a statistical attackon a JPEG image using the generalized Benford’s Law andestimates whether it is a suspicious image or not. It is anextension of the first tool we developed which wasresponsible to collect data from the images and calculatethe distributions and their deviations from the expected.We call the latter tool compBennie. The next section dis-cusses the results we obtained when we tested the newsteganalytic method using the UCID and also using a newdataset created by a smartphone.

4. Experiments and results

We evaluated the accuracy of our method in threestages. Firstly, we calculated the algorithm’s false positiverate (FPR). Thenwe tested the validity of the method on thetraining set and finally on a set of images taken bya smartphone. Furthermore, we tested the efficiency of our

Page 6: JPEG steganography detection with Benford's Law

0

2

4

6

8

10

12

1 5 9 13 17 21 25 29 33

Dev

iati

ons

(%) d

igit

1

Jpeg images

Pure

JPHSWIN

Digit 1

02468

101214

1 5 9 13 17 21 25 29 33

Dev

iati

ons

(%) d

igit

2

Jpeg images

Pure

JPHSWIN

Digit 2

02468

10121416

1 5 9 13 17 21 25 29 33

Dev

iati

ons

(%) d

igit

3

Jpeg images

Pure

JPHSWIN

Digit 3

0

5

10

15

20

1 5 9 13 17 21 25 29 33Dev

iati

ons

(%) d

igit

4

Jpeg images

PureJPHSWIN

Digit 4

0

5

10

15

20

1 5 9 13 17 21 25 29 33

Dev

iati

ons

(%) d

igit

5

Jpeg images

Pure

JPHSWIN

Digit 5

0

5

10

15

20

25

30

1 5 9 13 17 21 25 29 33

Dev

iati

ons

(%) d

igit

6

Jpeg images

Pure

JPHSWIN

Digit 6

0

5

10

15

20

25

30

1 5 9 13 17 21 25 29 33

Dev

iati

ons

(%) d

igit

7

Jpeg images

Pure

JPHSWIN

Digit 7

0

10

20

30

40

50

60

1 5 9 13 17 21 25 29 33

Dev

iati

ons

(%) d

igit

8

Jpeg images

Pure

JPHSWIN

Digit 8

0

10

20

30

40

50

1 5 9 13 17 21 25 29 33

Dev

iati

ons

(%) d

igit

9

Jpeg images

Pure

JPHSWIN

Digit 9

(a) Deviations of digit 1 (b) Deviations of digit 2 (c) Deviations of digit 3

(d) Deviations of digit 4 (e) Deviations of digit 5 (f) Deviations of digit 6

(g) Deviations of digit 7 (h) Deviations of digit 8 (i) Deviations of digit 9

Fig. 1. Deviations of first digits for quality factor 75.

P. Andriotis et al. / Digital Investigation 9 (2013) 246–257 251

steganalytic tool (StegBennie) which utilizes the proposedmethod against a popular steganography detection tool(Stegdetect).

4.1. False positive rate

During the first phase we calculated the percentage ofpure images that will be erroneously identified as suspi-cious (False Positive Rate – FPR). To achieve this goal weanalysed all the pure images that were available to us (7folders containing 1338 colour images each). Table 4gathers the results from this procedure and also displaysthe time our tools needed to analyse each folder. Fig. 2

Table 3Threshold for quality factors.

QF Threshold

50 4.0060 4.0070 4.3575 3.0080 3.1190 2.90

demonstrates the results of Table 4 in a chart. The verticalaxis represents the % percentage of fault estimation foreach quality factor. Images assessed as suspicious are pre-sented in black colour. Apparently, about one image out ofthree or four will be considered as suspicious despite thefact that it will be clear.

4.2. Hit rates

The next step of the evaluation of the steganalytic abilityof our method was to examine the percentage of maliciousimages successfully identified. For this task we used the

Table 4The false positive rate (FPR) of our method.

QF Suspicious FPR Processingtime

50 444 33.18% 13 s60 378 28.25% 18 s70 243 18.16% 19 s75 398 29.75% 20 s80 323 24.14% 21 s90 473 35.35% 23 s

Page 7: JPEG steganography detection with Benford's Law

0%10%20%30%40%50%60%70%80%90%

100%

q50 q60 q70 q75 q80 q90

Iden

tifi

ed a

s su

spic

ious

Sets of pure jpeg images

Suspicious Clear

Fig. 2. False positive rate for pure images.

0

20

40

60

80

100

QF50 QF60 QF70 QF75 QF80 QF90

Iden

tifi

ed (

%) a

s su

spic

ious

Sets of suspicious jpeg images

JPHSWIN

Outguess

Vsl

Fig. 3. Comparison of the hit rates for JPHSWIN, Outguess and Vsl.

Table 6Effectiveness of algorithm identification.

QF Identified Outguess Identified Vsl

50 100 10060 100 10070 0 10075 0 6080 0 6090 0 100

P. Andriotis et al. / Digital Investigation 9 (2013) 246–257252

images that are referred in Section 3. We had six folderswith 204 randomly picked pure images and six folders withthe same images containing hidden data, embedded withJPHSWIN, Outguess or Vsl. Table 5 aggregates the findingsof this experiment. The second column of Table 5 demon-strates the false positive rate and the other columns showthe hit rate of the method. In other words, columns 3, 4 and5 illustrate the percentage of suspicious images flagged asmalicious. Also, at Fig. 3 we provide a graphical depiction ofthe contents of Table 5. The vertical axis shows the % per-centage of recognition of stego carriers. Lastly, Table 6presents the number of images (%) that were identified assuspicious and declared as maliciously altered by eitherOutguess or Vsl.

In the final stage of our experiments we used a smart-phone (HTC Desire). The device uses the same quantizationtables we used previously in our research, the standardizedIJG tables. The quality compressions of its camera are three;‘Fine’ stands for QF ¼ 90, ‘High’ stands for QF ¼ 80 and‘Normal’ for QF¼ 70. It also provides the opportunity to theuser to decide about the resolution and the format(‘widescreen’ or ‘standard’) of the image. For this experi-ment we used about 150 images of different resolutions.The ‘small’ resolution was similar to the resolution that theUCID images had. The characteristics of the images wetested that had different resolutions can be seen at Table 7.Note that we also tested the accuracy of the method fora set of JPEG images with a different format than thestandard (‘widescreen’).

4.3. Tests with real data

Here we used the same approach as described in theprevious steps. Firstly, we measured the false positive rate.

Table 5Hit rates.

QF FPR JPHSWIN Outguess Vsl

50 24.47 76.47 100 10060 24.47 73.53 100 10070 2.94 82.35 80 10075 20.59 85.29 20 10080 29.41 73.53 100 8090 11.76 67.65 100 100

Table 8 illustrates the results and Fig. 4 demonstrates thefindings graphically. The vertical axis represents the per-centage of false positive results.

After acquiring good results for the false positive rateof the method, we embedded data in the images we tookwith the smartphone. We used again the same algorithmsfor this task; JPHSWIN, Outguess and Vsl. The resultsthat came from this experiment are displayed at Table 9.Fig. 5 concatenates the results of Table 9. Again, thevertical axis stands for the percentage of successful rec-ognition of stego carriers. Table 10 shows in which casesthe method we used was able to identify the embeddingalgorithms.

Lastly, we used the same image sets to compare thespeed of our steganalytic tool and its accuracy in identify-ing suspicious images against the popular JPEG steg-analyzer Stegdetect.2 We analysed 151 smartphone purephotos with both algorithms and the results of their per-formance can be seen in Table 11 (time is calculated inseconds).

For the needs of the second experiment we used thephotos described in Table 9 (subset of the aforementioned151). After grouping the ‘small’ and ‘1Mp’ photos in ninesets of stego carriers, we ran the two tools to evaluate theirsteganalytic ability. The ‘small’ and ‘1Mp’ photo sets werechosen because they have the same size properties asthe images in the training set. The results from the latterexperiment are concatenated in Table 12 and presentedin Fig. 6.

2 http://www.outguess.org/detection.php.

Page 8: JPEG steganography detection with Benford's Law

Table 7Different formats of real data.

Folders Pixels

small 640 � 4801Mp 1280 � 9603Mp 2048 � 15365Mp 2592 � 1952wide1Mp 1280 � 768

0

10

20

30

40

50

60

70

80

90

100

small 1Mp 3Mp 5Mp wide1Mp

Iden

tifi

ed (

%)

as s

uspi

ciou

s

Sets of pure jpeg images (smartphone)

Normal

High

Fine

Fig. 4. Comparison of false positive rate among the different quality factors.

P. Andriotis et al. / Digital Investigation 9 (2013) 246–257 253

5. Discussion of results

5.1. The false positive rate results

Looking back at Subsection 4.1 we conclude by thedemonstrated results of Table 4 and Fig. 2 that the falsepositive rate (FPR) of our method is acceptable. We believethat the current FPR is a satisfactory percentage that couldreduce the workload of a forensic examiner who performssteganalysis to JPEG images. Furthermore, if the inspectedimages are compressed with a quality factor of 70, the faultrate of the method is lower than 20%. The last column ofTable 4 illustrates the approximate time in seconds that oursteganalytic tool StegBennie needed in order to analyse thewhole folder that was under examination. Each foldercontained of 1338 JPEG images at a 512 � 318 resolution.The steganalytic tool is fast regardless of the compression’squality factor. As a consequence we proved that a combi-nation of the method and a fast program like StegBenniecan perform a trustworthy steganalysis to a folder con-taining JPEG images in less than half a minute.

5.2. Analysis of successful detection on the UCID set

After the embedding of text and doc files in variousimages, we presented Table 5 in Subsection 4.2. The con-clusions that arise by examining this table and Fig. 3 arequite satisfactory. It seems that the ability of our methodsto detect a suspicious JPEG image could be characterized asfairly strong. Moreover, the fault rate of our tool does notexceed the limits we saw at Table 4. The hit rates forJPHSWIN confirm that about 4 out of 5 malicious images

Table 8FPR on real data.

QF Resolution Examinedimages

FPR (%)

Normal QF ¼ 70 small 9 11.111Mp 10 03Mp 9 11.115Mp 8 12.5wide1Mp 9 0

High QF ¼ 80 small 10 10.01Mp 9 11.113Mp 10 20.05Mp 8 37.5wide1Mp 10 0

Fine QF ¼ 90 small 10 301Mp 19 15.793Mp 9 44.445Mp 10 30.0wide1Mp 11 27.27

will be successfully detected. At this point we should statethat the results for Outguess and Vsl come from the ex-amination of a small sample of images. The unsatisfactoryhit rate for images which are compressed with QF ¼ 75comes from the fact that Outguess always uses this specificquality factor during the embedding of data into JPEGimages.

However, in Table 6 we can see that the number ofimages (%) that were identified as suspicious and declaredas maliciously altered by either Outguess or Vsl is at a verygood level considering images that were altered usingQF ¼ 50 or 60. Also, it seems that in most cases we are ableto identify algorithms such as Vsl. From this table wededuce that for images with QF > 70 the identification ofOutguess is unlikely to happen. As an overview the steg-analytic ability of the method we introduce on the pseudotraining image set is at a very good level considering thegraphs and the tables that were presented.

5.3. Results for the smartphone images

At the final phase of our tests we examined a set ofimages captured by a smartphone (Subsection 4.3). Bytaking a closer look at Table 8 we can reach the conclusionthat, generally, our initial assumptions that the methodwe introduce will successfully detect more than two

Table 9Hit rates for real data.

QF Resolution JPHSWIN Outguess Vsl

Normal QF ¼ 70 small 88.89 77.78 100.01Mp 90.0 75.0 100.03Mp 55.55 75.0 100.05Mp 33.33 87.5 100.0wide1Mp 66.67 50.0 100.0

High QF ¼ 80 small 100.0 100.0 100.01Mp 66.67 100.0 100.03Mp 50.0 100.0 100.05Mp 50.0 71.43 100.0wide1Mp 60.0 100.0 100.0

Fine QF ¼ 90 small 100.0 100.0 100.01Mp 66.67 90.0 100.03Mp 55.55 87.5 100.05Mp 40.0 100.0 100.0wide1Mp 72.73 90.0 100.0

Page 9: JPEG steganography detection with Benford's Law

0102030405060708090

100

small 1Mp 3Mp 5Mp w1Mp

Iden

tifi

ed (

%) a

s su

spic

ious

Sets of suspicious images

NormalHighFine

0

20

40

60

80

100

small 1Mp 3Mp 5Mp w1Mp

Iden

tifi

ed (

%)

as s

uspi

ciou

s

Sets of suspicious images

NormalHighFine

0

20

40

60

80

100

small 1Mp 3Mp 5Mp w1Mp

Iden

tifi

ed (

%)

as s

uspi

ciou

s

Sets of suspicious images

NormalHighFine

Fig. 5. Success of detection of a stego carrier.

P. Andriotis et al. / Digital Investigation 9 (2013) 246–257254

thirds of clear images are correct. When examining im-ages compressed with normal and high quality, the falsepositive rate was very low. Furthermore, there are casesthat the accuracy of the method was exceptional (e.g. atnormal quality when we examined images with resolu-tion of 1Mp, the false positive rate was 0% either atstandard format or at widescreen format). However, thefalse positive rate becomes larger when the quality factorof compression is 90. Results also demonstrate that theimpact of the size of the image cannot affect dramaticallythe validity of the method. More experiments must beconducted to form a clear picture about the false positiverate of the method for images with a different format thanthe standard (e.g. ‘widescreen’). An initial attempt toexamine the disturbances on 1Mp images shows that theoutcome will not be different if the image has a wide-screen format. Fig. 4 visualizes the findings of Table 8. Ourtool can detect quite precisely clear images which arecompressed with normal quality (the false positive ratedoes not exceed 12.5%). The efficiency of our methods forthe set of images in fine quality is not at very goodstandards compared to normal or high quality. However,in small sizes the percentage of the fault detection is fairlysatisfactory.

The hit rates presented in Table 9 and Fig. 5 demonstratethe algorithm’s correctness and efficiency. The first infer-ence is the accuracy of detecting a stego carrier which wasmade by Vsl. This algorithm (Vsl) always causes criticaldeviations of digit 2, making detection by our tool a trivialtask. StegBennie was able to correctly identify the use of Vslin almost any case (Table 10).

We can also conclude that our method can adequatelyidentify suspicious images up to 1Mp. For larger imagescontaining secret data embedded by JPSWIN, the proba-bility of correct detection drops. However, we were able toidentify larger malicious images with a good hit rate when

Table 10Identification (%) of embedding algorithm.

QF70 QF80

s 1Mp 3Mp 5Mp w s 1Mp

Outguess 0 0 0 0 0 0 0Vsl 100 100 89 100 89 100 89

these images were manipulated by Outguess or Vsl. Recallthat, in order to embed secret information, these two al-gorithms change the quantization tables of the originalimage. Thus, high detection rates are also very likely to beachieved if we examine stego carriers created by other al-gorithms that use the same technique (alteration of quan-tization tables).

The last remark has to do with the resolution of theimages. Fig. 5a shows that hit rates decrease when theresolution of the image rises. However, if the format of animage changes (at the current experiment the standardformat became widescreen format), the differences of thedetection ability do not change dramatically. More experi-ments should be conducted to prove the accuracy of thisassumption.

Table 11 validates the fact that our steganalytic toolwhich uses the proposed method, is twice faster thanStegdetect. In this phase we tested the tools with pureimages produced by a smartphone. StegBennie’s falsepositive rate is larger than Stegdetect’s but we can stilladvocate for the capability of the method to efficientlydistinguish 4 out of 5 pure images and thus reduce theinvestigator’s workload. Table 12 confirms the fact that wecan perform faster analysis of a folder which contains JPEGphotos using StegBennie. Moreover, Fig. 6 depicts that oursteganalytic tool was more accurate than Stegdetect whenperformed steganalysis on the same sets of stego carriers.Stegdetect was not able to identify Vsl because the lattersteganographic tool (Vsl) is younger. Furthermore, Steg-detect did not manage to identify any stego carrier com-pressed with QF80 or QF90 despite the fact that it trackedabout 30% of stegos compressed with QF70. On the otherhand our steganalytic tool was able to accurately identifystego carriers produced by Outguess and Vsl.

To conclude, we illustrated that our method proved tobe reliable when it was tested with real data. The results we

QF90

3Mp 5Mp w s 1Mp 3Mp 5Mp w

0 0 0 10 0 0 0 088 90 63 100 100 100 100 100

Page 10: JPEG steganography detection with Benford's Law

Table 11Comparison of steganalysis elapsed time and false positive rates betweenStegBennie and Stegdetect (smartphone photos).

Performance StegBennie Stegdetect

Time 19.037 s 38.222 sFPR 17.22% 7.95%

Table 12Steganography detection elapsed time for StegBennie and Stegdetect. Eachcell represents the total time taken to process the entire category.

Stego carriers StegBennie Stegdetect

JPH70 (19 photos) 0.892 s 1.712 sJPH80 (19 photos) 0.868 s 1.48 sJPH90 (19 photos) 0.912 s 1.192 sOUT70 (17 photos) 0.724 s 1.392 sOUT80 (19 photos) 0.852 s 1.7 sOUT90 (20 photos) 0.908 s 1.756 sVSL70 (19 photos) 1.108 s 2.12 sVSL80 (19 photos) 1.128 s 2.156 sVSL90 (21 photos) 1.188 s 2.316 s

Table 13Detection rates for modified JPEG images.

Process Images Detection rate

Double JPEG QF70 / QF70 20 5%QF70 / QF90 20 100%QF80 / QF80 20 10%QF80 / QF90 20 100%

Crop images 10 10%Modify images 10 20%JPEG compress

PNG, BMPimages

12 BMP: 83.3%PNG: 33.3%

P. Andriotis et al. / Digital Investigation 9 (2013) 246–257 255

gathered are similar to those we saw at the training step inSection 4 and the comparison between our tool and thepopular Stegdetect showed that StegBennie is faster andachieves better results.

6. Other steganalytic tasks

In this section we will discuss the results occurred bytesting the ability of our methods to perform varioussteganalytic tasks. Not only are forensic examiners inter-ested in detecting images with hidden data, but they alsoneed to know if an image has been double compressed,cropped, blurred and generally if the image they examinehas been modified. In order to meet the needs of thisexperiment we used the image set presented in Subsection4.3. These are JPEG images captured by a smartphone. Weused the ‘small’ and ‘1Mp’ folders which contained imagescompressed with quality factor 70 and 80. We made thischoice because of the low false positive rate of these images

0 10 20 30 40 50 60 70 80 90

100

JPH70 JPH80 JPH90 OUT70 OU

Succ

essf

ul D

etec

tion

(%

)

Sets of stego carriers com

StegBennie

Fig. 6. Steganography detection ability of

(Table 8). We used the open source image processing toolGIMP 2.0 for Windows to JPEG compress the images fora second time. GIMP also helped us crop, blur and changetheir settings and colours. Finally, we JPEG compresseda small set of BMP and PNG images which were taken fromthe internet, to see if this procedure can reveal the com-pression history of an image. The results of this experimentcan be seen at Table 13.

Table 13 shows that it was not possible to detect thedouble-compression of a JPEG image when both com-pression passes used the same quality factor. However, wecan flag as suspicious every JPEG image that was doublecompressed with different quality factors. On the contrary,our methods fail to recognize images that were cropped ormodified by a program like GIMP. We must underline thatfor the current task we compressed the modified imageswith the initial quality factor. For example, after sharpen-ing an image which had been initially compressed withquality factor 70, we saved the new modified image withthe same quality factor (QF ¼ 70). Finally, StegBennie isvery likely to detect the process of compression of a BMPto a JPEG. However, the detection rate is lower for PNGimages.

At any case, the most useful conclusion extracted by thefinal experiment is that the method is able to identifya double JPEG compressed image, when the quality factorused for the second compression is not the same as theinitial one.

T80 OUT90 VSL70 VSL80 VSL90

pressed under various QF

StegDetect

StegBennie compared to Stegdetect.

Page 11: JPEG steganography detection with Benford's Law

P. Andriotis et al. / Digital Investigation 9 (2013) 246–257256

7. Conclusions and future work

In this paper we extended and further validated previ-ous work of Fu et al. (2007) and applied it in the case ofcolour JPEG images. We used this extension to developa new blind steganalytic method which utilizes Benford’sLaw introducing a new approach for the statistical steg-analysis of JPEG images. Thus, the outcome of our work isdual: firstly, we studied the behaviour of the quantized DCTcoefficients of colour images and confirmed the applic-ability of gBL; secondly we introduced a novel methodwhich can assist with various steganalytic tasks anddeveloped two fast and accurate new tools implementingit. By using those tools, a forensic expert can reveal thefingerprints left on compressed images by Outguess andVsl. Results extracted by examining a large image setdemonstrate the method’s validity and its ability to detectsuspicious images. False positive rates are fair and hit ratesare satisfactory. Lastly, the tools can identify the use of Vslin most of the cases.

A careful study of the deviations of the distributions ofdigit 2 revealed a stable, well defined and reliable behav-iour compared to the disturbances that various stegano-graphic techniques caused to the distributions of the otherfirst digits of the quantized DCT coefficients. However, animprovement of the decision making module of ourmethod could provide even better results than those wehave already seen. The potential of mathematical algo-rithms that take into consideration the overall disturbancesof all digits should be explored. The mathematical equationshould weigh the impact of each digit based on informationfrom previous investigation. In other words, machinelearning techniques and neural network theory could bevery helpful to optimize decision making. Also, researchshould be done to compare the efficiency of the methodagainst modern well known state of the art techniques.

We would also like to study the consequences that thesize of an image has to the distributions of the first digits inorder to provide more accurate parameters, based on thequality factor, size and resolution of the colour JPEG image.In Subsection 5.3 we saw that the format of an image(standard or widescreen) does not have a significant impacton the outcome of our method. On the other hand, theamount of pixels is an important factor and influences thevalidity of the results.

In addition, we have to consider the effect of the size ofthe embedded data and measure its impact on the overallvalidity of the method. We conducted some experimentswith a restricted quantity of images (taken from the ‘small’folder) to measure the effect of the embedded message’ssize to the steganalysis process. The size and resolution ofthose images are almost equal to the characteristics of thetraining set. Unfortunately, the embedding capacity ofthose images is very small and we cannot reach a safeconclusion about the impact of the message’s payload sizeto our method. However, it seems that when we are tryingto detect stego carriers created by Outguess or Vsl, the sizeof the hidden message does not affect critically the steg-analytic ability of the method and this finding can beexplained by the fact that the aforementioned algorithmschange image quantization tables. In the future we will

have to test the steganalytic algorithm with larger imagesthat will give us the chance to embed messages with var-ious sizes and evaluate how they affect its validity.

Thus, further investigation and more thorough experi-ments must be conducted in order to form a suitable modelwhich considers the quality factor, the number of compo-nents, the resolution of the JPEG image and the size of theembedded data.

In addition to the gBL, Fu et al. (2007) have also inves-tigated the distributions of the first digits of the coefficientsof the blocks of the JPEG images before the quantizationstep (during the compression of the image). They demon-strate that these distributions adhere to the original Ben-ford’s Law quite well. We have not yet applied thisobservation to our steganalytic method. Future develop-ment should consider this fact because it will probablyprovide the opportunity to ascertain the deviations of thedistributions of the first digits of the block coefficients(before quantization) and the deviations of distributions ofthe quantized DCT coefficients of the JPEG image.

An encouraging conclusion about the application of ourblind steganalytic method on different contexts is thesuccess rate we achieved when we tried to identify imagesthat were JPEG compressed for a second time. Table 13showed that the method fails to detect an image whichhas been double compressed using the same compressionquality factor. On the contrary, if the JPEG image has beendouble compressed and the second quality factor is dif-ferent from the first, then our technique is 100% accurate.This finding might allow us to estimate the compressionhistory of a double compressed JPEG image. In otherwords, we could use the deviations that our tools provideto estimate the initial quality factor of a JPEG image. Wecan then simply compare the set of distributions that animage provides with the sets of the expected distributionsof all the known quality factors. If we find a set whichcontains distributions whose deviations of all digits be-tween the expected and the gathered are minor, then wecan draw conclusions about the initial quality factor of theimage.

Acknowledgement

This work has been supported by the European Union’sPrevention of and Fight against Crime Programme “IllegalUse of Internet” – ISEC 2010 Action Grants, grant ref.HOME/2010/ISEC/AG/INT-002.

References

Benford F. The law of anomalous numbers. In: Proceedings of theAmerican philosophical society; 1938. p. 551–72.

Chandramouli R, Kharrazi M, Memon N. Image steganography andsteganalysis concepts and practice. In: Kalker T, editor. 2nd Interna-tional Workshop on Digital Watermarking (IWDW 2003). In: Cox I,Ro Y, editors. Lecture Notes in Computer Science, Vol. 2939; 2004. p.35–49.

Chandramouli R, Subbalakshmi K. Current trends in steganalysis: a criticalsurvey. In: Proc. 8th International Conference on Control, Automation,Robotics and Vision (ICARCV 2004), Vols. 1–3; 2004. p. 964–7.

Fridrich J, Goljanb M, Du R. Steganalysis based on JPEG compatibility. In:Proc. conference on multimedia systems and applications IV. Pro-ceedings of the Society of Photo-Optical Instrumentation Engineers(SPIE); 2001. p. 275–80.

Page 12: JPEG steganography detection with Benford's Law

P. Andriotis et al. / Digital Investigation 9 (2013) 246–257 257

Fu D, Shi YQ, Wei S. A generalized Benford’s law for JPEG coefficients andits applications in image forensics. In: Delp E, editor. Proc. 9th con-ference on security, steganography, and watermarking of multimediacontents. In: Wong P, editor. Proceedings of the Society of Photo-Optical Instrumentation Engineers (SPIE), Vol. 6505; 2007. pp.65051L–65051L–11.

Jolion J. Images and Benford’s law. Journal of Mathematical Imaging andVision 2001;14(1):73–81.

Kodovsky J, Fridrich J, Holub V. Ensemble classifiers for steganalysis ofdigital media. IEEE Transactions on Information Forensics and Secu-rity 2012;7(2):432–44.

Lee K, Westfeld A, Lee. Category attack for LSB steganalysis of JPEGimages. In: Digital watermarking (5th international workshop)IWDW 2006 Jeju Island, Korea, November 8–10, 2006. LNCS, Vol.4283. Springer-Verlag; 2006. p. 35–48. Revised Papers.

McBride B, Peterson G, Gustafson S. A new blind method for detectingnovel steganography. Digital Investigation 2005;2(1):50–70.

Newcomb S. Note on the frequency of use of the different digits in naturalnumbers. American Journal of Mathematics 1881;4(1):39–40.

Pérez-González F, Heileman GL, Abdallah CT. Benford’s law in imageprocessing. In: Proc. IEEE International Conference on Image Pro-cessing, (ICIP 2007); 2007. p. 405–8.

Schaefer G, Stich M. UCID – an uncompressed colour image database. In:Yeung M, editor. Proc. conference on storage and retrieval methods

and applications for multimedia. In: Lienhart R, Li C, editors. Pro-ceedings of the Society of Photo-Optical Instrumentation Engineers(SPIE), Vol. 5307; 2004. p. 472–80.

Schäfer C, Schräpler JP, Müller KR, Wagner GG. Automatic identification offaked and fraudulent interviews in surveys by two different methods.DIW-Diskussionspapiere 441. Deutsches Institut für Wirtschaftsfor-schung (DIW); 2004.

Solanki K, Sarkar A, Manjunath BS. YASS: yet another steganographicscheme that resists blind steganalysis. In: Furon T, editor. Infor-mation hiding. Lecture Notes in Computer Science, Vol. 4567; 2007.p. 16–31.

Wallace G. The JPEG still picture compression standard. IEEE Transactionson Consumer Electronics 1992;38(1):R18–34.

Westfeld A, Pfitzmann A. Attacks on steganographic systems – breakingthe steganographic utilities EzStego, Jsteg, Steganos, and S-Tools-andsome lessons learned. In: Pfitzmann A, editor. Proc. 3rd internationalworkshop on Information Hiding (IH 99). Lecture Notes in ComputerScience, Vol. 1768. Springer Berlin/Heidelberg; 2000. p. 61–76.

Zaharis A, Martini A, Tryfonas T, Illioudis C, Pangalos G. Lightweightsteganalysis based on image reconstruction and lead digit distribu-tion analysis. International Journal of Digital Crime and Forensics2011;3(4):29–41.

Zong H, Liu FL, Luo XY. Blind image steganalysis based on wavelet coef-ficient correlation. Digital Investigation 2012;9(1):58–68.