Top Banner
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 4, NO. 3, SEPTEMBER 2009 359 Temporal Derivative-Based Spectrum and Mel-Cepstrum Audio Steganalysis Qingzhong Liu, Andrew H. Sung, and Mengyu Qiao Abstract—To improve a recently developed mel-cepstrum audio steganalysis method, we present in this paper a method based on Fourier spectrum statistics and mel-cepstrum coefficients, derived from the second-order derivative of the audio signal. Specifically, the statistics of the high-frequency spectrum and the mel-cepstrum coefficients of the second-order derivative are extracted for use in detecting audio steganography. We also design a wavelet-based spectrum and mel-cepstrum audio steganalysis. By applying sup- port vector machines to these features, unadulterated carrier sig- nals (without hidden data) and the steganograms (carrying covert data) are successfully discriminated. Experimental results show that proposed derivative-based and wavelet-based approaches re- markably improve the detection accuracy. Between the two new methods, the derivative-based approach generally delivers a better performance. Index Terms—Audio, mel-cepstrum, second-order derivative, spectrum, steganalysis, support vector machine (SVM), wavelet. I. INTRODUCTION S TEGANOGRAPHY is the art and science of hiding data in digital media such as image, audio, and video files, etc. To the contrary, steganalysis is the art and science of detecting the information-hiding behaviors in digital media. In recent years, many steganalysis methods have been de- signed for detecting information-hiding in multiple steganog- raphy systems. Most of these methods are focused on de- tecting digital image steganography. For example, one of the well-known detectors, histogram characteristic function center of mass (HCFCOM), was successful in detecting noise-adding steganography [1]. Another well-known method is to construct the high-order moment statistical model in the multiscale decomposition using wavelet-like transform and then to apply a learning classifier to the high-order feature set [2]. Shi et al. proposed a Markov-process-based approach to detect the information-hiding behaviors in JPEG images [3]. Based on the Markov approach, Liu et al. expanded the Markov features to Manuscript received December 04, 2008; revised May 04, 2009. First published June 10, 2009; current version published August 14, 2009. This work was supported by Institute for Complex Additive Systems Analysis (ICASA), a research division of New Mexico Tech. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Jessica J. Fridrich. Q. Liu and A. H. Sung are with the Department of Computer Science and Engineering and Institute for Complex Additive Systems Analysis, New Mexico Tech, Socorro, NM 87801 USA (e-mail: [email protected]; [email protected]). M. Qiao is with the Department of Computer Science and Engineering, New Mexico Tech, Socorro, NM 87801 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIFS.2009.2024718 the interbands of the discrete cosine transform (DCT) domains and combined the expanded features and the polynomial fitting of the histogram of the DCT coefficients, and successfully im- proved the steganalysis performance in multiple JPEG images [4]. Other works on image steganalysis have been done by Fridrich [5], Pevny and Fridrich [6], Lyu and Farid [7], Liu and Sung [8], and Liu et al. [9]–[11]. Due to different characteristics of audio signals and images, methods developed for image steganalysis are not directly suitable for detecting information hiding in audio streams, and many research groups have investigated audio steganalysis. Ru et al. presented a method by measuring the features be- tween the signal under detection and a self-generated reference signal via linear predictive coding [12], [13], but the detection performance is poor. Avcibas designed a feature set of con- tent-independent distortion measures for classifier design [14]. Ozer et al. constructed a detector based on the characteristics of the denoised residuals of the audio file [15]. Johnson et al. set up a statistical model by building a linear basis that captures certain statistical properties of audio signals [16]. Craver et al. employed cepstral analysis to estimate a stego-signal’s probability density function in audio signals [17]. Kraetzer and Dittmann recently proposed a mel-cepstrum-based analysis to perform detection of embedded hidden messages [18], [19]. By expanding the Markov approach proposed by Shi et al. for image steganalysis [3], Liu et al. designed expanded Markov features for audio steganalysis [20]. Additionally, Zeng et al. presented new algorithms to detect phase coding steganog- raphy based on analysis of the phase discontinuities [21] and to detect echo steganography based on statistical moments of peak frequency [22]. In all these methods, Kraetzer and Dittmann’s proposed mel-cepstrum audio analysis is particu- larly noticeable, because it is the first time that mel-frequency cepstral coefficients (MFCCs), which are widely used in speech recognition, are utilized for audio steganalysis. In this paper, we propose an audio steganalysis method based on spectrum analysis and mel-cepstrum analysis of the second- order derivative of audio signal. In spectrum analysis, the statis- tics of the high-frequency spectrum of the second-order deriva- tive are extracted as spectrum features. To improve Kraetzer and Dittmann’s work [18], we design the features of mel-cep- strum coefficients that are derived from the second-order deriva- tive. Additionally, in comparison to the second-order derivative- based approach, a wavelet-based spectrum and mel-cepstrum method is also designed. Support vector machines (SVMs) with radial basis function (RBF) kernels [35] are employed to detect and differentiate steganograms from innocent signals. Results show that our derivative-based and wavelet-based methods are 1556-6013/$26.00 © 2009 IEEE Authorized licensed use limited to: NEW MEXICO INST OF MINING & TECH. Downloaded on August 31, 2009 at 18:45 from IEEE Xplore. Restrictions apply.
10

Qingzhong Liu, Andrew H. Sung, and Mengyu Qiaoqxl005/New/Publications/TIFS_audiosteg.pdfDue to different characteristics of audio signals and images, methods developed for image steganalysis

Oct 07, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Qingzhong Liu, Andrew H. Sung, and Mengyu Qiaoqxl005/New/Publications/TIFS_audiosteg.pdfDue to different characteristics of audio signals and images, methods developed for image steganalysis

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 4, NO. 3, SEPTEMBER 2009 359

Temporal Derivative-Based Spectrum andMel-Cepstrum Audio Steganalysis

Qingzhong Liu, Andrew H. Sung, and Mengyu Qiao

Abstract—To improve a recently developed mel-cepstrum audiosteganalysis method, we present in this paper a method based onFourier spectrum statistics and mel-cepstrum coefficients, derivedfrom the second-order derivative of the audio signal. Specifically,the statistics of the high-frequency spectrum and the mel-cepstrumcoefficients of the second-order derivative are extracted for usein detecting audio steganography. We also design a wavelet-basedspectrum and mel-cepstrum audio steganalysis. By applying sup-port vector machines to these features, unadulterated carrier sig-nals (without hidden data) and the steganograms (carrying covertdata) are successfully discriminated. Experimental results showthat proposed derivative-based and wavelet-based approaches re-markably improve the detection accuracy. Between the two newmethods, the derivative-based approach generally delivers a betterperformance.

Index Terms—Audio, mel-cepstrum, second-order derivative,spectrum, steganalysis, support vector machine (SVM), wavelet.

I. INTRODUCTION

S TEGANOGRAPHY is the art and science of hiding data indigital media such as image, audio, and video files, etc. To

the contrary, steganalysis is the art and science of detecting theinformation-hiding behaviors in digital media.

In recent years, many steganalysis methods have been de-signed for detecting information-hiding in multiple steganog-raphy systems. Most of these methods are focused on de-tecting digital image steganography. For example, one of thewell-known detectors, histogram characteristic function centerof mass (HCFCOM), was successful in detecting noise-addingsteganography [1]. Another well-known method is to constructthe high-order moment statistical model in the multiscaledecomposition using wavelet-like transform and then to applya learning classifier to the high-order feature set [2]. Shi etal. proposed a Markov-process-based approach to detect theinformation-hiding behaviors in JPEG images [3]. Based on theMarkov approach, Liu et al. expanded the Markov features to

Manuscript received December 04, 2008; revised May 04, 2009. Firstpublished June 10, 2009; current version published August 14, 2009. Thiswork was supported by Institute for Complex Additive Systems Analysis(ICASA), a research division of New Mexico Tech. The associate editorcoordinating the review of this manuscript and approving it for publication wasDr. Jessica J. Fridrich.

Q. Liu and A. H. Sung are with the Department of Computer Science andEngineering and Institute for Complex Additive Systems Analysis, New MexicoTech, Socorro, NM 87801 USA (e-mail: [email protected]; [email protected]).

M. Qiao is with the Department of Computer Science and Engineering, NewMexico Tech, Socorro, NM 87801 USA (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIFS.2009.2024718

the interbands of the discrete cosine transform (DCT) domainsand combined the expanded features and the polynomial fittingof the histogram of the DCT coefficients, and successfully im-proved the steganalysis performance in multiple JPEG images[4]. Other works on image steganalysis have been done byFridrich [5], Pevny and Fridrich [6], Lyu and Farid [7], Liu andSung [8], and Liu et al. [9]–[11].

Due to different characteristics of audio signals and images,methods developed for image steganalysis are not directlysuitable for detecting information hiding in audio streams, andmany research groups have investigated audio steganalysis.Ru et al. presented a method by measuring the features be-tween the signal under detection and a self-generated referencesignal via linear predictive coding [12], [13], but the detectionperformance is poor. Avcibas designed a feature set of con-tent-independent distortion measures for classifier design [14].Ozer et al. constructed a detector based on the characteristicsof the denoised residuals of the audio file [15]. Johnson et al.set up a statistical model by building a linear basis that capturescertain statistical properties of audio signals [16]. Craver etal. employed cepstral analysis to estimate a stego-signal’sprobability density function in audio signals [17]. Kraetzer andDittmann recently proposed a mel-cepstrum-based analysis toperform detection of embedded hidden messages [18], [19].By expanding the Markov approach proposed by Shi et al. forimage steganalysis [3], Liu et al. designed expanded Markovfeatures for audio steganalysis [20]. Additionally, Zeng et al.presented new algorithms to detect phase coding steganog-raphy based on analysis of the phase discontinuities [21] andto detect echo steganography based on statistical momentsof peak frequency [22]. In all these methods, Kraetzer andDittmann’s proposed mel-cepstrum audio analysis is particu-larly noticeable, because it is the first time that mel-frequencycepstral coefficients (MFCCs), which are widely used in speechrecognition, are utilized for audio steganalysis.

In this paper, we propose an audio steganalysis method basedon spectrum analysis and mel-cepstrum analysis of the second-order derivative of audio signal. In spectrum analysis, the statis-tics of the high-frequency spectrum of the second-order deriva-tive are extracted as spectrum features. To improve Kraetzerand Dittmann’s work [18], we design the features of mel-cep-strum coefficients that are derived from the second-order deriva-tive. Additionally, in comparison to the second-order derivative-based approach, a wavelet-based spectrum and mel-cepstrummethod is also designed. Support vector machines (SVMs) withradial basis function (RBF) kernels [35] are employed to detectand differentiate steganograms from innocent signals. Resultsshow that our derivative-based and wavelet-based methods are

1556-6013/$26.00 © 2009 IEEE

Authorized licensed use limited to: NEW MEXICO INST OF MINING & TECH. Downloaded on August 31, 2009 at 18:45 from IEEE Xplore. Restrictions apply.

Page 2: Qingzhong Liu, Andrew H. Sung, and Mengyu Qiaoqxl005/New/Publications/TIFS_audiosteg.pdfDue to different characteristics of audio signals and images, methods developed for image steganalysis

360 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 4, NO. 3, SEPTEMBER 2009

Fig. 1. Example of edge detection using derivatives [23].

very promising and possess remarkable advantage over Kraetzerand Dittmann’s work.

The rest of the paper is organized as follows: Section IIpresents the second-order derivative for audio steganalysisand the Fourier analysis; Section III introduces Kraetzer andDittmann’s mel-cepstrum analysis and describes improvedmel-cepstrum methods; Section IV presents experiments, fol-lowed by discussion in Section V and conclusion in Section VI.

II. TEMPORAL DERIVATIVE AND SPECTRUM ANALYSIS

In image processing, second-order derivative is widely em-ployed for detecting isolated points, edges, etc. [23]. Fig. 1shows an example of edge detection by using second-orderderivative. With this in mind, we developed a scheme based onthe second-order derivative for audio steganalysis, details ofwhich are described as follows.

A digital audio signal is denoted as. The second derivative of is , defined as

(1)

The stego-signal is denoted , which is modeled by addinga noise or error signal into the original signal

(2)

The second-order derivatives of error term and signalare denoted as and , respectively. Thus,

(3)

The discrete Fourier transforms of , , andare denoted as , , and , respectively,

(4)

(5)

(6)

where and is the number of samplesof the derivatives. We have

(7)

Assume that is the angle between the vectors and ,then

(8)

Since is arbitrary, the expected value of is calculatedas follows:

(9)

Divide both sides by

(10)

Generally speaking, is far smaller than atlow-frequency and middle-frequency components, wherethe modification caused by the addition of hidden data isnegligible. However, the situation changes at high-frequencycomponents. Digital audio signals are generally band-limited,the power spectral density is zero or very close to zero abovea certain finite frequency. On the other side, the error termis assumed to be broadband; in such cases, the modificationcaused by the addition of hidden data is not negligible inhigh-frequency components.

Assume an error to be a random signal with the expectedvalue of zero. The spectrum is approximately depicted by aGaussian-like distribution [24]. The power is zero at the lowestfrequency; as the frequency increases, the spectrum increases.Fig. 2(a) shows a simulated error signal, consisting of 25% for1s, 50% for 0 s, and 25% for 1 s. In this example, we assumethe sampling rate is 1000 Hz. Fig. 2(b) is the spectrum distribu-tion of second-order derivatives (only half the values are plotteddue to data symmetry). It demonstrates that the energy of thederivatives is concentrated in high frequency.

Regarding the second-order derivative, at the low and middlefrequency components, the power spectrum of an audio signalis normally much stronger than the power spectrum of the error

term caused by data hiding, in other words, is al-most equal to zero, based on (10); the difference of the spectrumbetween a cover and the stego-signal is suppressed at low and

1Available: http://mathworld.wolfram.com/FourierTransformGaussian.html

Authorized licensed use limited to: NEW MEXICO INST OF MINING & TECH. Downloaded on August 31, 2009 at 18:45 from IEEE Xplore. Restrictions apply.

Page 3: Qingzhong Liu, Andrew H. Sung, and Mengyu Qiaoqxl005/New/Publications/TIFS_audiosteg.pdfDue to different characteristics of audio signals and images, methods developed for image steganalysis

LIU et al.: TEMPORAL DERIVATIVE-BASED SPECTRUM AND MEL-CEPSTRUM AUDIO STEGANALYSIS 361

Fig. 2. (a) Random error signal consisting of 25% for 1s, 50% for 0 s, and 25% for �1 s; (b) spectrum of the second-order derivative.

Fig. 3. Spectra of the second derivatives of a cover signal (left) and the stego-signal (right).

middle frequency components. However, the situation is verydifferent at the high-frequency components. As frequency in-creases, increases, and is limited above a certain fre-quency, the increase of the spectrum resulted from hidden datais not negligible anymore; hence, the statistics extracted fromthe high-frequency components give a clue to detect the infor-mation-hiding behavior.

Fig. 3 shows the spectrum distribution of the second deriva-tive of a 44.1-kHz audio cover and the distribution of thesecond derivative of the stego-signal that is generated byembedding some data into the cover. It clearly shows that thehigh-frequency spectrum of the second-order derivative of thestego-signal has higher magnitude values, in comparison withthe cover.

In comparing signal spectrum to derivative spectrum, we alsoobserve that Fig. 4 demonstrates the spectra of the same coverand stego-signals without the extraction of the second deriva-tives. Similarly, the addition of hidden data increases the mag-nitude in high frequency although the energy is dominated inlow frequency.

It is worth noting that a comparison between Figs. 3 and 4shows that the second derivative amplifies the energy in highfrequency; especially, it amplifies the energy contributed by theaddition of the hidden signal. Therefore, a preprocessing step toextract the second-order derivative could be more effective fordetecting the hidden signal.

Next, we present the following procedure to extract the sta-tistical characteristics of the spectrum.

Authorized licensed use limited to: NEW MEXICO INST OF MINING & TECH. Downloaded on August 31, 2009 at 18:45 from IEEE Xplore. Restrictions apply.

Page 4: Qingzhong Liu, Andrew H. Sung, and Mengyu Qiaoqxl005/New/Publications/TIFS_audiosteg.pdfDue to different characteristics of audio signals and images, methods developed for image steganalysis

362 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 4, NO. 3, SEPTEMBER 2009

Fig. 4. Spectra of the same cover and stego-signals that are used in Fig. 3.

1) Obtain the Fourier spectrum of the second-order derivativeof the audio signal under test.

2) Calculate statistics, including mean value, standard devia-tion, and skewness, of the different frequency zones overthe spectrum. In our experiments, we equally divide the en-tire frequency zone into ( is set to ) zones orparts, from the lowest to the highest frequency. The meanvalue, standard deviation, and skewness of the th zone aredenoted , , and , respectively.

3) Choose the values , , and that are extracted fromthe high-frequency spectrum as the features.

The expected value of is given in (9). The expectedvalue of the variance is obtained by using the following:

(11)

According to (11), the rate of power change in differentspectrum bands of the stego-audio is different from the originalcover. Generally, the cepstrum may be interpreted as informa-tion for the power change; it was defined by Bogert, Healy, andTukey in [25]. Reynolds and McEachern showed a modifiedcepstrum called mel-cepstrum for speech recognition [26],

[27]. Recently, Kraetzer and Dittmann proposed a signal-basedmel-cepstrum audio steganalysis [18]. Based on (11), we de-sign a second derivative-based mel-cepstrum audio steganalysisto improve Kraetzer and Dittmann’s work. The details aredescribed in Section III.

III. IMPROVED MEL-CEPSTRUM AUDIO STEGANALYSIS

In speech processing, mel-frequency cepstrum (MFC) is arepresentation of the short-term power spectrum of a sound,based on a linear cosine transform of a log power spectrum ona nonlinear mel scale of frequency. To convert Hz into meluse the following:

(12)

MFCCs are coefficients that collectively make up an MFC.MFCCs are commonly derived from the following processes[28]:

1) take the Fourier transform of (a windowed excerpt of) asignal;

2) map the powers of the spectrum obtained above onto themel scale, using triangular overlapping windows;

3) take the logs of the powers at each of the mel frequencies;4) take the DCT of the list of mel log powers, as if it were a

signal;5) the MFCCs are the amplitudes of the resulting spectrum.Fig. 5 (available at http://www.ee.bilkent.edu.tr/

~onaran/SP-4.pdf) shows a fast Fourier transform (FFT)-based

Authorized licensed use limited to: NEW MEXICO INST OF MINING & TECH. Downloaded on August 31, 2009 at 18:45 from IEEE Xplore. Restrictions apply.

Page 5: Qingzhong Liu, Andrew H. Sung, and Mengyu Qiaoqxl005/New/Publications/TIFS_audiosteg.pdfDue to different characteristics of audio signals and images, methods developed for image steganalysis

LIU et al.: TEMPORAL DERIVATIVE-BASED SPECTRUM AND MEL-CEPSTRUM AUDIO STEGANALYSIS 363

Fig. 5. FFT-based mel-cepstrum procedure (available at http://www.ee.bilkent.edu.tr/~onaran/SP-4.pdf).

mel-cepstrum computation; the more technical details aregiven in [29].

Mel-cepstrum is commonly used for representing thehuman voice and musical signals. Inspired by the successin speech recognition, Kraetzer and Dittmann proposedmel-cepstrum-based speech steganalysis, including the fol-lowing two types of mel-cepstrum coefficients [18]:

1) MFCCs, . is the number ofMFCCs, for a signal with a sampling rate of 44.1 kHz

, calculated by the following equation, where MTindicates the mel-scale transformation:

(13)

2) Filtered mel-frequency cepstral coefficients (FMFCCs),. is the number of FMFCCs,

calculated by the following equation:

(14)

In (14), the role of speech band filtering is to remove thespeech relevant bands (the spectrum components between 200and 6819.59 Hz) [18].

To improve mel-cepstrum-based audio steganalysis,following Kraetzer and Dittmann’s work, we design thesecond-order derivative-based MFCCs and FMFCCs, obtainedby replacing the signal in (13) and (14) with the second-orderderivative ; the calculation is listed as follows:

(15)

and

(16)

Temporal derivative-based high-frequency spectrum statis-tics, described in Section II, and derivative-based mel-cepstrumcoefficients, derived from (15) and (16), form the feature vectorfor detecting the information hiding in digital audio signals.

To improve the original mel-cepstrum audio steganalysis,a wavelet-based mel-cepstrum approach is also designed. Weapply a wavelet transform to signal and get an approximationsub-band and a detail sub-band. Let denote the detail co-efficient sub-band; we replace in (13) and (14) with andobtaine the MFCCs and FMFCCs as follows:

(17)

and

(18)

IV. EXPERIMENTS

A. Setup

We obtained 6000 mono and 6000 stereo 44.1-kHz 16-bitquantization in uncompressed, PCM coded WAV audio signalfiles, covering different types such as digital speech, on-linebroadcast in different languages, for instance, English, Chi-nese, Japanese, Korean, and Spanish, and music (jazz, rock,blues). Each audio has the duration of 19 s. We produced thesame amount of the stego-audio signals by hiding differentmessage in these audio signals. The hiding tools/algorithmsinclude Hide4PGP V4.0,2 Invisible Secrets,3 LSB matching[30], and Steghide [31]. The hidden data include voice, video,image, text, and executable codes, which were encrypted beforeembedding by using different keys. We also produced audiosteganograms by hiding random bits. The covert data in any twoaudio streams are different. All the covers and steganogramsare available at http://www.cs.nmt.edu/~IA/steganalysis.html.

2Available: http://www.heinz-repp.onlinehome.de/Hide4PGP.htm3Available: http://www.invisiblesecrets.com/

Authorized licensed use limited to: NEW MEXICO INST OF MINING & TECH. Downloaded on August 31, 2009 at 18:45 from IEEE Xplore. Restrictions apply.

Page 6: Qingzhong Liu, Andrew H. Sung, and Mengyu Qiaoqxl005/New/Publications/TIFS_audiosteg.pdfDue to different characteristics of audio signals and images, methods developed for image steganalysis

364 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 4, NO. 3, SEPTEMBER 2009

Fig. 6. P-values in one-way ANOVA analysis for the spectrum features over the whole frequency band. The x-axis corresponds to the low- to high-frequencycomponent, from left to right. The y-axis shows the p-value.

B. Statistical Significance of Spectrum Features

To extract the spectrum features, we set 80 to and ex-tract the mean values, standard deviations, and skewness statis-tics, for a total of 240 features over the whole frequency of thederivatives. Fig. 6 lists the p-values in the one-way analysis ofvariances (ANOVA) [32], [33] of these features, extracted fromHide4PGP, Invisible Secrets, LSB matching, and Steghide, aswell as original covers. It clearly indicates the features extractedfrom the high-frequency components, which correspond to thesmall p-values, have better statistical significances than the fea-tures from the other frequency components. The ANOVA resultsare consistent with the analysis of Section II.

C. Signal-Based, Derivative-Based, and Wavelet-BasedMel-Cepstrum Audio Steganalysis

We compare the signal-based mel-cepstrum audio steganal-ysis to derivative-based and wavelet-based mel-cepstrum ap-proaches. Since Daubechies wavelets are widely used for signalprocessing and decomposition [34], we apply a Daubechieswavelet, “db8,” to signal for decomposition. Let MC, D-MC,and W-MC stand for signal-based, derivative-based, andwavelet-based mel-cepstrum steganalysis methods, respec-tively. Fig. 7 shows the receiver operating characteristic (ROC)curves by performing a cross validation in detecting the audiosteganograms by using SVMs with RBF kernels [35].

The experimental results show that wavelet-based andderivative-based mel-cepstrum audio steganalysis methodsprominently improve the detection performance of the original

mel-cepstrum method. In comparison with the wavelet-basedapproach, the derivative-based mel-cepstrum audio steganalysisgenerally delivers a better performance.

Since one-time cross-validation is not statistically significant,to compare these three methods, we perform 100 runs for eachmethod on each type of audio steganograms with a certain in-formation-hiding ratio. In each run, 50% audio samples are ran-domly assigned to the training set; the remaining 50% audiosamples are used for testing. The mean testing accuracy valuesand the standard errors are listed in Table I. Hiding ratio, theratio of the size of hidden data to the maximum capacity, is usedto measure the embedding strength.

As seen in Table I, wavelet-based and derivative-basedmel-cepstrum methods are superior to the original signal-basedmethod. Most remarkably, derivative-based mel-cepstrumaudio steganalysis delivers the best result by greatly improvingthe detection performance over the mel-cepstrum method. Forinstance, it improves the detection accuracy by about 17% forthe detection of invisible steganograms with maximum hiding,and about 18% for the detection of LSB matching steganogramswith maximum hiding.

D. Wavelet-Based and Wavelet-Based Spectrum andMel-Cepstrum Audio Steganalysis versus AMSL AudioSteganalysis Toolset (AAST)

The feature sets of wavelet-based and derivative-basedmethods include two types: spectrum statistics and mel-cep-strum coefficients. The mel-cepstrum coefficients consist of

Authorized licensed use limited to: NEW MEXICO INST OF MINING & TECH. Downloaded on August 31, 2009 at 18:45 from IEEE Xplore. Restrictions apply.

Page 7: Qingzhong Liu, Andrew H. Sung, and Mengyu Qiaoqxl005/New/Publications/TIFS_audiosteg.pdfDue to different characteristics of audio signals and images, methods developed for image steganalysis

LIU et al.: TEMPORAL DERIVATIVE-BASED SPECTRUM AND MEL-CEPSTRUM AUDIO STEGANALYSIS 365

Fig. 7. ROC curves by using signal-based, derivative-based, and wavelet-based mel-cepstrum audio steganalysis, with the legends MC, D-MC, and W-MC, re-spectively.

TABLE IAVERAGE TESTING ACCURACY VALUES AND STANDARD ERRORS OF ORIGINAL

MEL-CEPSTRUM (MC), WAVELET-BASED MEL-CEPSTRUM (W-MC) BY USING

“dB8” AND DERIVATIVE-BASED MEL-CEPSTRUM (D-MC) METHODS

MFCCs and FMFCCs, given by (15) and (16) for deriva-tive-based approach, and by (17) and (18) for wavelet-basedapproach. In wavelet-based method, we adopt the same thatis used in the derivative-based approach for spectrum featureextraction, except that we first replace the second derivativewith the wavelet detail subband. In both cases, spectrum fea-tures consist of mean values, standard deviations, and skewnessvalues from high-frequency components, given by

(19)

We abbreviate the methods as DSMC (derivative-based spec-trum and mel-cepstrum) and WSMC (wavelet-based spectrumand mel-cepstrum). In Kraetzer and Dittmann’s work, signal-based mel-cepstrum coefficients and other statistical features

constitute a detector for steganalysis of speech audio signals,called AAST [18]. Fig. 8 shows ROC curves by performing across validation with the use of these three methods. Table II listsmean testing accuracy values and standard errors of 100 runs.Experimental results show that DSMC and WSMC outperformASTT. On average, DSMC is superior to WSMC.

V. DISCUSSION

Derivative-based and wavelet-based methods are superior tothe original solution proposed by Kraetzer and Dittmann foraudio steganalysis. Our explanation is that audio signals aregenerally band-limited while, on the other hand, the embeddedhidden data is likely broadband. Consequently, both derivative-based and wavelet-based methods are more accurate since theyfirst obtain the high-frequency information from audio signals.

Our experimental results show that the derivative-basedmethod outperforms the wavelet-based method, especiallywhen only mel-cepstrum features are employed for detection.We believe this resulted from different spectrum characteristicsbetween the second-order derivative and wavelet filtering.Fig. 9(a) shows the spectrum of the second derivative of thehidden data in an audio steganogram; Fig. 9(b) plots the spec-trum of the detail wavelet sub-band of the same hidden data,filtered by using “db8.” There is a large difference betweenthese two spectra. The filtered by the wavelet may be treatedas a white noise signal; the spectrum is almost equally dis-tributed over the whole frequency band, as shown in Fig. 9(b).However, the second derivative suppresses the energy in low

Authorized licensed use limited to: NEW MEXICO INST OF MINING & TECH. Downloaded on August 31, 2009 at 18:45 from IEEE Xplore. Restrictions apply.

Page 8: Qingzhong Liu, Andrew H. Sung, and Mengyu Qiaoqxl005/New/Publications/TIFS_audiosteg.pdfDue to different characteristics of audio signals and images, methods developed for image steganalysis

366 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 4, NO. 3, SEPTEMBER 2009

Fig. 8. ROC curves by using DSMC, WSMC, and AAST, respectively.

TABLE IIAVERAGE TESTING ACCURACY VALUES AND STANDARD ERRORS BY USING

AAST, WSMC, AND DSMC

frequency and amplifies the energy in high frequency; the spec-trum shape is approximately simulated by a half of Gaussiandistribution, not equally distributed over the whole frequencyband, as shown in Fig. 9(a). Based on (11) in Section II, as thespectrum increases, the expected value of the variance ofthe audio steganogram will prominently increase, that is, therate of power change in different spectrum bands will dramat-ically change. Since the mel-cepstrum coefficients are usedto capture the information for power change, the advantageof derivative-based mel-cepstrum approach becomes easilynoticeable.

With the addition of spectrum features, the wavelet-basedmethod gains a greater improvement than the derivative-basedmethod and thereby narrows the performance gap between thetwo methods. However, in both methods, mel-cepstrum featuresmake more contributions than spectrum features for detecting

information-hiding; in other words, mel-cepstrum features playkey roles in audio steganalysis.

We note that different information-hiding systems/algorithmsare sensitive to different features, which can be observed bycomparing the detection results by using MC and AAST fea-ture sets, shown in Tables I and II, respectively. The testing re-sults by using the AAST feature set are with much higher stan-dard errors. We surmise that some statistical features in AASTfeature may not be very significant, as a statistical analysis foreach individual feature would indicate. To address this problem,we can discard some insignificant features and choose an op-timal feature set from all AAST features. The feature selectionproblem in steganalysis has been studied in our previous work[10]. The feature optimization for audio steganalysis is currentlybeing conducted.

As described and explained in our previous work on imagesteganalysis [11], besides the information-hiding ratio, imagecomplexity is an important parameter in evaluating steganalysisperformance, as the detection performance is necessarily lowerfor steganalysis of images with high complexity. Similarly, inaudio steganalysis, all methods tested in our experiments willnot be very effective for steganalysis of audio signals with highcomplexity, which generally have high magnitude in high fre-quency. This issue is also currently under our study.

VI. CONCLUSION

In this paper, we proposed spectrum analysis and improvedmel-cepstrum methods for audio steganalysis, derived from the

Authorized licensed use limited to: NEW MEXICO INST OF MINING & TECH. Downloaded on August 31, 2009 at 18:45 from IEEE Xplore. Restrictions apply.

Page 9: Qingzhong Liu, Andrew H. Sung, and Mengyu Qiaoqxl005/New/Publications/TIFS_audiosteg.pdfDue to different characteristics of audio signals and images, methods developed for image steganalysis

LIU et al.: TEMPORAL DERIVATIVE-BASED SPECTRUM AND MEL-CEPSTRUM AUDIO STEGANALYSIS 367

Fig. 9. (a) Spectrum of the second-order derivative of a hidden data in a 44.1-kHz audio steganogram, and (b) spectrum of the detail wavelet sub-band of thesame hidden data, filtered by using “db8.” The frequency shown in the x-axis of (b) is reduced due to down-sampling of wavelet decomposition to obtain the detailsub-band.

second-order derivative and from the detail wavelet sub-band.Experimental results show that, in steganalysis of audio filesproduced by Hide4PGP, Invisible Secrets, Steghide, and LSBmatching algorithms/tools, our proposed methods deliver goodperformance and gain significant advantage over a recently de-signed signal-based mel-cepstrum method. In a comparison ofthe two new methods, on average, the derivative-based solutionis superior to the wavelet-based method.

ACKNOWLEDGMENT

The authors would like to thank Dr. D. Ellis of ColumbiaUniversity for his insightful comments and great suggestions,Dr. M. Slaney of Yahoo research and Prof. H. Ai for their helpfuldiscussions, and Dr. J. Dittmann and C. Kraetzer for kindly pro-viding them with the AAST document. Special thanks go to theanonymous reviewers for their insightful comments that helpedimprove the presentation.

REFERENCES

[1] J. Harmsen and W. Pearlman, “Steganalysis of additive noise mode-lable information hiding,” Proc. SPIE, Electronic Imaging, Security,Steganography, and Watermarking of Multimedia Contents, vol. 5020,pp. 131–142, 2003.

[2] S. Lyu and H. Farid, “How realistic is photorealistic?,” IEEE Trans.Signal Process., vol. 53, no. 2, pt. 2, pp. 845–850, Feb. 2005.

[3] Y. Shi, C. Chen, and W. Chen, “A Markov process based approachto effective attacking JPEG steganography,” in Lecture Notes Comput.Sci., 2007, vol. 437, pp. 249–264.

[4] Q. Liu, A. Sung, B. Ribeiro, and R. Ferreira, “Steganalysis of multi-class JPEG images based on expanded Markov features and polyno-mial fitting,” in Proc. 21st Int. Joint Conf. Neural Networks, 2008, pp.3351–3356.

[5] J. Fridrich, “Feature-based steganalysis for JPEG images and its impli-cations for future design of steganographic schemes,” in Lecture NotesComput. Sci., 2004, vol. 3200, pp. 67–81.

[6] T. Pevny and J. Fridrich, “Merging Markov and DCT features for multi-class JPEG steganalysis,” Proc. SPIE Electronic Imaging, pp. 03–04,Jan. 2007.

[7] S. Lyu and H. Farid, “Steganalysis using high-order image statistics,”IEEE Trans. Inf. Forensics Security, vol. 1, no. 1, pp. 111–119, Mar.2006.

[8] Q. Liu and A. Sung, “Feature mining and nuero-fuzzy inferencesystem for steganalysis of LSB matching steganography in grayscaleimages,” in Proc. of 20th Int. Joint Conf. Artificial Intelligence, 2007,pp. 2808–2813.

[9] Q. Liu, A. Sung, J. Xu, and B. Ribeiro, “Image complexity and fea-ture extraction for steganalysis of LSB matching steganography,” inProc. 18th Int. Conf. Pattern Recognition (ICPR), 2006, no. 1, pp.1208–1211.

[10] Q. Liu, A. Sung, Z. Chen, and J. Xu, “Feature mining and pattern clas-sification for steganalysis of LSB matching steganography in grayscaleimages,” Pattern Recognit., vol. 41, no. 1, pp. 56–66, 2008.

[11] Q. Liu, A. Sung, B. Ribeiro, M. Wei, Z. Chen, and J. Xu, “Imagecomplexity and feature mining for steganalysis of least significant bitmatching steganography,” Inf. Sci., vol. 178, no. 1, pp. 21–36, 2008.

[12] X. Ru, H. Zhang, and X. Huang, “Steganalysis of audio: Attacking thesteghide,” in Proc. 4th Int. Conf. Machine Learning and Cybernetics,2005, pp. 3937–3942.

[13] X. Ru, Y. Zhang, and F. Wu, “Audio steganalysis based on negativeresonance phenomenon caused by steganographic tools,” J. ZhejiangUniv. SCIENCE A, vol. 7, no. 4, pp. 577–583, 2006.

[14] I. Avcibas, “Audio steganalysis with content-independent distortionmeasures,” IEEE Signal Process. Lett., vol. 13, no. 2, pp. 92–95, Feb.2006.

[15] H. Ozer, B. Sankur, N. Memon, and I. Avcibas, “Detection of audiocovert channels using statstical footprints of hidden messages,” DigitalSignal Process., vol. 16, no. 4, pp. 389–401, 2006.

[16] M. Johnson, S. Lyu, and H. Farid, “Steganalysis of recorded speech,”Proc. SPIE, vol. 5681, pp. 664–672, 2005.

[17] S. Craver, B. Liu, and W. Wolf, “Histo-cepstral analysis for reverse-engineering watermarks,” in Proc. 38th Conf. Information Sciences andSystems (CISS’04), Princeton, NJ, Mar. 2004, pp. 824–826.

[18] C. Kraetzer and J. Dittmann, “Mel-cepstrum based steganalysis forvoip-steganography,” Proc. SPIE, vol. 6505, p. 650505, 2007.

[19] C. Kraetzer and J. Dittmann, “Pros and cons of mel-cepstrum basedaudio steganalysis using SVM classification,” in Lecture NotesComput. Sci., 2008, vol. 4567, pp. 359–377.

[20] Q. Liu, A. Sung, and M. Qiao, “Detecting information-hiding in WAVaudios,” in Proc 19th Int. Conf. Pattern Recognition, 2008, pp. 1–4.

[21] W. Zeng, H. Ai, and R. Hu, “A novel steganalysis algorithm of phasecoding in audio signal,” in Proc. 6th Int. Conf. Advanced LanguageProcessing and Web Information Technology (ALPIT), 2007, pp.261–264.

[22] W. Zeng, H. Ai, and R. Hu, “An algorithm of echo steganalysis basedon power cepstrum and pattern classification,” in Proc. Int. Conf. In-formation and Automation (ICIA), 2008, pp. 1667–1670.

[23] R. Gonzalez and R. Woods, Digital Image Processing, 3rd ed. Engle-wood Cliffs, NJ: Prentice-Hall, 2008, ISBN: 9780131687288.

[24] A. Oppenheim, R. Schafer, and J. Buck, Discrete-Time Signal Pro-cessing. Englewood Cliffs, NJ: Prentice-Hall, 1999.

[25] B. Bogert, M. J. R. Healy, and J. W. Tukey, “The frequency analysisof times series for echoes: Cepstrum, pseudo-autocovariance, cross-cepstrum, and saphe cracking,” in Proc. Symp. Time Series Analysis,M. Rosenblatte, Ed. New York: Wiley, 1963, ch. 15, pp. 209–243.

[26] D. Reynolds, “A Gaussian Mixture Modeling Approaching to Text-Independent Speaker Indentification,” Ph.D. thesis, Department Elec-trical Engineering, Georgia Institute of Technology, Atlanta, 1992.

Authorized licensed use limited to: NEW MEXICO INST OF MINING & TECH. Downloaded on August 31, 2009 at 18:45 from IEEE Xplore. Restrictions apply.

Page 10: Qingzhong Liu, Andrew H. Sung, and Mengyu Qiaoqxl005/New/Publications/TIFS_audiosteg.pdfDue to different characteristics of audio signals and images, methods developed for image steganalysis

368 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 4, NO. 3, SEPTEMBER 2009

[27] R. H. McEachern, “Hearing it like it is: Audio signal processing theway the ear does it,” DSP Applications, vol. 3, no. 2, pp. 35–47, Feb.1994.

[28] M. Xu, L. Duan, J. Cai, L. Chia, C. Xu, and Q. Tian, “HMM-basedaudio keyword generation,” in Proc. 5th Pacific Rim Conf. Multimedia,Part III, Lecture Notes in Computer Science, Nov. 30–Dec. 3, 2004,vol. 3333, pp. 566–574.

[29] S. Molau, M. Pitz, R. Schlüter, and H. Ney, “Computing mel-frequencycepstral coefficients on the power spectrum,” in Proc. IEEE Int. Conf.Acoustics, Speech and Signal Processing, Salt Lake City, UT, May2001, vol. I, pp. 73–76.

[30] T. Sharp, “An implementation of key-based digital signal steganog-raphy,” in Proc. Information Hiding Workshop (LNCS), 2001, vol.2137, pp. 13–26.

[31] S. Hetzl and P. Mutzel, “A graph-theoretic approach to steganography,”in LNCS, 2005, vol. 3677, pp. 119–128.

[32] T. Hill and P. Lewicki, Statistics: Methods and Applications. Tulsa,OK: StatSoft, Inc., 2005, ISBN: 1884233597.

[33] R. Agostino, L. Sullivan, and A. Beiser, Introductory Applied Biostatis-tics. Pacific Grove, CA: Brooks/Cole, 2005.

[34] I. Daubechies, Ten Lectures on Wavelets. Philadelphia, PA: SIAM,1992.

[35] V. Vapnik, Statistical Learning Theory. Hoboken, NJ: Wiley, 1998.

Qingzhong Liu received the B.Eng. degree fromNorthwestern Polytechnical University and theM.Eng. degree from Sichuan University in China,and the Ph.D. degree in Computer Science from NewMexico Tech, in 2007.

Dr. Liu is currently a Senior Research Scientistand Adjunct Faculty of New Mexico Tech, Socorro,NM. His research interests include data mining,pattern recognition, bioinformatics, multimediacomputing, information security, and digital forensicanalysis.

Andrew H. Sung received the Ph.D. degree fromthe State University of New York at Stony Brook, in1984.

Dr. Sung is a Professor and the Chairman of theComputer Science and Engineering Departmentof New Mexico Tech, Socorro, NM. His currentresearch interests include information security anddigital forensic analysis, bioinformatics, applicationalgorithms, and soft computing and its engineeringapplications.

Mengyu Qiao is currently working toward the Ph.D.degree in the Computer Science and EngineeringDepartment, New Mexico Tech, Socorro, NM. Hereceived the B.Eng. degree in software engineeringfrom Beijing University of Posts and Telecommu-nications, China, in 2006, and the M.S. degree incomputer science from New Mexico Tech, in 2009.

His research interests include information se-curity, bioinformatics, data mining, image/signalprocessing, and software engineering.

Authorized licensed use limited to: NEW MEXICO INST OF MINING & TECH. Downloaded on August 31, 2009 at 18:45 from IEEE Xplore. Restrictions apply.