Least Significant Bit Steganography Detection with Machine Learning Techniques Shen Ge 1, Yang Gao 1, Ruili Wang 2 1 State Key Laboratory for Novel Software.

Least Significant Bit SteganograLeast Significant Bit Steganography Detection with Machine Leaphy Detection with Machine Lea

rning Techniquesrning TechniquesShen GeShen Ge11, Yang Gao, Yang Gao11, Ruili Wang, Ruili Wang22

11State Key Laboratory for Novel Software State Key Laboratory for Novel Software Technology, Nanjing UniversityTechnology, Nanjing University

22Institute of Information Sciences and TecInstitute of Information Sciences and Technology Massey University (Turitea)hnology Massey University (Turitea)

We are so sorry that we are not able to come due to We are so sorry that we are not able to come due to the visa problem.the visa problem.

OutlineOutline

• IntroductionIntroduction• MotivationMotivation• Conventional MethodsConventional Methods• Our Point of ViewOur Point of View• Our FrameworkOur Framework• Experiment ResultsExperiment Results• Conclusions and Future WorkConclusions and Future Work

Introduction (1)Introduction (1)

• SteganographySteganography– the goalthe goal

• to ensure secret messages to ensure secret messages transferred secretlytransferred secretly• to make the transferred secret messages to make the transferred secret messages undetectableundetectable..

– It is the art of invisible communication, and proviIt is the art of invisible communication, and provide a plausible deniability to secret communication.de a plausible deniability to secret communication.

• SteganalysisSteganalysis– the goalthe goal

• to to detectdetect the existence of steganography the existence of steganography• to to estimateestimate its message length its message length• or to or to extractextract the hidden information the hidden information

– The steganalysis algorithms achieve their goals by The steganalysis algorithms achieve their goals by exploiting the differences between the media files exploiting the differences between the media files before and after embedding.before and after embedding.

Introduction (2)Introduction (2)

• An illustrating exampleAn illustrating example

We will attack at 5:00 AM

tomorrow

Embedding

(Steganography)

Extracting

Hidden Message

Detecting

(Steganalysis) There is

embedded

message!

Motivation (1)Motivation (1)

• The using of steganography can cause The using of steganography can cause security problemssecurity problems– Suppose an employee send out an image eSuppose an employee send out an image e

mbedded with commercial secrets, currenmbedded with commercial secrets, current network firewalls cannot block such comt network firewalls cannot block such communicationsmunications

– So steganalysis is need to analyze the suitaSo steganalysis is need to analyze the suitable cover media and point out possible emble cover media and point out possible embedded ones for further processbedded ones for further process

Motivation (2)Motivation (2)

• In this paper we focus on LSB hidden informaIn this paper we focus on LSB hidden information detection with machine learning techniqtion detection with machine learning techniquesues– LSB is the most popular steganography methodsLSB is the most popular steganography methods– The detection can be used in real applicationsThe detection can be used in real applications

• We need new methods to detect the existence We need new methods to detect the existence of LSB embedded messageof LSB embedded message– Universal steganalysis seems to be too generic thus Universal steganalysis seems to be too generic thus

low accuracy, conventional LSB steganalysis seems low accuracy, conventional LSB steganalysis seems to be too specificto be too specific

– Steganalysis is proposed to estimate the length of hSteganalysis is proposed to estimate the length of hidden message, so if we focus on the detection proidden message, so if we focus on the detection problem, we have to do moreblem, we have to do more

Conventional Methods Conventional Methods (1)(1)

• Our Classification for Steganography MethodsOur Classification for Steganography Methods– Operates in Operates in bitmapsbitmaps: LSB: LSB– Embeds in Embeds in transformed domain imagestransformed domain images (such as JP (such as JP

EG images): OutGuess, F5EG images): OutGuess, F5• Our Classification for Steganalysis MethodsOur Classification for Steganalysis Methods

– Instance basedInstance based• use training sets, and involve a classifier construction prouse training sets, and involve a classifier construction pro

cesscess• IQM based, High-Order DWT, Calibrated FeatureIQM based, High-Order DWT, Calibrated Feature

– Non-instance basedNon-instance based• exploit the statistics of the image by an implicated parameexploit the statistics of the image by an implicated parame

tric model, and classification is done by heuristics.tric model, and classification is done by heuristics.• Most conventional methodsMost conventional methods• C2, RS, SPA, JPEG compatibilityC2, RS, SPA, JPEG compatibility


• We focus on three main conventional steWe focus on three main conventional steganalysis methods of LSBganalysis methods of LSB

• The The 22 Method [ Method [Pfitzmann and Westfeld]]– Idea: Idea: To find the statistical evidence which is To find the statistical evidence which is

left by the embedding processleft by the embedding process– The LSB embedding process can be viewed aThe LSB embedding process can be viewed a

s a flip operation, if the pixel’s LSB is the sas a flip operation, if the pixel’s LSB is the same with the bit we want to embed, then this me with the bit we want to embed, then this pixel is untouched, else it will be flipped, pixpixel is untouched, else it will be flipped, pixel 2j will be flipped to 2j+1,and 2j+1 will be fliel 2j will be flipped to 2j+1,and 2j+1 will be flipped to 2jpped to 2j


• The The 22 Method Method– We combine pixel 2j and 2j+1 to a pair which we call PoV (PWe combine pixel 2j and 2j+1 to a pair which we call PoV (P

air of Value)air of Value)

– From experiment, we can see that before embedding, the fFrom experiment, we can see that before embedding, the frequencies of the two pixels in a specific pair seems to distrequencies of the two pixels in a specific pair seems to distribute randomly, and after embedding, they will be nearly ribute randomly, and after embedding, they will be nearly the same because the pixel’s LSB are replaced by the mesthe same because the pixel’s LSB are replaced by the message 0 and 1 bits which is usually uniformly distributedsage 0 and 1 bits which is usually uniformly distributed


• The The 22 Method Method– So we can develop equations of the probability of images beiSo we can develop equations of the probability of images bei

ng embedded (also the message length)ng embedded (also the message length)– We define hWe define h2i2i and h and h2i+12i+1 as the frequencies of the two pixels in as the frequencies of the two pixels in

a certain pair, in order to tell if there is significant difference a certain pair, in order to tell if there is significant difference between the distributions of the two value, we can use between the distributions of the two value, we can use 22 test test, , we need to calculate we need to calculate 22 statistics (h statistics (h2i2i*=(h*=(h2i2i+h+h2i+12i+1)/2))/2)

– And finally we can calculate p as the message length (k is the And finally we can calculate p as the message length (k is the total number of all possible i)total number of all possible i)

– In fact, the accuracy of this methods decrease sharply when In fact, the accuracy of this methods decrease sharply when the number of pixels increase, so we often split the images ithe number of pixels increase, so we often split the images into groups to analyse nto groups to analyse

* 22 2 2

1 *1 2

( )ki i

ki i

h h

hc -

=

-=å

( )

211

2 21

2

1

1 02

11

2

t kk

kk

p e x dxc --

-

- -

-= -

G ò


• The RS Method [J. Fridrich et al]The RS Method [J. Fridrich et al]– Idea: Idea: using some experiment justified hypothesis, to using some experiment justified hypothesis, to

estimate the message lengthestimate the message length– Since this method is very complicated, we don’t disSince this method is very complicated, we don’t dis

cuss the theory behind it, we just describe the procecuss the theory behind it, we just describe the procedure neededdure needed

– TermsTerms• Group: Group: • Discriminate function:Discriminate function:• Flip function:Flip function:

– FF11::– FF-1-1::– FF00 : :

• MaskMask• Flip: every group is flipped by F and some mask MFlip: every group is flipped by F and some mask M

1 2( , , , )nG x x x= K1

1 2 11( , , , )

n

n i iif x x x x x

-

+== -åK

1 0 1,2 3, , 254 255.F = « « «K

1 1 0,1 2, , 255 256.F- =- « « «K

0 ( )F x x=

( ) { 1,0,1}M i Î -

(1) 1 (2) ( )( ) ( ( ), ( 2), , ( ))M M M n nF G F x F x F x= K


• The RS MethodThe RS Method– The Group TypesThe Group Types

• RegularRegular• SingularSingular• UnusableUnusable

– We denote the relative number of R and S groups uWe denote the relative number of R and S groups using mask Msing mask M{0,1} as R{0,1} as RMM, S, SMM, using -M, using -M{-1,0} as R{-1,0} as R-M-M,S,S--MM

– The hypothesisThe hypothesis• For typical unembedded cover image RFor typical unembedded cover image RMMRR-M-M S SMMSS-M-M• Straight line: RStraight line: R-M-M(p/2)-R(p/2)-R-M-M(1-p/2) and S(1-p/2) and S-M-M(p/2)-S(p/2)-S-M-M(1-p/2)(1-p/2)• Parabolas: RParabolas: RMM(p/2), R(p/2), RMM(1/2), R(1/2), RMM(1-p/2) and S(1-p/2) and SMM(p/2), S(p/2), SMM(1/2), (1/2),

SSMM(1-p/2)(1-p/2)• In RS diagram (see next page), the intersection point of RIn RS diagram (see next page), the intersection point of RMM

and Rand R-M-M has the same x coordinate with wich of S has the same x coordinate with wich of SMM and S and S-M-M• RRMM(1/2)=S(1/2)=SMM(1/2)(1/2)

( ( )) ( )G R f F G f GÎ Û <( ( )) ( )G R f F G f GÎ Û >

( ( )) ( )G R f F G f GÎ Û =


• The RS MethodThe RS Method– The RS diagramThe RS diagram

• 50% x stands for 100% 50% x stands for 100% embeddingembedding

• The value for p/2 is get The value for p/2 is get through the image through the image statistics, the value for 1-statistics, the value for 1-p/2 is get though the p/2 is get though the flipped imageflipped image

• Using the hypothesis Using the hypothesis stated before, we can stated before, we can calculate the final message calculate the final message length plength p

– The final formula for The final formula for calculating pcalculating p

• x is one root of equationx is one root of equation

• p=xp=xpp/(x/(xpp-1/2)-1/2)

21 0 0 1 1 0 0 0

0 1

0 1

2( ) ( 3 ) 0

where ( 2) ( 2), (1 2) (1 2),

( 2) ( 2), (1 2) (1 2).

p p

M M M M

M M M M

d d x d d d d x d d

d R p S p d R p S p

d R p S p d R p S p

- - -

- - - - - -

+ + - - - + - =

= - = - - -

= - = - - -


• The SPA Method [Dumitrescu et al.]The SPA Method [Dumitrescu et al.]– Idea: Idea: classifier value pairs into groups, study the transition bclassifier value pairs into groups, study the transition b

etween them, and use this to estimate the message lengthetween them, and use this to estimate the message length– TermsTerms

• We assume the image is represented by successive samples: sWe assume the image is represented by successive samples: s11,s,s22,,…,s…,sNN (N is the total number of the sample) (N is the total number of the sample)

• A sample pair is a tuple (sA sample pair is a tuple (sii,s,sjj) 1) 1i,ji,jNN• We group the tuples into four types (1We group the tuples into four types (1mm22bb-1) (b is the pixel total -1) (b is the pixel total

bits)bits)– XX2m+12m+1: (2k-2m-1,2k) or (2k,2k-2m-1): (2k-2m-1,2k) or (2k,2k-2m-1)– YY2m+12m+1: (2k-2m,2k+1) or (2k+1,2k-2m): (2k-2m,2k+1) or (2k+1,2k-2m)– XX2m2m: (2k-2m,2k) or (2k+1,2k-2m+1): (2k-2m,2k) or (2k+1,2k-2m+1)– YY2m2m: (2k-2m+1,2k+1) or (2k,2k-2m): (2k-2m+1,2k+1) or (2k,2k-2m)– CCmm: The Union of X: The Union of X2m-12m-1,X,X2m2m,Y,Y2m2m,X,X2m+12m+1, closed under embedding, closed under embedding– DDmm: The Union of X: The Union of X2m2m and Y and Y2m2m

• Considered the flip of each pixel in a pair, we get the finite state Considered the flip of each pixel in a pair, we get the finite state machine which can help calculate the message length pmachine which can help calculate the message length p

Conventional Methods(6)Conventional Methods(6)

• The SPA MethodThe SPA Method– The finite state transition machineThe finite state transition machine

– The final equation for calculating p:The final equation for calculating p:2

0 1 0 2 2 2 1 2 1 2 1 2 10 0

1

(2 | | | |) [2 | | | | 2 (| | | |)] (| | | |) 04 2

where 2 2

j j

j j m m m mm m

b

p pC C D D Y X Y X

j

+ + + + + += =

-

¢ ¢ ¢ ¢ ¢ ¢- - - + - + - =

= -

å å

Our Point of ViewOur Point of View• The detection problem itself can be treated The detection problem itself can be treated

as a standard classification problem. Here as a standard classification problem. Here our input is the message, the output is the our input is the message, the output is the label we assign to indicate whether this label we assign to indicate whether this image has been embedded or not.image has been embedded or not.

• Conventional methods do not focus on Conventional methods do not focus on detection, if we want to directly use these detection, if we want to directly use these methods, we have to use thresholdmethods, we have to use threshold

• We introduce machine learning technique to We introduce machine learning technique to wrap on conventional algorithms, so as to get wrap on conventional algorithms, so as to get better performance and generalization abilitybetter performance and generalization ability

• We extract features from the image based on We extract features from the image based on conventional methodsconventional methods

Our FrameworkOur Framework• The frameworkThe framework

• We get a series of methods using different classifierWe get a series of methods using different classifierss

• We use features based on conventional LSB steganaWe use features based on conventional LSB steganalysis methods because we want to ensure the perforlysis methods because we want to ensure the performance on LSB detectionmance on LSB detection– Sequential case, the features are Sequential case, the features are 22 coefficients coefficients– Non-sequential case, the features are Non-sequential case, the features are RS derived valuesRS derived values

Experiment Results (1)Experiment Results (1)• DatasetDataset

– 24-bit 24-bit color imagescolor images from public domains are used. We embedded from public domains are used. We embedded different length of messages into the images. We extract features different length of messages into the images. We extract features using different methods for sequential and non-sequential case. using different methods for sequential and non-sequential case. EEvery image will result an instance represented by a set of features very image will result an instance represented by a set of features in the dataset.in the dataset.

• ClassifiersClassifiers– We build our experiment platform on We build our experiment platform on WEKAWEKA– We choose Naive Bayes, Bayes Net, the J48 decision tree, kNN, SVWe choose Naive Bayes, Bayes Net, the J48 decision tree, kNN, SV

M (SMO algorithm with RBF kernel) and BP (Back Propagation) nM (SMO algorithm with RBF kernel) and BP (Back Propagation) neural network (here we use the default parameter indicated by Weural network (here we use the default parameter indicated by WEKA) to build classifiers for a benchmarkEKA) to build classifiers for a benchmark

• Other issuesOther issues– We use the entropy of H value of HSV color as the measure of imaWe use the entropy of H value of HSV color as the measure of ima

ge internal complexity, and divide the image set to ge internal complexity, and divide the image set to five complexitfive complexity levelsy levels and use and use four different embed ratesfour different embed rates as 10%, 20%, 50% and as 10%, 20%, 50% and 100%. We make data gathering by internal complexity and embed 100%. We make data gathering by internal complexity and embed rate to simulate the real situation. rate to simulate the real situation.

– This grouping operation is intend to investigate how will the data This grouping operation is intend to investigate how will the data properties influence the final resultproperties influence the final result

Experiment Results (2)Experiment Results (2)• We find that there are two cases in LSB embeddingWe find that there are two cases in LSB embedding• Sequential caseSequential case

– the bits are the bits are successively embeddedsuccessively embedded, so we can find “clusters” of , so we can find “clusters” of bits embedded, resulting in abrupt changes in the bits statistics, anbits embedded, resulting in abrupt changes in the bits statistics, and this makes the detection easier.d this makes the detection easier.

– We test wrapped methods on pov3 algorithm (a variation of We test wrapped methods on pov3 algorithm (a variation of 22), feat), features are based on ures are based on 22,so the threshold based-,so the threshold based-22 with threshold of 95% with threshold of 95% and 99% are used for comparison with the 10-fold cross validation rand 99% are used for comparison with the 10-fold cross validation results of ML methodsesults of ML methods

– We split the RGB LSB plane to 100 segments, every We split the RGB LSB plane to 100 segments, every 22 coefficient is t coefficient is the feature, we have three palnes, so we have 300 features for each ihe feature, we have three palnes, so we have 300 features for each imagemage

• Non-sequential caseNon-sequential case– Non-sequential case: the embedded bits are scattered randomly in tNon-sequential case: the embedded bits are scattered randomly in t

he datahe data– We test wrapped methods on the feature we derived from RS, and We test wrapped methods on the feature we derived from RS, and

use threshold RS for comparisonuse threshold RS for comparison– The features is derived from RS, we have 2 features for each imageThe features is derived from RS, we have 2 features for each image

( ) ( )( ) ( )( )

( ) ( )( ) ( )( )1 2

2 2 2 2{ , } { , }

min 2 , 2 min 2 , 2M M M M

M M M M

R p R p S p S pF F F

R p R p S p S p- -

- -

- -= =

r

Experiment Results (3)Experiment Results (3)

• Sequential caseSequential case– The results show that the The results show that the

precision decreases wheprecision decreases when the image complexity in the image complexity increases and increases wncreases and increases when the embed rate increhen the embed rate increases. ases.

– We can see that except fWe can see that except for Naïve Bayes and Bayeor Naïve Bayes and Bayes Net, other traditional ms Net, other traditional methods like kNN, J48 and ethods like kNN, J48 and SVM can get a same or bSVM can get a same or better accuracy than simpetter accuracy than simple le 22 method. method.


• Sequential case:Sequential case:– Because most of the pictures are in high complexity level 2-Because most of the pictures are in high complexity level 2-

4, so 4, so ML-based methods are generally performs better than ML-based methods are generally performs better than simple simple 22 . We can make conclusion that applying machine . We can make conclusion that applying machine learning to learning to 22 can effectively improve the accuracy, and the can effectively improve the accuracy, and the classifier wrapped conventional steganalysis maybe a good classifier wrapped conventional steganalysis maybe a good solution to detect sequential LSB steganography.solution to detect sequential LSB steganography.


• Non-sequential case:Non-sequential case:– From table, we can see that J48 performs best in mixed embed rate From table, we can see that J48 performs best in mixed embed rate

case, and can get nearly 95% accuracy at all embed levels. We use case, and can get nearly 95% accuracy at all embed levels. We use only two features, this result is comparable to only two features, this result is comparable to 22 case in sequential case in sequential embedding and is better than threshold based RS can do.embedding and is better than threshold based RS can do.

– The best precision of threshold based SPA methods is 93.20%, and The best precision of threshold based SPA methods is 93.20%, and the precision of our J48-based method is 94.44%.the precision of our J48-based method is 94.44%.

Precision (RMS) Embed 0.1 Embed 0.2 Embed 0.5 Embed 1.0 Embed All Mixed

NaÄ³ve Bayes 50.90%(0.59) 53.80%(0.57) 54.90%(0.45) 94.56%(0.31) 80.48%(0.37)

Bayes Net 89.21%(0.27) 95.85%(0.17) 99.35%(0.07) 99.45%(0.07) 95.44%(0.18)

kNN 92.16%(0.28) 97.55%(0.16) 99.45%(0.07) 99.70%(0.05) 96.38%(0.19)

J4894.11%(0.27

)98.05%(0.14

)99.40%(0.08

)99.65%(0.06

)97.56%(0.14)

SMO 59.39%(0.64) 75.32%(0.50) 93.21%(0.26) 96.70%(0.18) 80.90%(0.44)

BP 53.10%(0.50) 54.10%(0.50) 52.70%(0.50) 56.80%(0.50) 80.00%(0.40)

Threshold RS 95.30% 98.75% 99.70% 87.11% 97.38%

Conclusions and Future Conclusions and Future WorkWork

• In this paper, we have developed a general framewoIn this paper, we have developed a general framework of applying machine learning to steganalysis for rk of applying machine learning to steganalysis for LSB hidden information detection. We justify its suLSB hidden information detection. We justify its superiority by experiments in both sequential LSB and periority by experiments in both sequential LSB and non-sequential LSB case.non-sequential LSB case.

• Possible future worksPossible future works– the seeking of more theoretical explanation for the effectivthe seeking of more theoretical explanation for the effectiv

eness of our framework.eness of our framework.– to use feature selection and nonlinear mapping to constructo use feature selection and nonlinear mapping to construc

t more effective features.t more effective features.– to extend our framework to non-LSB steganalysis (especiallto extend our framework to non-LSB steganalysis (especiall

y for JPEG steganalysis).y for JPEG steganalysis).– more effective learning techniques like cost-sensitive learnmore effective learning techniques like cost-sensitive learn

ing and class-imbalance learning will be incorporated in oing and class-imbalance learning will be incorporated in our framework for more effective classifiers.ur framework for more effective classifiers.

– Anyway, applying machine learning to steganalysis needs fAnyway, applying machine learning to steganalysis needs further discussion and more research.urther discussion and more research.

Main ReferencesMain References• A. Westfeld and A. Pfitzmann. Attacks on steganographic systeA. Westfeld and A. Pfitzmann. Attacks on steganographic syste

ms. In Information Hiding Third International Workshop IH'9ms. In Information Hiding Third International Workshop IH'99 Proceedings, Lecture Notes in Computer Science vol. 1768, p9 Proceedings, Lecture Notes in Computer Science vol. 1768, pages 61-76, 1999.ages 61-76, 1999.

• J. J. Fridrich and M. Goljan. Practical steganalysis of digital imJ. J. Fridrich and M. Goljan. Practical steganalysis of digital images { state of the art. In Security and Watermarking of Multiages { state of the art. In Security and Watermarking of Multimedia Contents, SPIE vol. 4675, pages 1-13, 2002.media Contents, SPIE vol. 4675, pages 1-13, 2002.

• S. Dumitrescu, X. Wu, and Z. Wang. Detection of lsb steganogrS. Dumitrescu, X. Wu, and Z. Wang. Detection of lsb steganography via sample pair analysis. In Information Hiding 5th Interaphy via sample pair analysis. In Information Hiding 5th International Workshop IH 2002 Revised Papers, Lecture Notes in national Workshop IH 2002 Revised Papers, Lecture Notes in Computer Science vol. 2578, pages 355-372, 2003.Computer Science vol. 2578, pages 355-372, 2003.

• A. D. Ker. Improved detection of lsb steganography in grayscaA. D. Ker. Improved detection of lsb steganography in grayscale images. In Information Hiding 6le images. In Information Hiding 6thth International Workshop, International Workshop, IH 2004 Revised Selected Papers, Lecture Notes in Computer SIH 2004 Revised Selected Papers, Lecture Notes in Computer Science vol. 3200, pages 97-115, 2004cience vol. 3200, pages 97-115, 2004

Thank Thank you!you!

Least Significant Bit Steganography Detection with Machine Learning Techniques Shen Ge 1, Yang Gao 1, Ruili Wang 2 1 State Key Laboratory for Novel Software.

Documents

steganography detection

conventional lsb steganalysis

steganalysis methods

popular steganography

length of hidden message

new methods

steganalysis algorithms

specific steganalysis