Review of tools & techniques for steganalysis

A Project Report

On

“Review of tools & techniques for steganalysis”SUBMITTED TO ASIAN SCHOOL OF CYBER LAW, PUNE,

In partial fulfillment for the award of the degree of

CYBER FORENSICS ANALYST

Submitted by

SACHIN LAWANDE

Under the guidance of

Prof. ROHAS NAGPAL

ASIAN SCHOOL OF CYBER LAW

Department of Cyber Forensics2015-2016

ASCL

Acknowledgement

Every ASCL student looks toward the CFA project as an opportunity by which he can implement the skill that he has eventually nurtured in the year by hard work dedication the milestone of completing the project would have been intractable without the help of few people who need to be acknowledge.

We owe this moment of satisfactions with a dear sense gratitude to our internal guide Prof. Rohas Nagpal who guided us at every stage. Whose technical support and helpful attitude give us high moral support.

We also take this opportunity to thank all our colleagues who baked our interest by giving useful suggestions and also possible help. At last but not least we are thankful to our friend colleagues and all the people directly or indirectly concerned with this project.

Sachin LawandeCFA (ASCL)

Pune

I

ABSTRACT

Steganography deals with confidentiality and convert communication and today the

techniques for countering this in the context of computer forensics has somewhat fallen

behind. This Report will discuss on how steganography works in data hiding and different

methods and techniques to investigation that data. While this paper is about recovering

encoded data, tools that are used for both steganography and steganalysis, Techniques that

shows hidden data form. These methods will help forensics analyst to get hidden data. We

need to keep stay one step ahead of cyber criminals.

II

ContentsACKNOWLEDGEMENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I

ABBREVIATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .II

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II

1 INTRODUCTION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 Basic concept related project. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2.1 What is Steganography? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8

1.2.2 What is Steganalysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ……….. . . . . . . . . . . . . . . . . . . . . . . . 8

2 Report Perception and study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1 Problem of Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Objective. . . . . . . . . . . . . . . … . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Techniques of Steganalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.1 Signature Steganalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.2 BMP-LSB Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . 12

3.3 Specific statistical analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13

3.4 Text Based Steganalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.5 Audio Based Steganalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.6 Video Based Steganalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4 Tools for Steganalysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . …. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 23

4.1 Stegdetect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . …. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.2 DIIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . …. . . . . . .. . . . . . . . . .. . .. . . . . .. .. . . . . . . . .24

4.3 Stegspy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . …. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.4 VSL Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . …. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.5 Ben 4-D Steganalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . …. . . 29

CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . …. . . . . . . 31

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . …. . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

III

Review of tools & techniques for steganalysis

Chapter 1

INTRODUCTION

1.1 Introduction

Rapidly growing computer and networking technology coupled with an affected expansion in

communications and information exchange capability within government, public and private

corporations and our own homes has made our world smaller. As a society, we are substantially

more participated in information technologies than ever before. Use of the Internet and

multimedia for communication have become mutual place and have become an integral part of

both business activity and social activity. This has changed how citizens across the world

operate.

The rapid evolution of the Internet and technology has also been somewhat of a “double-edged

sword.” Not only has it delivered a medium for exchanging vast amounts of information and

knowledge for the benefit of men it has also provided a new medium for conducting activities

harmful to mankind. No longer restricted to the bounds of physical space, criminals, terrorists,

have discovered a digital world where they can take benefit of the vast expanse of cyber space to

conceal their activities from the prying eyes of law enforcement and the intelligence. In the pre-

Internet era, criminals often operated under the clock of darkness. Now they operate all time

under the cloak of cyber space” with little concern for being detected, arrested, prosecuted and

convicted because by and large much criminal action goes un-reported. Even when it is reported,

law enforcement is already so overcome with CP investigations they don’t have the time or

assets to investigate other cyber-crimes. This fact is not lost on those people who would use the

Internet for criminal activity or otherwise evil purposes.

ASCL 2015 [CFA] Page 4


To make matters poorer, criminals are adapting to evolving law enforcement technologies in the

field of cyber forensics by finding new ways to hide their criminal and illegal activities. Law

enforcement forensic experts are start to discover data hiding applications on detained media that

have been used to avoid detection by popular computer cyber forensic tools by hiding a any

digital file inside of another digital multimedia file. This method is called digital steganography.

The rising opportunities of modern communications need the special means of security

specially on computer network. The network data security is becoming more important as the

number of data being swapped on the internet increases. Therefore, the confidentiality and data

integrity are needs to protect against unlawful access and use. This has caused in an explosive

growth of the field of information hiding.

In watermarking applications, the message contains information or data such as owner id and a

digital time stamp, which usually applied for copyright protection.

Fingerprint, the owner of the data set implants a serial number that uniquely identifies the user

of the data set. This adds to copyright information to makes it thinkable to trace any

unauthorized use of the data set back to the user.

1.2 Basic Concepts related to Project

1.2.1 What is Steganography?

Steganography is the art of hiding private or sensitive information within something that

appears to be nothing out to the usual. Steganography is often confused with cryptography

because the two are similar in the way that they both methods are used to guard important

information. The difference between two is that steganography includes hiding information so it

appears that no information is hidden at all. If a person or persons views the object that the

information is hidden inside the digital media of he or she will have no idea that there is any

hidden information, therefore the person will not attempt to decode the infoemation.

What steganography basically does is exploit human perception, human senses are not trained

to look for files that have data inside of them, although this software is available that can do what



is called Steganography. The most common use of steganography is to hide a file inside another

digital media file.

1.2.2 What is Steganalysis?

Steganalysis is the art of detecting messages hidden using steganography method; this is

equivalent to cryptanalysis applied to cryptography.

The goal of steganalysis is to identify supposed packages, determine whether or not they have a data encrypted into them, and, if possible, recover that data.

Unlike cryptanalysis, where it is obvious that captured data contains a message (though that message is encrypted), steganalysis generally starts with a pile of suspect data files, but little information about which of the files, if any, contain a payload. The steganalyst is usually something of a forensic statistician, and must start by reducing this set of data files (which is often quite large; in many cases, it may be the entire set of files on a computer) to the subset most likely to have been altered.

Detecting Steganography:

The art of detecting Steganography is referred to as Steganalysis.

To put is simply Steganalysis involves identifying the use of Steganography inside of a

file. Steganalysis does not deal with trying to decode the hidden data inside of a file, just

discovering it.

There are many methods and techniques that can be used to detect Steganography such as:

Viewing the file and comparing it to another replica of the file found on the Internet

(Picture file). There are usually multiple replica of images on the internet, so you may want to

look for several of them and try and match the victim file to them. For example if you download

a JPEG and your suspect file is also a JPEG and the two files look almost same apart from the



fact that one is larger than the other, it is most probable you suspect file has hidden data inside

of it.

1.2.3 Short-time Fourier transform (STFT)? Short-term Fourier transform, is a Fourier-related transform used to determine the sinusoidal frequency and phase content of local sections of a signal as it changes over time. In practice, the procedure for computing STFTs is to divide a longer time signal into shorter segments of equal length and then compute the Fourier transform separately on each shorter segment. This reveals the Fourier spectrum on each shorter segment. One then usually plots the changing spectra as a function of time.

Fig 1.1 An STFT being used to analyze an audio signal across time

Application-

fundamental frequency estimation from spectral peaks cross-synthesis spectral envelope extraction by cepstral smoothing spectral envelope extraction by linear prediction sinusoidal modeling of audio signals sines+noise modeling sines+noise+transients modeling chirplet modeling time-scale modification frequency scaling FFT filter banks


http://www.dsprelated.com/dspbooks/filters/What_Filter.html

http://www.dsprelated.com/dspbooks/mdft/Fast_Fourier_Transform_FFT.html

http://www.dsprelated.com/dspbooks/sasp/Time_Scale_Modification.html

http://www.dsprelated.com/dspbooks/sasp/Gaussian_Windowed_Chirps_Chirplets.html

http://en.wikipedia.org/wiki/Transient_(acoustics)

http://en.wikipedia.org/wiki/White_noise

http://www.dsprelated.com/dspbooks/sasp/What_Noise.html

http://www.dsprelated.com/dspbooks/filters/Definition_Signal.html

http://www.dsprelated.com/dspbooks/sasp/Sinusoidal_Modeling_Sound.html

http://en.wikipedia.org/wiki/Linear_prediction

http://en.wikipedia.org/wiki/ADSR_envelope

http://www.glenbrook.k12.il.us/gbssci/phys/Class/sound/u11l4d.html

https://en.wikipedia.org/wiki/List_of_Fourier-related_transforms


1.2.4 Linear Regression? Linear regression is the most basic and commonly used predictive analysis. Regression estimates are used to describe data and to explain the relationship between one dependent variable and one or more independent variables.

At the center of the regression analysis is the task of fitting a single line through a scatter plot. The simplest form with one dependent and one independent variable is defined by the formula y = c + b*x, where y = estimated dependent, c = constant, b = regression coefficients, and x = independent variable.

Fig 1.2.4 simple linear regression, which has one independent variable

Application – (1) Causal analysis(2) Forecasting an effect(3) Trend forecasting

Other than correlation analysis, which focuses on the strength of the relationship between two or more variables, regression analysis assumes a dependence or causal relationship between one or more independent and one dependent variable


http://www.statisticssolutions.com/data-analysis-plan-spearman-rank-correlation/


1.2.5 Support Vector Machine?

A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new examples.

Application-

SVMs can be used to solve various real world problems:

SVMs are helpful in text and hypertext categorization as their application can significantly reduce the need for labeled training instances in both the standard inductive and transductive settings.

Classification of images can also be performed using SVMs. Experimental results show that SVMs achieve significantly higher search accuracy than traditional query refinement schemes after just three to four rounds of relevance feedback

SVMs are also useful in medical science to classify proteins with up to 90% of the compounds classified correctly.

Hand-written characters can be recognized using SVM.



Chapter 2

REPORT PERCEPTION AND STUDY

2.1 Problem of Definition

Steganography, “covered writing,” is a means of hidden communication that covers a multiplicity of

techniques used to embed data within a cover middle in such a manner that the very existence of the

encoded information is unseen.

Thousands of steganography applications are readily available on the Internet, and most of those are

available as free or shareware, for use by hackers and criminals. Computer security, law enforcement, and

intelligence experts need the ability to both spot the use of digital steganography applications to secure

information and then extract the hidden information. Accordingly, there is much current attention in

steganalysis, or the discovery and extraction of information hidden with digital steganography

applications.

2.2 Objectives This report will emphasis on steganography of graphical or image files. It will describe some

technical features of steganography used by various tools that are specific to certain types of

image files. Following that, steganalysis techniques used to detect the existence of hidden data

from the forensic analyst's point of view will be discussed. Finally, the limitations in steganalysis

will be presented along with the evaluation of some steganalysis tools.



Chapter 3

TECHNIQUES FOR STEGANALYSIS

3.1 Signature Steganalysis

Steganography methods hide secret data and manipulate the images and other digital image in

ways as to remain invisible to human eye. But hiding data within any digital media using

steganography requires modifications of the media properties that may cause some form of

degradation or unusual features and patterns. These patterns and features may act as signatures

that broadcast the presence of encoded message In signature based attacks are accepted to detect

the existence of hidden messages. It is reported that Jpeg, a data insert steganography stool,

inserts the secret data at the end of JPEG files marker and adds a secure signature of the

program before the secret data. The signature is the following hex code: 5B 3B 31 53 00. The

existence of this signature automatically implies that the image contains a secret data embedded

essentially using Jpeg.

Example

Fig.3.1.1 Image with Hidden Data



Img.3.1.2 Hex editor view of Image Signature denoted by the red box in the diagram below

These particular steganography applications also encode additional data used for decoding the

secreted information. This data excludes signature bytes that the steganography application uses

to find that the hidden information was encoded by itself (indicated by the red box in the

diagram below), and a hash value representation of the user’s define password (indicated by the

green box).



3.2 BMP – The Least Significant Bit Technique

A commonly used steganography technique that can be realistic to BMP graphic files is the Least

Significant Bit (LSB) method. As its name shows, the LSB method changes the least significant

bit in the data bytes of the image to encode the unseen data. These bit changes do not cause main

quality lack in the image, mainly for 24-bit BMP files. Sometimes, a steganography can use the

least two significant bits in the bytes to encode the hidden data.

Detection

To illustrate the LSB steganographic method, consider the given BMP image: house.bmpThe LSB steganography method encrypts messages in the LSB of every byte in an image. By doing so, the value of each pixel is altered slightly, but not enough to make major visual changes to the image, even when compared to the original. Same the original carrier file with the same file that has been used by the LSB method in a hex editor shows a modification in some byte values. Notice in the figure 1.4 and 1.5 below that the emphasized byte values differ in value by one.

Example

Img.3.2.1 Data.bmp



Hex editor comparison

Img.3.2.2 Data.jpg (without steganography)

Img.3.2.3 Data.jpg (with steganography)

Mining hidden information that has been embedded using the LSB method involves defining

the number of bits used for encoding. After mining the encoding bits, they must be reassembled

to create the hidden information. Some steganography applications work various randomization

techniques for reassembling the encoded bits. For honest embedding, simply reconstruct eight

bits into each byte of the hidden data.



3.3 Specific Statistical steganalysis

Steganography embeds top-secret messages in images; this causes modifications in the statistics

of an image. Statistical steganalysis, as the name implies, examines this underlying statistics of

an image to detect the secret embedded data. Statistical steganalysis is measured powerful than

signature steganalysis because mathematical methods are more sensitive than visual awareness.

Specific statistical steganalysis can be categorized based on data hiding techniques i.e in spatial

domain and transform domain.

i) Spatial domain steganalysis a) Chisquare Attack

The first ever statistical steganalysis was projected by Westfeld and Pfi. This method is specific

to LSB embedding and is based on influential first order statistical analysis rather than visual

review. The technique identifies Pairs of Values (POVs) which involve of pixel values,

quantized DCT coefficients or palette catalogs that get mapped to one another on LSB flipping.

After message embedding, the total number of incident of two members of certain POV remains

same. This concept of pair wise needs is exploited to design a statistical Chi-square test to detect

the hidden messages. The reported results show that this method constantly detects sequentially

embedded messages. Later, the method was general to detect randomly scattered messages.

b) RQP (Raw Quick Pair)

Another specific steganalysis technique for detecting LSB embedding in 24-bit color images—

the Raw Quick Pair (RQP) method is offered by Fridrich. The method is based on analyzing

close pairs of colors formed by LSB embedding. It has been shown that the ratio of close color

to the total number of unique color rises significantly when a message of a selected length is

embedded in a cover image relatively than in a stego image. It is this difference that enables to

discriminate between cover images and stego images for the case of LSB steganography. The

method works constantly well as long as the number of unique color in the cover image is less



than 30% of the number of pixels. As reported the method has higher detection rate than the

technique given by Westfeld and Pfitzmann but cannot be useful to grayscale images.

c) RS Steganalysis

A more sophisticated method RS steganalysis (Regular & Singular group) is offered by Fridrich

et al for recognition of LSB embedding in color and grayscale images. This method utilizes

sensitive dual statistics resultant from spatial correlations in images. The image is separated into

disjoint groups of fixed shape. Within each group noise is calculated by the mean absolute value

of the alterations between adjacent pixels. Each group is classified as ―regular or ―singular

reliant on whether the pixel noise within the group is improved or after flipping the LSBs of a

fixed set of pixels within each group using a ―mask‖. The ordering is repeated for a dual type

of flipping. Theoretical analysis and experimentation show that the amount of regular and

singular group’s forms curves quadratic in the amount of information embedded by the LSB

method. RS steganalysis is more consistent than Chi-square method .

d) B-Spline fitting

Shunquan Tan and Bin Li claims that there is no targeted steganalysis contrary to EALSBMR.

They proposed that B-Spline can be used to fit the histogram to eliminate the pulse distortion

caused by settling phase of EALSBMR. This method can correctly estimate the threshold used

in the secret data embedding procedure and divide the stego images with unit block size from

those with block sizes greater than 1.

ii) Transform domain steganalysis

a) Chi-square statistics

Zhang and Ping have suggested an attack on sequential JSteg and random JSteg for JPEG

images. The method is based on the statistical model of DCT coefficients. It is experiential

that the quantized DCT coefficients of JPEG images allocate symmetrically around zero in



clean images. These deliverys are changed owing to the message embedding; sequential or

random. Chi-square statistics of stego image are considered and an inequality equation is

used to judge the occurrence of hidden message. The embedding ratio is also deliberate.

The technique is simple and very operative.

b) Histogram Analysis Attack

Histograms analysis attack works on JPEG successive and pseudo-random embedding type

stegosystems, such as JSteg and Outguess 0.1. It can effectively approximation the length

of the message embedded and it is based on the harm of histogram symmetry after

embedding. X. Yu et al proposed a commanding steganalysis method specific for JSteg

steganography in JPEG file format. In this technique the cover image histogram of DCT

constants is assessed from the stego image histogram. This estimation is more correct than

Fridrichs cropping method.

c) Calibration Technique

Fridrich offers a feature-based steganalytic method which is joint with the concept of

calibration for JPEG images. First and second order features are analyzed both in DCT and

spatial domain like comprehensive DCT coefficient histogram, dual histograms, blockiness,

co-occurrence matrix. In order to evaluation the cover image we take into account how

JPEG works. Based on the fact that JPEG images have a block structure of 8x8 blocks and

are formed by quantized DCT constants, which tend to be robust to small distortions such

as density and embedding, we can estimate the cover image. Thus, by decompressing and

recompressing an image with changed block structure we can estimate the cover image.

This is done by using the following calibration methods on the stego image.

Decompress the stego image using its quantization table.

Crop the decompressed stego image by 4 pixels, either column-wise or row-wise or at the

edges.

Compress the cropped image using the same quantization table.



3.4 Text Based steganalysis

The usage of text media, as a protection channel for secret communication, has drawn more

attention. This attention in turn creates growing concerns on text steganalysis. At present, it is

harder to find secret messages in texts associated with other types of multimedia files, such as

image, video and audio. In general, text steganalysis feats the fact that embedding information

usually changes some statistical possessions of stego texts; therefore it is vital to perceive the

modifications of stego texts. Previous work on text steganalysis could be unevenly classified into

three classes: format- based , invisible character-based and linguistics, separately. Different from

the former two categories, linguistic steganalysis attempts to detect secret messages in natural

language texts. In the case of linguistic steganography, lexical, syntactic, or semantic things of

texts are operated to conceal information while their meanings are conserved as much as

possible. Due to the diversity of syntax and the polysemia of semantics in natural language, it is

hard to observe the changes in stego texts. So far, many linguistic steganalysis methods have

been proposed. In these methods, special structures are designed to extend semantic or

syntactical changes of stego texts. For example, Z.L. Chen et al. designed the N-window mutual

information matrix as the recognition feature to detect semantic steganography algorithms.

Furthermore, they used the word entropy and the change of the word location as the semantic

features, which enhanced the detection rates of their methods. Similarly, C.M. Taskiran et al

used the probabilistic context-free grammar to design the unique structures in order to attack on

syntax steganography algorithms. In the work declared above, designed features powerfully

affect the final presentations and they can merely disclose local properties of texts. Accordingly,

when the size of a text is large enough, alterations between Natural texts (NTs) and Stego texts

(STs) are evident, thus the detection acts of the mentioned methods are acceptable. Whereas,

when the sizes of texts become small, the detection rates decrease vividly and can not be

satisfied for applications. In addition, some steganographic tools have been improved in the

features of semantic and syntax for better camouflage. Therefore, linguistic steganalysis still

needs further research to determination these problems. Some more work on Text Steganalysis

has been deliberated below.



A. Linguistic Steganalysis Based on Meta Features and Immune Mechanism

Linguistic steganalysis depends on effective discovery structures due to the diversity of syntax

and the polysemia of semantics in ordinary language processing. This paper presents a novel

linguistics steganalysis way based on meta types and immune clone mechanism. Firstly, meta

functions are used to signify texts. Then resistant clone tool is exploited to select appropriate

features so as to found effective detectors. Our approach employed meta forms as detection

features, which is an reverse view from the past literatures. Moreover, the immune training

process covers of two phases which can identify individually two kinds of stego texts. The

constituted detectors have the talented of blind steganalysis to a certain extent. Experiments

show that the proposed approach gets improved performance than typical existing methods,

especially in detecting short texts. When sizes of texts are kept to 3kB, detection accuracies have

exceeded 95.

B. Research on Steganalysis for Text Steganography Based on Font Format

In the study area of text steganography, algorithms based on font format have benefits of great

capacity, good imperceptibility and wide use range. However, little work on steganalysis for

such algorithms has been stated in the literature. Based on the fact that the statistic functions of

font format will be changed after using font-format-based steganographic algorithms, we extant

a novel Support Vector Machine-based steganalysis algorithm to detect whether hidden

information exists or not. This algorithm can not only efficiently detect the existence of hidden

information, but also guess the hidden information length according to differences of font

attribute value. As shown by experimental results, the finding accuracy of our algorithm reaches

as high as 99.3 percent when the hidden data length is at least 16 bits.



3.5 Audio steganalysis Algorithms

Audio steganalysis is very problematic due to the existence of advanced audio steganography

schemes and the very environment of audio signals to be high-capacity data streams necessitates

the need for scientifically stimulating statistical analysis.

A. Phase and Echo Steganalysis

Zeng et. al proposed steganalysis algorithms to detect phase coding steganography based on the

analysis of phase discontinuities and to detect echo steganography based on the statistical

moments of peak frequency. The phase steganalysis algorithm explores the fact that phase

coding corrupts the extrinsic continuities of unwrapped phase in each audio segment, causing

changes in the phase difference. A statistical analysis of the phase difference in each audio

segment can be used to monitor the change and train the classifiers to differentiate an embedded

audio signal from a clean audio signal. The echo steganalysis algorithm statistically analyzes the

peak frequency using short window extracting and then calculates the eighth high order center

moments of peak frequency as feature vectors that are fed to a support vector machine, which is

used as a classifier to distinguish between audio signals with and without data.

B. Universal Steganalysis based on Recorded Speech

Johnson et. al projected a generic universal steganalysis algorithm that bases it study on the

statistical regularities of recorded speech. Their statistical model decomposes an audio signal

(i.e., recorded speech) using basis functions localized in both time and frequency areas in the

form of Short Time Fourier Transform (STFT). The spectrograms composed from this

decomposition are examined using non-linear support vector machines to differentiate between

cover and stego audio signals. This approach is probable to work only for high-bit rate audio

steganography and will not be operative for detecting low bit-rate embedding’s.



C. Use of Statistical Distance Measures for Audio Steganalysis

H. Ozer et. al measured the distribution of various statistical distance measures on cover audio

signals and stego audio signals vis--vis their types without noise and observed them to be

statistically dissimilar. The authors employed audio excellence metrics to capture the anomalies

in the signal introduced by the embedded data. They designed an audio steganalysis that relied

on the choice of audio excellence measures, which were tested reliant on their perceptual or non-

perceptual nature. The selection of the proper features and quality measures was shown using

the

(i) ANOVA test to determine whether there are any statistically significant alterations between

available conditions and the

(ii) SFS (Sequential Floating Search) algorithm that reflects the inter-correlation between the test

features in ensemble.

Subsequently, two classifiers, one based on linear regression and other based on support vector

machines were used and also concurrently evaluated for their capability to detect stego messages

embedded in the audio signals. The features selected using the SFS test and estimated using the

support vector machines produced the best outcome. The perceptual- domain measures measured

in are: Bark Spectral Distortion, Modified Bark Spectral Distortion, Enhanced Modified Bark

Spectral Distortion, Perceptual Speech Quality Measure and Perceptual Audio Quality Measure.

The non-perceptual time-domain measured are: Signal-to-Noise Ratio, Segmental Signal-to-

Noise Ratio and Czenakowski Distance. The non-perceptual frequency-domain measures

considered are: Log-Likelihood Ratio, Log-Area Ratio, Itakura- Satio Distance, Cepstral

Distance, Short Time Fourier Random Alter Distance, Spectral Phase Distortion and Spectral

Phase Magnitude Distortion.

D. Audio Steganalysis based on Hausdorff Distance

The audio steganalysis algorithm projected by Liu et. al uses the Hausdorff distance measure to

measure the distortion among a cover audio signal and a stego audio signal. The algorithm takes

as input a possibly stego audio signal x and its de-noised version x as an estimation of the cover

signal. Both x and x are then subjected to suitable segmentation and wavelet decomposition to



generate wavelet coefficients at dissimilar levels of resolution. The Haus- dorff distance values

between the wavelet coefficients of the audio signals and their de-noised versions are measured.

The statistical moments of the Hausdorff distance measures are used to train a classifier on the

change between cover audio signals and stego audio signals with different content loadings.

However, the above approach of creating a reference signal via its own de-noised version

causes content-dependent distortion. This can lead to a situation where the differences in the

signal content itself can eclipse the classifier from detecting the distortions induced during data

hiding. In, Avcibas future an audio steganalysis technique based on content- independent

distortion measures. The technique uses a single orientation signal that is common to all the

signals to be tested.



3.6 Video steganalysis Methodology

A. Video Steganalysis Exploring the Temporal Correlation between Frames

Budia et. al suggested a technique for video steganalysis by using the redundant information

present in the temporal domain as a preventive against secret messages embedded by spread

spectrum steganography. Their study, founded on linear collusion approaches, is positive in

identifying hidden watermarks bearing low energy with good precision. The simulation results

also prove the benefit of the temporal- based methods over purely spatial methods in detecting

the secret message.

B. Video Steganalysis based on Asymptotic Relative Efficiency (ARE)

Jainsky et. Al proposed a video steganalysis algorithm that incorporates asymptotic relative

efficiency based detection. This algorithm is more appropriate for applications in which only a

subset of the video frames are watermarked with the secret message and not all of them. The

stego video signal is assumed to contain of a sequence of correlated image frames and obeys a

Gauss-Markov temporal correlation model. Steganalysis contains of a signal processing phase

followed by the detection phase. The signal processing phases highlights the presence of hidden

information in the sequence of frames using a motion estimation scheme. The detection phase is

based on asymptotic relative efficiency (ARE), wherein both the cover-video and the

watermarked secret message are measured to be random variables. The ARE-based detector is

memory less in nature and uses an adaptive threshold for the video characteristics that are used to

differentiate a cover- video from a stego-video. The video characteristics (e.g. size, standard

deviation and correlation coefficient) measured are those that vary from one sequence of frames

to another. The number of frames in a order to be analyzed at each passing into the detector was

also measured as a parameter for detection.

C. Video Steganalysis based on Mode Detection

Su et. al propose a video steganalysis algorithm that aims the Moscow State University (MSU)

stego video software, which is one of the very few existing video steganographic tools that can



embed any file in AVI (Audio Video Interleave) format and the embedded messages can be

mined correctly even after the stego-videos are compressed. The steganalysis algorithm uses the

correlation among adjacent frames and detects a special distribution mode across the frames. The

embedding unit is a 32 x 32 pixel block and the four 16 x 16 blocks within a unit form a

chessboard- like distribution pattern. After correlation study between adjacent frames, if the ratio

of number of 32 x 32 pixel blocks with a specific distribution mode to the total number of 32 x

32 pixel blocks in a video sequence is determined to be above a threshold value, then the video

signal is forecast to carry an embedded message.

D. Video Steganalysis based on Spatial and Temporal Prediction

Pankajakshan and Ho suggest a video steganalysis scheme for the MPEG video coding standard

in which a given frame is predicted from its neighboring locus frames using motion

compensation. The MPEG coding system supports two types of predicted frames: the Pframes

(uses a single past frame as the reference frame) and the B-frames (uses a past frame and a future

frame as reference frames). The prediction-error frames (PEFs) corresponding to the Pand B-

frames are then coded using transform coding method. The PEFs exhibit spatiotemporal

correlation among the adjacent frames. The PEFs of a test video signal are decomposed using the

3-level DWT (Discrete Wavelet Transform) method and the first three instants of the

characteristic functions (CFs) in each of the sub-bands are computed. The resulting feature

vectors are fed to train a pattern classifier to distinguish between the stego and non-stego videos.



Chapter 4

TOOLS FOR STEGANALYSIS

4.1 Stegdetect

Steganography Detection with Stegdetect

Stegdetect is a programmed tool for detecting steganography content in images. It is capable of

detecting several altered steganography methods to embed hidden information in JPEG images.

Currently the detectable schemes are

JSTEG,

JPhide (unix and windows),

Invisible secrets,

Outguess 01.3b,

F5 (header analysis),

AppendX and camouflage.

Stegbreak is used to promotion dictionary attacks against JSteg-Shell, JPHide, and Out Guess

0.13b.

Stegdetect and Stegbreak have been created by Niels Provos.

Stegdetect 0.6 supports linear discriminant analysis. Given a set of normal images and a set of

images that carry hidden content by a new steganographic application, Stegdetect can

robotically determine a linear detection purpose that can be applied to yet unclassified images.

Linear discriminant analysis computes a separating hyper plane that separates the no-stego

images from the stego images. The hyper plane is considered as a linear function. The learned

function can be saved for later use on new images.


http://www.citi.umich.edu/u/provos/


Stegdetect supports several diverse feature vectors and automatically computes receiver

operating characteristic which can be used to estimate the quality of the automatically learned

detection function.

Example

Fig- 4.1.1 - Sample Image



Img.4.1.2 Stegdetect Stego Image Snapshot

4.2 Digital Invisible Ink Toolkit (DIIT)

DIIT is providing as a JAR package from the website and Chi-Square is provided in EXE

Form compiled to run on Windows OS. However it was probable to use Wine to run the EXE

file in the Linux environment.

The following is a screenshot (figure 4.2) of the GUI of the tool. As it can be

seen, it also offers an option to enter a password for encryption. Like the previous processes, in

this process also, steganographic images were shaped with and without passwords as “cfa2.bmp”

and “diit_np_newbmp.bmp” respectively.

Example

Fig.4.2.1: DIIT Analysis Screenshot



Fig.4.2.2: DIIT Analysis Result Screenshot

After each process, the MD5 hash was created for original and the steganographic images. The

following figure (figure 5) shows the MD5 hashes for all the images difficult the Steghide,

Outguess, and DIIT steganographic processes.

f34cc0ae3fb2a1c9be2faa674a2812d0 cfa2.bmp

4c2a9fb3860b299460a4be912a806437 steghide_np_cfa2.bmp

cdac07608cdf45f1e62ab96086dc362e steghide_wp_cfa2.bmp

e7cd6d440badb0404db9e02f1c2dd9c6 outguess_np_cfa2.bmp

bbd68076246b513669e94180ee02ee5b outguess_wp_cfa2.bmp

9b190be2345100aebad2493e0d915522 newbmp.bmp

54b03c5c48e374f697abb2809c4d3222 steghide_np_newbmp.bmp

a463186d7cbc0bfc1f1af13f2117c016 steghide_wp_newbmp.bmp

01f4acd389266a01a07acba0153108a6 diit_np_newbmp.bmp

fc914686e06b46e5672a1fdaea72c235 diit_wp_newbmp.bmp

Table.4.2.3: MD5 hashes



4.3 StegSpy

StegSpy is a program always in growth. The latest version includes allows documentation of a

“steganized” file. StegSpy will notice steganography and the program used to hide the message.

The latest version also identifies the position of the hidden content as well. StegSpy currently

identifies the following programs

Hiderman

Masker

Invisible Secrets

JPEGx

StegSpy is a software tool designed to detect the incidence of data that has been hidden using steganography. Steganography is a Method used to embed hidden data within another file. The file containing the data, or carrier file, serves as an safe medium used to covertly transport the underlying data, or payload. When join, these two form the staged file. The process of detecting steganography is called steganalysis. StegSpy conducts steganalysis by locating specific hexadecimal byte patterns within the raw data of supposed staged files to determine if those files contain hidden.Example StegSpy’s main border consists of an Information window and a Run button, as

depicted in



Table.4.3.2: User Interface

Table.4.3.3: Steganalysis Result



after StegSpy inspects a file, the results of the analysis will appear in the Information window.

If StegSpy identifies the inspected file as a stegoed file, the program used to embed the hidden

data will be known in the Information window along with the offset location of the detected

signature within the stegoed file. If StegSpy recognizes the examined file as a clean file, the

path of the file will appear in the Information window along with the message “Sorry, no Steg

found.” Figures 4.3.2 and 4.3.3 depict the results of a positive analysis and a negative analysis

individually.



4.4 VSL application

A lot of applications dedicated to steganography are simply command line tools, which limits

their usage. Also, most of them tool only one technique - commonly some LSB tool variation.

Similar situation goes for steganalysis applications.

On the other hand, VSL provide easy to use, yet power full framework to use many methods at

the same time. Since VSL is a graphical block diagramming tool, it allows compound processing

that can be performed in both batch and parallel form (see screenshots). Also, it can be operated

even by moderately inexperienced users as it provides legible graphical user interface

(conforming with drag-and-drop technology).

Besides its GUI, application delivers several ready-to-use steganographic and steganalysis

techniques. Data can be unseen with basic Least Significant Bit (LSB) method, with more

advanced Karhunen-Loeve Transform (KLT) metod or by F5 algorithm, which uses DCT

transformation in JPEG files. For steganalysis two advanced tool can be used. First, RS-

Analysis: efficient steganalysis for LSB methods - and the second one - Binary Similarity

Measures (BSM) method with Support Vector Machines (SVMs) classifier: blind steganalysis

(universal) technique, which can be used to find any kind of steganography.

VSL covers also many other modules - several distortion techniques, which can be used to test

conflict of steganographic technique. Program has built-in modules, which helps with research,

reports, file handling, image analysis etc.

Free and open source software

Application is licensed under GNU GPLv3 license, which is involved within distribution

package. Anyone is free to use, accept and share this software free of charge, as long as the

license is not violated.



Platform independent

Virtual Steganographic Laboratory is coded in Java, so it is cross-platform software and it can be

performed on any operating system, which has Java (1.5 or later version is required).

Example

Fig 4.4.1 VSL Tool Snapshot







4.5 Ben-4D Steganalysis

Quick and correct identification of stego-carrier files from a crew of files. A generalisation of

the basic principles of Benford’s Law distribution is applied on the doubtful file in order to

decide whether the file is a stego-carrier

Features

Detects JPHSWin, LSB

Detects Invisible Secrets v4.0, Fuse

Detects Camouflage v1.2.1, Fuse

Uses JPEGSnoop, http://www.impulseadventure.com/photo/jpeg-snoop.html

Scan individual files, or folders

Generates report

Figure 4.5 Hit Rates Comparison among ‘Ben-4D’ and other tools.

Ben-4D is a tool to be used when images are saved through carving and where the metadata of the pictures in question may be missing or unreliable for whatever reason, e.g. examining partially recovered images for steganography.



Figure 4.5.2 Ben-4D’s hit rates and false positives (hidden data: 1Kb)

Before testing ‘Ben-4D’, stego-carrier files were separated into three different groups of five hundred original files each. For each group JPHSWin, Camouflage and Invisible Secrets were used to embed the minimum in size, file possible which was an ASCII txt file (1Kb).



Illustration

Example 1-

James was arrested for violating child pornography laws of the USA. He is using steganography technique to hide kiddie porn in greeting cards. He is sending though images using yahoo mail to his clients.

Image-

Encrypted Image with child pornography contents



How to use Steganalysis to decode data?

Solution-

Step 1 - Use Stegdetect tool to analyze Data is hidden inside image or not.

- Open tool- Open Christmas.png- Analyze

Figure 5.1 Stegdetect analyze of hidden data

The Figure 5.1 showing Hidden stego-image. That Means data is hidden inside image. Now we can move to our next step.



Step 2 - Use Digital Invisible Ink Toolkit (DIIT) to get encoded data.

- Run Digital invisible ink toolkit

- Select Decode tab

- Open Get Image and select encoded image Hidden.png

Figure 5.2- DIIT Open Encoded Image



Step 3 – Try Different steganography algorithms to decode data.

- Try all algorithms one by one- Click on ok

Figure 5.3 Select and try steganography algorithm one by one



Step 4 – Final step Save Hidden Data

- Select save location in your computer- Put file name and save

Figure 5.4 Save Hidden data file

Decoded Child pornographic content-



Example 2 -

The al Qaeda terrorists used the internet in public place and send messages via public e-mail. The secret communication about their activity often discusses using steganography. They are hiding their messages inside normal images.

Figure 5.5 Image with hidden communication data

Solution-

Step 1- Analyze the image using DIIT tool for stegnolysis.

- Open DIIT- Select Analysis tab- Click on stegnolysis- Go



Figure 5.6 Analysis of Image data is hidden or not

Step 2 – Star Decode Image

Figure 5.7 Select algorithm for Decode



Figure 5.8 Save Hidden File

Figure 5.8 Decoded Massage file



CONCLUSION

From the information that has been presented in this report, it would be hard to come to a firm

conclusion concerning the state of steganalysis tools. Since it is not a widespread research with

large amounts of data sets, it would be debatable if such a conclusion is made. However, it can

be said that steganalysis is not as straight forward or suitable as steganography. This translates to

a great deal of benefit for those who hide secrets using steganography. And a huge difficulty for

the forensic analysts, who has the challenge of detecting and recovering the hidden messages

without destroying it. Furthermore, it is also specious that steganalysis fails when such tools are

applied to detect steganographic techniques it wasn't intended to detect. It has also been observed

that, false positives are also likely when generic techniques are used to detect factors such as

casualness of LSB. Perhaps with more data and research, these tools can be enhanced to be more

real and accurate. As steganography techniques are easily available in different varieties for

anyone who propose to keep or communicate secrets, and with the emerging signs of its use in

various arenas, forensic analysts face new tests in their investigations. Criminals would indeed

exploit every chance available to ensure the success of their plans. This could involve mass

circulation of terror plans over the Internet or even more covert means of transmitting and

storing banned content on portable storage devices.



REFERENCES

[1] N.F.Johnson, S.Jajodia, Traveling steganography: seeing the unseen, IEEE Computers,

Feb 1998, Page(s):26–34.

[2] Steganography software tools,

http://members.tripod.com/steganography/stego/software.html[Accessed on 12

Jan2013].

[3] A.Westfeld, F5-A steganographic algorithm: high capacity despite better steganalysis,

Proceedings of Fourth International Workshop on Information Hiding, April 2001, Page(s):

289–302.

[4] W.-N. Lie and L.-C. Chang, Data hiding in images with adaptive numbers of least

significant bits based on human system, in Proc., IEEE Int. Conf. Image Processing, 1999,

Page(s): 286–290.

[5] Y. K. Lee and L. H. Chen, High capacity image steganographic model, Proc. Inst. Elect.

Eng., Vis. Image Processing, vol. 147, no. 3, 2000, Page(s): 288–294.

[6] S. Katzenbeisser and F. A. P. Petitcolas, Information Hiding Techniques for

Steganography and Digital . Norwood, MA: Artech House, 2000.

[7] L. M. Marvel, C. G. Boncelet Jr., and C. T. Retter, Spread spectrum image steganography,

IEEE Trans. Image Process., vol. 8, no. 8, Aug. 1999, Page(s): 1075–1083.

[8] F. A. P. Petitcolas, R. J, and M. G. Kuhn, ―Information hiding—A survey, Proc. IEEE, vol. ‖87, no. 7, Jul.1999, Page(s) 1062–1078.

[9] N.F. Johnson, S. Jajodia, Steganalysis of images created using current steganography

software, in: Lecture Notes in

[10] R.Chandramouli, Li Grace, Nasir Memon, Adaptive steganography, in: Proc. SPIE,

Security of Multimedia Contents IV, San Jose, CA, vol. 4675, 2002, pp. 69–78.



I. GLOSSARY

Term Definition

least significant bit(LSB)

In computing, the least significant bit (LSB) is the bit position in a binary integer giving the units value, that is, determining whether the number is even or odd..

Most significant bit(MSB)

In computing, the most significant bit (MSB, also called the high-orderbit) is the bit position in a binary number having the greatest value. TheMSB is sometimes referred to as the left-most bit due to the convention in positional notation of writing more significant digits further to the left.

Semantic Business

Vocabulary and Rules

The SBVR defines the vocabulary and rules for documenting the semantics of business vocabularies, business facts, and business rules; as well as an XMI schema for the interchange of business vocabularies and business rules among organizations and between software tools

Class Model

A class diagram is a type of static structural diagram that describes the structure of a system by showing the real time entities in business, their attributes, operations (or methods), and the relationships among the classes.

Use case Model

Use case model or diagram also static structural diagram that represent the interaction between end user (Actor) to the system under consideration

Software Requirement Specification

A software requirements specification (SRS) is a complete description of the behavior of a system to be developed which includes all the necessary requirement for system development

XML Metadata

Interchange (XMI)

The XML Metadata Interchange (XMI) is standard for exchanging metadata information via Extensible Markup Language (XML). The most common use of XMI is as an interchange format for UML models, although it can also be used for serialization of models of other languages (Meta models).

MD5 The MD5 message-digest algorithm is a widely used cryptographic hash function producing a 128-bit (16-byte) hash value, typically expressed in text format as a 32 digit hexadecimal number. MD5 has been utilized



in a wide variety of cryptographic applications, and is also commonly used to verify data integrity.

II. ABREVATIONS

Acronym Definition

LSB least significant bit

MSB Most significant bit

RQP Raw Quick Pair

SBVR Semantic Business Vocabulary and Rules

SRS Software Requirement Specification

TC Test Cases

UML Unified Modeling Language

XMI XML Metadata Interchange

MD5 Message-Digest algorithm 5



III. LIST OF FIGURES

FIGURE NO. DESCRPTION THE FIGURE PAGE NO

1.1 Software Analysis Process 02

3.1 Image with Hidden Data 06

3.2Hex Editor view

07

3.3 Data.bmp 08

3.4 Data.bmp without steganography 09

3.5 Data.bmp with steganography 10

4.1 Stego Image Snapshot 20

4.2 DIIT Snapshot 21

4.3 Importing the Text File as Input to System 22

4.4 VSL Tool Snapshot 1 23

4.5 VSL Tool Snapshot 2 24



IV. LIST OF TABLES

FIGURE NO DESCRPTION OF TABLE PAGE NO.

4.1 MD5 HASH TABLE 21

4.2 Keywords and Phrases for Logical Formulations 24


Review of tools & techniques for steganalysis

Documents