Top Banner

of 14

Cryptography - Detecting Steganographic Content on the Internet

Feb 28, 2018

ReportDownload

Documents

kazim-shah

  • 7/25/2019 Cryptography - Detecting Steganographic Content on the Internet

    1/14

    CITI Technical Report 01-11

    Detecting Steganographic Content on the Internet

    Niels Provos Peter Honeyman

    {provos,honey}@citi.umich.edu

    Abstract

    Steganography is used to hide the occurrence of communication. Recent suggestions in US newspapersindicate that terrorists use steganography to communicate in secret with their accomplices. In particular,images on the Internet were mentioned as the communication medium. While the newspaper articlessounded very dire, none substantiated these rumors.

    To determine whether there is steganographic content on the Internet, this paper presents a detec-tion framework that includes tools to retrieve images from the world wide web and automatically detectwhether they might contain steganographic content. To ascertain that hidden messages exist in images, thedetection framework includes a distributed computing framework for launching dictionary attacks hostedon a cluster of loosely coupled workstations. We have analyzed two million images downloaded from eBayauctions but have not been able to find a single hidden message.

    August 31, 2001

    Center for Information Technology IntegrationUniversity of Michigan

    535 West William StreetAnn Arbor, MI 48103-4943

  • 7/25/2019 Cryptography - Detecting Steganographic Content on the Internet

    2/14

    .

  • 7/25/2019 Cryptography - Detecting Steganographic Content on the Internet

    3/14

    Detecting Steganographic Content on the Internet

    Niels Provos Peter Honeyman

    Center for Information Technology Integration

    University of Michigan

    1 Introduction

    Steganography is the art and science of hiding thefact that communication is taking place. Stegano-graphic systems can hide messages inside of images

    or other digital objects. To a casual observer in-specting these images, the messages are invisible.

    In February 2000, USA Today reported that ter-rorists are using steganography to hide their com-munication from law enforcement [4]. According tothem, messages are being hidden in images postedto Internet auction sides like eBay or Amazon. Thearticle lacked any technical information that wouldallow a reader to verify these claims. Nonetheless,the article was echoed by a number of other newssources.1

    To assess the claim that steganographic content is

    regularly posted to the Internet, we need a way todetect steganographic content in images automati-cally. This paper presents a steganography detec-tion framework that begins with a web crawler thatdownloads JPEG images from the Internet. Usingstatistical analysis, a subset of images likely to con-tain steganographic content is identified. The anal-ysis is statistical, i.e. there is no guarantee that anidentified image really contains a hidden message,so we also describe a distributed computing frame-work that launches a dictionary attack hosted on acluster of loosely-coupled workstations to reveal anyhidden content.

    We discuss the results from analyzing two millionimages downloaded from eBay auctions. So far wehave not been able to find a single message.

    The remainder of this paper is organized as follows.In Section2, we give a brief background of steganog-raphy in general. Section3 explains how to hide in-formation in JPEG [15] images. Section4 presents

    This research was supported in part by DARPA grantnumber F30602-99-1-0527.

    1Due to an editing error, we indicate that eBay and Ama-zon were identified in the USA Today article. In fact, thatinformation came from an article in Wired News [17]. Weregret the error. [Added October 9, 2001]

    statistical test capable of detecting steganographiccontent. In Section 5, we give an overview of ex-isting steganographic systems and describe how todetect them. The detection framework is presentedin Section 6. We discuss our results and related

    work in Sections7and8. We conclude in Section9.

    2 Steganography Background

    The term Information Hiding relates to both wa-termarking and steganography. Watermarking usu-ally refers to methods that hide information in adata object so that the information is robust tomodifications. That means, it should be impossi-ble to remove a watermark without degrading thequality of the data object.

    On the other hand, steganography refers to hid-den information that is fragile. Modifications to thecover medium may destroy it.

    Watermarking and steganography differ in anotherimportant way: while steganographic informationmust never be apparent to a viewer unaware of itspresence, this feature is optional for a watermark.

    The security of a classical steganographic systemrelies on the secrecy of the encoding system. Oncethe encoding system is known, the steganographicsystem is defeated. A famous example of a classicalsystem is that of a Roman general who shaved the

    head of a slave and tattooed a hidden message on it.After the hair had grown back, the slave was sent todeliver the message [3]. While such a system mightwork once, the moment that it is known, it is simpleto shave the heads of all people passing by to checkfor hidden messages.

    Other encoding systems might use the last word inevery sentence of a letter or the least significant bitsin an image.

    However, modern steganography should be de-tectable only if secret information is known, namely,a secret key. This is very similar to Kerckhoffs

  • 7/25/2019 Cryptography - Detecting Steganographic Content on the Internet

    4/14

  • 7/25/2019 Cryptography - Detecting Steganographic Content on the Internet

    5/14

    Figure 1: The image on the left is the unmodified original, but the image on the right has the first chapterof the Hunting of the Snark embedded into it. There are no visual differences to the human eye.

    40 30 20 10 0 10 20 30 40

    0

    5000

    10000

    15000

    CoefficientFrequency

    Modified image

    40 30 20 10 0 10 20 30 400

    5000

    10000

    15000

    CoefficientFrequency

    Original image

    40 30 20 10 0 10 20 30 4020

    10

    0

    10

    20

    Differenceinpercent

    DCT coefficents

    Histogram difference

    Figure 2: Embedding a hidden message causes no-ticeable changes to the histogram of DCT coeffi-cients.

    we can take the arithmetic mean,

    yi =n2i+n2i+1

    2 ,

    to determine the expected distribution. The ex-pected distribution is compared against the ob-served distribution

    yi = n2i.

    The 2 value for the difference between the distri-butions is given as

    2 =+1i=1

    (yiyi)2

    yi,

    whereare the degrees of freedom, that is, the num-ber of different categories in the histogram minusone.

    The probability of embeddingp is then given by thecomplement of the cumulative distribution function,

    p= 1

    20

    t(2)/2et/2

    2/2(/2) dt,

    where is the Euler Gamma function.

    0

    20

    40

    60

    80

    100

    0 10 20 30 40 50 60 70 80 90 100Probabilityofembeddinginpercent

    Analysed position in image in percent

    misc/dcsf0001-no.jpg

    0

    20

    40

    60

    80

    100

    0 10 20 30 40 50 60 70 80 90 100Probabilityofembeddinginpercent

    Analysed position in image in percent

    misc/dcsf0001.jpg

    Figure 3: The probability of embedding calculatedfor different areas of an image. The upper graphshows the results for an unmodified image, the lowergraph shows the results for an image with stegano-graphic content.

    We can compute the probability of embedding for

  • 7/25/2019 Cryptography - Detecting Steganographic Content on the Internet

    6/14

    different parts of an image. The selection dependson what steganographic system we try to detect.For an image that does not contain any hidden in-

    formation, we expect the probability of embeddingto be zero everywhere. Figure 3 shows the em-bedding probability for an image without stegano-graphic content and for an image that has contenthidden in it.

    5 Steganographic Systems in Use

    In this section, we present several steganographicsystems that embed hidden messages into JPEG im-ages. We show that the statistical distortions de-

    pend on the steganographic system that insertedthe message into the image. Because the distor-tions are characteristic for each system, we developsignatures that allow us to identify which systemhas been used.

    There are three popular steganographic systemsavailable on the Internet that hide information inJPEG images:

    JSteg, JSteg-Shell

    JPHide

    OutGuess

    All of these systems use some form of least-significant bit embedding and are detectable by sta-tistical analysis except the latest release of Out-Guess [9]. In the following, we present the specificcharacteristics of these systems and show how todetect them.

    5.1 JSteg and JSteg-Shell

    JSteg is an addition by Derek Upham to the Inde-

    pendent JPEG Groups JPEG Software library. TheDCT coefficients are modified continuously from thebeginning of the image. JSteg does not support en-cryption and has no random bit selection.

    The data of the message is prepended with a vari-able size header. The first five bits of the headerexpress the size of the length field in bits. The fol-lowing bits contain the length field that expressesthe size of the embedded content.

    Figure4shows the result of the 2-test for an imagethat contains information hidden with JSteg. In thiscase, the first chapter of The Hunting of the Snark

    0

    20

    40

    60

    80

    100

    0 10 20 30 40 50 60 70 80 90 100Probabilityofemb

    eddinginpercent

    Analysed position in image in percent