Pranav Steganography Report

ABSTRACT

Steganography is a useful tool that allows covert transmission ofinformation over an overt communications channel. Combiningcovert channel exploitation with the encryption methods ofsubstitution ciphers and/or one time pad cryptography, steganography enables the user to transmit information maskedinside of a file in plain view. The hidden data is both difficult todetect and when combined with known encryption algorithms,equally difficult to decipher.This paper provides a general overview of the following subject areas: historical cases and examples using steganography, how steganography works, what steganography software is commercially available and what data types are supported, what methods and automated tools are available to aide computer forensic investigators and information security professionals in detecting the use of steganography, after detection has occurred, can the embedded message be reliably extracted, can the embedded data be separated from the carrier revealing the original file, and finally, what are some methods to defeat the use of steganography even if it cannot be reliably detected.

1. INTRODUCTION

Within the field of Computer Forensics, investigators should beaware that steganography can be an effective means that enablesconcealed data to be transferred inside of seemingly innocuouscarrier files. Knowing what software applications are commonlyavailable and how they work gives forensic investigators agreater probability of detecting, recovering, and eventuallydenying access to the data that mischievous individuals andprograms are openly concealing.Generally speaking, steganography brings science to the art of hiding information. The purpose of steganography is to convey a message inside of a

conduit of misrepresentation such that the existence of the message is both hidden and difficult to recover when discovered. The word steganography comes from two roots in the Greek language, “Stegos” meaning hidden / covered / or roof, and “Graphia” simply meaning writing.Similar in nature to the slight of hand used in traditional magic,steganography uses the illusion of normality to mask the existence of covert activity. The illusion is manifested through the use of a myriad of forms including written documents,photographs, paintings, music, sounds, physical items, and eventhe human body. Two parts of the system are required toaccomplish the objective, successful masking of the messageand keeping the key to its location and/or deciphering a secret.When categorized within one of the two fundamental securitymechanisms of computer science (cryptographic protocols andmaintaining control of the CPUs instruction pointer),steganography clearly fits within cryptography. It closelymirrors common cryptographic protocols in that the embeddedinformation is revealed in much the same manner as substitutionor Bacon cipher mechanisms.This paper will highlight some historical examples, discuss the basic principles of steganography showing how most instances work, identify software that can be used for this purpose, and finally provide an overview of current methods employed to detect and defeat it..

2. HISTORICAL EXAMPLES

Hiding messages by masking their existence is nothing new.Classical examples include a Roman general that shaved thehead of a slave tattooing a message on his scalp. When theslave’s hair grew back, the General dispatched the slave todeliver the hidden message to its intended recipient.Ancient Greeks covered tablets with wax and used them to write on. The tablets were composed of wooden slabs. A layer of melted wax

was poured over the wood and allowed to harden as it dried. Hidden messages could be carved into the wood prior tocovering the slab. When the melted wax was poured over theslab, the now concealed message was later revealed by therecipient when they re-melted the wax and poured it from thetablet.From the 1st century through World War II invisible inks were often used to conceal hidden messages. At first, the inks wereorganic substances that oxidized when heated. The heat reactionrevealed the hidden message. As time passed, compounds andsubstances were chosen based on desirable chemical reactions.When the recipient mixed the compounds used to write theinvisible message with a reactive agent, the resulting chemicalreaction revealed the hidden data. Today, some commonly usedcompounds are visible when placed under an ultraviolet light.In another form, while Paris was under siege in 1870, messageswere sent by carrier pigeon. A Parisian photographer used amicrofilm technique to enable each pigeon to carry a highervolume of data. The miniaturization of information also servedto deter detection and was a precursor to the invention of themicrodot.A microdot is a document or photograph reduced in size until it is as small as a pencil dot (about the size of the period at the end of this sentence). Between World War I and II Germany usedmicrodots for steganographic messaging purposes and latermany countries passed these microdot messages through insecure postal channels.With any type of hidden communication, the security of the message often lies in the secrecy of its existence and/or the secrecy of how to decode it. Cryptography often uses only a worst case approach assuming only one of these two conditions holds.

3. The basics of embedding.

Three different aspects in information-hiding systemscontend with each other: capacity, security, and robustness.4 Capacity refers to the amount of information that can be hidden in the cover medium, security to an eavesdropper’s inability to detect hidden information, and robustness to the amount of modification the stego medium can withstand before an adversary can destroy hidden information.Information hiding generally relates to both watermarking and steganography. A watermarking system’s primary goal is to achieve a high level of robustness—that is, it should be impossible to remove a watermark withoutdegrading the data object’s quality. Steganography, onthe other hand, strives for high security and capacity,which often entails that the hidden information is fragile.Even trivial modifications to the stego medium can destroy it.A classical steganographic system’s security relies on the encoding system’s secrecy. An example of this type of system is a Roman general who shaved a slave’s head and tattooed a message on it. After the hair grew back, the slave was sent to deliver the now-hidden message.5 Although such a system might work for a time, once it is known, it is simple enough to shave the heads of all thepeople passing by to check for hidden messages—ultimately,such a steganographic system fails.Modern steganography attempts to be detectable only if secret information is known—namely, a secret key. This is similar to Kerckhoffs’ Principle in cryptography, which holds that a cryptographic system’s security should rely solely on the key material.6 For steganography to remain undetected, the unmodified cover medium must be kept secret,because if it is exposed, a comparison between the cover and stego media immediately reveals the changes.Information theory allows us to be even more specific on what it means for a system to be perfectly secure.Christian Cachin proposed an information-theoretic model for steganography that considers the security of

steganographic systems against passive eavesdroppers.7 In this model, you assume that the adversary has complete knowledge ofthe encoding system but does not know the secret key. His or her task is to devise a model for the probability distribution PC of all possible cover media and PS of all possible stego media. The adversary can then use detection theory to decide between hypothesis C (that a message contains no hidden information) and hypothesis S (that a message carries hidden content). A system is perfectly secure if no decision rule exists that can perform betterthan random guessing. Essentially, steganographic communication senders and receivers agree on a steganographic system and a shared secret key that determines how a message is encodedin the cover medium. To send a hidden message, for example, Alice creates a new image with a digital camera. Alice supplies the steganographic system with her shared secret and her message. Thesteganographic system uses the shared secret to determine how the hidden message should be encoded in the redundant bits. The result is a stego image that Alice sends to Bob. When Bob receives the image, he uses the shared secret and the agreed on steganographic system to retrieve the hidden message. Figure1 shows an overview of the encoding step; as mentioned earlier, statistical analysis can reveal the presence of hidden content.

4. Hide and seek

Although steganography is applicable to all data objects that contain redundancy, in this article, we consider JPEG images only (although the techniques and methods for steganography and steganalysis that we present here apply to other data formats as well). People often transmit digital pictures over email and other Internet communication, and JPEG is one of the most common formats for images. Moreover, steganographic systems for the JPEG format seem more interesting because the systems operate in

a transform space and are not affected by visual attacks.(Visual attacks mean that you can see steganographic messages on the low bit planes of an image because they overwrite visual structures; this usually happens in BMP images.) Neil F. Johnson andSushil Jajodia, for example, showed that steganographic systems for palette-based images leave easily detected distortions.Let’s look at some representative steganographic systems and see how their encoding algorithms change an image in a detectable way. We’ll compare the different systems and contrast their relative effectiveness.

5. Steganography detection on the Internet

How can we use these steganalytic methods in a realworldsetting—for example, to assess claims that steganographiccontent is regularly posted to the Internet?To find out if such claims are true, we created a steganography detection framework23 that gets JPEG images off the Internet and uses steganalysis to identify subsets of the images likely to contain steganographic content.

Steganographic systems in useTo test our framework on the Internet, we started by searching the Web and Usenet for three popular steganographic systems that can hide information in JPEG images:JSteg (and JSteg-Shell), JPHide, and OutGuess. All these systems use some form of least-significant bit embedding and are detectable with statistical analysis.JSteg-Shell is a Windows user interface to JSteg first developed by John Korejwa. It supports content encryption and compression before JSteg embeds the data. JSteg-Shell uses the RC4 stream cipher for encryption (but the RC4 key space is restricted to 40 bits).JPHide is a steganographic system Allan Latham first developed that uses Blowfish as a PRNG.24,25 Version 0.5 (there’s also a version 0.3) supports additional compression of the

hidden message, so it uses slightly different headers to store embedding information. Before the content is embedded, the content is Blowfish-encrypted with a user-supplied pass phrase.Detection frameworkStegdetect is an automated utility that can analyze JPEGimages that have content hidden with JSteg, JPHide, andOutGuess 0.13b. Stegdetect’s output lists the steganographicsystems it finds in each image or writes “negative” if it couldn’t detect any.We calibrated Stegdetect’s detection sensitivity againsta set of 500 non-stego images (of different sizes) and stegoimages (from different steganographic systems). On a 1,200-MHz Pentium III processor, Stegdetect can keep up with a Web crawler on a 10 MBit/s network.Stegdetect’s false-negative rate depends on the steganographic system and the embedded message’s size.The smaller the message, the harder it is to detect by statistical means. Stegdetect is very reliable in finding images that have content embedded with JSteg. For JPHide, detection depends also on the size and the compression quality of the JPEG images. Furthermore, JPHide 0.5 reduces the hidden message size by employing compression. Figure 11 shows the results of detecting JPHide and JSteg.For JSteg, we cannot detect messages smaller than 50 bytes. The false-negative rate in such cases is almost 100percent. However, once the message size is larger than 150 bytes, our false-negative rate is less than 10 percent.For JPHide, the detection rate is independent of the message size, and the false-negative rate is at least 20 percent in all cases. Although the false-negative rate for OutGuess is around 60 percent, a high false-negative rate is preferable to a high false-positive rate, as we explain later.

Finding imagesTo exercise our ability to test for steganographic contentautomatically, we needed images that might contain hidden

messages. We picked images from eBay auctions (due to various news reports)20,21 and discussion groups in the Usenet archive for analysis.To get images from eBay auctions, a Web crawler thatcould find JPEG images was the obvious choice. Unfortunately,there were no open-source, image-capable Web crawlers available when we started our research. To get around this problem, we developed Crawl, a simple, efficient Web crawler that makes a local copy of any JPEG images it encounters on a Web page. Crawl performs a depth-first search and has two key features:• Images and Web pages can be matched against regular expressions; a match can be used to include or exclude Web pages in the search.• Minimum and maximum image size can be specified,which lets us exclude images that are too small to containhidden messages. We restricted our search to images larger than 20 Kbytes but smaller than 400.We downloaded more than two million images linked to eBay auctions. To automate detection, Crawl uses stdout to report successfully retrieved images to Stegdetect.After processing the two million images with Stegdetect,we found that over 1 percent of all images seemed to contain hidden content. JPHide was detected most often (seeTable 2).We augmented our study by analyzing an additionalone million images from a Usenet archive. Most of theseare likely to be false-positives. Stefan Axelsson applied thebase-rate fallacy to intrusion detection systems and showedthat a high percentage of false positives had a significanteffect on such a system’s efficiency.27 The situation is verysimilar for Stegdetect.We can calculate the true-positive rate—the probability that an image detected by Stegdetect really has steganographic content—as follows where P(S) is the probability of steganographic content in images, and P(¬S) is its complement. P(D|S) is the probability that we’ll detect an image that has steganographic content, and P(D|¬ S) is the false-positive rate. Conversely,P(¬D|S) = 1 – P(D|S) is the false-negative rate.To improve the true-positive rate, we must increase

the numerator or decrease the denominator. For a givendetection system, increasing the detection rate is not possible without increasing the false-positive rate and vice versa. We assume that P(S)—the probability that an image contains steganographic content—is extremely low compared to P(¬ S), the probability that an image contains no hidden message. As a result, the false-positive rate P(D|¬S) is the dominating term in the equation; reducing it is thus the best way to increase the true-positive rate. Given these assumptions, the false-positive rate also dominates the computational costs to verifying hidden content. For a detection system to be practical, keeping the false-positive rate as low as possible is important.

Verifying hidden contentThe statistical tests we used to find steganographic content in images indicate nothing more than a likelihood that content is embedded. Because of that, Stegdetect cannot guarantee a hidden message’s existence.To verify that the detected images have hidden content,Stegbreak must launch a dictionary attack against the JPEG files. JSteg-Shell, JPHide, or Outguess all hide contentbased on a user-supplied password, so an attacker can try to guess the password by taking a large dictionary and trying to use every single word in it to retrieve the hidden message. In addition to message data, the three systems also embed header information, so attackers can verify a guessed password using header information such as message length. For a dictionary attack28 to work, thesteganographic system’s user must select a weak password(one from a small subset of the full password space). Ultimate success, though, depends on the dictionary’s quality. For the eBay images, we used a dictionary with roughly 850,000 words from several languages. For the Usenet images, we improved the dictionary by including four-digit PIN numbers and short pass phrases. We created these short pass phrases by taking three- to five-letter words from a list of the 2,000 most common English

words and concatenating them. The resulting dictionary contains 1.8 million words.We measured Stegbreak’s performance on a 1,200-MHz Pentium III by running a dictionary attack against one image and then against a set of 50 images (see Table 3). The speed improvement for 50 images is due to key schedule caching. For JPHide, we checked about 8,700 words per second; a test run with 300 images and a dictionary of roughly 577,000 words took 10 days to check for both versions of JPHide. Blowfish is designed to make key schedule computation expensive, which slowed downStegbreak. When checking for JPHide 0.5, the Blowfish key schedule must be recomputed for almost every image. Stegbreak was faster for OutGuess—about 34,000 words per second. However, due to limited header information, a large dictionary can produce many candidate passwords. For JSteg-Shell, Stegbreak checked about 47,000 words per second, which was fast enough to run a dictionary attack on a single computer. JSteg- Shell restricts the key space to 40 bits, but if passwords consist of only 7-bit characters, the effective key space is reduced to 35 bits. We could search that key space in about eight days.

6. A Detailed Look at Steganography

In this section we will discuss Steganography at length. We will start by looking at the different types of Steganography generally used in practice today along with some of the other principles that are used in Steganography. We will then look at some of theSteganographic techniques in use today. This is where we will look at the nuts and bolts of Steganography and all the different ways we can use this technology. We will then close by going over Steganalysis. Steganalysis concentrates on the art and science offinding and or destroying secret messages that have been produced using any of the various steganographic techniques we will cover in this paper.To start, lets look at what a theoretically perfect secret communication (Steganography) would consist of. To illustrate this concept, we will use three fictitious characters named Amy,

Bret and Crystal. Amy wants to send a secret message (M) to Bret using a random(R) harmless message to create a cover (C) which can be sent to Bret without raising suspicion. Amy then changes the cover message (C) to a stego-object (S) by embedding the secret message (M) into the cover message (C) by using a stego-key (K). Amy should then be able to send the stegoobject (S) to Bret without being detected by Crystal. Bret will then be able to read the secret message (M) because he knows the stego-key (K) used to embed it into the cover message (C). As Fabien A.P. Petitcolas points out, "in a 'perfect' system, a normal cover should not be distinguishable from a stego-object, neither by a human nor by a computer looking for statistical patterns." In practice,however, this is not always the case. In order to embed secret data into a cover message, the cover must contain a sufficient amount of redundant data or noise. This is because the embedding process Steganography uses, actually replaces this redundant data with the secret message. This limits the types of data that we can use with Steganography. In practice, there are basically three types of steganographic protocols used. They are Pure Steganography, Secret Key Steganography and Public Key Steganography.Pure Steganography is defined as a steganographic system that does not require the exchange of a cipher such as a stego-key. This method of Steganography is the least secure means by which to communicate secretly because the sender and receiver can rely only upon the presumption that no other parties are aware of this secret message. Using open systems such as the Internet, we know this is not the case at all.Secret Key Steganography is defined as a steganographic system that requires the exchange of a secret key (stego-key) prior to communication. Secret Key Steganographytakes a cover message and embeds the secret message inside of it by using a secret key(stego-key). Only the parties who know the secret key can reverse the process and read the secret message. Unlike Pure Steganography where a perceived invisible communication channel is present, Secret Key Steganography exchanges a stego-key,which makes it more susceptible to

interception. The benefit to Secret Key Steganography is even if it is intercepted, only parties who know the secret key canextract the secret message.Public Key Steganography takes the concepts from Public Key Cryptography as explained below. Public Key Steganography is defined as a steganographic system that uses a public key and a private key to secure the communication between the parties wanting to communicate secretly. The sender will use the public key during the encodingprocess and only the private key, which has a direct mathematical relationship with the public key, can decipher the secret message. Public Key Steganography provides a more robust way of implementing a steganographic system because it can utilize a much more robust and researched technology in Public Key Cryptography. It also has multiple levels of security in that unwanted parties must first suspect the use of steganography and then they would have to find a way to crack the algorithm used by the public key system before they could intercept the secret message.

A. Encoding Secret Messages in Text

Encoding secret messages in text can be a very challenging task. This is because text files have a very small amount of redundant data to replace with a secret message.Another drawback is the ease of which text based Steganography can be altered by an unwanted parties by just changing the text itself or reformatting the text to some other form (from .TXT to .PDF, etc.). There are numerous methods by which to accomplish text based Steganography. I will introduce a few of the more popular encoding methodsbelow.Line-shift encoding involves actually shifting each line of text vertically up or down by as little as 3 centimeters. Depending on whether the line was up or down from the stationary line would equate to a value that would or could be encoded into a secretmessage.Word-shift encoding works in much the same way that line-shift encoding works,only we use the horizontal spaces

between words to equate a value for the hidden message. This method of encoding is less visible than line-shift encoding but requires that the text format support variable spacing.Feature specific encoding involves encoding secret messages into formatted text by changing certain text attributes such as vertical/horizontal length of letters such as b,d, T, etc. This is by far the hardest text encoding method to intercept as each type of formatted text has a large amount of features that can be used for encoding the secret message.All three of these text based encoding methods require either the original file or the knowledge of the original files formatting to be able to decode the secret message.

B. Encoding Secret Messages in Images

Coding secret messages in digital images is by far the most widely used of all methods in the digital world of today. This is because it can take advantage of the limited power of the human visual system (HVS). Almost any plain text, cipher text, image andany other media that can be encoded into a bit stream can be hidden in a digital image.With the continued growth of strong graphics power in computers and the research being put into image based Steganography, this field will continue to grow at a very rapid pace.Before diving into coding techniques for digital images, a brief explanation of digital image architecture and digital image compression techniques should be explained.As Duncan Sellars explains "To a computer, an image is an array of numbersthat represent light intensities at various points, or pixels. These pixels make up the images raster data." When dealing with digital images for use with Steganography, 8-bit and 24-bit per pixel image files are typical. Both have advantages and disadvantages, as we will explain below.8-bit images are a great format to use because of their relatively small size. The drawbackis that only 256 possible colors can be used which can be a potential problem during encoding. Usually a gray scale color

palette is used when dealing with 8-bit images such as (.GIF) because its gradual change in color will be harder to detect after the image has been encoded with the secret message. 24-bit images offer much more flexibility when used for Steganography. The large numbers of colors (over 16 million) that can be usedgo well beyond the human visual system (HVS), which makes it very hard to detect once a secret message, has been encoded. The other benefit is that a much larger amount of hidden data can be encoded into a 24-bit digital image as opposed to an 8-bit digitalimage. The one major drawback to 24-bit digital images is their large size (usually in MB) makes them more suspect than the much smaller 8-bit digital images (usually in KB) when sent over an open system such as the Internet.Digital image compression is a good solution to large digital images such as the 24-bit images mentioned earlier. There are two types of compression used in digital images, lossy and lossless. Lossy compression such as (.JPEG) greatly reduces the size of a digital image by removing excess image data and calculating a close approximation of the original image. Lossy compression is usually used with 24-bit digital images to reduce its size, but it does carry one major drawback. Lossy compression techniques increase the possibility that the uncompressed secret message will lose parts of itscontents because of the fact that lossy compression removes what it sees as excess image data. Lossless compression techniques, as the name suggests, keeps the original digital image in tact without the chance of loss. It is for this reason that it is the compressiontechnique of choice for steganographic uses. Examples of lossless compression techniques are (.GIF and .BMP). The only drawback to lossless image compression is that it doesn't do a very good job at compressing the size of the image data.We will now discuss a couple of the more popular digital image encoding techniques used today. They are least significant bit (LSB) encoding and masking and filtering techniques. Least significant bit (LSB) encoding is by far the most popular of the codingtechniques used for digital images. By using the LSB of each byte

(8 bits) in an image for a secret message, you can store 3 bits of data in each pixel for 24-bit images and 1 bit in each pixel for 8-bit images. As you can see, much more information can be stored in a24-bit image file. Depending on the color palette used for the cover image (i.e., all gray),it is possible to take 2 LSB's from one byte without the human visual system (HVS) being able to tell the difference. The only problem with this technique is that it is very vulnerable to attacks such as image changes and formatting (i.e., changing from .GIF to .JPEG).Masking and filtering techniques for digital image encoding such as Digital Watermarking (i.e.- integrating a companies logo on there web content) are more popular with lossy compression techniques such as (.JPEG). This technique actually extends an images data by masking the secret data over the original data as opposed to hiding information inside of the data. Some experts argue that this is definitely a form ofInformation Hiding, but not technically Steganography. The beauty of Masking and Filtering techniques are that they are immune to image manipulation which makes there possible uses very robust.There are techniques that use complex algorithms, image transformation techniques and image encryption techniques which are still, relatively new, but show promise to be more secure and robust ways to use digital images in Steganography.

7. ConclusionComputer forensic professionals need to be aware of the difficulties in identifying the use of steganography in any investigation. As with many digital age technologies, steganography techniques are becoming increasingly more sophisticated and difficult to reliably detect. Once use isdetected or discovered, obtaining the ability to recover theembedded content is becoming difficult as well. Acquiring

knowledge of current steganographic techniques, along withtheir associated data types, can provide a critical advantage to aninvestigator by adding valuable tools to their forensic toolkit.Finally, due to the relatively simple techniques capable of denying the exploitation of a covert steganographic channel,companies may wish to take precautionary measures. By enacting measures discussed in this paper, they can ensure their proprietary and trade secret information is not being shoplifted inside of the daily podcast, shared in family photos, or distributed via the latest YouTube video.

REFERENCES

[1] K. Ahsan, and D. Kundur, “Practical InternetSteganography: Data Hiding in IP” found online athttp://www.ece.tamu.edu/~deepa/pdf/txsecwrksh03.pdf

[2] R.J. Anderson and F.A.P. Petitcolas, “On the Limits ofSteganography,” J. Selected Areas in Comm., vol. 16, no.4, 1998, pp. 474–481

[3] Curran, K. and Bailey, K. “An evaluation of image-basedsteganography methods”. International Journal of DigitalEvidence, Fall 2003.

http://www.ece.tamu.edu/~deepa/pdf/txsecwrksh03.pdf

Pranav Steganography Report

Documents

Pranav Steganography Report