DELHI TECHNOLOGICAL UNIVERSITY SHAHBAD DAULTAPUR, BAWANA RD., DELHI-42 DEPARTMENT OF COMPUTER ENGINEERING SELF-STUDY SEEMINAR REPORT ON STEGANOGRAPHY PRESENTED BY
DELHI TECHNOLOGICAL UNIVERSITY
SHAHBAD DAULTAPUR, BAWANA RD., DELHI-42
DEPARTMENT OF COMPUTER ENGINEERING
SELF-STUDY SEEMINAR REPORT ON
STEGANOGRAPHY
PRESENTED BY
CERTIFICATE
I Abhishek Singh, hereby solemnly affirm that the Self-Study Seminar Report entitled
“Steganography” being submitted by me in partial fulfillment of the requirements for the
award of the degree of Bachelor of Technology in Computer Engineering, to the Delhi
Technological University, is a record of bona fide work carried out by me under the
guidance of Ms. Indu Singh and Mr. Vinod Kumar___________
The work reported in this report in full or in part has not been submitted to any university
or Institute for the award of any other degree or diploma.
Place: DTU, Bawana Road, Delhi-11092
Date:
ACKNOWLEDGEMENT
I would like to express our greatest gratitude to the people who have helped &
support me throughout my Self-Study Seminar Report.
I are grateful to my guides, ____Ms. Indu Singh and Mr. Vinod Kumar ______ for
their continuous support for the project, from initial advice to contacts in the early stages of
conceptual inception & through ongoing advice & encouragement to this day.
I wish to thank them for their undivided support and interest, which inspired and
encouraged me to go the right way.
At last, but not the least, I want to thank my friends who appreciated me for my work and
motivated me; and finally to God who made all the things possible.
ABSTRACT
Steganography is the art and science of writing hidden messages in such a way
that no one apart from the sender and intended recipient even realizes there is a hidden
message.
This Report is primarily a reference resource on the Steganography. The material here is
designed to allow a student to learn and understand what Steganography is and how it
works. I do discuss quite a number of Practical Steganographic Methodology to achieve
Covert Communications, but I also focus on how these secrete communication using
Steganography can be busted, as well as how Steganography can used other than as method
to hide data. Also in this report you will find how Steganography does when compared to
the quite famous methodology of Securing Data i.e. Cryptology. If you want to really
understand about Steganography is and what makes it work, you’ve come to the right
place. If all you want is simple instructions on how to hide information in media files, this
probably isn’t the Guide for you. It tells you about Steganography and its deep study.
IMPORTANT KEYWORDS
Steganography
Cryptography
Cover Media
Plain Text
Digital Watermarking
Steganalysis
Stego-Key
Stego-Media
TABLE OF CONTENT
1. INTRODUCTION…………………………………………………………………1
WHAT EXACTLY IS STEGANOGRAPHY?…………………………......1
WHY IS STEGANOGRAPHY IMPORTANT?............................................1
2. LITERATURE SURVEY…………………………………………………………3
STEGANOGRAPHY IN TEXT FILES…………………………………….5
STEGANOGRAPHY IN AUDIO FILES…………………………………..6
STEGANOGRAPHY IN MAGES………………………………………….8
COVERT CHANNELS……………..……………………………………..10
STEGANOGRAPHY OVER NETWORKS………………………………11
3. DISCUSSION……………………………………………………………………..13
STEGANALYSIS…………………………………………………………13
APPLICATIONS………………………………………………………….15
STEGANOGRAPHY VS. CRYPTOGRAPHY………………………..…17
FUTURE SCOPE OF STEGANOGRAPHY……………………………..18
4. CONCLUSION…………………………………………………………………...19
5. REFERENCES…………………………………………………………………...20
CHAPTER 1:
INTRODUCTION
WHAT EXACTLY IS STEGANOGRAPHY?
The word Steganography is of Greek origin and means "covered, or hidden
writing".
Steganography is the art and science of communicating in a way which hides the
existence of the communication. Steganography boosts the concept of Information
Hiding without altering or recasting the Information.
Steganography has been widely used, including in recent historical times and the present
day, from using slaves’ shaven heads and patterns sewed into quilts in ancient time, to
using crossword puzzles and invisible ink in modern times, to relay messages which were
not meant for others to read.
However today, in the 21st Century, Steganography is most often associated with data
hidden with other data in an electronic file, including the concealment of information
within computer files such as a document file, image file, program and even protocol.
WHY IS STEGANOPGRAPHY IMPORTANT?
The Primary Goal of Steganography is to hide messages inside other harmless
messages in a way that does not allow any enemy to even detect that there is a second
secret message present. Only due to Steganography, Secret Communications can be
established which is presently required by an individual as well as any organization or the
Government.
Another purpose Steganography serves is to protect the Intellectual Property Rights of
Copyright Owners over their digital media. Information about the owner, recipient and
access level can be embedded within the media through Steganographic Methods. This
allows no one to claim their rights over someone else’s copyrighted media as long as the
“Watermark” hidden in the media is present. Steganography also makes the removal of the
“Watermark” without destroying the media itself.
CHAPTER 2
LITERATURE SURVEY
The purpose of this Literature Review is the state and explain the how
Steganography actually helps in strengthening the security of any sensitive information. It
will take a look at how the technology and encoding systems have developed today to
sophisticatedly embed data over different types of files such that the embedded data is not
even perceptible to anyone not only at first look but also when the file undergoes scrutiny.
Furthermore, the use of mediums, which were not intended for communications, are being
used to transmit stego-media are also mentioned in this section. Also explaining how the
networking today helps in Steganography.
Steganography involves hiding data in an overt message and doing it in such a way
that it is difficult for an adversary to detect and difficult for an adversary to remove. Based
on this goal, the basic Steganographic Model is shown below:
Carrier Media (cover-media) is an original unaltered message into which the
message meant to be hidden is to be embedded into the media.
Encoding process/Steganographic Algorithm in which the sender tries to hide a
message by embedding it into a cover-text, usually using a key, to obtain a stego-
media.
The Payload (or the Secrete Message) is taken along with Carrier Media which are
worked in the Steganographic Technique/Algorithm to embed the payload into
carrier.
The unique key in the Encoding System/Algorithm known as Stego-Key decides
how the process of alteration of carrier media is to be done in order to
accommodate the payload without revealing the payload to others.
The final incorporated media obtained from the Algorithm is called the Stego-
Media. It is this media which holds the secrete message and is ready to be
transmitted without being detected for holding any covert information!
Recovering process in which the receiver tries to get, using the key only, the hidden
message in the stego-media.
Security requirement is that a third person watching such a communication should
not be able to find out whether the sender has been active, and when, in the sense
that he really embedded a message in the carrier media. In other words, stego-
media should be indistinguishable from Carrier media.
Depending upon the type of carrier media, suitable technique can be applied to bring about
the maximum encoding density i.e. the degree of alteration of carrier media to encode the
payload.
Steganography can be classified into image, text, audio steganography depending on
the cover media used to embed secret data
STEGANOGRAPHY IN TEXT FILES
Text steganography can involve anything from changing the formatting of an
existing text, to changing words within a text, to generating random character sequences or
using context-free grammars to generate readable texts. Text steganography is believed to
be the trickiest due to deficiency of redundant information which is present in image, audio
or a video file. However, in text documents, we can hide information by introducing
changes in the structure of the document without making a notable change in the concerned
output. The Payload can be embedded in a text through 3 pre-established techniques:
1. Line-Shift Coding
In this method, secret message is hidden by vertically shifting the text lines to some
degree. Vertically shift the positions of the locations of text lines in order to hide
the secret information in the document.
The user can chose an arbitrary number between 0 and 127, which is converted into
an 8 bit sequence. According to this bit sequence every second line is shifted one
pixel down if there is a 1 whereas a 0 keeps the line in the normal position.
2. Word-Shift Coding
In this method, secret message is hidden by shifting the words horizontally, i.e. left
or right to represent bit 0 or 1 respectively. Words shift can only be detected using
correlation method. The scheme is based on the fact that most documents use
variable word spacing to distribute white space when justifying a document. If the
space between adjacent words were not different, word-shift coding would not
make sense at all.
3. Feature Coding
In feature coding, secret message is hidden by altering one or more features of the
text. A parser examines a document and picks out all the features that it can use to
hide the information.
For example, point in letters i and j can be displaced, length of strike in letters f and
t can be changed, or by extending or shortening height of letters b, d, h, etc.
STEGANOGRAPHY IN AUDIO FILES
Today most of the Audio Files are available in Digital form. Digital audio is
discrete rather than continuous signal as found in analog audio. A discrete signal is created
by sampling a continuous analog signal at a specified rate. Digital audio is stored in a
computer as a sequence of 0's and 1's. With the right tools, it is possible to change the
individual bits that make up a digital audio file. Such precise control allows changes to be
made to the binary sequence that are not discernible to the human ear. There have been
many techniques for hiding information or messages in audio in such a manner that the
alterations made to the audio file are perceptually indiscernible. Some of the pre-
established and prominent methods of Audio Steganography are:
1. Least Significant Bit Coding
A very popular methodology is the LSB (Least Significant Bit) algorithm, which
replaces the least significant bit in some bytes of the cover file to hide a sequence
of bytes containing the hidden data. That's usually an effective technique in cases
where the LSB substitution doesn't cause significant quality degradation. Note that
there's a fifty percent chance that the bit you're replacing is the same as its
replacement, in other words, half the time, the bit doesn't change, which helps to
minimize quality degradation.
The illustration shows how the message 'HEY' is encoded in a 16-bit CD quality
sample using the LSB method. Here the secret information is ‘HEY’ and the cover
file is audio file. HEY is to be embedded inside the audio file. First the secret
information ‘HEY’ and the audio file are converted into bit stream. The least
significant column of the audio file is replaced by the bit stream of secrete
information ‘HEY’. The resulting file after embedding secret information ‘HEY’ is
called Stego-file.
2. Echo Hiding
Echo hiding technique embeds secret information in a sound file by introducing an
echo into the discrete signal. Echo hiding has advantages of providing a high
Encoding Density and superior robustness when compared to other methods. Only
one bit of secret information could be encoded if only one echo was produced from
the original signal. Hence, before the encoding process begins the original signal is
broken down into blocks. Once the encoding process is done, the blocks are
concatenated back together to create the final signal [5, 20]. Echo Hiding is shown
in the illustration:
3. Phase Coding
The phase coding technique works by replacing the phase of an initial audio
segment with a reference phase that represents the secret information. The
remaining segments phase is adjusted in order to preserve the relative phase
between segments.
STEGANOGRAPHY IN IMAGES
To a computer, an image is an array of numbers that represent light intensities at
various points, or pixels. The 8-bit color images is more preferably used to hide
information. In 8-bit color images, each pixel is represented as a single byte. Each pixel
merely points to a color index table, or palette, with 256 possible colors. The pixel's value,
then, is between 0 and 255.
The challenge of using steganography in computer images is to hide as much data as
possible with the least noticeable difference in the image. Many Steganography experts
recommend using images featuring 256 shades of gray as the palette, for reasons that will
become apparent. Grey-scale images are preferred because the shades change very
gradually between palette entries. This increases the image's ability to hide information!
Also Images having large areas of solid color should be avoided to be used as Carrier Files
as even minute change in the Carrier would be visual noticeable!
Information in Images can be hidden many different ways. Some of the prominent
Encoding techniques of Image Steganography are:
1. Least Significant Bit Encoding a.k.a. Insertion Technique
One of the most common techniques used in steganography today is called least
significant bit (LSB) insertion. This method is exactly what it sounds like; the least
significant bits of the cover-image are altered so that they form the embedded
information. The following example shows how the letter A can be hidden in the
first eight bytes of three pixels in a 24-bit image:
Pixels: (00100111 11101001 11001000)
(00100111 11001000 11101001)
(11001000 00100111 11101001)
The Code for letter ‘A’ which is to be hidden: 10000001
After the Insertion technique:
Resultant Pixels: (00100111 11101000 11001000)
(00100110 11001000 11101000)
(11001000 00100111 11101001)
The highlighted bits of respective bytes are the only bits of 3 pixels that were
actually altered. LSB insertion requires on average that only half the bits in an
image be changed. Since the 8-bit letter A only requires eight bytes to hide it in, the
ninth byte of the three pixels can be used to hide the next character of the hidden
message!
2 or more Least Significant bits per byte can be used to embed more payload
increasing the encoding density, however the cover image degrades sharply leading
to its easing detection for holding hidden information.
2. Masking and Filtering
Masking and filtering techniques hide information by marking an image in a
manner similar to paper watermarks. Because watermarking techniques are more
integrated into the image, they may be applied without fear of image destruction
from “lossy” compression. By covering, or masking a faint but perceptible signal
with another to make the first non-perceptible, we exploit the fact that the human
visual system cannot detect slight changes in certain temporal domains of the
image.
COVERT CHANNELS
Covert channels are communication paths that were neither designed nor intended to
transfer information at all, but are used that way, using entities that were not intended for
such use.
The most prominent example of Covert Channel is the TCP/IP Headers.
The IP header contains the key information that is needed for packets of data to be routed
properly. When using Stego you want to find fields that you can overwrite or change that
will not have an effect on the host communication. One field in the IP header that you can
change without having any effect is the IP identification number. Usually the ID is
incremented by 1 for each packet that is sent out, but any number can be used and the
protocol will function properly. This ability to make a change and not damage functionality
makes this piece of data an ideal candidate for hiding stego.
In the TCP, we can use the sequence and the acknowledgement number fields to hide the
data. During the initial handshake the values of sequence and acknowledgement numbers
are picked and randomly generated by the sender and receiver. Therefore, the first packet
that is sent can contain data hidden in those fields because the initial values don’t have any
purpose. Essentially, once communication has been established, these fields can no longer
be used to hide data.
STEGANOGRAPHY OVER NETWORK
On a computer network you can use Steganographic techniques to hide files in traffic,
transmit viruses, or hide the trail somebody when moving around online. The 3 methods to
implement Steganography over a network are:
1. Hiding in an Attachment
Hiding data in an attachment is the most basic form of using a network to transmit
stego from one person to another. Three most popular ways to do this are with
Email, by file transfer such as FTP, or by posting a file on a website
2. Hiding in a Transmission
Hiding data in an attachment is the most basic form of using a network to transmit
stego from one person to another. Three most popular ways to do this are with
Email, by file transfer such as FTP, or by posting a file on a website
3. Hiding in Overt Protocol
Essentially, this is what is called as data camouflaging, where you make data look
like something else. With this technique you take data, put it in normal network
traffic, and modify the data in such a way that it looks like the overt protocol.
For example, most networks carry large amounts of HTTP or Web traffic. You
could send data over port 80, and it would look like web traffic. The problem is
that, if someone examined the payload, it would not look like normal web traffic,
which usually contains HTML. What if you added symbols such as < >, < / > to the
data? Because these are the types of characters that HTML contains, the traffic
would look like Web traffic and probably would slip by the casual observer.
CHAPTER 3
DISCUSSION
This Section is included in this report to state and explain where exactly
Steganography stands in the present time. Despite the presence of many other method of
secure communication, like Cryptography, why is Steganography now being more
preferred than the rest. Also this section also explains if there is way to hide information in
any media, then there is a way to catch these Stego-media carrying the covert information
while being transmitted as any other carrier media.
STAGANALYSIS
Since Steganography provides a medium of covert communications, it can be
abused by some anti-social or ill elements present in the world. These elements can use any
media to send their messages though the internet without being detected by anyone. Hence
it is necessary to analyze any suspected media for a hidden message. This is where
Steganalysis comes in.
Steganalysis is the study of detecting messages hidden using steganography. The goal of
Steganalysis is to identify suspected packages, determine whether or not they have a
payload encoded into them, and, if possible, recover that payload. Steganalysis generally
starts with a pile of suspect data files, but little information about which of the files, if any,
contain a payload.
Basic Technique: A set of unmodified files of the same type, and ideally from the same
source as the set being inspected, are analyzed for various statistics. Some of these are as
simple as spectrum analysis, also attempt to look for inconsistencies in the way this data
has been compressed. One case where detection of suspect files is straightforward is when
the original, unmodified carrier is available for comparison. Comparing the package
against the original file will yield the differences caused by encoding the payload—and,
thus, the payload can be extracted.
Another Technique: Most programs which are employed to detect Stego-Media used the
concept of randomness.
With most files that do not contain random data you would get a histogram that has peaks
and valleys where certain characters appear often and others appear only infrequently.
Below is a histogram for unencrypted data.
On the other hand, with encrypted data, because it is random, you acquire a much flatter
histogram, as you can see in Figure 4.2. You can see that, compared to Figure 4.1, Figure
4.2 looks extremely flat. Every character appears with equal frequency.
Hence the degree of randomness depicted by the Histogram can be used to take up media
for inspection even if an unmodified file from same source is not available for comparison
APPLICATIONS
Data Hiding was the Major Application of the Steganography since its inception.
However, by the 21st Century, Steganography has found many more applications which are
not meant for any kind of secrete communications! Some of the Prominent Applications of
Steganography are explain below:
1. Digital Watermarking
A Digital Watermark is a kind of marker covertly embedded in a noise-tolerant
signal such as audio or image data. A popular application of watermarking
techniques is to provide a proof of ownership of digital data by embedding
copyright statements into video or image digital products and hence does not need
any relation to the Carrier Media. Like traditional watermarks, digital watermarks
can be made perceptible under certain conditions, i.e. after using some algorithm,
and imperceptible anytime else, keeping in mind that the Digital Watermark does
not distorts the carrier signal.
Both steganography and digital watermarking employ Steganographic techniques to
embed data covertly in noisy signals. Steganography aims for imperceptibility to
human senses, but digital watermarking tries to control the robustness as top
priority even if the watermark becomes perceptible.
Digital watermarking may be used for a wide range of applications, such as:
Copyright protection
Source tracking (different recipients get differently watermarked content)
Broadcast monitoring (television news often contains watermarked video
from international agencies)
Authentication of the Media
2. Tamper Proofing of Files
When any fie containing any sensitive information is being transferred or
transmitted, there is a risk of it being altered by any person or organization which
could prove fatal. The principle of Steganography can help here. A data, just like a
watermark, can be embedded into the file through Steganographic methods. If the
file undergoes any changes by any unauthorized person, the embedded data will
also undergo some changes. Hence the embedded data can be check to see whether
the file has been tampered or not. Of course it is not possible to remove the
embedded data without destroying the file itself. Therefore the presence of the
embedded data into the file acts as medium for making the file Tamper Proof.
3. Steganography in Modern Printers
Steganography is used by some modern printers, including HP and Xerox brand
color laser printers. Tiny yellow dots are added to each page. The dots are barely
visible and contain encoded printer serial numbers, as well as date and time stamps.
4. Steganography in DNA
Even biological data, stored on DNA, may be a candidate for hiding messages, as
biotech companies seek to prevent unauthorized use of their genetically engineered
material. The technology is already in place for this: three New York researchers
successfully hid a secret message in a DNA sequence and sent it across the country.
STEGANOGRAPHY VS CRYPTOGRAPHY
Cryptography and Steganography are well known and widely used techniques that
manipulate information in order to cipher or hide their existence respectively.
Steganography is the art and science of communicating in a way which hides the existence
of the communication, while Cryptography scrambles a message so it cannot be
understood; the Steganography hides the message so it cannot be seen. Even though both
methods provide security, there are some significant differences between the 2 sciences.
However it’s the Steganography which comes on top, providing a better sense of
security over Cryptography.
The advantage of steganography over cryptography alone is that messages do not
attract attention to themselves, to messengers, or to recipients. Whereas the goal of
cryptography is to make data unreadable by a third party, the goal of steganography
is to hide the data from a third party.
Steganography
Only Sender & Reciever knows the existence of the message.
Prevents Discovery of the very existence of Communications.
Does not Alter the structur & content of the message.
Cryptograpphy
The Existence of Message is Known to all..
Prevents Unauthorized party to dicover the contents of
Communications.
Alters the structure of the Meessage.
But still, steganography and cryptography can be used together to ensure more
robust and foolproof security of the covert message.
FUTURE SCOPE OF STEGANOGRAPHY
One thing that you can certainly say about the future of Steganography: Change and
the introduction of new approaches will continue to occur on a frequent basis. The first
area where we can appreciate Stego making strides is in the technology used to produce
and break it.
Improved Resistance to Analysis
As Stego-media gets more sophisticated, its resistance to being analyzed, or even
recognized, will improve. In the current state of Stego technology, if you suspect
Stego-media is being used, it is relatively easy to detect it. Once you detect it, you
can probably retrieve the contents which would then be protected only by the
strength of the encryption applied to it, if any. In the future, efforts will make
Stego undetectable and irretrievable except by those for whom it is intended. The
ability to manipulate data, then printing out a hard copy, rescanning it, and again
being able to retrieve the hidden data would be an intriguing scenario.
Higher Encoding Density
The ability to hide huge amounts of data with stego is another logical area for
improvement. Currently stego can use only a certain amount of data bits in a host
file without degrading the file to the point where it’s obvious that stego is being
used. As stego is used in crimes such as corporate espionage, there will be more
demand to hide larger amounts of data. Large-scale stego, where you can perform
compression on huge amounts of data on the fly and store it in small files, is one
possible future.
CONCLUSION
Steganography is a really interesting subject and outside of the mainstream
cryptography and system administration that most of us deal with day after day. “You
never know if a message is hidden”, this is the dilemma that empowers steganography.
As main emphasis of Steganography is placed on the areas of providing a Covert
Communication in an Overt Environment, but it does not mean it is limited to Data Hiding.
Steganography is now also finding grounds in copyright protection, privacy protection, and
surveillance, and in checking genuineness. I believe that steganography will continue to
grow in importance as a protection mechanism.
In spite of being proved more superior to Cryptography, Steganography is not intended to
replace cryptography but rather to supplement it. If a message is encrypted and hidden with
a Steganographic method it provides an additional layer of protection and reduces the
chance of the hidden message being detected. Also the opposite way can be done: The
secrete message first being embedded into the Carrier file and then the carrier file
undergoes Encryption.
Hence both Steganography and Cryptography together can unlock a new world of hiding
data and reach new heights.
However with the advancement of Data Hiding and encryption, it is expected that
Steganography can be abused for greedy and illicit purposes. But with continuous
advancements in technology it is expected that in the near future more efficient and
advanced techniques in Steganalysis will emerge that will help law enforcement to better
detect illicit materials transmitted through the Internet.
REFERENCES
Site: http://en.wikipedia.org/wiki/Steganography
Site: http://www.sarc-wv.com/news.aspx
Site:
http://www.strangehorizons.com/2001/20011008/steganography.shtml
Site: http://\vww.jjtc.com/stegdoc
Eric Cole - “Hiding in Plain Text”, Wiley Publishing Inc. :2003
S. Katzenbeisser, F. Petitcolas – “Information Hiding: Techniques for
Steganography and Digital Watermarking”