1 CHAPTER 1 INTRODUCTION With advancements in digital communication technology and the growth of computer power and storage, the difficulties in ensuring individuals’ privacy become increasingly challenging. The degrees to which individuals appreciate privacy differ from one person to another. Various methods have been investigated and developed to protect personal privacy. Encryption is probably the most obvious one, and then comes steganography. Steganography is an old art which has been in practice since time unknown. Steganography, from the Greek, means covered or secret writing and is thus the art of hiding messages inside innocuous cover carriers, e.g. images, audio, video, text, or any other digitally represented code or transmission, in such a manner that the existence of the embedded messages is undetectable. The hidden message may be plaintext, ciphertext, or anything that can be represented as a bit stream. Encryption lends itself to noise and is generally observed while steganography is not observable. Steganography and cryptography, though closely related, they are not the same. The former has the intent to hide the existence of the message whereas the later scrambles a message to absolute illegibility. The goal of steganography is to avoid drawing suspicion to the transmission of a hidden message. It hide messages inside other harmless messages in a way that does not allow any enemy to even detect that there is a second secret message present. If suspicion is raised, then this goal is defeated. Discovering and rendering useless such covert messages is another art form known as steganalysis.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
CHAPTER 1
INTRODUCTION
With advancements in digital communication technology and
the growth of computer power and storage, the difficulties in ensuring
individuals’ privacy become increasingly challenging. The degrees to
which individuals appreciate privacy differ from one person to
another. Various methods have been investigated and developed to
protect personal privacy. Encryption is probably the most obvious
one, and then comes steganography.
Steganography is an old art which has been in practice since
time unknown. Steganography, from the Greek, means covered or
secret writing and is thus the art of hiding messages inside
innocuous cover carriers, e.g. images, audio, video, text, or any other
digitally represented code or transmission, in such a manner that the
existence of the embedded messages is undetectable. The hidden
message may be plaintext, ciphertext, or anything that can be
represented as a bit stream. Encryption lends itself to noise and is
generally observed while steganography is not observable.
Steganography and cryptography, though closely related, they are not
the same. The former has the intent to hide the existence of the
message whereas the later scrambles a message to absolute
illegibility.
The goal of steganography is to avoid drawing suspicion to the
transmission of a hidden message. It hide messages inside other
harmless messages in a way that does not allow any enemy to even
detect that there is a second secret message present. If suspicion is
raised, then this goal is defeated. Discovering and rendering useless
such covert messages is another art form known as steganalysis.
2
This approach of information hiding technique has recently become
important in a number of application areas. Digital audio, video, and
pictures are increasingly furnished with distinguishing but
imperceptible marks, which may contain a hiding copyright notice or
serial number or even help to prevent unauthorized copying directly.
Military communications system make increasing use of traffic
security technique which, rather than merely concealing the content of
a message using encryption, seek to conceal its sender, its receiver
or its very existence. Similar techniques are used in some mobile
phone systems and schemes proposed for digital elections.
1.1 Steganography
Steganography is the art and science of writing hidden messages in
such a way that no one, apart from the sender and intended recipient,
suspects the existence of the message, a form of security through
obscurity.
3
Figure: The different embodiment disciplines of Information Hiding.
The arrow indicates an extension and bold face indicates the focus of
this study.
Intuitively, this work makes use of some nomenclature
commonly used by steganography and watermarking communities.
The term “cover image” is used throughout this thesis to describe the
image designated to carry the embedded bits. An image with
embedded data, payload, is described as “stego-image” while
“steganalysis” or “attacks” refer to different image processing and
statistical analysis approaches that aim to break steganography
algorithms. People use to confuse steganography with cryptography,
which is wrong.
Steganography and cryptography, though closely related, they
are altogether different. The former hides the existence of the
message, while the latter scrambles a message so that it cannot be
understood (Sellars, 1999). But the two techniques must not be
perceived as mutually exclusive and if used together can prove more
powerful. As we have said of steganography, the embedded data is
not necessarily encrypted; hidden message may be plaintext,
ciphertext, or anything that can be represented as a bit stream.
Embedding encrypted message could be more secure and effective.
4
Figure 1: General scheme of steganography
1.2 Steganography vs. Cryptography
Basically, the purpose of cryptography and steganography is to
provide secret communication. However, steganography is not the
same as cryptography. Cryptography hides the contents of a secret
message from a malicious people, whereas steganography even
conceals the existence of the message. Steganography must not be
confused with cryptography, where we transform the message so as
to make it meaning obscure to a malicious people who intercept it.
Therefore, the definition of breaking the system is different [6]. In
cryptography, the system is broken when the attacker can read the
secret message. Breaking a steganographic system need the attacker
to detect that steganography has been used and he is able to read
the embedded message.
In cryptography, the structure of a message is scrambled to
make it meaningless and unintelligible unless the decryption key is
available. It makes no attempt to disguise or hide the encoded
message. Basically, cryptography offers the ability of transmitting
information between persons in a way that prevents a third party from
reading it. Cryptography can also provide authentication for verifying
the identity of someone or something.
5
In contrast, steganography does not alter the structure of the
secret message, but hides it inside a cover-image so it cannot be
seen. A message in ciphertext, for instance, might arouse suspicion
on the part of the recipient while an “invisible” message created with
steganographic methods will not. In other word, steganography
prevents an unintended recipient from suspecting that the data exists.
In addition, the security of classical steganography system relies on
secrecy of the data encoding system. Once the encoding system is
known, the steganography system is defeated.
It is possible to combine the techniques by encrypting
message using cryptography and then hiding the encrypted message
using steganography. The resulting stego-image can be transmitted
without revealing that secret information is being exchanged.
Furthermore, even if an attacker were to defeat the steganographic
technique and detect the message from the stego-object, he would
still require the cryptographic decoding key to decipher the encrypted
message.
Table below shows a comparision between the three
techniques.
Criterion/
Method
Steganography Watermarking Cryptography
Carrier any digital media mostly
image/audio files
usually text
based,
with some
extensions
to image files
Secret data payload watermark plain text
no changes to the structure changes the
structure
Key optional necessary
Detection blind usually blind
6
informative,
i.e.,
original cover
or watermark is
needed for
recovery
Authentication full retrieval of data usually
achieved by
cross
correlation
full retrieval of
data
Objective secrete
communication
Copyright
preserving
data protection
Result stego-file watermarked-
file
cipher-text
Concern delectability/
capacity
robustness robustness
Type of
attacks
steganalysis image
processing
cryptanalysis
Visibility never sometimes Always
Fails when it is detected It is removed/
replaced
de-ciphered
Relation to
cover
not necessarily
related to the
cover. The
message is
more important
than the cover.
usually
becomes an
attribute of the
cover image.
The cover is
more important
than the
message.
N/A
Flexibility free to choose any
suitable cover
cover choice is
restricted
N/A
History very ancient
except its digital
version
modern era modern era
7
Figure 2: Different steganography fields
Our work is Data Hiding (protection against detection). We have used
the cover object as digital image and stego object(secret data) as the
text file.
8
CHAPTER 2
DIGITAL IMAGE STEGANOGRAPHY
Steganography can also be classified a on the basis of carrier
media. The most commonly used media are text, image, audio and
video. So here Digital Images are used as the carrier media.
2.1 DIGITAL IMAGES
A digital image is defined for the purposes of this document as
a raster based, 2-dimensional, rectangular array of static data
elements called pixels, intended for display on a computer monitor or
for transformation into another format, such as a printed page. To a
computer, an image is an array of numbers that represent light
intensities at various points, or pixels. These pixels make up the
image's raster data. Digital images are typically stored in 32-, 24- or
8-bit per pixel files. In 8-bit color images, (such as GIF files), each
pixel is represented as a single byte. A typical 32 bit picture of
width=n pixels and height = m pixels can be represented by an m x n
matrix of pixels.
9
Figure 3: Matrix and bits representation of an image file.
The three 8 bit parts - red-R, blue-B and green-G - constitute
24 bits which means that a pixel should have 24 bits. 32 bit refers to
the image having an "alpha channel". An alpha channel is like an
extra color, although instead of displaying it as a color, it is rendered
translucently (see-through) with the background.
IMAGE FORMATS
There are several image formats in use nowadays. Since raw
image files are quite large, some suitable compression technique is
applied to reduce the size. Based on the kind of compression
employed a given image format can be classified as lossy or lossless.
Lossy compression is used mostly with JPEG files and may not
maintain the original image's integrity despite providing high
10
compression. Obviously it would infect any data embedded in the
image. Lossless compression does maintain the original image data
exactly but does not offer such high compression rates as lossy
compression. PNG, BMP, TIFF and GIF etc are example lossless
formats.
Some commonly used formats are JPEG, BMP, TIFF, GIF and
PNG; the last two types of images are also called palette images. We
discuss here all these formats briefly:
1. TIFF- Tagged Im age File Format (TIFF), which was
developed by the Aldus Corp. in the 1980's, stores many
different types of images ranging from monochrome to true
color. It is a lossless format using LZW (Lempel- Ziv Welch)
compression, a form of Huffman Coding. It is not lossless when
utilizing the new JPEG tag that allows for JPEG compression.
There is no major advantage over JPEG though the quality of
original image is retained. It is not as user-controllable as
claimed.
2. BMP- This is a system standard graphics file format for
Microsoft Windows and hence proprietary and platform
dependent. It is capable of storing truecolor bitmap images and
used in MS Paint and Windows wallpapers etc. Being an
uncompressed file format, it requires high storage.
3. GIF . The Graphics Interchange Format (GIF) is a lossless
format that uses the LZW algorithm which is modified slightly
for image scan line packets (line grouping of pixels). UNISYS
Corp. and CompuServe introduced this format for transmitting
graphical images over phone lines via modems. It is limited to
only 8-bit (256) color images, suitable for images with few
distinctive colors (e.g., graphics drawing). GIF format is also
used for nonphotographic type images, e.g. buttons, borders
etc. It supports simple animation.
11
4. JPEG - A creation of Joint Photographic Expert Group was
voted as international standard in 1992. It takes advantage of
limitations in the human vision system (HVS) to achieve high
rates of compression. It is a lossy type of format which allows
user to set the desired level of quality/compression. By far one
of the most common image formats, it is primarily used for
photographs. JPEGs are extremely popular since they
compress into a small file size and retain excellent image
quality.
5. PNG - (Portable Network Graphic) is a lossless image format,
properly pronounced "ping". The PNG format was created in
December 1994 and was endorsed by The World Wide Web
Consortium (W3C) for its faster loading, and enhanced quality
platform-independent Web graphics. It was designed to
replace the older and simpler GIF format. Like GIF you can
make transparent images for buttons and icons, but it does not
support animation. The compression is asymmetric; reading is
faster than writing.
We have choosen PNG image file format as our carrier media
because of the following advantages:
1. PNG is the most flexible image format for web because it can
save images in 8-bit, 24-bit and 32-bit colours which is not
possible with GIF and JPEG file formats. For example, GIF can
only store only 8-bit or lower bit depths. Similarly, JPEGs must
be stored in 24-bit and no lower while PNG.s can be stored in
8-bit, 24-bit, or 32-bit.
2. PNG uses a lossless compression method, which means that
an image can be compressed and decompressed without any
loss of the image quality. PNG is compressed using any
number of pre-compressed filters and is then decompressed
when viewed similar to JPEG format, except the PNG format is
12
.lossless.. PNG.s compression engine typically compresses
images 5-25% better than GIF.
3. PNG can store a variable transparency value known as alpha
channel transparency. This allows an image to have up to 256
different levels of partial transparency. While, JPEG does not
support transparency, PNG can also store the gamma value of
an image on the platform it was created which can enable a
display system to present the image on its correct gamma
value, if it has been specified. Correct gamma value enables a
picture to display properly on different platform without losing
its quality during transformation.
4. Metadata for Searching and Indexing as keywords and other
text strings (compressed or otherwise) can be incorporated to
enable search engines to locate the image on web.
2.2 STEGANOGRAPHY TECHNIQUES
The following restrictions and features should be kept in mind
during the embedding process:
It is important that the embedding occur without significant
degradation or loss of perceptual quality of the cover.
For data consistency, original data of the cover rather than
header or wrapper must be used for embedding.
Intelligent attacks or anticipated manipulations such as filtering
and resampling should not mutilate the embedded data.
Four main factors that characterize the data hiding techniques in
steganography:
Hiding Capacity: the size of information that can be hidden
relative to the size of the cover.
13
Perceptual Transparency: It is important that the embedding
occur without significant degradation or loss of perceptual
quality of the cover.
Robustness: the ability of embedded data to remain intact if the
stego-image undergoes transformations.
Tamper Resistance: refers to the difficulty for an attacker to
alter or forge a message once it has been embedded.
Digital data can be embedded in many ways into the images,
e.g. sequential, random, non-random (looking for .noisy. areas of the
image, that will attract less attention), redundant etc. Each one of
these has its own merits and demerits. The most common techniques
of data hiding in images are:
1. Appending data bytes at the end of carrier:
The secret data bytes are appended at the end of the carrier
media such as image and the carrier media is then
compressed to its original size to reduce the suspects of
having secret data.
Advantage is that it is very easy to implement. Disadvantage is
it is very easy to detect and get the message.
2. Least significant bit (LSB) insertion:
LSB techniques embed the message bits directly into the least-
significant bit plane of the cover image in a deterministic
sequence. This results in a change with too low an amplitude
to be human-perceptible. LSB embedding is simple, popular
and many techniques use these methods. The problem is its
vulnerability to image manipulation.
3. Public Key Steganography
This method requires the pre-existence of a shared secret key
to designate pixels which should be tweaked. Thus both the
sender and the receiver must have this secret. The idea of
14
private/public key pair doesn.t work since the eavesdropper
can use the public key to sabotage the whole affair.
4. Transform domain based embedding:
Transform Embedding Techniques embed the data by
modulating coefficients in a transform domain, such as