Top Banner
My Journey to Cracking Steganography Mission 15 at HackThisSite by Ivan Ivanov Petrov (Keeper) FIRST EDITION

My Journey to Cracking Steganography Mission 15

Nov 25, 2015



A detailed description/walkthrough of a famous challenge at HackThisSite.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
  • My Journey to Cracking

    Steganography Mission 15

    at HackThisSite

    by Ivan Ivanov Petrov (Keeper)


  • About the underground

    The website was founded by Jeremy Hammond in the late 2003. For a long time, its been a

    subject to many different organizations trying to gain control over it and destroy the general


    In November 2004 the (now defunct) HackThisSite-based HowDark Security Group notified the phpBB Group, makers of the phpBB bulletin software, of a serious vulnerability in the product. The vulnerability

    was kept under wraps while it was brought to the attention of the phpBB admins, who after reviewing,

    proceeded to downplay its risks. Unhappy with the Groups' failure to take action, HowDark then

    published the bug on the bugtraq mailing-list. Malicious users found and exploited the vulnerability

    which led to the takedown of several phpBB-based bulletin boards and websites. Only then did the

    admins take notice and release a fix. Slowness to patch the vulnerability by end-users led to an

    implementation of the exploit in the Perl/Santy worm (read full article) which defaced upwards of 40,000

    websites and bulletin boards within a few hours of its release.

    - Wikipedia, the free encyclopedia

    The community is dedicated to facilitating an open learning environment by providing a series

    of hacking challenges, articles, resources, and discussion of the latest happenings in hacker

    culture. An online movement of artists, activists, hackers and anarchists who are organizing to

    create new worlds.

    Considering that several of the hacking challenges are simulated web defacements, the

    question of the ethics of hacking is repeatedly brought up. They consider hacking itself to be a

    tool, a skill which in itself is neutral, a means without end. It can be used for good (for the

    benefit of all) or bad (mindless destruction or theft). They do not encourage negative use of the

    information we provide. They are more concerned with the greater risks of not distributing this

    information and are ready to accept the consequences.

  • About Steganography Mission 15

    Starting off from the very beginning, the mission originally had a fairly simple solution until

    there was a followed-up update of the entire challenge which altered the concept entirely. The

    mission drew attention due to the fact that many famous and not so well-known

    steganographers have tried to figure out the notion behind it but none has been able to so far.

    Ever since the year of 2008, the challenge has only been solved by eighteen people worldwide

    (whose origin is unknown up to now). Some state that few of those were the very

    administrators of the website whose hands get to know the answer to every submitted

    challenge on the board. Others are inclined to believe that the solvings are a result of extensive

    exhausted search attacks (a.k.a brute-force attacks).

    My involvement in this mission started back in 2012 when I first had the chance of getting

    introduced to steganography. At first, I thought there wasnt anything special about it but soon

    after I took it on a higher level and was unable of solving it, I found out that it was an

    underground competition.

    Before we proceed with any

    further talk, let us bring out

    the foremost details that

    need be mentioned.

    Beginning with the image

    itself in the first place:

    The steganographied image

    has a divided IDAT structure

    of 12 blocks (the last LSB

    slightly smaller) (.PNG). The

    data seems to have been

    concealed by altering

    the enhanced LSB values, eliminating the high-level bits for each pixel except for the last least

    significant bit. So all bytes are going to be 0 or 1 since 0 or 1 on a 256 values range won't give

    any visible color. Basically, a 0 stays at 0, and a 1 becomes maximum value, or 255. Initial

    analyzes on the image did not show anything in specific or rather odd beyond the utter lack of

    one value in any of the three color values (RGB) and the heightened presence of another value

    in one third of the color values. Studying these and replacing bytes has given me nothing,

    however, and I was at a loss as to whether this avenue is even worth pursuing at all.

  • Hence, I looked into developing a script in rather Python, PHP or C/C++ that would reverse the

    process and 'restore' the enhanced LSBs. Automating the process guarantees a higher

    percentage of success rate since a number of different analyses are being carried in a matter of

    seconds whereas it would take quite a while for a single person to conduct these experiments.

    Converting the image to a 24-bit .BMP and tracking down the red curve from a chi-square

    steganalysis, it is certain that there is a steganographied data within the file therefore nothing

    has been or will be at vain.

    First, there is a little bit more than 8 vertical zones. That means that the hidden data is a little

    bit more than 8kB in size. One pixel can be used to hide three bits (one in the LSB of each RGB

    color tone). So we can hide (98x225)x3 bits. To get the number of kilobytes, we divide by 8 and

    by 1024: ((98x225)x3)/(8x1024). Well, that should be around 8.1 kilobytes. However, that ain't

    the case here.

    The analysis of the APPO and APP1 markers of a .JPG extension of the file also gave some

    awkward outputs:

    Start Offset: 0x00000000

    *** Marker: SOI (xFFD8) ***

    OFFSET: 0x00000000

    *** Marker: APP0 (xFFE0) ***

    OFFSET: 0x00000002

    length = 16

    identifier = [JFIF]

    version = [1.1]

    Chi-square analysis (Java module)

    Chi-square analysis (Batch module)

  • density = 96 x 96 DPI (dots per inch)

    thumbnail = 0 x 0

    *** Marker: APP1 (xFFE1) ***

    OFFSET: 0x00000014

    length = 58

    Identifier = [Exif]

    Identifier TIFF = x[4D 4D 00 2A 00 00 00 08 ]

    Endian = Motorola (big)

    TAG Mark x002A = x[002A]

    EXIF IFD0 @ Absolute x[00000026]

    Dir Length = x[0003]

    [IFD0.x5110 ] =

    [IFD0.x5111 ] = 0

    [IFD0.x5112 ] = 0

    Offset to Next IFD = [00000000]

    *** Marker: DQT (xFFDB) ***

    Define a Quantization Table.

    OFFSET: 0x00000050

    Table length = 67


    Precision=8 bits

    Destination ID=0 (Luminance)

    DQT, Row #0: 2 1 1 2 3 5 6 7

    DQT, Row #1: 1 1 2 2 3 7 7 7

    DQT, Row #2: 2 2 2 3 5 7 8 7

    DQT, Row #3: 2 2 3 3 6 10 10 7

    DQT, Row #4: 2 3 4 7 8 13 12 9

    DQT, Row #5: 3 4 7 8 10 12 14 11

    DQT, Row #6: 6 8 9 10 12 15 14 12

    DQT, Row #7: 9 11 11 12 13 12 12 12

    Approx quality factor = 94.02 (scaling=11.97 variance=1.37)

    Being nearly convinced that there is no encryption algorithm applied therefore no key

    implementation follows the concealment - my notion is that of coding a script that would shift

    the LSB values and return the originals. The file was run under several structure analyses,

    statistical attacks, BPCS and a few others.

  • The histogram of the image shows a specific

    color with an unusual spike to it. I

    manipulated that as best I can to try and

    view any hidden data, but to no avail. Those

    are the histograms of the RGB values as


    Then there are the multiple IDAT chunks. I

    did put together a similar image by defining

    random color values at/for each pixel

    location, and I too wound up with several of

    these. Unfortunately, very little was found inside of them. Even more interesting is the way that

    color values are repeated in the image. It seems as though the frequency of reused colors could

    hold some clue. Yet did not fully understand that relationship, if any exists at all. Additionally,

    there is only a single column and a single row of pixels that do not possess a full value of 255 on

    their alpha channel. I even interpreted the X, Y, A, R, G, and B values of every pixel in the image

    as ASCII, but wound up with nothing too legible. Even the green curve of the average of LSBs

    cannot tell us anything. There is no evident break. Here are several other histograms which

    show the weird curve of the blue value from the RGB:

    The red curve shows some

    difference. It can see

    something that we cannot

    spot (yet). Statistical detection

    is more sensitive than our

    eyes, and I guess that was my

    final point. However, there is

    also a sort of latency in the

    red curve. Even without

    hidden data, it starts at

    maximum and stays like that

    for some time. It is close to a

    false positive state. Looks like

    the LSB in the image is very

    close to random, and the

    algorithm needs a large

    population (keep in mind that

  • the analysis was carried on a consistently incrementing population of pixels) until abutting upon

    a threshold where the choice was to be made whether the red curve has to go down or up

    depending on the state of pixels (which are never randomized). The same sort of latency

    happens occurs in the occasion of hidden data. You hide 1kB or 2kB of data, but the red curve

    does not pay attention to that and alters not its direction after this amount of data. It waits a

    little bit (and in our situation - respectively at around 1.3kB and 2.6kB. Here is a representation

    of the data types from a hex editor:

    Here's another spectrum to confirm the behavior of the blue (RGB) value. Notice the sudden

    curve at the beginning.

    As mentioned above, there is no

    evident clue of the original

    values of the RGB alpha channel.

    They are either set to 255 or 0

    depending on their Least

    Significant Bit. The other option

    that was in my mind at that

    moment was that the mission

    was intended to implement a

    protocol for the usage of

    quantum steganography. Matlab

    and a few other steganalysis techniques seem tempting but to a certain degree. The only

    steganalysis attack that can reveal whether there is anything concealed in terms of eLSB

    technique is the chi-square. As for Matlab, the tools it offers are of no great use since they are

    restricted to what the user supplies as information and we currently have none valid. In

    particular, I could easily reverse the process by pulling the least significant bit from every pixel

  • channel, group them into words of 8 bits and convert back to text. However, that is if I knew

    the key or variable used for the layer encryption.

    Protocols such as those for hiding quantum information in a codeword of a quantum error-

    correcting code passing through a channel are more likely to be the case. Meaning that I cannot

    (it is impossible to) eavesdrop simply with the power to monitor the channel, but without the

    secret key, cannot distinguish the message from the channel noise. In other words there must

    be something other besides this that is the case which I have yet not found. Also noise would

    not only refer to the visual representation of the file. It could be related to a hex dump or

    whatnot - any unreadable/corrupted data as a whole.

    The idea here behind eLSB shifting is that each pixel is being replaced with a different value and

    hence makes the image totally unrecognizable. It is called enhanced because we are eliminating

    the high-level bits for every pixel except for the last LSB one and this is the case where we can

    most often evaluate the layer by looking into the structure of an image and following let us say

    an IDAT of 9 blocks, last LSB will be either smaller or equal to the previous bits (rarely equal in

    fact) which means that the previous ones have been altered and there's literally no room for

    the last LSB.

    One of the few techniques that can be used to detect eLSB steganography (and actually

    differentiate it from quantums) is statistical analyses. The chi-square module represents the

    following data as shown below.

    The program will output a graph with two curves. The first one in red is the result of the chi-square test. If it's close to one, than the probability for a random embedded message is high. So, if there is a

    random message embedded, this green curve will stay around 0.5. On the graph network, every

    vertical blue line represents 1 kilobyte of embedded data.

    - Somnium, a.k.a. Guillermito

  • This is a sample representation of how the LSBs are being enhanced and set to either 255 or 0.

    Basically, the noise level depends on how much data we want to steganography and of course

    the size of the image, the color capacity etc.

    Now let us say there is some sound or whatever audio file meddled. If we are good enough with

    steganography you could mix up both eLSB and audio rendering of an image and come up with

    an incredibly secure layer. Consider we have a file calledfuke.wav which is somewhat altered

    and has some data within it. One of the ways to check for anything specific or whatever is to

    put the file under a frequency analysis and see whether there is something worth pursuing.

    First let's see a temporal analysis alongside a TFFT. Actually, the only difference between a FFT

    and a temporal analysis is that the TFFT studies both the time and frequency of the signal while

    the FFT one only the signal itself (in other words we need to define a spectrum in order to see

    the temporal frequencies).

    If that does not suit us, we can use sox for Linux boxes and generate a similar spectrum. Note

    that sox works only with .wav files (which is pretty much the extension that most software

    worships). Now to output a spectrum we do the following:

  • Code:

    We may have to use a converter like ffmpeg or similar to alter the extension if we have

    previously generated a different one than .wav. And so we end up with the following:

    Similar to that spectrogram are the following. The first one of which is with a dBV^2 scale on a

    1024-bit window at 85%+- and the second one a linear scale and a 2048-bit window at 90%+

    with a log bin. Quite better visible as we can plainly see. Same would refer if we manage to

    scale the sox spectrogram and manipulate it as best as we can but I frankly do not think sox

    offers such possibilities.

    Frankly speaking there is a lot of software for

    embedding and extracting data but none is

    actually efficient when it comes to reversing

    the process. In this case the only possible way

    to reverse it will be to pull the least significant

    bit from every pixel channel, group them into

    words of 8 bits and convert back to text but

    that would only be possible if we had any clue

    on which pixels have been altered (which we do

    not possess as mentioned earlier). Matlab, however, is not the only possibility we are left with.

    There are numerous software distributions for that purpose though a lot of people who are

    capable or have been capable of reaching to this point will be experienced enough to code their

    own script for such purpose (even though being less optimized and functional).

    That being said, the ultimate mission remains a mystery which has lead me to no avail. The

    avenues one could pursue throughout this challenge are literally more than an experienced

    steganographers can imagine.