A project completed as part of the requirements for the BSc (Hons) of Science of Computing entitled Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues by Richard Bergmair Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.1/44
44
Embed
Towards Linguistic Steganography: A Systematic ...richard.bergmair.eu/pub/towlingsteg-viva-slides.pdf · A project completed as part of the requirements for the BSc (Hons) of Science
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
A project completed as part of the requirements for theBSc (Hons) of Science of Computing entitled
Towards Linguistic Steganography:A Systematic Investigation of
Approaches, Systems, and Issues
by Richard Bergmair
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.1/44
Motivation
Why Linguistic Steganography?• Cryptosystems can protect sensitive data from
unauthorized access, by using a representationthat makes a cryptogram impossible to interpretbut
• they do not conceal the very fact, that acryptogram has been exchanged
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.2/44
Motivation
Why Linguistic Steganography?• this is not a problem, as long as cryptography is
perceived at a broad (legal?) basis as a legitimateway of protecting one’s privacy, but
• it is a problem, if it seen as a tool useful primarilyto potential terrorists.
In order to protect the individual’s freedom of opinion
and expression, we will have to deal with “Wendy the
warden” trying to detect and penalize unwanted com-
munication.Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.3/44
Motivation
Why Linguistic Steganography?• Stegosystems can protect sensitive data from
being detected, by using a representation thatmakes steganograms appear as covers (a holidayimage, a newspaper article, ...)
• The more covers an arbitrator needs to analyze,trying to detect a steganogram, the more difficult itwill get.
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.4/44
Motivation
Why Linguistic Steganography?• The vast masses of data coded in natural
language make for a good haystack to hide aneedle in. Steganalytic efforts concentrating ondigital images exchanged over the web might stillbe tractable, but it will hardly be possible toarbitrate all communication that takes place innatural language.
• Natural language messages can easily betransmitted over almost any medium.
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.5/44
Steganographic Security
• Alice and Bob want to exchange messages m
chosen from a message-space M over aninsecure channel. They assume that datasubmitted over this channel is intercepted by Eve.
• Alice and Bob have a key-distribution facility, whichequips them with keys k, chosen from a key-spaceK. They can safely assume this channel to besecure, in the sense of trusting it, not to exposethe keys to Eve.
• Alice and Bob want to make the insecure channelsecure, by making the security of the messagesdepend on the security of the keys.
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.6/44
Steganographic Security
In the cryptographic setting,• Alice encrypts the message m, by choosing a
cryptogram e in accordance with the key k:E(m, k) = e.
• Bob decrypts the cryptogram e, i.e. reconstructsthe message m from e using k: D(e, k) = m. This ispossible because ∀m, k : D(E(m, k), k) = m.
• Eve tries to break the cryptogram. This isimpossible because it involves solving a difficultproblem.
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.7/44
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.9/44
Steganographic Security
In the steganographic setting,• Alice embeds the message m into a cover c, by
choosing a steganogram e in accordance with thekey k: E(c,m, k) = e.
• Bob extracts the message from the steganograme using k: D(e, k) = m. This is possible because∀m, k : D(E(m, k), k) = m.
• Eve tries to detect the steganogram. This isimpossible because there is a cover c′ such thatthe difference between e and c′ is imperceptibleby humans, and machines trying to detect it face adifficult problem in the cryptographic sense.
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.10/44
Steganographic Security
A difficult problem in the cryptographic sense can, forexample, be
• factoring the product of two large primes. (numericcrypto, complexity-theoretic analysis)
• guessing a key chosen from a key-space which isas large as the message-space.(information-theoretic analysis)
• solving a problem where the AI-community agreesthat it can easily be solved by intelligent humans,but that it cannot be solved within any knownformal model. (HIPs)
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.11/44
Steganographic Security
K
E
C
M
X
H(M|X)
QR
S
H(K|E)
H(M|E)H(C|E)
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.12/44
Steganographic Security
1/61/6
1/61/6
1
111
1
1/6
1/61/61/6
1
1/241/241/241/24
1/21/2
2/61
1/6
1/61/61/241/241/241/24
2/61/6
1/61/6
1/6
1/61/61/6
1/62/61/61/6
1
1
1
11
2/103/105/10
3/605/60
2/60
1/62/61/61/6
3/605/60
2/60
X M E Cformalization compression encryption mimicry
P Q R S T
interpretation
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.13/44
Lexical Steganography
C = { Midshire is a nice little city,
Midshire is a fine little town,
Midshire is a great little town,
Midshire is a decent little town,
Midshire is a wonderful little town }
M = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 }
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.14/44
Lexical Steganography
Midshire is a
wonderful
decent
fine
great
nice
little
{
city
town
}
.
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.15/44
Lexical Steganography
Midshire is a
00 wonderful
01 decent
10 fine
11 great
?? nice
little
{
0 city
1 town
}
.
10|1 = 1012 = 510
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.16/44
Lexical Steganography
Midshire is a
0 wonderful
1 decent
2 fine
3 great
4 nice
little
{
0 city
1 town
}
.
[
2, 1
5, 2
]
= 2 ∗ 2 + 1 = 5.
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.17/44
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.18/44
Lexical Steganography
Midshire is a
0 wonderful .5
10 decent .25
110 fine .125
1110 great .0625
1111 nice .0625
little
{
0 city .5
1 town .5
}
.
10|1 = 1012 = 510
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.19/44
Lexical Steganography
All approaches we have seen so far have one basicidea in common: transforming a sequence of symbols
s1, s2, s3, . . . , sn
into a sequence
T (s1) | T (s2) | T (s3) | . . . | T (sn),
which has a “dual” interpretation, one with regard to the
cover-channel, one with regard to a secret message.
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.20/44
Context-Free Mimicry
A more sophisticated linguistic model can be achieved,by assuming the symbols as grammatical productions
S ⇒ α1, α1 ⇒ α2, α2 ⇒ α3, . . . , αm−1 ⇒ e.
into a sequence
T (S ⇒ α1) | T (α1 ⇒ α2) | T (α2 ⇒ α3) | . . . | T (αm−1 ⇒ e)
which has a “dual” interpretation, one with regard to the
cover-channel, one with regard to a secret message.
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.21/44
Chapman’s system
The Doe and the Lion A DOE hard fixed by robbers taught refuge in a slave tinkling
to a Lion. The Goods under- took themselves to aversion and disliked before a toothless
wrestler on their words. The Sheep, much past his will, married her backward and forward
for a long time, and at last said, If you had defended a dog in this wood, you would have
had your straits from his sharp teeth. One day he ruined to see a Fellow, whose had
smeared for its pro- vision, resigning along a fool and warning advisedly. said the Horse,
if you really word me to be in good occasion, you could groom me less, and proceed me
more. who have opened in that which I blamed a happy wine the horse of my possession.
[...]
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.22/44
Wayner’s system
It’s time for another game between the Whappers and the Blogs in scenic downtown
Blovonia . I’ve just got to say that the Blog fans have come to support their team and rant
and rave . Play Ball ! Time for another inning . The Whappers will be leading off . Baseball
and Apple Pie . The pitcher spits. Herbert Herbertson swings the bat to get ready and
enters the batter’s box . Here’s the fastball . He tries to bunt, and Robby Rawhide grabs it
and tosses it to first . Hey, one down, two to go. Here we go. Prince Albert von Carmicheal
swings the baseball bat to stretch and enters the batter’s box . Okay. Here’s the pitch It’s
a spitter . High and outside . Ball . No contact in Mudsville ! Nothing on that one . Nice hit
into short left field for a dangerous double and the throw is into the umpire’s head ! [...]
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.23/44
Winstein’s system
“Risky E-Vote System to Expand” Wired News (01/26/04); Zetter, Kim [...]She promises that the workplace computers people use to vote on SERVE will be
fortified(1) with firewalls and other intrusion countermeasures, and adds that electionofficials will recommend that home users install antivirus software on their PCs and runvirus checks prior to election day.Rubin counters that antivirus software can only identify known viruses, and thus is
ineffective against new e-voting malware; moreover(1) , attacks could go undetected
because SERVE lacks elector(0) verifiability.
Rubin and the three(1) other researchers who furnished the report were part of a10-member expert panel enlisted by the Federal Voting Assistance Program (FVAP) toassess SERVE. Paquette reports that of the six remaining FVAP panel members, fiverecommended that the SERVE trial proceed, and one made no comment. [...]
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.24/44
Evaluation
For a number of reasons, I believe that the basic ap-
proach that is most promising for building a secure and
robust natural language steganography system in the
near future is the lexical replacement system, simi-
lar in principle to Winstein’s.
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.25/44
Evaluation
The state of the art in computational linguistics andartificial intelligence is a significant limiting factor!
• Do ontological semantics scale?• Even if they did, we do not have a reliable
common-sense ontology, yet.• Context-free grammars alone do not adequately
characterize natural languages. (anbncn respectively)• Style-templates were never meant to fool
sophisticated linguistic models or humans.
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.26/44
Evaluation
• Lexical models do scale!• And we even have large-scale resources available,
that cover all of everyday written language.(WordNet, for instance)
• Lexical models do not dig very deep into thesemantic realm, but usually this will not be aproblem, if
• we use an embedding-approach, instead of ageneration-approach. This rather conservativeapproach follows the policy: “Use humanlanguage-competence as much as possible, andrely on formal models only when necessary!”
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.27/44
Evaluation
current systems• do not mimic cover-statistics adequately: They do
not mimic word-choice probabilities. A systemsimilar in principle to Winstein’s, however followingWayner’s coding strategy, should be used instead.
• do not encrypt messages adequately: Everyonecan extract the messages from the steganogramsif he has the correct dictionary, respectivelygrammar. (Shouldn’t linguistic knowledge beassumed public wisdom? Language is, bydefinition, something public!) Messages should beencrypted with respect to key-distribution systemsinstead!
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.28/44
Evaluation
current systems• lack robustness. Some kind of error-correction
should be applied.• employ linguistically inadequate models: They use
disjunct interchangeability sets. Statisticalword-sense disambiguation systems should beused instead.
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.29/44
Lexical Ambiguity and Coding
move movement
motion
test
work
go
run
impress strike
(a) disjunct synsets
move
test
work
go
run
impress strike
movement
motion
(b) natural synsets
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.30/44
Lexical Ambiguity and Coding
move
test
work
go
run
impress strike
movement
motion
(c) “Forward ambiguity”
move
test
work
go
run
impress strike
movement
motion
(d) “Backward ambiguity”
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.31/44
Lexical Ambiguity and Coding
... go ...... run ...
... work ...
... move ...
(e) lexical semantics
Austria’s one of my
colornational
colorsfavourite
copying−paper is
bloodis ...
... is
colored ...
... is
(f) “contextual” semantics
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.32/44
Lexical Ambiguity and Coding
Uncle Joe turned out to be a brilliant player of the electric guitar.
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.33/44
Lexical Ambiguity and Coding
rep(o) = dis(L(o), C(o)).
r ∈ rep(o) ⇒ r ∈
{
repA(o), if rep(o) = rep(r)
repB(o), otherwise.
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.34/44
Lexical Ambiguity and Coding
• type-A-words o where repB(o) = ∅. Here we can besure that a replacement of word o will always bereversible automatically.
• type-B-words o where repA(o) = ∅. Here we can besure that a replacement of word o will never bereversible automatically.
• type-C-words o where repA(o) 6= ∅ ∧ repB(o) 6= ∅.Here the question whether a replacement will bereversible depends on the actual replacementwhich is made.
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.35/44
Secure and Robust Coding
[2]kl
[2]kl
[2]kl
[2]kl
ba[2]
xy
[2]pqr
[3]
nm
opqr
[6]
nm
opqr
[6]pqr
[3]xy
[2]
ba[2]
ba[2]
xy
[2]xy
[2]
ba
[2]pqr
[3]pqr
[3]
nm
opqr
[6]
nm
opqr
[6]
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.36/44
Secure and Robust Coding
[2]kl
[12]
[12]
[12]
[12]
[2]kl
[2]kl
[2]klb
a[2]
xy
[2]pqr
[3]
nm
opqr
[6]
nm
opqr
[6]pqr
[3]xy
[2]
ba[2]
ba
[2]xy
[2]pqr
[3]
nm
opqr
[6]
nm
opqr
[6]
ba[2]
xy
[2]pqr
[3]
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.37/44
Secure and Robust Coding
[2]01
[2]01
[2]01
[2]01
[3]012
[2]01
[2]01
[2]01
[2]01
[2]01
[2]01
[2]01
[3]012
[2]01
[2]01
[2]01
[2]01
[2]01
[4] [6] [8] [8] [6] [4] [4][4]
"physical" elements
"virtual" atomar elements to base coding on
"split" prime factors
[4]abcd
[4]pqrs
[6]
nopqr
m[8]
stuvwxyz
[8]stuvwxyz
[6]
nopqr
m[4]
pqrs
[4]abcd
0123
0123
012345
01234567
01234567
012345
0123
0123
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.38/44
Secure and Robust Coding
[2]
l
[2]
l
[2]kl
[2]kl
kk
0 1 1 0 1 0 0 0 1 0
to be usedwith Method I
ba[2]
xy
[2]pqr
[3]
nopqr
[6]
nm
opqr
[6]pqr
[3]xy
[2]
ba
[2]
ba[2]
nm
opqr
[6]
nm
opqr
[6]pqr
[3]
ba
[2]xy
[2]pqr
[3]xy
[2]
to be usedwith Method II
m
pseudorandom numbersseed
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.39/44
Secure and Robust Coding
= = = = = = = =
= =
v222v
221v
220v
219v
217v16
3v210v
28v6
3v25v
21v0
2v22 v
23 v
24 v
27 v
29 v11
3v213 v
214 v
215 v18
5
v02
v21 v
25
v63v63v63
v28 v
220 v
221 v
222 v6
3v123
v123
v163
v22 v
23 v
24 v11
3v185
v27 v
213 v
214 v
215v
210 v
219v
217
rstuvwxyz
012345678
[9]v(s ) s5 5
zyxwvuts
[8]
01234567
v(s ) s6 6
3210
45
nom
pqr
[6]v(s ) s7 7 s
bcd
s
0123
a)v( 8 8
4 e
[5]v( s s
[4]
0123
)v(
pqrs
9
yxwvuts
[8]
0123456
v(s ) s4 4
7 z
3210
45
nom
pqr
[6]v(s ) s3 3s
bcd
s[4]
0123
a)v( 2 2v(s s
[4]
0123
)v(
pqrs
1 1s
bcd
s0
[4]
0123
a)0v(
bcd
[4]
0123
av( 10 10)s s
v29
[ 21
2v ])=0sv(
[ 2 2vv ])=sv( 1
2 32 2
vv ])=sv( 2
4 5[ [ 2vv ])=sv( 3
6 73 [ 2 2
vv v2 ]
v(s )=4
8 9 10 [ vv ])=sv(
[ 2 2vv v
2 ]v(s )=
[ 2vv ])=sv(
[ ])=sv(
[ 2 2vv ])=sv(
[ 2 2vv ]
5 6 7 8 9 v(s10)=
11 12 13 14 15 16 1733 3
v185
19 20 21 22
0 0 1 1 1 0 0 1 0 1 0 1 0 1 1 1 0 0 1 0 0 0 0
v0
] 4 5 616 15 152 2
332
[ [ [] ] ]
[ ]
24
s’ s’ s’ s’s’ s’2s’10
secret secret15
[
][
360secret
1.
2.
3.
4.
5.
6.
I II
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.40/44
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.42/44
Towards Linguistic Steganography:A Systematic Investigation of Approaches, Systems,
and Issues
A project conducted Oct-03 – Apr-04 byRichard Bergmair
atUniversity of Derby in Austria
under supervision byStefan Katzenbeisser.
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.43/44
This slide-set is not to be seen as a self-containeddocument. Please conduct the project-report instead.In particular, note that sources were not properly cited
in this slide-set. See the citations given in theproject-report for reference on sources.
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues – p.44/44