Top Banner
Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University of Cambridge Computer Laboratory Edinburgh, May 2012
64

Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Feb 04, 2018

Download

Documents

voduong
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Linguistic Steganography: Information Hiding in Text

Stephen Clark and Ching-Yun (Frannie) ChangUniversity of Cambridge Computer Laboratory

Edinburgh, May 2012

Page 2: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 2

Information Hiding

My friend Bob, until yesterday I was using binoculars for stargazing.Today, I decided to try my new telescope. The galaxies in Leo andUrsa Major were unbelievable! Next, I plan to check out some nebulasand then prepare to take a few snapshots of the new comet. AlthoughI am satisfied with the telescope, I think I need to purchase lightpollution filters to block the xenon lights from a nearby highway toimprove the quality of my pictures. Cheers, Alice.

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 3: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 3

Information Hiding

My friend Bob, until yesterday I was using binoculars for stargazing.Today, I decided to try my new telescope. The galaxies in Leo andUrsa Major were unbelievable! Next, I plan to check out some nebulasand then prepare to take a few snapshots of the new comet. AlthoughI am satisfied with the telescope, I think I need to purchase lightpollution filters to block the xenon lights from a nearby highway toimprove the quality of my pictures. Cheers, Alice.

mfbuyiwubfstidttmnttgilaumwuniptcosnatpttafsotncaiaswttitintplpftbtxlfanhtitqompca

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 4: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 4

Information Hiding

My friend Bob, until yesterday I was using binoculars for stargazing.Today, I decided to try my new telescope. The galaxies in Leo andUrsa Major were unbelievable! Next, I plan to check out some nebulasand then prepare to take a few snapshots of the new comet. AlthoughI am satisfied with the telescope, I think I need to purchase lightpollution filters to block the xenon lights from a nearby highway toimprove the quality of my pictures. Cheers, Alice.

mfbuyiwubfstidttmnttgilaumwuniptcosnatpttafsotncaiaswttitintplpftbtxlfanhtitqompca

π = 3.141592653589793 . . .

buubdlupnpsspx

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 5: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 5

Information Hiding [Fridrich, 2010]

My friend Bob, until yesterday I was using binoculars for stargazing.Today, I decided to try my new telescope. The galaxies in Leo andUrsa Major were unbelievable! Next, I plan to check out some nebulasand then prepare to take a few snapshots of the new comet. AlthoughI am satisfied with the telescope, I think I need to purchase lightpollution filters to block the xenon lights from a nearby highway toimprove the quality of my pictures. Cheers, Alice.

mfbuyiwubfstidttmnttgilaumwuniptcosnatpttafsotncaiaswttitintplpftbtxlfanhtitqompca

π = 3.141592653589793 . . .

buubdlupnpsspxattack tomorrow

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 6: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 6

Steganography

• Steganography is a branch of security concerned with hidinginformation in some cover medium

• Use of images for hiding information has been extensively studied

• Make changes to an image so that the changes are imperceptibleto an observer

• The resulting image encodes the message

• A related area is watermarking, which is concerned with hidinginformation for the purposes of identification (e.g. copyright)

• or e.g. identifying Google translations

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 7: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 6

Steganography

• Steganography is a branch of security concerned with hidinginformation in some cover medium

• Use of images for hiding information has been extensively studied

• Make changes to an image so that the changes are imperceptibleto an observer

• The resulting image encodes the message

• A related area is watermarking, which is concerned with hidinginformation for the purposes of identification (e.g. copyright)

• or e.g. identifying Google translations

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 8: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 7

The Cover Medium

• Advantages of images

• local changes can maintain global properties of the image• easy to make changes which are imperceptible to a human

• Disadvantages of images

• sender needs an image• sender needs to transmit image to the receiver

• Text is everywhere - why not conceal information in a cover text?

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 9: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 8

Example using Lexical Substitution

• Cover text:

Which is why, some would say, it’s slightly odd that when no less anauthority than the chairman of the Financial Services Authority, LordTurner, questions the social utility of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well it’s curious thatthe chancellor of the exchequer (who could use a bob or two) doesn’tlick his chops and demand a bit of that.

• Secret bitstring: 0 1 1 0 0 0 1 0

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 10: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 9

Example using Lexical Substitution

• Data Embedding:

Which is why, some would say, it’s fairly odd that when no less anauthority than the chairman of the Financial Services Authority, LordTurner, questions the social utility of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well it’s curious thatthe chancellor of the exchequer (who could use a bob or two) doesn’tlick his chops and demand a bit of that.

• Secret bitstring: 0 1 1 0 0 0 1 0

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 11: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 10

Example using Lexical Substitution

• Data Embedding:

Which is why, some would say, it’s fairly odd that when no less anauthority than the president of the Financial Services Authority, LordTurner, questions the social utility of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well it’s curious thatthe chancellor of the exchequer (who could use a bob or two) doesn’tlick his chops and demand a bit of that.

• Secret bitstring: 0 1 1 0 0 0 1 0

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 12: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 11

Example using Lexical Substitution

• Data Embedding:

Which is why, some would say, it’s fairly odd that when no less anauthority than the president of the Financial Services Authority, LordTurner, questions the social usefulness of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well it’s curious thatthe chancellor of the exchequer (who could use a bob or two) doesn’tlick his chops and demand a bit of that.

• Secret bitstring: 0 1 1 0 0 0 1 0

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 13: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 12

Example using Lexical Substitution

• Data Embedding:

Which is why, some would say, it’s fairly odd that when no less anauthority than the president of the Financial Services Authority, LordTurner, questions the social usefulness of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well it’s curious thatthe chancellor of the exchequer (who could use a bob or two) doesn’tlick his chops and demand a bit of that.

• Secret bitstring: 0 1 1 0 0 0 1 0

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 14: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 13

Example using Lexical Substitution

• Data Embedding:

Which is why, some would say, it’s fairly odd that when no less anauthority than the president of the Financial Services Authority, LordTurner, questions the social usefulness of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well it’s strange thatthe chancellor of the exchequer (who could use a bob or two) doesn’tlick his chops and demand a bit of that.

• Secret bitstring: 0 1 1 0 0 0 1 0

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 15: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 14

Example using Lexical Substitution

• Data Embedding:

Which is why, some would say, it’s fairly odd that when no less anauthority than the president of the Financial Services Authority, LordTurner, questions the social usefulness of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well it’s strangethat the chancellor of the exchequer (who could use a bob or two)doesn’t lick his lips and demand a bit of that.

• Secret bitstring: 0 1 1 0 0 0 1 0

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 16: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 15

Example using Lexical Substitution

• Data Embedding:

Which is why, some would say, it’s fairly odd that when no less anauthority than the president of the Financial Services Authority, LordTurner, questions the social usefulness of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well it’s strangethat the chancellor of the exchequer (who could use a bob or two)doesn’t lick his lips and demand a piece of that.

• Secret bitstring: 0 1 1 0 0 0 1 0

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 17: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 16

Example using Lexical Substitution

• Stego Text:

Which is why, some would say, it’s fairly odd that when no less anauthority than the president of the Financial Services Authority, LordTurner, questions the social usefulness of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well it’s strangethat the chancellor of the exchequer (who could use a bob or two)doesn’t lick his lips and demand a piece of that.

• Secret bitstring: 0 1 1 0 0 0 1 0

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 18: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 17

This Talk

• Joint work with Frannie Chang

• Outline:

• more introduction to linguistic steganography• a stegosystem based on lexical substitution

• using some simple NLP techniques

• a secret sharing scheme based on adjective deletion

• Purpose of this talk:

• introduce Linguistic Steganography as a new task at theintersection of NLP and Computer Security

• an interesting task for lexical substitution• a new task for natural language generation/regeneration?

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 19: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 18

Linguistic Steganography

• Some existing work, but very little compared to images

• Concerned with linguistic transformations, rather than superficialproperties of the text (e.g. white spaces)

• Difficulty is that local changes can lead to inconsistencies:

• ungrammatical or unnatural sentences• grammatical, natural sentences which lack coherence with respect

to the rest of the document (or the world)

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 20: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 19

Linguistic Steganography Framework

• Assume an existing cover text which will be modified (rather thangenerated from scratch)

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 21: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 19

Linguistic Steganography Framework

• Assume an existing cover text which will be modified (rather thangenerated from scratch)

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 22: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 20

Linguistic Steganography Framework

• Note that the receiver does not need a copy of the cover text(just the code dictionary for lexical substitution)

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 23: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 21

Linguistic Steganography Framework

• Note that this is not a cryptography problem

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 24: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 22

Linguistic Steganography Framework

• Trade-off between imperceptibility and payload

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 25: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 23

Possible Linguistic Transformations

• Lexical (e.g. synonym substitution)

• Syntactic (e.g. passive/active transformation)

• Semantic/pragmatic

• Can the transformations be applied reliably and often?

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 26: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 23

Possible Linguistic Transformations

• Lexical (e.g. synonym substitution)

• Syntactic (e.g. passive/active transformation)

• Semantic/pragmatic

• Can the transformations be applied reliably and often?

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 27: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 24

Simple Lexical Stegosystem (Winstein, 98)

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 28: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 25

Sense Ambiguity Problem

• Decoding ambiguity⇒ use vertex colour coding (later in talk)

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 29: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 26

Security Simplifications

• Ignoring Kerckhoff’s principle, that the adversary knows thesteganographic channel

• in practice get around this with a secret shared key

• Assuming that the adversary is not a computer (i.e. ignoring thepossibility of steganalysis)

• Assuming that the adversary is passive rather than active

• Ignoring the source of the cover text

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 30: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 27

Lexical Substitution Problem

The idea is a powerful one→ The idea is a potent one

This computer is powerful→ This computer is potent

• Some synonyms are not acceptable in context⇒ need to check whether a synonym is applicable in a givencontext (to ensure imperceptibility)

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 31: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 28

Checking Synonym Applicability

• Use the Google n-gram corpus to see if the synonym in contexthas been used before (and frequently)

• Now a fairly standard technique which has been used for manysimilar lexical disambiguation tasks (Shane Bergsma)

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 32: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 29

The Google n-gram Corpus

the part that you were 103

the part that you will 198

the part that you wish 171

the part that you would 867

the part that your read 45

the part the Riverside County 51

the part the United States 72

the part the detective was 63

the part the next day 95

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 33: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 30

Contextual Check

He was bright and independent and proud →He was clever and independent and proud

f2 = 302, 492 was clever 40,726clever and 261,766

f3 = 8, 072 He was clever 1,798was clever and 6,188clever and independent 86

f4 = 343 He was clever and 343was clever and independent 0clever and independent and 0

f5 = 0 He was clever and independent 0was clever and independent and 0clever and independent and proud 0

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 34: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 31

Contextual Check

He was bright and independent and proud →He was clever and independent and proud

Count(w) =∑

n log(fn)max is the highest n-gram Count for any synonym

Score(w) = Count(w)/maxIf Score(w) ≥ threshold , w passes the contextual check

Count(clever) = log(f2) + log(f3) + log(f4) + log(f5) = 28Score(clever) = 28/max = 0.9

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 35: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 32

Extensions to the Contextual Check

• Weight some n-grams more heavily than others

• Use wild-cards for unknown words

• . . .

• ⇒ difficult to beat the basic system

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 36: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 33

Evaluation

• Automatic evaluation using data from Lexical Substitution Task(McCarthy and Navigli, Semeval 2007)

• Manual human evaluation of naturalness of the modified text

• more direct evaluation of imperceptibility for the steganographyapplication

• We use WordNet as the source of possible substitutes

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 37: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 34

Human Evaluation

• Evaluate imperceptibility by asking humans to rate naturalness ofsentences (1–4), in 3 conditions:

• sentence unchanged• sentence changed by our system (with threshold of 0.95)• sentence changed by random choice of target word and random

choice of substitute from target word’s synsets (baseline)

• Sentences are from Robert Peston’s BBC blog

• On average around 2 changes are made per sentence

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 38: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 35

Example Sentences

ORIG: Apart from anything else, big companies have the size and muscle toderive gains by forcing their suppliers to cut prices (as shown by the furorehighlighted in yesterday’s Telegraph over Serco’s demand - now withdrawn -for a 2.5% rebate from its suppliers); smaller businesses lower down the foodchain simply don’t have that opportunity.

SYSTEM: Apart from anything else, large companies have the size andmuscle to derive gains by pushing their suppliers to cut prices (as evidencedby the furore highlighted in yesterday’s Telegraph over Serco’s need - nowwithdrawn - for a 2.5% rebate from its suppliers); smaller businesses lowerdown the food chain simply don’t have that opportunity.

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 39: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 36

Example Sentences

ORIG: Apart from anything else, big companies have the size and muscle toderive gains by forcing their suppliers to cut prices (as shown by the furorehighlighted in yesterday’s Telegraph over Serco’s demand - now withdrawn -for a 2.5% rebate from its suppliers); smaller businesses lower down the foodchain simply don’t have that opportunity.

RANDOM: Apart from anything else, self-aggrandising companies have thesize and muscle to derive gains by forcing their suppliers to foreshorten prices(as shown by the furore highlighted in yesterday’s Telegraph over Serco’sdemand - now withdrawn - for a 2.5% rebate from its suppliers); smallerbusinesses lower down the food chain simply don’t birth that chance.

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 40: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 37

Experimental Design

• 60 sentences

• 30 judges

• Latin square design with 3 groups of 10 judges

• People in the same group receive the 60 sentences under thesame set of conditions

• Each judge sees all 60 sentences, but sees each sentence onlyonce in one of the three conditions

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 41: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 38

Results

• Average score for the original sentences is 3.67 (scale of 1–4)

• Average score for the sentences modified by our system is 3.33

• Average score for the randomly changed sentences is 2.82

• Differences between the systems are highly significant (WilcoxonSigned-Ranks Test)

• Payload is around 4 bits per sentence for this level ofimperceptibility

• Threshold controls tradeoff between payload and imperceptibility

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 42: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Intro Ling Steg Lex Sub 38

Results

• Average score for the original sentences is 3.67 (scale of 1–4)

• Average score for the sentences modified by our system is 3.33

• Average score for the randomly changed sentences is 2.82

• Differences between the systems are highly significant (WilcoxonSigned-Ranks Test)

• Payload is around 4 bits per sentence for this level ofimperceptibility

• Threshold controls tradeoff between payload and imperceptibility

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 43: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Ambiguity Sharing Deletion 39

Sense Ambiguity Problem

• Different codewords assigned to different senses of compositionleads to a decoding ambiguity

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 44: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Ambiguity Sharing Deletion 40

Sense Ambiguity Problem

• Represent synonymy relation in a graph

• words are nodes in the graph• edges represent membership of the same synset

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 45: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Ambiguity Sharing Deletion 41

Vertex Colour Coding

• Vertex Colouring: a labelling of the graph’s nodes with colours(codes) so that no two adjacent nodes share the same colour

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 46: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Ambiguity Sharing Deletion 42

Vertex Colour Coding Algorithm

• Assume synsets have no more than 4 words

• 99.6% of synsets have less than 8 words

• Task is to maximise the number of nodes (words) in the graphwhilst assigning a unique codeword to each node

• We propose a greedy algorithm to perform the colouring – addedges and codes assuming some ordering of the words so that notwo adjacent nodes share the same code

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 47: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Ambiguity Sharing Deletion 43

Vertex Colour Coding Algorithm

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 48: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Ambiguity Sharing Deletion 44

The Stego Lexical Substitution System

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 49: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Ambiguity Sharing Deletion 45

Deletion as the Transformation

• Words can often be deleted without affecting the meaning

• especially adjectives

“Have you heard of the mysterious death of your late boarderMr. Enoch J. Drebber, of Cleveland?” A terrible change cameover the woman’s face as I asked the question. It was someseconds before she could get out the single word “Yes” – andwhen it did come it was in a husky, unnatural tone.

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 50: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Ambiguity Sharing Deletion 46

Deletion as the Transformation

• How can the receiver detect deleted words in the stego text?

• One possibility is to have more than one stego text, with differentwords deleted in each

• More than one stego text leads to the idea of secret sharing

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 51: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Ambiguity Sharing Deletion 47

Secret Sharing

• There are two receivers, each receiving a different version of thecover text

• Only when the receivers compare texts can the secret message berevealed

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 52: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Ambiguity Sharing Deletion 48

A Secret Sharing Scheme

Secretbits:101

Text: “Have you heard of the mysterious death of your late

boarder Mr. Enoch J. Drebber, of Cleveland?” A terrible

change came over the woman’s face as I asked the question.

It was some seconds before she could get out the single word

“Yes” – and when it did come it was in a husky, unnatural

tone.

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 53: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Ambiguity Sharing Deletion 49

A Secret Sharing Scheme

Embed1st bit: 1

Share0: “Have you heard of the death of your late boarder

Mr. Enoch J. Drebber, of Cleveland?” A terrible change

came over the woman’s face as I asked the question. It was

some seconds before she could get out the single word “Yes”

– and when it did come it was in a husky, unnatural tone.

Targetadj:mysterious Share1: “Have you heard of the mysterious death of your late

boarder Mr. Enoch J. Drebber, of Cleveland?” A terrible

change came over the woman’s face as I asked the question.

It was some seconds before she could get out the single word

“Yes” – and when it did come it was in a husky, unnatural

tone.

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 54: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Ambiguity Sharing Deletion 50

A Secret Sharing Scheme

Embed2nd bit:0

Share0: “Have you heard of the death of your late boarder

Mr. Enoch J. Drebber, of Cleveland?” A terrible change

came over the woman’s face as I asked the question. It was

some seconds before she could get out the single word “Yes”

– and when it did come it was in a husky, unnatural tone.

Targetadj:terrible Share1: “Have you heard of the mysterious death of your

late boarder Mr. Enoch J. Drebber, of Cleveland?” A change

came over the woman’s face as I asked the question. It was

some seconds before she could get out the single word “Yes”

– and when it did come it was in a husky, unnatural tone.

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 55: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Ambiguity Sharing Deletion 51

A Secret Sharing Scheme

Embed3rd bit: 1

Share0: “Have you heard of the death of your late boarder

Mr. Enoch J. Drebber, of Cleveland?” A terrible change

came over the woman’s face as I asked the question. It was

some seconds before she could get out the word “Yes” – and

when it did come it was in a husky, unnatural tone.

Targetadj:single Share1: “Have you heard of the mysterious death of your

late boarder Mr. Enoch J. Drebber, of Cleveland?” A change

came over the woman’s face as I asked the question. It was

some seconds before she could get out the single word “Yes”

– and when it did come it was in a husky, unnatural tone.

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 56: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Ambiguity Sharing Deletion 52

A Secret Sharing Scheme

read offbits: 101

Share0: “Have you heard of the death of your late boarder

Mr. Enoch J. Drebber, of Cleveland?” A terrible change

came over the woman’s face as I asked the question. It was

some seconds before she could get out the word “Yes” – and

when it did come it was in a husky, unnatural tone.

Share1: “Have you heard of the mysterious death of your

late boarder Mr. Enoch J. Drebber, of Cleveland?” A change

came over the woman’s face as I asked the question. It was

some seconds before she could get out the single word “Yes”

– and when it did come it was in a husky, unnatural tone.

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 57: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Ambiguity Sharing Deletion 53

Adjective Deletion Data

• Pleonasm data for pilot study

• free gift, cold ice, final end, . . .

• Full study used human annotated data

• 1,200 sentences from the BNC marked for naturalness (yes/no)

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 58: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Ambiguity Sharing Deletion 54

Example Judgements (YES)

Judgement Example sentence

Deletable He was putting on his heavy overcoat, asked again casually if he couldhave a look at the glass.

Deletable We are seeking to find out what local people want, because they mustown the work themselves.

Deletable We are just at the beginning of the worldwide epidemic and the situationis still very unstable.

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 59: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Ambiguity Sharing Deletion 55

Example Judgements (NO)

Judgement Example sentence

Undeletable He asserted that a modern artist should be in tune with his times, carefulto avoid hackneyed subjects.

Undeletable With various groups suggesting police complicity in township violence,many blacks will find little security in a larger police force.

Undeletable There can be little doubt that such examples represent the tip of aniceberg.

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 60: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Ambiguity Sharing Deletion 56

Data Collection

• 30 native English speakers

• 1,200 sentences with 300 annotated by 3 judges; the restannotated by one

• Fleiss kappa was 0.49 (moderate agreement)

• 700 training; 200 development; 300 test

• Ratio of deletable:undeletable was roughly 2:1

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 61: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Ambiguity Sharing Deletion 57

Deletion Classifier

• SVM classifier with a variety of features, e.g.:

• Google n-gram count ratios before and after deletion• lexical association measures between noun and adjective, eg PMI• Noun and adjective entropy measures• . . .

• Full feature set gave best performance in development tests

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 62: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Ambiguity Sharing Deletion 58

Full Classifer Results on Test Set

Threshold 0.69 0.70 0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78

Pre 70.1 69.8 70.7 72.0 70.8 71.1 74.8 85.0 90.9 100Rec 74.5 73.4 72.9 70.8 65.6 58.9 41.7 26.6 15.6 5.2

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 63: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Ambiguity Sharing Deletion 59

Spot the Changes!

• If you eat a dodgy chicken tikka masala bought from one of our

best-known supermarket strings, and you feel violently sick afterwards,

do you blame the supermarket - or will your ire be placed at the

anonymous manufacturer of the product which caused the tainted meal?

• Most of us would, I think, hold the supermarket accountable, if its name

was on the package - although it was another company altogether that

had the sloppy hygiene measures which caused the poisoning.

• BP also concedes that its own employees made errors, particularly when

it came to interpreting the results of pressure tests.

• But I think it is in BP’s recommendations for change that many will findthe real story, because there BP makes clear that it wants to exercise farbetter oversight of those who work for it when trying to

extract oil from deepwater fields.

Stephen Clark Linguistic Steganography Edinburgh, May 2012

Page 64: Linguistic Steganography: Information Hiding in Textsc609/talks/ed12stego.pdf · Linguistic Steganography: Information Hiding in Text Stephen Clark and Ching-Yun (Frannie) Chang University

Ambiguity Sharing Deletion 60

Spot the Changes!

• If you eat a dodgy chicken tikka masala bought from one of our

best-known supermarket strings, and you feel violently sick afterwards,

do you blame the supermarket - or will your ire be placed at the

anonymous manufacturer of the product which caused the tainted meal?

• Most of us would, I think, hold the supermarket accountable, if its name

was on the package - although it was another company altogether that

had the sloppy hygiene measures which caused the poisoning.

• BP also concedes that its own employees made errors, particularly when

it came to interpreting the results of pressure tests.

• But I think it is in BP’s recommendations for change that many will findthe real story, because there BP makes clear that it wants to exercise farbetter oversight of those who work for it when trying to

extract oil from deepwater fields.

Stephen Clark Linguistic Steganography Edinburgh, May 2012