Top Banner
Linguistic Steganography: Information Hiding in Text Stephen Clark with Ching-Yun (Frannie) Chang University of Cambridge Computer Laboratory Luxembourg, September 2013
68

Linguistic Steganography: Information Hiding in Text

Jan 02, 2017

Download

Documents

hoangliem
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Linguistic Steganography: Information Hiding in Text

Linguistic Steganography: Information Hiding in Text

Stephen Clarkwith Ching-Yun (Frannie) Chang

University of Cambridge Computer Laboratory

Luxembourg, September 2013

Page 2: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 2

Information Hiding

My friend Bob, until yesterday I was using binoculars for stargazing.Today, I decided to try my new telescope. The galaxies in Leo andUrsa Major were unbelievable! Next, I plan to check out some nebulasand then prepare to take a few snapshots of the new comet. AlthoughI am satisfied with the telescope, I think I need to purchase lightpollution filters to block the xenon lights from a nearby highway toimprove the quality of my pictures. Cheers, Alice.

Linguistic Steganography

Page 3: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 3

Information Hiding

My friend Bob, until yesterday I was using binoculars for stargazing.Today, I decided to try my new telescope. The galaxies in Leo andUrsa Major were unbelievable! Next, I plan to check out some nebulasand then prepare to take a few snapshots of the new comet. AlthoughI am satisfied with the telescope, I think I need to purchase lightpollution filters to block the xenon lights from a nearby highway toimprove the quality of my pictures. Cheers, Alice.

mfbuyiwubfstidttmnttgilaumwuniptcosnatpttafsotncaiaswttitintplpftbtxlfanhtitqompca

Linguistic Steganography

Page 4: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 4

Information Hiding

My friend Bob, until yesterday I was using binoculars for stargazing.Today, I decided to try my new telescope. The galaxies in Leo andUrsa Major were unbelievable! Next, I plan to check out some nebulasand then prepare to take a few snapshots of the new comet. AlthoughI am satisfied with the telescope, I think I need to purchase lightpollution filters to block the xenon lights from a nearby highway toimprove the quality of my pictures. Cheers, Alice.

mfbuyiwubfstidttmnttgilaumwuniptcosnatpttafsotncaiaswttitintplpftbtxlfanhtitqompca

π = 3.141592653589793 . . .

buubdlupnpsspx

Linguistic Steganography

Page 5: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 5

Information Hiding [Fridrich, 2010]

My friend Bob, until yesterday I was using binoculars for stargazing.Today, I decided to try my new telescope. The galaxies in Leo andUrsa Major were unbelievable! Next, I plan to check out some nebulasand then prepare to take a few snapshots of the new comet. AlthoughI am satisfied with the telescope, I think I need to purchase lightpollution filters to block the xenon lights from a nearby highway toimprove the quality of my pictures. Cheers, Alice.

mfbuyiwubfstidttmnttgilaumwuniptcosnatpttafsotncaiaswttitintplpftbtxlfanhtitqompca

π = 3.141592653589793 . . .

buubdlupnpsspxattack tomorrow

Linguistic Steganography

Page 6: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 6

Steganography

• Steganography is a branch of security concerned with hidinginformation in some cover medium

• Use of images for hiding information has been extensively studied

• Make changes to an image so that the changes are imperceptibleto an observer

• The resulting image encodes the message

• A related area is watermarking, which is concerned with hidinginformation for the purposes of identification (e.g. copyright)

• or e.g. identifying Google translations

Linguistic Steganography

Page 7: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 6

Steganography

• Steganography is a branch of security concerned with hidinginformation in some cover medium

• Use of images for hiding information has been extensively studied

• Make changes to an image so that the changes are imperceptibleto an observer

• The resulting image encodes the message

• A related area is watermarking, which is concerned with hidinginformation for the purposes of identification (e.g. copyright)

• or e.g. identifying Google translations

Linguistic Steganography

Page 8: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 7

The Cover Medium

• Advantages of images

• local changes can maintain global properties of the image• easy to make changes which are imperceptible to a human

• Disadvantages of images

• sender needs an image• sender needs to transmit image to the receiver

• Text is everywhere - why not conceal information in a cover text?

Linguistic Steganography

Page 9: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 8

Example using Lexical Substitution

• Cover text:

Which is why, some would say, it’s slightly odd that when no less anauthority than the chairman of the Financial Services Authority, LordTurner, questions the social utility of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well it’s curious thatthe chancellor of the exchequer (who could use a bob or two) doesn’tlick his chops and demand a bit of that.

• Secret bitstring: 0 1 1 0 0 0 1 0

Linguistic Steganography

Page 10: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 9

Example using Lexical Substitution

• Data Embedding:

Which is why, some would say, it’s fairly odd that when no less anauthority than the chairman of the Financial Services Authority, LordTurner, questions the social utility of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well it’s curious thatthe chancellor of the exchequer (who could use a bob or two) doesn’tlick his chops and demand a bit of that.

• Secret bitstring: 0 1 1 0 0 0 1 0

Linguistic Steganography

Page 11: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 10

Example using Lexical Substitution

• Data Embedding:

Which is why, some would say, it’s fairly odd that when no less anauthority than the president of the Financial Services Authority, LordTurner, questions the social utility of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well it’s curious thatthe chancellor of the exchequer (who could use a bob or two) doesn’tlick his chops and demand a bit of that.

• Secret bitstring: 0 1 1 0 0 0 1 0

Linguistic Steganography

Page 12: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 11

Example using Lexical Substitution

• Data Embedding:

Which is why, some would say, it’s fairly odd that when no less anauthority than the president of the Financial Services Authority, LordTurner, questions the social usefulness of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well it’s curious thatthe chancellor of the exchequer (who could use a bob or two) doesn’tlick his chops and demand a bit of that.

• Secret bitstring: 0 1 1 0 0 0 1 0

Linguistic Steganography

Page 13: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 12

Example using Lexical Substitution

• Data Embedding:

Which is why, some would say, it’s fairly odd that when no less anauthority than the president of the Financial Services Authority, LordTurner, questions the social usefulness of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well it’s curious thatthe chancellor of the exchequer (who could use a bob or two) doesn’tlick his chops and demand a bit of that.

• Secret bitstring: 0 1 1 0 0 0 1 0

Linguistic Steganography

Page 14: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 13

Example using Lexical Substitution

• Data Embedding:

Which is why, some would say, it’s fairly odd that when no less anauthority than the president of the Financial Services Authority, LordTurner, questions the social usefulness of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well it’s strange thatthe chancellor of the exchequer (who could use a bob or two) doesn’tlick his chops and demand a bit of that.

• Secret bitstring: 0 1 1 0 0 0 1 0

Linguistic Steganography

Page 15: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 14

Example using Lexical Substitution

• Data Embedding:

Which is why, some would say, it’s fairly odd that when no less anauthority than the president of the Financial Services Authority, LordTurner, questions the social usefulness of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well it’s strangethat the chancellor of the exchequer (who could use a bob or two)doesn’t lick his lips and demand a bit of that.

• Secret bitstring: 0 1 1 0 0 0 1 0

Linguistic Steganography

Page 16: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 15

Example using Lexical Substitution

• Data Embedding:

Which is why, some would say, it’s fairly odd that when no less anauthority than the president of the Financial Services Authority, LordTurner, questions the social usefulness of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well it’s strangethat the chancellor of the exchequer (who could use a bob or two)doesn’t lick his lips and demand a piece of that.

• Secret bitstring: 0 1 1 0 0 0 1 0

Linguistic Steganography

Page 17: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 16

Example using Lexical Substitution

• Stego Text:

Which is why, some would say, it’s fairly odd that when no less anauthority than the president of the Financial Services Authority, LordTurner, questions the social usefulness of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well it’s strangethat the chancellor of the exchequer (who could use a bob or two)doesn’t lick his lips and demand a piece of that.

• Secret bitstring: 0 1 1 0 0 0 1 0

Linguistic Steganography

Page 18: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 17

This Talk

• Joint work with Frannie Chang

• Outline:

• more introduction to linguistic steganography• a stegosystem based on lexical substitution• a secret sharing scheme based on adjective deletion• online demo

• Motivation:

• can simple NLP methods deliver a practical steganography system?• interesting research area at the intersection of Natural Language

Processing and Computer Security

Linguistic Steganography

Page 19: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 17

This Talk

• Joint work with Frannie Chang

• Outline:

• more introduction to linguistic steganography• a stegosystem based on lexical substitution• a secret sharing scheme based on adjective deletion• online demo

• Motivation:

• can simple NLP methods deliver a practical steganography system?• interesting research area at the intersection of Natural Language

Processing and Computer Security

Linguistic Steganography

Page 20: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 18

Linguistic Steganography

• Some existing work, but very little compared to images

• Concerned with linguistic transformations, rather than superficialproperties of the text (e.g. white spaces)

• Difficulty is that local changes can lead to inconsistencies:

• ungrammatical or unnatural sentences• grammatical, natural sentences which lack coherence with respect

to the rest of the document (or the world)

Linguistic Steganography

Page 21: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 19

Linguistic Steganography Framework

• Assume an existing cover text which will be modified (rather thangenerated from scratch)

Linguistic Steganography

Page 22: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 19

Linguistic Steganography Framework

• Assume an existing cover text which will be modified (rather thangenerated from scratch)

Linguistic Steganography

Page 23: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 20

Linguistic Steganography Framework

• Note that the receiver does not need a copy of the cover text(just the code dictionary for lexical substitution)

Linguistic Steganography

Page 24: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 21

Linguistic Steganography Framework

• Trade-off between imperceptibility and payload

Linguistic Steganography

Page 25: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 22

Possible Linguistic Transformations

• Lexical (e.g. synonym substitution)

• Syntactic (e.g. passive/active transformation)

• Semantic/pragmatic

• Can the transformations be applied reliably and often?

Linguistic Steganography

Page 26: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 22

Possible Linguistic Transformations

• Lexical (e.g. synonym substitution)

• Syntactic (e.g. passive/active transformation)

• Semantic/pragmatic

• Can the transformations be applied reliably and often?

Linguistic Steganography

Page 27: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 23

Simple Lexical Stegosystem (Winstein, 98)

Linguistic Steganography

Page 28: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 24

Sense Ambiguity Problem

• Decoding ambiguity⇒ use a novel form of vertex coding (later in talk)

Linguistic Steganography

Page 29: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 25

Security Simplifications

• Assuming that the adversary is not a computer (i.e. ignoring thepossibility of steganalysis)

• Assuming that the adversary is passive rather than active

• Ignoring the source of the cover text

• Assuming that the adversary does not know the steganographicchannel (Kerckhoff’s principle)

• but opportunities for secret shared keys

Linguistic Steganography

Page 30: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 26

Lexical Substitution Problem

The idea is a powerful one→ The idea is a potent one

This computer is powerful→ This computer is potent

• Some synonyms are not acceptable in context⇒ need to check whether a synonym is applicable in a givencontext (to ensure imperceptibility)

Linguistic Steganography

Page 31: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 27

Checking Synonym Applicability

• Use the Google n-gram corpus to see if the synonym in contexthas been used before (and frequently)

• Now a fairly standard NLP technique which has been used formany similar lexical disambiguation tasks

Linguistic Steganography

Page 32: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 28

Paradigm Shift in NLP

• 30 years ago statistical, corpus-based methods began to appear

• Now the dominant approach for all NLP problems (e.g. Googletranslate)

Linguistic Steganography

Page 33: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 29

The Google n-gram Corpus

the part that you were 103

the part that you will 198

the part that you wish 171

the part that you would 867

the part that your read 45

the part the Riverside County 51

the part the United States 72

the part the detective was 63

the part the next day 95

Linguistic Steganography

Page 34: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 30

Contextual Check

He was bright and independent and proud →He was clever and independent and proud

f2 = 302, 492 was clever 40,726clever and 261,766

f3 = 8, 072 He was clever 1,798was clever and 6,188clever and independent 86

f4 = 343 He was clever and 343was clever and independent 0clever and independent and 0

f5 = 0 He was clever and independent 0was clever and independent and 0clever and independent and proud 0

Linguistic Steganography

Page 35: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 31

Contextual Check

He was bright and independent and proud →He was clever and independent and proud

Count(w) =∑

n log(fn)max is the highest n-gram Count for any synonym

Score(w) = Count(w)/maxIf Score(w) ≥ threshold , w passes the contextual check

Count(clever) = log(f2) + log(f3) + log(f4) + log(f5) = 28Score(clever) = 28/max = 0.9

Linguistic Steganography

Page 36: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 32

Extensions to the Contextual Check

• Weight some n-grams more heavily than others

• Use wild-cards for unknown words

• . . .

• ⇒ difficult to beat the basic system

Linguistic Steganography

Page 37: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 33

Evaluation

• Automatic evaluation using data from Lexical Substitution Task(McCarthy and Navigli, Semeval 2007)

• Manual human evaluation of naturalness of the modified text

• more direct evaluation of imperceptibility for the steganographyapplication

• We use WordNet as the source of possible substitutes

Linguistic Steganography

Page 38: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 34

WordNet

WordNet Search - 3.1- WordNet home page - Glossary - Help

Word to search for: newspaper Search WordNet

Display Options: (Select option to change) Change

Key: "S:" = Show Synset (semantic) relations, "W:" = Show Word (lexical) relationsDisplay options for sense: (gloss) "an example sentence"

Noun

S: (n) newspaper, paper (a daily or weekly publication on folded sheets;contains news and articles and advertisements) "he read his newspaper atbreakfast"S: (n) newspaper, paper, newspaper publisher (a business firm thatpublishes newspapers) "Murdoch owns many newspapers"S: (n) newspaper, paper (the physical object that is the product of anewspaper publisher) "when it began to rain he covered his head with anewspaper"S: (n) newspaper, newsprint (cheap paper made from wood pulp and usedfor printing newspapers) "they used bales of newspaper every day"

WordNet Search - 3.1 http://wordnetweb.princeton.edu/perl/webwn?s=newspaper&...

1 of 1 19/09/2013 08:33

Linguistic Steganography

Page 39: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 35

Human Evaluation

• Evaluate imperceptibility by asking humans to rate naturalness ofsentences (1–4), in 3 conditions:

• sentence unchanged• sentence changed by our system (with threshold of 0.95)• sentence changed by random choice of target word and random

choice of substitute from target word’s synsets (baseline)

• Sentences are from Robert Peston’s BBC blog

• On average around 2 changes are made per sentence

Linguistic Steganography

Page 40: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 36

Example Sentences

ORIG: Apart from anything else, big companies have the size and muscle toderive gains by forcing their suppliers to cut prices (as shown by the furorehighlighted in yesterday’s Telegraph over Serco’s demand - now withdrawn -for a 2.5% rebate from its suppliers); smaller businesses lower down the foodchain simply don’t have that opportunity.

SYSTEM: Apart from anything else, large companies have the size andmuscle to derive gains by pushing their suppliers to cut prices (as evidencedby the furore highlighted in yesterday’s Telegraph over Serco’s need - nowwithdrawn - for a 2.5% rebate from its suppliers); smaller businesses lowerdown the food chain simply don’t have that opportunity.

Linguistic Steganography

Page 41: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 37

Example Sentences

ORIG: Apart from anything else, big companies have the size and muscle toderive gains by forcing their suppliers to cut prices (as shown by the furorehighlighted in yesterday’s Telegraph over Serco’s demand - now withdrawn -for a 2.5% rebate from its suppliers); smaller businesses lower down the foodchain simply don’t have that opportunity.

RANDOM: Apart from anything else, self-aggrandising companies have thesize and muscle to derive gains by forcing their suppliers to foreshorten prices(as shown by the furore highlighted in yesterday’s Telegraph over Serco’sdemand - now withdrawn - for a 2.5% rebate from its suppliers); smallerbusinesses lower down the food chain simply don’t birth that chance.

Linguistic Steganography

Page 42: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 38

Experimental Design

• 60 sentences

• 30 judges

• Latin square design with 3 groups of 10 judges

• People in the same group receive the 60 sentences under thesame set of conditions

• Each judge sees all 60 sentences, but sees each sentence onlyonce in one of the three conditions

Linguistic Steganography

Page 43: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 39

Annotation Guidelines

Linguistic Steganography

Page 44: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 40

Annotation Example

Linguistic Steganography

Page 45: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 41

Results

• Average score for the original sentences is 3.67 (scale of 1–4)

• Average score for the sentences modified by our system is 3.33

• Average score for the randomly changed sentences is 2.82

• Differences between the systems are highly significant (WilcoxonSigned-Ranks Test)

• Payload is a few bits per sentence for this level of imperceptibility

• Threshold controls tradeoff between payload and imperceptibility

Linguistic Steganography

Page 46: Linguistic Steganography: Information Hiding in Text

Intro Ling Steg Lex Sub 41

Results

• Average score for the original sentences is 3.67 (scale of 1–4)

• Average score for the sentences modified by our system is 3.33

• Average score for the randomly changed sentences is 2.82

• Differences between the systems are highly significant (WilcoxonSigned-Ranks Test)

• Payload is a few bits per sentence for this level of imperceptibility

• Threshold controls tradeoff between payload and imperceptibility

Linguistic Steganography

Page 47: Linguistic Steganography: Information Hiding in Text

Ambiguity Sharing Deletion 42

Sense Ambiguity Problem

• Different codewords assigned to different senses of compositionleads to a decoding ambiguity

Linguistic Steganography

Page 48: Linguistic Steganography: Information Hiding in Text

Ambiguity Sharing Deletion 43

Sense Ambiguity Problem

• Represent synonymy relation in a graph

• words are nodes in the graph• edges represent membership of the same synset

Linguistic Steganography

Page 49: Linguistic Steganography: Information Hiding in Text

Ambiguity Sharing Deletion 44

Vertex Colour Coding

• Vertex Colouring: a labelling of the graph’s nodes with colours(codes) so that no two adjacent nodes share the same colour

Linguistic Steganography

Page 50: Linguistic Steganography: Information Hiding in Text

Ambiguity Sharing Deletion 45

Vertex Colour Coding Algorithm

• Assume synsets have no more than 4 words

• 99.6% of synsets have less than 8 words

• Task is to maximise the number of nodes (words) in the graphwhilst assigning a unique codeword to each node

• We propose a greedy algorithm to perform the colouring – addedges and codes assuming some ordering of the words so that notwo adjacent nodes share the same code

Linguistic Steganography

Page 51: Linguistic Steganography: Information Hiding in Text

Ambiguity Sharing Deletion 46

Vertex (Colour) Coding Algorithm

Linguistic Steganography

Page 52: Linguistic Steganography: Information Hiding in Text

Ambiguity Sharing Deletion 47

Vertex Coding Algorithm

Linguistic Steganography

Page 53: Linguistic Steganography: Information Hiding in Text

Ambiguity Sharing Deletion 48

The Stego Lexical Substitution System

Linguistic Steganography

Page 54: Linguistic Steganography: Information Hiding in Text

Ambiguity Sharing Deletion 49

Deletion as the Transformation

• Words can often be deleted without affecting the meaning(especially adjectives)

“Have you heard of the mysterious death of your late boarderMr. Enoch J. Drebber, of Cleveland?” A terrible change cameover the woman’s face as I asked the question. It was someseconds before she could get out the single word “Yes” – andwhen it did come it was in a husky, unnatural tone.

Linguistic Steganography

Page 55: Linguistic Steganography: Information Hiding in Text

Ambiguity Sharing Deletion 50

Deletion as the Transformation

• How can the receiver detect deleted words in the stego text?

• One possibility is to have more than one stego text, with differentwords deleted in each

• More than one stego text leads to the idea of secret sharing

Linguistic Steganography

Page 56: Linguistic Steganography: Information Hiding in Text

Ambiguity Sharing Deletion 51

Secret Sharing

• There are two receivers, each receiving a different version of thecover text

• Only when the receivers compare texts can the secret message berevealed

Linguistic Steganography

Page 57: Linguistic Steganography: Information Hiding in Text

Ambiguity Sharing Deletion 52

A Secret Sharing Scheme

Secretbits:101

Text: “Have you heard of the mysterious death of your late

boarder Mr. Enoch J. Drebber, of Cleveland?” A terrible

change came over the woman’s face as I asked the question.

It was some seconds before she could get out the single word

“Yes” – and when it did come it was in a husky, unnatural

tone.

Linguistic Steganography

Page 58: Linguistic Steganography: Information Hiding in Text

Ambiguity Sharing Deletion 53

A Secret Sharing Scheme

Embed1st bit: 1

Share0: “Have you heard of the death of your late boarder

Mr. Enoch J. Drebber, of Cleveland?” A terrible change

came over the woman’s face as I asked the question. It was

some seconds before she could get out the single word “Yes”

– and when it did come it was in a husky, unnatural tone.

Targetadj:mysterious Share1: “Have you heard of the mysterious death of your late

boarder Mr. Enoch J. Drebber, of Cleveland?” A terrible

change came over the woman’s face as I asked the question.

It was some seconds before she could get out the single word

“Yes” – and when it did come it was in a husky, unnatural

tone.

Linguistic Steganography

Page 59: Linguistic Steganography: Information Hiding in Text

Ambiguity Sharing Deletion 54

A Secret Sharing Scheme

Embed2nd bit:0

Share0: “Have you heard of the death of your late boarder

Mr. Enoch J. Drebber, of Cleveland?” A terrible change

came over the woman’s face as I asked the question. It was

some seconds before she could get out the single word “Yes”

– and when it did come it was in a husky, unnatural tone.

Targetadj:terrible Share1: “Have you heard of the mysterious death of your

late boarder Mr. Enoch J. Drebber, of Cleveland?” A change

came over the woman’s face as I asked the question. It was

some seconds before she could get out the single word “Yes”

– and when it did come it was in a husky, unnatural tone.

Linguistic Steganography

Page 60: Linguistic Steganography: Information Hiding in Text

Ambiguity Sharing Deletion 55

A Secret Sharing Scheme

Embed3rd bit: 1

Share0: “Have you heard of the death of your late boarder

Mr. Enoch J. Drebber, of Cleveland?” A terrible change

came over the woman’s face as I asked the question. It was

some seconds before she could get out the word “Yes” – and

when it did come it was in a husky, unnatural tone.

Targetadj:single Share1: “Have you heard of the mysterious death of your

late boarder Mr. Enoch J. Drebber, of Cleveland?” A change

came over the woman’s face as I asked the question. It was

some seconds before she could get out the single word “Yes”

– and when it did come it was in a husky, unnatural tone.

Linguistic Steganography

Page 61: Linguistic Steganography: Information Hiding in Text

Ambiguity Sharing Deletion 56

A Secret Sharing Scheme

read offbits: 101

Share0: “Have you heard of the death of your late boarder

Mr. Enoch J. Drebber, of Cleveland?” A terrible change

came over the woman’s face as I asked the question. It was

some seconds before she could get out the word “Yes” – and

when it did come it was in a husky, unnatural tone.

Share1: “Have you heard of the mysterious death of your

late boarder Mr. Enoch J. Drebber, of Cleveland?” A change

came over the woman’s face as I asked the question. It was

some seconds before she could get out the single word “Yes”

– and when it did come it was in a husky, unnatural tone.

Linguistic Steganography

Page 62: Linguistic Steganography: Information Hiding in Text

Ambiguity Sharing Deletion 57

Adjective Deletion Data

• Pleonasm data for pilot study

• free gift, cold ice, final end, . . .

• Full study used human annotated data

• 1,200 sentences from the BNC marked for naturalness (yes/no)

Linguistic Steganography

Page 63: Linguistic Steganography: Information Hiding in Text

Ambiguity Sharing Deletion 58

Example Judgements (YES)

Judgement Example sentence

Deletable He was putting on his heavy overcoat, asked again casually if he couldhave a look at the glass.

Deletable We are seeking to find out what local people want, because they mustown the work themselves.

Deletable We are just at the beginning of the worldwide epidemic and the situationis still very unstable.

Linguistic Steganography

Page 64: Linguistic Steganography: Information Hiding in Text

Ambiguity Sharing Deletion 59

Example Judgements (NO)

Judgement Example sentence

Undeletable He asserted that a modern artist should be in tune with his times, carefulto avoid hackneyed subjects.

Undeletable With various groups suggesting police complicity in township violence,many blacks will find little security in a larger police force.

Undeletable There can be little doubt that such examples represent the tip of aniceberg.

Linguistic Steganography

Page 65: Linguistic Steganography: Information Hiding in Text

Ambiguity Sharing Deletion 60

Data Collection

• 30 native English speakers

• 1,200 sentences with 300 annotated by 3 judges; the restannotated by one

• Fleiss kappa was 0.49 (moderate agreement)

• 700 training; 200 development; 300 test

• Ratio of deletable:undeletable was roughly 2:1

Linguistic Steganography

Page 66: Linguistic Steganography: Information Hiding in Text

Ambiguity Sharing Deletion 61

Deletion Classifier

• SVM classifier with a variety of features, e.g.:

• Google n-gram count ratios before and after deletion• lexical association measures between noun and adjective, eg PMI• Noun and adjective entropy measures• . . .

Linguistic Steganography

Page 67: Linguistic Steganography: Information Hiding in Text

Ambiguity Sharing Deletion 62

Full Classifer Results on Test Set

Threshold 0.69 0.70 0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78

Pre 70.1 69.8 70.7 72.0 70.8 71.1 74.8 85.0 90.9 100Rec 74.5 73.4 72.9 70.8 65.6 58.9 41.7 26.6 15.6 5.2

Linguistic Steganography

Page 68: Linguistic Steganography: Information Hiding in Text

Ambiguity Sharing Deletion 63

References

• Practical Linguistic Steganography using Contextual SynonymSubstitution and a Novel Vertex Coding MethodChing-Yun Chang and Stephen ClarkTo appear in Computational Linguistics

• Adjective Deletion for Linguistic Steganography and Secret SharingChing-Yun Chang and Stephen ClarkProceedings of the 24th International Conference on ComputationalLinguistics (COLING-12), Mumbai, India, 2012

• The Secret’s in the Word Order: Text-to-Text Generation for LinguisticSteganographyChing-Yun Chang and Stephen ClarkProceedings of the 24th International Conference on ComputationalLinguistics (COLING-12), Mumbai, India, 2012

• Linguistic Steganography using Automatically Generated ParaphrasesChing-Yun Chang and Stephen ClarkProceedings of the Annual Meeting of the North American Association forComputational Linguistics (NAACL-HLT-10), Los Angeles, 2010

Linguistic Steganography