Top Banner
Linguistic Steganography: Information Hiding in Text Stephen Clark with Ching-Yun (Frannie) Chang University of Cambridge Computer Laboratory Luxembourg, September 2013
68

Linguistic Steganography: Information Hiding in Text

Jan 02, 2017

Download

Documents

hoangliem
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Linguistic Steganography: Information Hiding in Text

    Stephen Clarkwith Ching-Yun (Frannie) Chang

    University of Cambridge Computer Laboratory

    Luxembourg, September 2013

  • Intro Ling Steg Lex Sub 2

    Information Hiding

    My friend Bob, until yesterday I was using binoculars for stargazing.Today, I decided to try my new telescope. The galaxies in Leo andUrsa Major were unbelievable! Next, I plan to check out some nebulasand then prepare to take a few snapshots of the new comet. AlthoughI am satisfied with the telescope, I think I need to purchase lightpollution filters to block the xenon lights from a nearby highway toimprove the quality of my pictures. Cheers, Alice.

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 3

    Information Hiding

    My friend Bob, until yesterday I was using binoculars for stargazing.Today, I decided to try my new telescope. The galaxies in Leo andUrsa Major were unbelievable! Next, I plan to check out some nebulasand then prepare to take a few snapshots of the new comet. AlthoughI am satisfied with the telescope, I think I need to purchase lightpollution filters to block the xenon lights from a nearby highway toimprove the quality of my pictures. Cheers, Alice.

    mfbuyiwubfstidttmnttgilaumwuniptcosnatpttafsotncaiaswttitintplpftbtxlfanhtitqompca

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 4

    Information Hiding

    My friend Bob, until yesterday I was using binoculars for stargazing.Today, I decided to try my new telescope. The galaxies in Leo andUrsa Major were unbelievable! Next, I plan to check out some nebulasand then prepare to take a few snapshots of the new comet. AlthoughI am satisfied with the telescope, I think I need to purchase lightpollution filters to block the xenon lights from a nearby highway toimprove the quality of my pictures. Cheers, Alice.

    mfbuyiwubfstidttmnttgilaumwuniptcosnatpttafsotncaiaswttitintplpftbtxlfanhtitqompca

    = 3.141592653589793 . . .

    buubdlupnpsspx

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 5

    Information Hiding [Fridrich, 2010]

    My friend Bob, until yesterday I was using binoculars for stargazing.Today, I decided to try my new telescope. The galaxies in Leo andUrsa Major were unbelievable! Next, I plan to check out some nebulasand then prepare to take a few snapshots of the new comet. AlthoughI am satisfied with the telescope, I think I need to purchase lightpollution filters to block the xenon lights from a nearby highway toimprove the quality of my pictures. Cheers, Alice.

    mfbuyiwubfstidttmnttgilaumwuniptcosnatpttafsotncaiaswttitintplpftbtxlfanhtitqompca

    = 3.141592653589793 . . .

    buubdlupnpsspxattack tomorrow

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 6

    Steganography

    Steganography is a branch of security concerned with hidinginformation in some cover medium

    Use of images for hiding information has been extensively studied

    Make changes to an image so that the changes are imperceptibleto an observer

    The resulting image encodes the message

    A related area is watermarking, which is concerned with hidinginformation for the purposes of identification (e.g. copyright)

    or e.g. identifying Google translations

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 6

    Steganography

    Steganography is a branch of security concerned with hidinginformation in some cover medium

    Use of images for hiding information has been extensively studied

    Make changes to an image so that the changes are imperceptibleto an observer

    The resulting image encodes the message A related area is watermarking, which is concerned with hiding

    information for the purposes of identification (e.g. copyright) or e.g. identifying Google translations

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 7

    The Cover Medium

    Advantages of images

    local changes can maintain global properties of the image easy to make changes which are imperceptible to a human

    Disadvantages of images

    sender needs an image sender needs to transmit image to the receiver

    Text is everywhere - why not conceal information in a cover text?

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 8

    Example using Lexical Substitution

    Cover text:

    Which is why, some would say, its slightly odd that when no less anauthority than the chairman of the Financial Services Authority, LordTurner, questions the social utility of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well its curious thatthe chancellor of the exchequer (who could use a bob or two) doesntlick his chops and demand a bit of that.

    Secret bitstring: 0 1 1 0 0 0 1 0

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 9

    Example using Lexical Substitution

    Data Embedding:

    Which is why, some would say, its fairly odd that when no less anauthority than the chairman of the Financial Services Authority, LordTurner, questions the social utility of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well its curious thatthe chancellor of the exchequer (who could use a bob or two) doesntlick his chops and demand a bit of that.

    Secret bitstring: 0 1 1 0 0 0 1 0

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 10

    Example using Lexical Substitution

    Data Embedding:

    Which is why, some would say, its fairly odd that when no less anauthority than the president of the Financial Services Authority, LordTurner, questions the social utility of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well its curious thatthe chancellor of the exchequer (who could use a bob or two) doesntlick his chops and demand a bit of that.

    Secret bitstring: 0 1 1 0 0 0 1 0

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 11

    Example using Lexical Substitution

    Data Embedding:

    Which is why, some would say, its fairly odd that when no less anauthority than the president of the Financial Services Authority, LordTurner, questions the social usefulness of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well its curious thatthe chancellor of the exchequer (who could use a bob or two) doesntlick his chops and demand a bit of that.

    Secret bitstring: 0 1 1 0 0 0 1 0

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 12

    Example using Lexical Substitution

    Data Embedding:

    Which is why, some would say, its fairly odd that when no less anauthority than the president of the Financial Services Authority, LordTurner, questions the social usefulness of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well its curious thatthe chancellor of the exchequer (who could use a bob or two) doesntlick his chops and demand a bit of that.

    Secret bitstring: 0 1 1 0 0 0 1 0

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 13

    Example using Lexical Substitution

    Data Embedding:

    Which is why, some would say, its fairly odd that when no less anauthority than the president of the Financial Services Authority, LordTurner, questions the social usefulness of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well its strange thatthe chancellor of the exchequer (who could use a bob or two) doesntlick his chops and demand a bit of that.

    Secret bitstring: 0 1 1 0 0 0 1 0

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 14

    Example using Lexical Substitution

    Data Embedding:

    Which is why, some would say, its fairly odd that when no less anauthority than the president of the Financial Services Authority, LordTurner, questions the social usefulness of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well its strangethat the chancellor of the exchequer (who could use a bob or two)doesnt lick his lips and demand a bit of that.

    Secret bitstring: 0 1 1 0 0 0 1 0

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 15

    Example using Lexical Substitution

    Data Embedding:

    Which is why, some would say, its fairly odd that when no less anauthority than the president of the Financial Services Authority, LordTurner, questions the social usefulness of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well its strangethat the chancellor of the exchequer (who could use a bob or two)doesnt lick his lips and demand a piece of that.

    Secret bitstring: 0 1 1 0 0 0 1 0

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 16

    Example using Lexical Substitution

    Stego Text:

    Which is why, some would say, its fairly odd that when no less anauthority than the president of the Financial Services Authority, LordTurner, questions the social usefulness of much activity in financialmarkets, and also suggests that it might be no bad thing to levy a tinyTobin tax on all this frenetic trading in electrons, well its strangethat the chancellor of the exchequer (who could use a bob or two)doesnt lick his lips and demand a piece of that.

    Secret bitstring: 0 1 1 0 0 0 1 0

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 17

    This Talk

    Joint work with Frannie Chang

    Outline:

    more introduction to linguistic steganography a stegosystem based on lexical substitution a secret sharing scheme based on adjective deletion online demo

    Motivation:

    can simple NLP methods deliver a practical steganography system? interesting research area at the intersection of Natural Language

    Processing and Computer Security

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 17

    This Talk

    Joint work with Frannie Chang

    Outline:

    more introduction to linguistic steganography a stegosystem based on lexical substitution a secret sharing scheme based on adjective deletion online demo

    Motivation:

    can simple NLP methods deliver a practical steganography system? interesting research area at the intersection of Natural Language

    Processing and Computer Security

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 18

    Linguistic Steganography

    Some existing work, but very little compared to images

    Concerned with linguistic transformations, rather than superficialproperties of the text (e.g. white spaces)

    Difficulty is that local changes can lead to inconsistencies:

    ungrammatical or unnatural sentences grammatical, natural sentences which lack coherence with respect

    to the rest of the document (or the world)

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 19

    Linguistic Steganography Framework

    Assume an existing cover text which will be modified (rather thangenerated from scratch)

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 19

    Linguistic Steganography Framework

    Assume an existing cover text which will be modified (rather thangenerated from scratch)

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 20

    Linguistic Steganography Framework

    Note that the receiver does not need a copy of the cover text(just the code dictionary for lexical substitution)

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 21

    Linguistic Steganography Framework

    Trade-off between imperceptibility and payload

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 22

    Possible Linguistic Transformations

    Lexical (e.g. synonym substitution)

    Syntactic (e.g. passive/active transformation)

    Semantic/pragmatic

    Can the transformations be applied reliably and often?

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 22

    Possible Linguistic Transformations

    Lexical (e.g. synonym substitution)

    Syntactic (e.g. passive/active transformation)

    Semantic/pragmatic

    Can the transformations be applied reliably and often?

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 23

    Simple Lexical Stegosystem (Winstein, 98)

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 24

    Sense Ambiguity Problem

    Decoding ambiguity use a novel form of vertex coding (later in talk)

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 25

    Security Simplifications

    Assuming that the adversary is not a computer (i.e. ignoring thepossibility of steganalysis)

    Assuming that the adversary is passive rather than active

    Ignoring the source of the cover text

    Assuming that the adversary does not know the steganographicchannel (Kerckhoffs principle)

    but opportunities for secret shared keys

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 26

    Lexical Substitution Problem

    The idea is a powerful one The idea is a potent one

    This computer is powerful This computer is potent

    Some synonyms are not acceptable in context need to check whether a synonym is applicable in a givencontext (to ensure imperceptibility)

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 27

    Checking Synonym Applicability

    Use the Google n-gram corpus to see if the synonym in contexthas been used before (and frequently)

    Now a fairly standard NLP technique which has been used formany similar lexical disambiguation tasks

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 28

    Paradigm Shift in NLP

    30 years ago statistical, corpus-based methods began to appear

    Now the dominant approach for all NLP problems (e.g. Googletranslate)

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 29

    The Google n-gram Corpus

    the part that you were 103

    the part that you will 198

    the part that you wish 171

    the part that you would 867

    the part that your read 45

    the part the Riverside County 51

    the part the United States 72

    the part the detective was 63

    the part the next day 95

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 30

    Contextual Check

    He was bright and independent and proud He was clever and independent and proud

    f2 = 302, 492 was clever 40,726clever and 261,766

    f3 = 8, 072 He was clever 1,798was clever and 6,188clever and independent 86

    f4 = 343 He was clever and 343was clever and independent 0clever and independent and 0

    f5 = 0 He was clever and independent 0was clever and independent and 0clever and independent and proud 0

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 31

    Contextual Check

    He was bright and independent and proud He was clever and independent and proud

    Count(w) =

    n log(fn)max is the highest n-gram Count for any synonym

    Score(w) = Count(w)/maxIf Score(w) threshold , w passes the contextual check

    Count(clever) = log(f2) + log(f3) + log(f4) + log(f5) = 28Score(clever) = 28/max = 0.9

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 32

    Extensions to the Contextual Check

    Weight some n-grams more heavily than others

    Use wild-cards for unknown words

    . . .

    difficult to beat the basic system

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 33

    Evaluation

    Automatic evaluation using data from Lexical Substitution Task(McCarthy and Navigli, Semeval 2007)

    Manual human evaluation of naturalness of the modified text

    more direct evaluation of imperceptibility for the steganographyapplication

    We use WordNet as the source of possible substitutes

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 34

    WordNet

    WordNet Search - 3.1- WordNet home page - Glossary - Help

    Word to search for: newspaper Search WordNet

    Display Options: (Select option to change) ChangeKey: "S:" = Show Synset (semantic) relations, "W:" = Show Word (lexical) relationsDisplay options for sense: (gloss) "an example sentence"

    Noun

    S: (n) newspaper, paper (a daily or weekly publication on folded sheets;contains news and articles and advertisements) "he read his newspaper atbreakfast"S: (n) newspaper, paper, newspaper publisher (a business firm thatpublishes newspapers) "Murdoch owns many newspapers"S: (n) newspaper, paper (the physical object that is the product of anewspaper publisher) "when it began to rain he covered his head with anewspaper"S: (n) newspaper, newsprint (cheap paper made from wood pulp and usedfor printing newspapers) "they used bales of newspaper every day"

    WordNet Search - 3.1 http://wordnetweb.princeton.edu/perl/webwn?s=newspaper&...

    1 of 1 19/09/2013 08:33

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 35

    Human Evaluation

    Evaluate imperceptibility by asking humans to rate naturalness ofsentences (14), in 3 conditions:

    sentence unchanged sentence changed by our system (with threshold of 0.95) sentence changed by random choice of target word and random

    choice of substitute from target words synsets (baseline)

    Sentences are from Robert Pestons BBC blog

    On average around 2 changes are made per sentence

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 36

    Example Sentences

    ORIG: Apart from anything else, big companies have the size and muscle toderive gains by forcing their suppliers to cut prices (as shown by the furorehighlighted in yesterdays Telegraph over Sercos demand - now withdrawn -for a 2.5% rebate from its suppliers); smaller businesses lower down the foodchain simply dont have that opportunity.

    SYSTEM: Apart from anything else, large companies have the size andmuscle to derive gains by pushing their suppliers to cut prices (as evidencedby the furore highlighted in yesterdays Telegraph over Sercos need - nowwithdrawn - for a 2.5% rebate from its suppliers); smaller businesses lowerdown the food chain simply dont have that opportunity.

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 37

    Example Sentences

    ORIG: Apart from anything else, big companies have the size and muscle toderive gains by forcing their suppliers to cut prices (as shown by the furorehighlighted in yesterdays Telegraph over Sercos demand - now withdrawn -for a 2.5% rebate from its suppliers); smaller businesses lower down the foodchain simply dont have that opportunity.

    RANDOM: Apart from anything else, self-aggrandising companies have thesize and muscle to derive gains by forcing their suppliers to foreshorten prices(as shown by the furore highlighted in yesterdays Telegraph over Sercosdemand - now withdrawn - for a 2.5% rebate from its suppliers); smallerbusinesses lower down the food chain simply dont birth that chance.

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 38

    Experimental Design

    60 sentences

    30 judges

    Latin square design with 3 groups of 10 judges

    People in the same group receive the 60 sentences under thesame set of conditions

    Each judge sees all 60 sentences, but sees each sentence onlyonce in one of the three conditions

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 39

    Annotation Guidelines

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 40

    Annotation Example

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 41

    Results

    Average score for the original sentences is 3.67 (scale of 14)

    Average score for the sentences modified by our system is 3.33

    Average score for the randomly changed sentences is 2.82

    Differences between the systems are highly significant (WilcoxonSigned-Ranks Test)

    Payload is a few bits per sentence for this level of imperceptibility

    Threshold controls tradeoff between payload and imperceptibility

    Linguistic Steganography

  • Intro Ling Steg Lex Sub 41

    Results

    Average score for the original sentences is 3.67 (scale of 14)

    Average score for the sentences modified by our system is 3.33

    Average score for the randomly changed sentences is 2.82

    Differences between the systems are highly significant (WilcoxonSigned-Ranks Test)

    Payload is a few bits per sentence for this level of imperceptibility

    Threshold controls tradeoff between payload and imperceptibility

    Linguistic Steganography

  • Ambiguity Sharing Deletion 42

    Sense Ambiguity Problem

    Different codewords assigned to different senses of compositionleads to a decoding ambiguity

    Linguistic Steganography

  • Ambiguity Sharing Deletion 43

    Sense Ambiguity Problem

    Represent synonymy relation in a graph

    words are nodes in the graph edges represent membership of the same synset

    Linguistic Steganography

  • Ambiguity Sharing Deletion 44

    Vertex Colour Coding

    Vertex Colouring: a labelling of the graphs nodes with colours(codes) so that no two adjacent nodes share the same colour

    Linguistic Steganography

  • Ambiguity Sharing Deletion 45

    Vertex Colour Coding Algorithm

    Assume synsets have no more than 4 words

    99.6% of synsets have less than 8 words

    Task is to maximise the number of nodes (words) in the graphwhilst assigning a unique codeword to each node

    We propose a greedy algorithm to perform the colouring addedges and codes assuming some ordering of the words so that notwo adjacent nodes share the same code

    Linguistic Steganography

  • Ambiguity Sharing Deletion 46

    Vertex (Colour) Coding Algorithm

    Linguistic Steganography

  • Ambiguity Sharing Deletion 47

    Vertex Coding Algorithm

    Linguistic Steganography

  • Ambiguity Sharing Deletion 48

    The Stego Lexical Substitution System

    Linguistic Steganography

  • Ambiguity Sharing Deletion 49

    Deletion as the Transformation

    Words can often be deleted without affecting the meaning(especially adjectives)

    Have you heard of the mysterious death of your late boarderMr. Enoch J. Drebber, of Cleveland? A terrible change cameover the womans face as I asked the question. It was someseconds before she could get out the single word Yes andwhen it did come it was in a husky, unnatural tone.

    Linguistic Steganography

  • Ambiguity Sharing Deletion 50

    Deletion as the Transformation

    How can the receiver detect deleted words in the stego text?

    One possibility is to have more than one stego text, with differentwords deleted in each

    More than one stego text leads to the idea of secret sharing

    Linguistic Steganography

  • Ambiguity Sharing Deletion 51

    Secret Sharing

    There are two receivers, each receiving a different version of thecover text

    Only when the receivers compare texts can the secret message berevealed

    Linguistic Steganography

  • Ambiguity Sharing Deletion 52

    A Secret Sharing Scheme

    Secretbits:101

    Text: Have you heard of the mysterious death of your lateboarder Mr. Enoch J. Drebber, of Cleveland? A terrible

    change came over the womans face as I asked the question.

    It was some seconds before she could get out the single word

    Yes and when it did come it was in a husky, unnatural

    tone.

    Linguistic Steganography

  • Ambiguity Sharing Deletion 53

    A Secret Sharing Scheme

    Embed1st bit: 1

    Share0: Have you heard of the death of your late boarder

    Mr. Enoch J. Drebber, of Cleveland? A terrible change

    came over the womans face as I asked the question. It was

    some seconds before she could get out the single word Yes

    and when it did come it was in a husky, unnatural tone.

    Targetadj:mysterious Share1: Have you heard of the mysterious death of your late

    boarder Mr. Enoch J. Drebber, of Cleveland? A terrible

    change came over the womans face as I asked the question.

    It was some seconds before she could get out the single word

    Yes and when it did come it was in a husky, unnatural

    tone.

    Linguistic Steganography

  • Ambiguity Sharing Deletion 54

    A Secret Sharing Scheme

    Embed2nd bit:0

    Share0: Have you heard of the death of your late boarder

    Mr. Enoch J. Drebber, of Cleveland? A terrible change

    came over the womans face as I asked the question. It was

    some seconds before she could get out the single word Yes

    and when it did come it was in a husky, unnatural tone.

    Targetadj:terrible Share1: Have you heard of the mysterious death of your

    late boarder Mr. Enoch J. Drebber, of Cleveland? A change

    came over the womans face as I asked the question. It was

    some seconds before she could get out the single word Yes

    and when it did come it was in a husky, unnatural tone.

    Linguistic Steganography

  • Ambiguity Sharing Deletion 55

    A Secret Sharing Scheme

    Embed3rd bit: 1

    Share0: Have you heard of the death of your late boarder

    Mr. Enoch J. Drebber, of Cleveland? A terrible change

    came over the womans face as I asked the question. It was

    some seconds before she could get out the word Yes and

    when it did come it was in a husky, unnatural tone.

    Targetadj:single Share1: Have you heard of the mysterious death of your

    late boarder Mr. Enoch J. Drebber, of Cleveland? A change

    came over the womans face as I asked the question. It was

    some seconds before she could get out the single word Yes

    and when it did come it was in a husky, unnatural tone.

    Linguistic Steganography

  • Ambiguity Sharing Deletion 56

    A Secret Sharing Scheme

    read offbits: 101

    Share0: Have you heard of the death of your late boarder

    Mr. Enoch J. Drebber, of Cleveland? A terrible change

    came over the womans face as I asked the question. It was

    some seconds before she could get out the word Yes and

    when it did come it was in a husky, unnatural tone.

    Share1: Have you heard of the mysterious death of your

    late boarder Mr. Enoch J. Drebber, of Cleveland? A change

    came over the womans face as I asked the question. It was

    some seconds before she could get out the single word Yes

    and when it did come it was in a husky, unnatural tone.

    Linguistic Steganography

  • Ambiguity Sharing Deletion 57

    Adjective Deletion Data

    Pleonasm data for pilot study

    free gift, cold ice, final end, . . .

    Full study used human annotated data

    1,200 sentences from the BNC marked for naturalness (yes/no)

    Linguistic Steganography

  • Ambiguity Sharing Deletion 58

    Example Judgements (YES)

    Judgement Example sentence

    Deletable He was putting on his heavy overcoat, asked again casually if he couldhave a look at the glass.

    Deletable We are seeking to find out what local people want, because they mustown the work themselves.

    Deletable We are just at the beginning of the worldwide epidemic and the situationis still very unstable.

    Linguistic Steganography

  • Ambiguity Sharing Deletion 59

    Example Judgements (NO)

    Judgement Example sentence

    Undeletable He asserted that a modern artist should be in tune with his times, carefulto avoid hackneyed subjects.

    Undeletable With various groups suggesting police complicity in township violence,many blacks will find little security in a larger police force.

    Undeletable There can be little doubt that such examples represent the tip of aniceberg.

    Linguistic Steganography

  • Ambiguity Sharing Deletion 60

    Data Collection

    30 native English speakers

    1,200 sentences with 300 annotated by 3 judges; the restannotated by one

    Fleiss kappa was 0.49 (moderate agreement)

    700 training; 200 development; 300 test

    Ratio of deletable:undeletable was roughly 2:1

    Linguistic Steganography

  • Ambiguity Sharing Deletion 61

    Deletion Classifier

    SVM classifier with a variety of features, e.g.:

    Google n-gram count ratios before and after deletion lexical association measures between noun and adjective, eg PMI Noun and adjective entropy measures . . .

    Linguistic Steganography

  • Ambiguity Sharing Deletion 62

    Full Classifer Results on Test Set

    Threshold 0.69 0.70 0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78

    Pre 70.1 69.8 70.7 72.0 70.8 71.1 74.8 85.0 90.9 100Rec 74.5 73.4 72.9 70.8 65.6 58.9 41.7 26.6 15.6 5.2

    Linguistic Steganography

  • Ambiguity Sharing Deletion 63

    References

    Practical Linguistic Steganography using Contextual SynonymSubstitution and a Novel Vertex Coding MethodChing-Yun Chang and Stephen ClarkTo appear in Computational Linguistics

    Adjective Deletion for Linguistic Steganography and Secret SharingChing-Yun Chang and Stephen ClarkProceedings of the 24th International Conference on ComputationalLinguistics (COLING-12), Mumbai, India, 2012

    The Secrets in the Word Order: Text-to-Text Generation for LinguisticSteganographyChing-Yun Chang and Stephen ClarkProceedings of the 24th International Conference on ComputationalLinguistics (COLING-12), Mumbai, India, 2012

    Linguistic Steganography using Automatically Generated ParaphrasesChing-Yun Chang and Stephen ClarkProceedings of the Annual Meeting of the North American Association forComputational Linguistics (NAACL-HLT-10), Los Angeles, 2010

    Linguistic Steganography

    Intro]

    Ling Steg

    Lex Sub

    Ambiguity

    Sharing

    Deletion