Top Banner
Three Novel Algorithms for Hiding Data in PDF Files Based on Incremental Updates Li Lei
26

Three Novel Algorithms for Hiding Data in s Based on Incremental Updates Li Lei School of Information Science and Technology Sun Yat-Sen University.

Dec 29, 2015

Download

Documents

Robyn Boone
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Three Novel Algorithms for Hiding Data in s Based on Incremental Updates Li Lei School of Information Science and Technology Sun Yat-Sen University.

Three Novel Algorithms for Hiding Data in

PDF Files Based on Incremental Updates

Li Lei School of Information Science and Technology

Sun Yat-Sen University

Page 2: Three Novel Algorithms for Hiding Data in s Based on Incremental Updates Li Lei School of Information Science and Technology Sun Yat-Sen University.

Contents

1

2

3

4

5

Introduction

The Structure of PDF Files

Experimental Results

Proposed Algorithms

Incremental Updates

6 Future work

Page 3: Three Novel Algorithms for Hiding Data in s Based on Incremental Updates Li Lei School of Information Science and Technology Sun Yat-Sen University.

• Introduction

PDF (Portable Document Format)

A widely used electronic document format

High printing quality

Cross-platform applicability

Device-independence

Page 4: Three Novel Algorithms for Hiding Data in s Based on Incremental Updates Li Lei School of Information Science and Technology Sun Yat-Sen University.

Hiding information in PDF file

Secret message transmission

Mark the source and transmission path

• Introduction

Page 5: Three Novel Algorithms for Hiding Data in s Based on Incremental Updates Li Lei School of Information Science and Technology Sun Yat-Sen University.

Existing algorithms First category

Varying the line, word, character spacing or other certain display attributes

slightly.

[2,3,4,5,6,7]

Obvious defects that the effect of page display is disturbed and that

information security is relatively low.

Second category

Adding or changing the content of PDF file streams.

[8,9,10]

Disadvantages in guaranteeing large capacity, high security and robustness

to some degree.

• Introduction

Page 6: Three Novel Algorithms for Hiding Data in s Based on Incremental Updates Li Lei School of Information Science and Technology Sun Yat-Sen University.

• The structure of PDF file

Header

Body

Cross-referencetable

Trailer

File structure (Physical structure)

It includes the header, the body which contains a

lot of objects, the cross-reference table containing

information about the indirect objects in the file and

the trailer.

It determines how the objects are stored in a PDF

file.

Page 7: Three Novel Algorithms for Hiding Data in s Based on Incremental Updates Li Lei School of Information Science and Technology Sun Yat-Sen University.

Document structure (Logical structure) A PDF document can be regarded as a hierarchy

of objects contained in the body section of a PDF file.

The document structure of PDF file is organized

in the shape of an object tree topped by Catalog

and five subtrees named Page tree, Outline hierarchy,

Article thread, Named destinations and

Interactive form included.

• The structure of PDF file

Page 8: Three Novel Algorithms for Hiding Data in s Based on Incremental Updates Li Lei School of Information Science and Technology Sun Yat-Sen University.

Object

An object is the basic element in PDF files. PDF supports eight basic types of

objects: Boolean Object, Numeric Object, String Object, Name Object, Array

Object, Dictionary Object, Stream Object and Null Object.

Objects may be labeled so that they can be referred to by other objects. A labeled

object is called an indirect object.

• The structure of PDF file

Page 9: Three Novel Algorithms for Hiding Data in s Based on Incremental Updates Li Lei School of Information Science and Technology Sun Yat-Sen University.

Content stream

The content stream belong to Page tree contains the almost all information

about PDF text contents and display attributes. Each page’s contents will be

cut to some blocks and saved in some dictionary objects named Contents object.

Each Contents object will contain text object and text state. The text object

describes the text contents and the text state is a collection of page display

attributes.

• The structure of PDF file

Page 10: Three Novel Algorithms for Hiding Data in s Based on Incremental Updates Li Lei School of Information Science and Technology Sun Yat-Sen University.

• Incremental updates

Header

OriginalBody

OriginalCross-reference

sectionOriginal trailer

Updated body 1

Cross-referenceSection 1

Updated trailer 1...

Updated trailer n

Initial structureOf PDF file

Incremental Update 1

Incremental Update n

...

The contents of PDF file can be updated

incrementally without rewriting the entire

file. Changes are appended to the end of

the file, leaving its original contents

intact.

In an incremental update, any new or

changed objects are appended to the file,

which constitute the updated body at the

end of the file, a cross-reference section

and a new trailer are appended followed.

Page 11: Three Novel Algorithms for Hiding Data in s Based on Incremental Updates Li Lei School of Information Science and Technology Sun Yat-Sen University.

• Incremental updates

When Incremental updates?

Right-click and modify properties

“Save” editing operations

Page 12: Three Novel Algorithms for Hiding Data in s Based on Incremental Updates Li Lei School of Information Science and Technology Sun Yat-Sen University.

•Proposed algorithms

1. A compensated version of modifying display attributes

Text state in Contents object indicates the attributes of text display. Every attribute

has a operator key word to mark it, such as Char Space: Tc, Word Space: Tw, Scale:

Tz,

Leading: TL, Font size: Tf, Render: Tr, Rise: Ts etc. These operator key words in the

content stream can be modified to hide information.

Page 13: Three Novel Algorithms for Hiding Data in s Based on Incremental Updates Li Lei School of Information Science and Technology Sun Yat-Sen University.

•Proposed algorithms

1. A compensated version of modifying display attributes

But these algorithms affect the display of the PDF file.

Page 14: Three Novel Algorithms for Hiding Data in s Based on Incremental Updates Li Lei School of Information Science and Technology Sun Yat-Sen University.

•Proposed algorithms

1. A compensated version of

modifying display attributes we can compensate the effect of data hiding

using incremental updates of PDF files:

After altering the text states of contents objects

to embed information, the original contents

objects are written in updated body.

PDF file

Read file stream

Get all Contents objects and decode them

If text state space is enough

Report:Embedding

failed

No

Embeddeddata

modify the lowest order's parity of the text state

Stego-keyRewrite the originalContents objects by incremental updat

Output Stego-file and

stego-key

Chaoticsequence

Yes

Page 15: Three Novel Algorithms for Hiding Data in s Based on Incremental Updates Li Lei School of Information Science and Technology Sun Yat-Sen University.

•Proposed algorithms

2. Algorithms based on new body and cross-reference section

① In the updated body, the actual embedded carrier is indirect objects. Considering the

complexity of inserting objects, content security, capacity and other factors, we

select stream object as the embedded carrier.

② Select the new cross-reference section as covert information carrier. We can embed

information by controlling the 10-bytes offset in cross-reference section’s entry. Use

the difference of adjacent entries’ offset to represent the covert information.

Page 16: Three Novel Algorithms for Hiding Data in s Based on Incremental Updates Li Lei School of Information Science and Technology Sun Yat-Sen University.

PDF file

Read file stream

Add n+1 stream objects with the specified length based on the ordered decimal sequence

Embeddeddata

Stego-key

Output stego-PDF fileand stego-key

Write new stream objects, each stream content is compressed

embedded segment

chaos the ordered binary segment

sequence

Add new cross reference section

Embeddeddata

Stego-key

chaos the ordered binary segment

sequence

Add new cross reference section

•Proposed algorithms

2. Algorithms based on new body and cross-reference section

Page 17: Three Novel Algorithms for Hiding Data in s Based on Incremental Updates Li Lei School of Information Science and Technology Sun Yat-Sen University.

•The experimental results and analysis

Data Embedding Capacity

User interface:

Page 18: Three Novel Algorithms for Hiding Data in s Based on Incremental Updates Li Lei School of Information Science and Technology Sun Yat-Sen University.

Perceptual transparency property

Seen from the effects chart, after having

embedded data, there was not any change

in display effect of the cover file.

•The experimental results and analysis

Page 19: Three Novel Algorithms for Hiding Data in s Based on Incremental Updates Li Lei School of Information Science and Technology Sun Yat-Sen University.

The robustness to reading and editing operations

1. Robustness to annotating and marking operations

Apply Adobe Acrobat 9 Pro to annotate and mark

the embedded PDF file in various ways.

We try to extract the covert information

from it. And the experiment result shows

that the accuracy of extracting data is 100%.

•The experimental results and analysis

Page 20: Three Novel Algorithms for Hiding Data in s Based on Incremental Updates Li Lei School of Information Science and Technology Sun Yat-Sen University.

The robustness to reading and editing operations

1. Robustness to interactive form editing

(a) is the stego file without any editing and (b) is the file been written some contents

to (a). We try to extract the covert information from (b), and the experiment result

shows that the accuracy of extracting test is 100%.

•The experimental results and analysis

Page 21: Three Novel Algorithms for Hiding Data in s Based on Incremental Updates Li Lei School of Information Science and Technology Sun Yat-Sen University.

•The experimental results and analysis

File Size Page number Embedded SizeSize increasing

percentage

1 149KB 4 153KB 2.7%

2 237KB 4 245KB 3.4%

3 271KB 4 272KB 0.4%

4 298KB 4 306KB 2.7%

5 303KB 6 304KB 0.3%

6 349KB 7 350KB 0.3%

7 413KB 2 415KB 0.5%

8 543KB 5 544KB 0.2%

9 663KB 4 664KB 0.15%

10 801KB 10 803KB 0.2%

Increase in the size of carrier file 1. Algorithm 1 (Embed 128 bits)

Rewriting a Contents object by incremental update will increase the size of the original file by 1 to 8 KB (depending on the size of the original Contents object). Real experimental result shows average rate of files’ size increase is around 1%.

Page 22: Three Novel Algorithms for Hiding Data in s Based on Incremental Updates Li Lei School of Information Science and Technology Sun Yat-Sen University.

•The experimental results and analysis

Increase in the size of carrier file

2. Algorithm 2, 3 (Embed 128 bits)

The increase of the size caused by algorithm 2 is irrelevant to the original files. Using 4 objects

to embed 128 bits, will add no more than 1KB to original PDF file. 200KB0.5%

The increase of the size caused by algorithm 3 is also irrelevant to the original files. Using 22

entries (need to add 22 new objects) of cross-reference to embed 128 bits, the maximal size

increase will be around 4 to 5 KB. 2002.5%

Page 23: Three Novel Algorithms for Hiding Data in s Based on Incremental Updates Li Lei School of Information Science and Technology Sun Yat-Sen University.

•The experimental results and analysis Performance Comparison

Performance

Incremental

updates

methods

wbStego 4.3

The methods based

on varying

display attributes

The methods based

on changing

entries’ order

Perceptual

transparencyNo changed No changed

Slightly

changedNo changed

Embedding

capacitylarge enough Small Based on file Based on file

Security HighRelatively

low

Relatively

LowHigh

Robustness StrongRelatively

Strong

Relatively

StrongMedium

Page 24: Three Novel Algorithms for Hiding Data in s Based on Incremental Updates Li Lei School of Information Science and Technology Sun Yat-Sen University.

•Future work

Different versions of PDF files are being used at present. Some higher versions of

PDF files have used cross-reference streams to store the information of indirect objects.

How to advance the compatibility of different PDF versions is the emphasis for our

next step work.

Page 25: Three Novel Algorithms for Hiding Data in s Based on Incremental Updates Li Lei School of Information Science and Technology Sun Yat-Sen University.

1. Adobe Systems Incorporated. PDF Reference, fifth edition, version 1.6.http://www.adobe.com/devnet/pdf/pdfs/PDFReference16.pdf, 20062. S. H. Low and N. F. Maxemchuk. Performance comparison of two text marking methods. IEEE Journal on Selected Areas in Communications, Vol.16, No.4, 1998,pp.561-5723. J. T. Brassil, et al. Electronic marking and identification techniques to discourage document copying, IEEE Journal on Selected Areas in Communications,Vol.13, No.8, 1995, pp.1495-15044. Shangping Zhong, Tierui Chen. Information Steganography Algorithm Based on PDF Documents. Computer Engineering, Vol.32, No.3, Feb. 2006, pp.161-1635. S. H. Low, et al. Document marking an identification using both line and word shifing. in Proceedings INFOCOM’95, Boston, MA, Apr. 1995, pp.853-8606. N. F. Maxemchuk and S. H. Low. Marking text documents. in Proceedings, International Conference Image Processing,, Boston, Santa Barbara, CA, Oct. 1997, pp.13-177. E. Franz and A. Pfitzmann. Steganography secure against Cover-Stego-Attacek, 3 th International Workshop, Information Hiding 1999,2000, pp.29-46.8. wbStego Studio. The steganography tool wbStego4. http://www.wbailer.com/wbstego, 2007.9. Youji Liu, Xingming Sun, Gang Luo. A Novel Information Hidng Algorithm Based on Structure of PDF Document. Computer Engineering, Vol.32, No.17, Sep. 2006, pp.230-23210. Xingtong Liu, Quan Zhang, Chaojing Tang, Jingjing Zhao and Jian Liu. A Steganographic Algorithm for Hiding Data in PDF Files Based on Equivalent Transformation, in Information Processing (ISIP), 2008 International Symposiums on, 23-25 May 2008, pp. 417-421.

•Reference

Page 26: Three Novel Algorithms for Hiding Data in s Based on Incremental Updates Li Lei School of Information Science and Technology Sun Yat-Sen University.

It’s all

Thanks