Top Banner
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME 284 A ZERO TEXT WATERMARKING ALGORITHM BASED ON THE PROBABILISTIC PATTERNS FOR CONTENT AUTHENTICATION OF TEXT DOCUMENTS Fahd N. Al-Wesabi 1 , Adnan Z. Alsakaf 2 , Kulkarni U. Vasantrao 3 1 PhD Candidate, Faculty of Engineering, SRTM University, Nanded, INDIA, 2 Professor, Department of IS, Faculty of Computing and IT, UST, Sana’a, Yemen, 3 Professor, Department of Comp. Sci. and Engg., SGGS Institute of Engg.and Tech., Maharashtra, INDIA. ABSTRACT In the study of content authentication and tamper detection of digital text documents, there are very limited techniques available for content authentication of text documents using digital watermarking techniques. A novel intelligent text zero-watermarking approach based on probabilistic patterns is proposed in this paper for content authentication and tamper de- tection of text documents. Based on the Markov model for English text analysis algorithmsfor the watermark generation and detection was designed in this paper. In the proposed approach, Markov model of order one and letter-based was constructed for content authentication and tamper detection of English text documents. Theprobabilistic pattern features of text contents, were utilized theseto generate the watermark. However, we can extract this watermark later using extraction and detection algorithm to identify the status of text document such as au- thentic, or tampered. The proposed approachis implemented using PHP programming lan- guage. Furthermore, the effectiveness and feasibility of the proposed approachis proved with experiments using six datasets of varying lengths. The accuracytampering detection is com- pared with other recent approaches under random insertion, deletion and reorder attacks in multiple random locations of experimental datasets. Results show that the proposed ap- proachis more secure as it always detects tampering attacks occurred randomly on text even when the tampering volume is low or high. Keywords: Digital watermarking, Markov Model, order one, letter-Level, probabilistic pat- terns, information hiding, content authentication, tamper detection. INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) ISSN 0976 – 6367(Print) ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), pp. 284-300 © IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2012): 3.9580 (Calculated by GISI) www.jifactor.com IJCET © I A E M E
17

A zero text watermarking algorithm based on the probabilistic patterns

Nov 07, 2014

Download

Documents

iaeme iaeme

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A zero text watermarking algorithm based on the probabilistic patterns

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME

284

A ZERO TEXT WATERMARKING ALGORITHM BASED ON THE

PROBABILISTIC PATTERNS FOR CONTENT AUTHENTICATION

OF TEXT DOCUMENTS

Fahd N. Al-Wesabi1, Adnan Z. Alsakaf

2, Kulkarni U. Vasantrao

3

1PhD Candidate, Faculty of Engineering, SRTM University, Nanded, INDIA,

2 Professor, Department of IS, Faculty of Computing and IT, UST, Sana’a, Yemen, 3Professor, Department of Comp. Sci. and Engg., SGGS Institute of Engg.and Tech.,

Maharashtra, INDIA.

ABSTRACT

In the study of content authentication and tamper detection of digital text documents,

there are very limited techniques available for content authentication of text documents using digital watermarking techniques. A novel intelligent text zero-watermarking approach based on probabilistic patterns is proposed in this paper for content authentication and tamper de-tection of text documents. Based on the Markov model for English text analysis algorithmsfor the watermark generation and detection was designed in this paper. In the proposed approach, Markov model of order one and letter-based was constructed for content authentication and tamper detection of English text documents. Theprobabilistic pattern features of text contents, were utilized theseto generate the watermark. However, we can extract this watermark later using extraction and detection algorithm to identify the status of text document such as au-thentic, or tampered. The proposed approachis implemented using PHP programming lan-guage. Furthermore, the effectiveness and feasibility of the proposed approachis proved with experiments using six datasets of varying lengths. The accuracytampering detection is com-pared with other recent approaches under random insertion, deletion and reorder attacks in multiple random locations of experimental datasets. Results show that the proposed ap-proachis more secure as it always detects tampering attacks occurred randomly on text even when the tampering volume is low or high. Keywords: Digital watermarking, Markov Model, order one, letter-Level, probabilistic pat-terns, information hiding, content authentication, tamper detection.

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING

& TECHNOLOGY (IJCET) ISSN 0976 – 6367(Print) ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), pp. 284-300 © IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2012): 3.9580 (Calculated by GISI) www.jifactor.com

IJCET

© I A E M E

Page 2: A zero text watermarking algorithm based on the probabilistic patterns

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME

285

I. INTRODUCTION

With the increasing use of internet, e-commerce, and other efficient communication

technologies, the copyright protection and authentication of digital contents, have gained great importance. Most of these digital contents are in text form such as email, websites, chats, e-commerce, eBooks, news, and short messaging systems/services (SMS) [1].

These text documents may be tempered by malicious attackers, and the modified data can lead to fatal wrong decision and transaction disputes [2].

Content authentication and tamper detection of digital image, audio, and video has been of great interest to the researchers. Recently, copyright protection, content authentication, and tamper detection of text document attracted the interest of researchers. Moreover, during the last decade, the research on text watermarking schemes is mainly focused on issues of copy-right protection, but gave less attention on content authentication, integrity verification, and tamper detection [4].

Various techniques have been proposed for copyright protection, authentication, and tamper detection for digital text documents. Digital Watermarking (DWM) techniques are con-sidered as the most powerful solutions to most of these problems. Digital watermarking is a technology in which various information such as image, a plain text, an audio, a video or a combination of all can be embedded as a watermark in digital content for several applications such as copyright protection, owner identification, content authentication, tamper detection, access control, and many other applications [2].

Traditional text watermarking techniques such as format-based, content-based, and image-based require the use of some transformations or modifications on contents of text document to embed watermark information within text. A new technique has been proposed named as a zero-watermarking for text documents. The main idea of zero-watermarking techniques is that it does not change the contents of original text document, but utilizes the contents of the text itself to generate the watermark information [13].

In this paper, the authors present a new zero-watermarking technique for digital text documents. This technique utilizes the probabilistic nature of the natural languages, mainly the first order Markov model.

The paper is organized as follows. Section 2 provides an overview of the previous work done on text watermarking. The proposed generation and detection algorithms are described in detail in section 3. Section 4 presents the experimental results for the various tampering attacks such as insertion, deletion and reordering. Performance of the proposed approach is evaluated by multiple text datasets. The last section concludes the paper along with directions for future work.

II. PREVIOUS WORK

Text watermarking techniques have been proposed and classified by many literatures

based on several features and embedding modes of text watermarking. We have examined briefly some traditional classifications of digital watermarking as in literatures. These tech-niques involve text images, content based, format based, features based, synonym substitu-tion based, and syntactic structure based, acronym based, noun-verb based, and many others of text watermarking algorithms that depend on various viewpoints [1][3][4].

Page 3: A zero text watermarking algorithm based on the probabilistic patterns

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME

286

A. Format-based Techniques

Text watermarking techniques based on format are layout dependent. In [5], proposed three different embedding methods for text documents which are, line shift coding, word shift coding, and feature coding. In line-shift coding technique, each even line is shifted up or down depending on the bit value in the watermark bits. Mostly, the line is shifted up if the bit is one, otherwise, the line is shifted down. The odd lines are considered as control lines and used at decoding. Similarly, in word-shift coding technique, words are shifted and modify the inter-word spaces to embed the watermark bits. Finally, in the feature coding technique, certain text features such as the pixel of characters, the length of the end lines in characters are altered in a specific way to encode the zeros and ones of watermark bits. Watermark detection process is performed by comparing the original and watermarked document.

B. Content-based Techniques

Text watermarking techniques based on content are structure-based natural language dependent [4]. In [6][14], a syntactic approach has been proposed which use syntactic struc-ture of cover text for embedding watermark bits by performed syntactic transformations to syntactic tree diagram taking into account conserving of natural properties of text during wa-termark embedding process. In [18], a synonym substitution has been proposed to embed wa-termark by replacing certain words with their synonyms without changing the sense and con-text of text.

C. Binary Image-based Techniques

Text Watermarking techniques of binary image documents depends on traditional im-age watermarking techniques that based on space domain and transform domain, such as Dis-crete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), and Least Significant Bit (LSB) [5]. Several formal text watermarking methods have been proposed based on em-bedding watermark in text image by shifting the words and sentences right or left, or shifting the lines up or down to embed watermark bits as it is mentioned above in section format-based watermarking [5][7].

D. Zero-based Techniques

Text watermarking techniques based on Zero-based watermarking are content features dependent. There are several approaches that designed for text documents have been pro-posed in the literatures which are reviewed in this paper [1][19] [20] and [21].

The first algorithm has been proposed by [19] for tamper detection in plain text documents based on length of words and using digital watermarking and certifying authority techniques. The second algorithm has been proposed by [20] for improvement of text authenticity in which utilizes the contents of text to generate a watermark and this watermark is later extracted to prove the authenticity of text document. The third algorithm has been proposed by [1] for copy-right protection of text contents based on occurrence frequency of non-vowel ASCII characters and words. The last algorithm has been proposed by [21] to protect all open textual digital con-tents from counterfeit in which is insert the watermark image logically in text and extracted it later to prove ownership. In [22], Chinese text zero-watermark approach has been proposed based on space model by using the two-dimensional modelcoordinate of wordlevel and the sen-tence weights of sentencelevel.

Page 4: A zero text watermarking algorithm based on the probabilistic patterns

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME

287

E. Combined-based Techniques

One can say the text is dissimilar image. Thus, language has a distinct and syntactical nature that makes such techniques more difficult to apply. Thus, text should be treated as text instead of an image, and the watermarking process should be performed differently. In [23] A combined method has been proposed for copyright protection that combines the best of both image based text watermarking and language based watermarking techniques.

The above mentioned text watermarking approaches are not appropriate to all types of text documents under document size, types and random tampering attacks, and its mecha-nisms are very essential to embed and extract the watermark in which maybe discovered eas-ily by attackers . On the other hands,theseapproaches are not designed specifically to solve problem of authentication and tamper detection of text documents, and are based on making some modifications on original text document to embed added external information in text document and this information can be used later for various purposes such as content authen-tication, integrity verification, tamper detection, or copyright protection. This paper proposes a novel intelligent approach for content authentication and tamper detectionof English text documents in which the watermark embedding and extraction process are performed logically based on text analysis and extract the features of contents by using hidden Markov model in which the original text document is not altered to embed watermark.

III. THE PROPOSED APPROACH

This paper proposes a novel intelligent approach based on zero-watermarking meth-

odology in which the original text document is not altered to embed watermark, that means the watermark embedding process is performed logically. The proposed approach uses the Markov model of the natural languages that is Markov chains are used to analyse the English Text and extract the probabilistic features of the contents which are utilized to generate a wa-termark key that is stored in a watermark database. This watermark key can be used later and matched with watermark generated from attacked document for identifying any tampering that may happen to the document and authenticating its content. This process illustrated in figure 1.

Fig. 1.Watermark generation and detection processes

Page 5: A zero text watermarking algorithm based on the probabilistic patterns

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME

288

Before we explain the watermark generation and extraction processes, in the next sub-section we present a preliminary mathematical description of Markov models for natural lan-guage text analysis.

A. Markov Models for Text Analysis

In this subsection, we explain how to model text using a Markov chain, which is de-fined as a stochastic (random) model for describing the way that processes move from state to a state. For example, suppose that we want to analyse the sentence:

“Ahmed was beginning to get very tired of sitting by his brother on the bank, and of having

nothing to do”

When we use a Markov model of order one, each character is a state by itself, and the Markov process transitions from state to state as the text is read. For instance, as the above sample text is processed, the system makes the following transitions:

"A" -> "h" -> "m" -> "e" -> "d" -> " " -> "w" -> "a" -> "s" -> " " -> "b" -> "e" -> "g" -> "i" ->

"n" -> "n" -> "i" -> "n" -> "g" …

As a result of first order Markov model for analysing the given sentence we obtain the figure 2 which gives the present state and the all possible transitions.

Fig. 2.Sample text transitions.

Now if we consider state "a", the next state transitions are "h", "n",”n”, "s", and "v". We observe that state “n” occurs twice.

Next we present a simple method to build the states and the Markov transition matrix M��, ��which is the most basic part of text analysis using Markov model.

In the proposed approach, the text considered is not limited to alphabetic characters, but includes spaces, numbers, and special characters such as [, . ; : - ? ! ], and the total num-ber of states is 61,these are [English letters = 26, space letter= 1, Integer numbers from 0 to

Page 6: A zero text watermarking algorithm based on the probabilistic patterns

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME

289

9= 10, specific symbols such as . ' " , ; : ? ! / \ @ $ & % * + - = >< ( ) [ ] = 24].The entry

M��, �� will be used to keep track of the number of times that the i�character of the text is

followed by the j�character of the text.For i � 1 to L � 1, where Lis the length of the text document - 1", let x be the ith character in the text and y be the (i+1)st character in the text. Then increment M[x,y].Now the matrix M contains the counts of all transitions. Next we turn these counts into probabilities as follows,for each ifrom 1 to 61, sum the entries on the ith row, i.e., let counter[i] = M[i,1] + M[i,2] + M[i,3] + ... + M[i,61] .

Now define P[i,j] = M[i,j] / counter[i] for all pairs i,j. This just gives a matrix of prob-abilities. In other words, now P[i,j] is the probability of making a transition from letter i to letter j. Hence a matrix of probabilities that describes a Markov model of order one for the given text is obtained.

B. Watermark Generation and EmbeddingAlgorithm

The watermark generationand embedding algorithm requires the original text document as input, then as a pre-processing step it is required to perform conversion of capital letters to small letters and to remove all spaces within the text document. A wa-termark pattern is generated as the output of this algorithm. This watermark is then stored in watermark database along with the original text document, document identity, author name, current date and time.

This stage includes two main processes which are watermark generation and watermark embedding. Watermark generation from the original text document and embed it logically within the original watermark will be done by the embedding algorithm.

In this proposed watermark generation algorithm, the original text document (T) is to

be provided by the author. Then text analysis process should be done using Markov model to compute the number of occurrences of the next state transitions for every pre-sent state, in this approach we use Markov model of order one. A Matrix of transition probabilities that represents the number of occurrences of transition from a state to an-other is constructed according to the procedure explained in previous section A and can be computed by equation (1).

MarkovMatrix[ps][ns] = P[i][j], for i,.j=1,2, .,n …….(1)

Where, o n: is the total number of states o i: refers to PS "the present state". o j: refers to NS "the next state". o P[i,j]: is the probability of making a transition from character i to character j.

After performing the text analysis and extracting the probability features, the water-

mark is obtained by identifying all the nonzero values in the above matrix. These nonzero values are sequentially concatenated to generate a watermark pattern, denoted by WMPOas given by equation (2) and presented in figure 3.

Page 7: A zero text watermarking algorithm based on the probabilistic patterns

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME

290

Fig. 3. Watermark generation processes

WMPO&= MarkovMatrix [ps] [ns], for i,. j= nonzero values in the Markov matrix……..(2)

This watermark is then stored in a watermark database along with the original text docu-ment, document identity, author name, current date and time.After watermark generation as sequential patterns, an MD5 message digest is generated for obtaining a secure and compact form of the watermark,notationalyas given by equation (3) and presented in figure 4.

Fig. 4.Watermark before and after MD5 digesting

DWM = MD5(WMPO) ……………….. (3) The proposed watermark generation and embedding algorithm, using First order Markov

model is presented formally in figure 5.

Page 8: A zero text watermarking algorithm based on the probabilistic patterns

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME

291

Fig. 5: Watermark generation and embedding algorithm

C. Watermark Extraction and Detection Algorithm

The watermark detection algorithm is on the base of zero-watermark, so before de-tection for attacked text document TDA, the proposed algorithm still need to generate the attacked watermark patterns′. When received the watermark patterns′, the matching rate of patterns′ and watermark distortion are calculated in order to determine tampering de-tection and content authentication.

This stage includes two main processes which are watermark extraction and detec-

tion. Extracting the watermark from the received attacked text document and matching it with the original watermark will be done by the detection algorithm.

The proposed watermark extraction algorithmtakes the attacked text document,

and perform the same water mark generation algorithm to obtain the watermark pattern for the attacked text document.

After extracting the attacked watermark pattern, the watermark detection is per-

formed in three steps,

• Primary matching is performed on the whole watermark pattern of the original document WMPO, and the attacked document WMPA. If these two patterns are found the same, then the text document will be called authentic text without tampering. If the primary matching is unsuccessful, the text document will be called not authentic and tampering occurred, then we proceed to the next step.

• Secondary matching is performed by comparing the components associated with each state of the overall pattern.which compares the extracted watermark pattern for each state with equivalent transition of original watermark pattern. This process can be described by the following mathematical equations (4),and (5).

Page 9: A zero text watermarking algorithm based on the probabilistic patterns

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME

292

������, �� � ������������ ������������ ��� ����������������� � !"# $%% �. �……….. (4)

���'��� � ( ∑ *��+,��,��-./01�2345 '3436�43367892:83���( !"# $%% �……………….…….. (5)

This process is illustrated in figure 6.

Fig. 6: Watermark extraction process

Finally, the PMR is calculated by equation (6), which represent the pattern match-ing rate between the original and attacked text document.

PMR � �∑ =>?@�A�BC01D �……………….…….. (6)

Where, • N: is the number of non-zero elements in the Markov matrix

The watermark distortion rate refers to tampering amount occurred by attacks on con-

tents of attacked text document, this value represent in WDR which we can get for it by equation (7):

WDR � 1 � PMR ……………………. (7)

The detection algorithm is illustrated in figure 7.

Page 10: A zero text watermarking algorithm based on the probabilistic patterns

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME

293

Fig. 7: watermark extraction and detection algorithm

IV. EXPERIMENTAL SETUP, RESULTS AND DISCUSSION

A. Experimental Setup In order to test the proposed approach and compare with other the approach, we con-

ducted a series of simulation experiments. The experimental environment is listed as below: CPU: Intel Core™i5 M480/2.67 GHz, RAM: 8.0GB, Windows 7; Programming language PHP NetBeans IDE 7.0. With regard to the data sets used, six samples from the data sets de-signed in [24]. These samples were categorized into three classes according to their size, namely Small Size Text (SST), Medium Size Text (MST), and Large Size Text (LST).

DWM Detection Algorithm (A Zero Text DWM based on Probabilistic patterns)

- Input: Original Text Document, Attacked Text Document - Output: WMPO, WMA, WMPA. PMR, WDR, Attacked States and Transitions Matrix [61][61].

1. Read WMo or Original Text(TDO) and Attacked Text(TDA) documents and performs Pre-processing for them.

1. Loop ps = 1 to 61, // Build the states of MarkovMatrix -

���� Loop ns = 1 to 61, // Build the transitions for each state of MarkovMatrix

- aMarkovMatrix[ps][ns] = Total Number of Transition[ps][ns] // compute the total frequencies of tran-

sitions for every state

2. Loop i = 1 to 61, // Extract the embedded watermark

���� Loop j = 1 to 61,

- IF aMarkovMatrix [i][j] != 0 // states that have transitions

- WMPA &= aMarkovMatrix[i] [j]

3. Output WMPO, WMPA

4. IF WMPA = WMPO

���� Print “Document is authentic and no tampering occurred”

- PMR = 1

���� Else

- Print “Document is not authentic and tampering occurred”

- For i = 1 to 61 // Extract transition patterns and match each of them with original transition patterns

- For j = 1 to 61

o IF WMPO[i][j] != 0

- patternCount +=1

GHIJ�i��j� � �KHGL�A��M�� �KHGL�A��M�� KHGN�A��M�� KHGL�A��M� � - transPMRTotal += PMRO

o Else

- IF WMPA[i][j] != 0

o patternCount += WMPP�i��j� - statePMR[i] =

�QRST=>?OU�RV�A��M� WR��XQSYUZS��A�

- PMR@ += statePMR[i]

���� Totalpattern += patternCount

5. PMR � =>?[OU�RVWR��XQST

6.6.6.6. WDR � 1 – PMR

WMPO: Original watermark, WMPA: Attacked watermark, aMarkovMatrix: Attacked states and transitions matrix, TDA: Attacked text document array, ps: the present state, ns: the next state, PMRT: Transition patterns matching rate, PMRS: State patterns matching rate, PMR: Watermark patterns matching rate, WDR: Watermark distortion rate.

Page 11: A zero text watermarking algorithm based on the probabilistic patterns

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME

294

Next, we define the types of attacks and their percentage as follows, Insertion attack, deletion attack and reorder attack performed randomly on multiple locations of these datasets.

The details of our datasets volume and attacks percentage used is shown in table I, which is considered are similar to those performed in [19] for comparison purpose, and it should be mentioned that we perform the reorder attack on the datasets which is not contained in the same paper.

Table I Original and attacked text samples with insertion and deletion percentage

Sample Text

No

Original

Text Attacks Percentage

Attacked

Text

Word

Count

Inser-

tion

Dele-

tion

Reor-

der Word Count

[SST2] 421 26% 25% 16% 425 [SST4] 179 44% 54% 5% 161 [MST2] 559 49% 25% 6% 696 [MST4] 2018 14% 12% 2% 2048 [MST5] 469 57% 53% 10% 491 [LST1] 7993 9% 6% 1% 8259

To measure the performance of our approach and compare it with others, the Tamper-

ing Accuracy which is a measure of the watermark robustness will be used. The PMR value will give the Tampering Accuracy of the given text document. The watermark distortion rate WDR is also measured and compared with other approaches. The values of both PMR and WDR range between 0 and 1 value. The larger PMR value, and obviously the lowest WDR value mean more robustness, while the lowest PMR value and largest WDR value means less robustness.

Desirable value of PMR with close to 0, and close to 1 with WDR. We categorize tam-per detection states into three classes based on PMR threshold values which are: (High when PMR values greater than 0.70, Mid when PMR values between 0.40 and 0.70, and Low when PMR values less than 0.40).

To evaluate the accuracy of the proposed approach, a series of experiments were con-ducted with all the well known attacks such as random insertion, deletion and reorder of words and sentences on each sample of the datasets. These various kinds of attacks were applied at mul-tiple locations in the datasets. The experiments were conducted, firstly with individual attacks, then with all attacks at the same time and conducted comparative results of the proposed ap-proach with recently similar approach.

B. Experiments with allAttacks

In order to compare the performance of the proposed approach with recently pub-lished approach for text watermarking which titled word length Zero-watermarking algorithm (WLZW) proposed by Z Jalil et al. [19], named here as WLZW, in this part we limited our character set to letters from ‘a’ to ‘z’ and space letter as in [19]. In this experiments, random multiple insertions and deletion attacks were performed at the same time on each sample of the datasets with attacks rates as shown in table I.Ratios of successfully detected watermark of the proposed algorithms as compared with WLZW are shown in table II and graphically represented in figure 8.

Page 12: A zero text watermarking algorithm based on the probabilistic patterns

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January

Fig. 8: Comparative performance accuracy of the proposed algorithm with WLZW alg

Comparison of the proposed algorithm with WLZW

Sample

Text No

Attacks Rates

IA

rate

1 : [SST2] 26%

2 : [SST4] 44%3: [MST2] 49%

4: [MST4] 14%

5: [MST5] 57%

6: [LST1] 9%

Table II shows the comparative results of the ratio of watermark accuracy to tamper detection for both the proposed approach and WLZW approach. sults that the proposed approach performs better for the data sets under small rate of the atacks, while at higher rate of the attacks the WLZA approach is better in performance.

In the next section we consider an enlarged character set for the text document, and iprove significantly the performance of our approach. C. Experiments with the proposed approach

In this section, we evaluate the performance of the set is extended to cover all English letters, space, numbers, and special symbols. ments were conducted with the attacks selected randomly between 1 and 40%. The performance results of this approach uder all the mentioned attacks are cally in figures 9, 10, and 11 for Insertion attack, Deletion attack, and Reorder attack respetively. These results are discussed below.

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976

6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME

295

Comparative performance accuracy of the proposed algorithm with WLZW algrithm

TableII

Comparison of the proposed algorithm with WLZW

Attacks Rates Ratio of watermark accu-

racy to tamper detection

IA

rate

DA

rate WLZW

The proposed

Algorithm

26% 25% 0.5671 0.7022

44% 54% 0.8634 0.6607 49% 25% 0.7941 0.651

14% 12% 0.8335 0.8486

57% 53% 0.6332 0.5767

9% 6% 0.8548 0.903

Table II shows the comparative results of the ratio of watermark accuracy to tamper

detection for both the proposed approach and WLZW approach. It can be seen from the rproposed approach performs better for the data sets under small rate of the a

tacks, while at higher rate of the attacks the WLZA approach is better in performance.In the next section we consider an enlarged character set for the text document, and i

cantly the performance of our approach.

the proposed approach

evaluate the performance of the proposed approach. The character English letters, space, numbers, and special symbols. the various kinds of attacks individually with the rates of these

attacks selected randomly between 1 and 40%. The performance results of this approach umentioned attacks are presented intabular form in table IV, and presented graph

cally in figures 9, 10, and 11 for Insertion attack, Deletion attack, and Reorder attack respeiscussed below.

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

(2013), © IAEME

Comparative performance accuracy of the proposed algorithm with WLZW algo-

Table II shows the comparative results of the ratio of watermark accuracy to tamper It can be seen from the re-

proposed approach performs better for the data sets under small rate of the at-tacks, while at higher rate of the attacks the WLZA approach is better in performance.

In the next section we consider an enlarged character set for the text document, and im-

. The character English letters, space, numbers, and special symbols. The experi-

the rates of these attacks selected randomly between 1 and 40%. The performance results of this approach un-

, and presented graphi-cally in figures 9, 10, and 11 for Insertion attack, Deletion attack, and Reorder attack respec-

Page 13: A zero text watermarking algorithm based on the probabilistic patterns

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January

Fig. 9: Watermark accuracy under

Fig. 10: Watermark accuracy under various rates of deletion attacks

Fig. 11: Watermark accuracy under various rates of reorder attacks

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976

6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME

296

Watermark accuracy under various rates of insertion attacks

Watermark accuracy under various rates of deletion attacks

Watermark accuracy under various rates of reorder attacks

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

(2013), © IAEME

Watermark accuracy under various rates of reorder attacks

Page 14: A zero text watermarking algorithm based on the probabilistic patterns

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME

297

Table IV

IMPROVED PATTERNS matching of extracted watermark with individual attacks

Sample

Text No

Insertion Attack

(IA)

Deletion Attack

(DA)

Re-order Attack

(RA)

IA rate PMR DA

rate PMR

RA

rate PMR

[SST2] 17% 0.8077 12% 0.879 16% 0.9797 [SST4] 40% 0.6206 26% 0.7554 5% 0.9891

[MST2] 11% 0.9081 10% 0.8842 6% 0.9951

[MST4] 3% 0.9546 5% 0.9442 2% 0.9996

[MST5] 16% 0.7805 12% 0.8835 10% 0.9916

[LST1] 1% 0.9856 6% 0.9296 1% 0.9999

For each dataset in table IV, and figure9, 10, and 11 with all kind of attacks, tampering

of the text is always detected based on PMR threshold values, whether it is low, medium or high. This proves that the text is sensitive to any modification made by various attackers and the accuracy of watermark gets affected even when the tampering volume is low.

As observed from the graphs for all the cases the PMR is always above 70% under all types of the attacks, except for the dataset [SST4], under Insertion attack with rate 40% it is be-low 70% but still above 60%.This shows that the proposed approach is very effective in detect-ing insertion ,deletion, and reorder tampering.

Further, the performance of the proposed approach was evaluated with all the attacks applied at the same time. Table V show the experimental results under insertion and deletion, and reorder attacks occurred simultaneously.

Table V

IMPROVED PATTERNS MATCHING RATE UNDER INSERTION AND DELETION AT-TACKS

Attacks Rates

PMR WDR IA rate

DA

rate

RA

rate

[SST2] 26% 25% 11% 0.6599 0.3401

[SST4] 44% 54% 18% 0.5288 0.4712

[MST2] 49% 25% 11% 0.5861 0.4139

[MST4] 14% 12% 4% 0.8307 0.1693 [MST5] 57% 53% 12% 0.5087 0.4913

[LST1] 9% 6% 1.5% 0.8586 0.1414

As can be seen from Table V, and the graphical representation in figure 12, that the pro-

posed approach is still efficient, when all attacks were applied simultaneously, with small rate of attacks [MST4] , [LST1] and [SST2]. On the other hand when the rate of the attacks is slightly higher, the proposed approach performs in a moderate manner, and still effective. The results are also presented graphically as compared with the WLZW approach, and the proposed approach with small character set. As can be seen that the proposed approach per-forms comparably in a good manner even though the character set is very much larger than other approaches.

Page 15: A zero text watermarking algorithm based on the probabilistic patterns

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January

Fig. 12: Comparative performance accuracy of the proposed algorithm before and after improved

V. CONCLUSION

Based on Markov model of order oneandletterwatermark approach which is based on text analysisistic patterns of states and transitions in order to proach is implemented using PHP programming language. The eapproach is more secure and has goodwatermark approachnamed WLZW under raand dispersed form on 6 variable size racy is improved in the proposed approach. In addition, the proposed approach always detects insetion, deletion and reorder tampering attacks occurred randomly on different size ofeven when the tampering volume is very low,approach is not limited to alphabetic characters, but includes spaces, numbers, and special cha

This work can further be extended to language processing and analysis the contents of text document and utilize of its febetter watermark.

REFERENCES

1. Z. JaliI, A. Hamza, S. Shahidm M. Arif, A. Mirza, rithm based on Non-Vowel ASCII Charactersformation Technology (ICET 2010), IEEE.

2. Suhail M. A., (2008), “Digital Watermarking for Protection of Intellectual PropertyPublished by University of Bradford, UK

3. L. Robert, C. Science, C. Government Arts,niques”. International Journal of Recent Trends in Engineeri

4. X. Zhou, S. Wang, S. Xiong, marking”. International Conference on E1-6.

5. T. Brassil, S Low, and N. F. Maxemchuk, Distribution of Text Documents

6. M. Atallah, V. Raskin, M. C. Crogan, C. F. Hempelmann, F. Kerschbaum, D. MS.Naik, (2001),“Natural language watermarking: Design, analysis, andiceedings of the a Fourth Hiding W

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976

6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME

298

Comparative performance accuracy of the proposed algorithm before and after improved

with WLZW algorithm

Based on Markov model of order oneandletter-based, the authors have designed a text zeroapproach which is based on text analysis. The algorithm uses the text features

of states and transitions in order to generate and detect the watermark. Theproposedaproach is implemented using PHP programming language. The experiment shows that the proposed

good accuracy of tampering detection. Compared with the traditional watermark approachnamed WLZW under random insertion, deletion and reorder attacks in localized

size text datasets, the result shows that the tampering detection accposed approach. In addition, the proposed approach always detects inse

tion, deletion and reorder tampering attacks occurred randomly on different size of text documents e tampering volume is very low, medium, or high. Also, results show that the proposed

proach is not limited to alphabetic characters, but includes spaces, numbers, and special chaThis work can further be extended to include the high order level of Markov model for natural

guage processing and analysis the contents of text document and utilize of its features to generate a

Z. JaliI, A. Hamza, S. Shahidm M. Arif, A. Mirza, (2010), “A Zero Text WatermarVowel ASCII Characters”, International Conference on Educational and I

formation Technology (ICET 2010), IEEE. Digital Watermarking for Protection of Intellectual Property

d by University of Bradford, UK. L. Robert, C. Science, C. Government Arts, (2009), “A Study on Digital Watermar

. International Journal of Recent Trends in Engineering, Vol. 1, No. 2, pp. 223X. Zhou, S. Wang, S. Xiong, (2009), “Security Theory and Attack Analysis for Text Wate

. International Conference on E-Business and Information System Secur

T. Brassil, S Low, and N. F. Maxemchuk, (1999), “Copyright Protection for the Eleents”. Proceedings of the IEEE, vol. 87, no. 7, pp. 1181

M. Atallah, V. Raskin, M. C. Crogan, C. F. Hempelmann, F. Kerschbaum, D. MNatural language watermarking: Design, analysis, andimplementation

ourth Hiding Workshop, vol. LNCS 2137, 25-27.

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

(2013), © IAEME

Comparative performance accuracy of the proposed algorithm before and after improved

based, the authors have designed a text zero-tures as probabil-. Theproposedap-

periment shows that the proposed ing detection. Compared with the traditional

dom insertion, deletion and reorder attacks in localized tampering detection accu-

posed approach. In addition, the proposed approach always detects inser-text documents

, or high. Also, results show that the proposed proach is not limited to alphabetic characters, but includes spaces, numbers, and special characters.

include the high order level of Markov model for natural tures to generate a

ro Text Watermarking Algo-International Conference on Educational and In-

Digital Watermarking for Protection of Intellectual Property”, A Book

A Study on Digital Watermarking Tech-ng, Vol. 1, No. 2, pp. 223-225.

ty Theory and Attack Analysis for Text Water-System Security, IEEE, pp.

Copyright Protection for the Electronic . Proceedings of the IEEE, vol. 87, no. 7, pp. 1181-1196.

M. Atallah, V. Raskin, M. C. Crogan, C. F. Hempelmann, F. Kerschbaum, D. Mohamed, and plementation”. Pro-

Page 16: A zero text watermarking algorithm based on the probabilistic patterns

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME

299

7. N. F. Maxemchuk and S Low, (1997), Marking Text Documents. Proceedings of the IEEE In-ternational Conference on Image Processing, Washington, DC, pp. 13- 16.

8. D. Huang, H. Yan, (2001), “Interword distance changes represented by sine waves for watermark-ing text images”. IEEE Trans. Circuits and Systems for Video Technology, Vol.11, No.12, pp. 1237 1245.

9. N. Maxemchuk, S. Low, (2000), “Performance Comparison of Two Text Marking Methods”. IEEE Journal of Selected Areas in Communications (JSAC), vol. 16 no. 4, pp. 561-572, 1998.

10. S. Low, N. Maxemchuk, Capacity of Text Marking Channel. IEEE Signal Processing Letters, vol. 7, no. 12 , pp. 345 -347.

11. M. Kim, “Text Watermarking by Syntactic Analysis”. 12th WSEAS International Conference on Computers, Heraklion, Greece, 2008.

12. H. Meral, B. Sankur, A. Sumru, T. Güngör, E. Sevinç , (2009), Natural language watermarking via morphosyntactic alterations. Computer Speech and Language, 23, pp. 107-125.

13. Z. Jalil, A. Mirza, (2009), “A Review of Digital Watermarking Techniques for Text Documents”. International Conference on Information and Multimedia Technology, pp. 230-234, IEEE.

14. M. AtaIIah, C. McDonough, S. Nirenburg, V. Raskin, (2000), “Natural Language Processing for Information Assurance and Security: An Overview and Implementations”. Proceedings 9th ACM/SIGSAC New Security Paradigms Workshop, pp. 5 1-65.

15. H. Meral, E. Sevinc, E. Unkar, B. Sankur, A. Ozsoy, T. Gungor, (2007), “Syntactic tools for text watermarking”. In Proc. of the SPIE International Conference on Security, Steganography, and Wa-termarking of Multimedia Contents, pp. 65050X-65050X-12.

16. O. Vybornova, B. Macq., (2007), “Natural Language Watermarking and Robust Hashing Based on Presuppositional Analysis”. IEEE International Conference on Information Reuse and Integration, IEEE.

17. M. tallah, V. Raskin, C. Hempelmann, (2002), “language watermarking and tamperproofing”. Proc. of al.. Natural 5th International Information Hiding Workshop, Noordwijkerhout, Netherlands, pp.196-212.

18. U. Topkara, M. Topkara, M. J. Atallah, (2006), “The Hiding Virtues of Ambiguity: Quantifiably Resilient Watermarking of Natural Language Text through Synonym Substitutions”. In Proceedings of ACM Multimedia and Security Conference, Geneva.

19. Z Jalil, A. Mirza, H. Jabeen, (2010), “Word Length Based Zero-Watermarking Algorithm for Tamper Detection in Text Documents”. 2nd International Conference on Computer Engineering and Technology, pp. 378-382, IEEE.

20. Z Jalil, A. Mirza, M. Sabir, (2010), “Content based Zero-Watermarking Algorithm for Authentica-tion of Text Documents”. (IJCSIS) International Journal of Computer Science and Information Secu-rity, Vol. 7, No. 2.

21. Z. Jalil , A. Mirza, T. Iqbal, (2010), “A Zero-Watermarking Algorithm for Text Documents based on Structural Components”. pp. 1-5 , IEEE.

22. M.Yingjie, G. Liming, W.Xianlong, G Tao, (2011), “Chinese Text Zero-Watermark Based on Space Model”.In Proceedings of I3rd International Workshop on Intelligent Systems and Applica-tions,pp. 1-5 , IEEE.

23. S. Ranganathan, A. Johnsha, K. Kathirvel, M. Kumar, (2010), “Combined Text Watermarking. In-ternational Journal of Computer Science and Information Technologies”, Vol. 1 (5), pp. 414-416.

24. Fahd N. Al-Wesabi, Adnan Alshakaf, Kulkarni U. Vasantrao, (2012), “A Zero Text Watermarking Algorithm based on the Probabilistic weights for Content Authentication of Text Documents”, in Proc. On International Journal of Computer Applications (IJCA), U.S.A, pp. 388 - 393.

25. M.Vasim Babu and Dr.A.V Ramprasad, “Energy Aware Adaptive Monte Carlo Localization Al-gorithm For WSN Based On Antithetic Markov Chain (AMCAM)” International journal of Computer Engineering & Technology (IJCET), Volume 3, Issue 1, 2012, pp. 180 - 190, Published by IAEME.

26. Karimella Vikram, Dr. V. Murali Krishna, Dr. Shaik Abdul Muzeer and Mr.K. Nara-simha, “Invisible Water Marking Within Media Files Using State-Of-The-Art Technology” Interna-tional journal of Computer Engineering & Technology (IJCET), Volume 3, Issue 3, 2012, pp. 1 - 8, Published by IAEME.

Page 17: A zero text watermarking algorithm based on the probabilistic patterns

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 1, January- February (2013), © IAEME

300

Fahd N.Al-Wesabi was born in Thammar, Yemen on 05 April, 1980. He received his B.Sc. degree in Computer Science from University of Science and Technology, Sana'a, Yemen, in 2006. He later received M.Sc. degree in Computer Information Systems in 2009 from The Arab Academy for bank-ing and financial sciences, Jordon, Yemen branch. He is Assistant teacher, Department of IT, Faculty of Computing and IT, University of Science and Technology, Sana’a, Yemen. Currently he is pursuing his Ph.D research at Department of Computer Science, Engineering College, SRTM

University, Nanded, India. His research interest includes text watermarking, information security, content authentication, and soft computing tools.

Adnan Z. Alsakaf Currently he is pursuing his Ph.D research at Department of Com-puter Science, IIT School, Delhi, India. His research interest includes information security, cryptography, watermarking, and soft computing tools. He is Professor, Department of IS,Faculty of Computing and IT, University of Science and Technology, Sana’a, Yemen.

Kulkarni U. Vasantrao received his B.Sc degree in Electronics, MSc degree in Systems Software; and Ph.D. degree in Electronics and Computer Science Engineering, Fuzzy Neural Networks and their applications in Pat-tern Recognition. He is Professor & Head, Dept. of Computer Science, S.G.G.S. Engineering College, SRTM University, Nanded. His area of spe-cialization includes Artificial Neural Networks, Distributed Systems and Microprocessors.