Top Banner
Secure Network Communication Based on Text-to-Image Encryption Ahmad Abusukhon 1 , Mohamad Talib 2 , Issa Ottoum 3 1 IT Faculty, - Computer Network Department Al-Zaytoonah University of Jordan Amman, JORDAN [email protected] 2 Department of Computer Science University of Botswana Gaborone, BOTSWANA [email protected] 3 IT Faculty, - Computer Network Department Al-Zaytoonah University of Jordan Amman, JORDAN [email protected] ABSTRACT Security becomes an important issue when secure or sensitive information is sent over a network where all computers are connected together. In such a network a computer is recognized by its IP address. Unfortunately, an IP address is attacked by hackers; this is where one host claims to have the IP address of another host and thus sends packets to a certain machine causing it to take some sort of action. In order to overcome this problem cryptography is used. In cryptographic application, the data sent are encrypted first at the source machine using an encryption key then the encrypted data are sent to the destination machine. This way the attacker will not have the encryption key which is required to get the original data and thus the hacker is unable to do anything with the session. In this paper, we propose a novel method for data encryption. Our method is based on private key encryption. We call our method Text-To-Image Encryption (TTIE). KEYWORDS Network; Secured Communication; Text-to- Image Encryption; Algorithm; Decryption; Private key; Encoding. 1 INTRODUCTION Information security is one of the most important issues to be considered when describing computer networks. The existence of many applications on the Internet, for example e-commerce (selling and buying through the Internet) is based on network security. In addition, the success of sending and receiving sensitive data using wireless networks depends on the existence of a secure communication (the Virtual Private Network, VPN) [11]. One of the methods which are used to provide secure communication is Cryptography. Cryptography (or sometimes referred to as encipherment) is used to convert the plain text to encode or make unreadable form of text [9]. An Encryption method uses what is known as an encryption key to hide the contents of a plain text (make it unintelligible). Without knowing the decryption key it is difficult to determine what the plain text is. In computer networks; the sensitive data are encrypted on the sender side in order to have them hidden and protected from 263 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 263-271 The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012)
78
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 2012 (Vol. 1, No. 4)

Secure Network Communication Based on Text-to-Image Encryption

Ahmad Abusukhon1, Mohamad Talib

2 , Issa Ottoum

3

1 IT Faculty, - Computer Network Department

Al-Zaytoonah University of Jordan

Amman, JORDAN

[email protected] 2 Department of Computer Science

University of Botswana

Gaborone, BOTSWANA

[email protected] 3IT Faculty, - Computer Network Department

Al-Zaytoonah University of Jordan

Amman, JORDAN

[email protected]

ABSTRACT

Security becomes an important issue when

secure or sensitive information is sent over a

network where all computers are connected

together. In such a network a computer is

recognized by its IP address. Unfortunately,

an IP address is attacked by hackers; this is

where one host claims to have the IP address

of another host and thus sends packets to a

certain machine causing it to take some sort

of action. In order to overcome this problem

cryptography is used. In cryptographic

application, the data sent are encrypted first

at the source machine using an encryption

key then the encrypted data are sent to the

destination machine. This way the attacker

will not have the encryption key which is

required to get the original data and thus the

hacker is unable to do anything with the

session. In this paper, we propose a novel

method for data encryption. Our method is

based on private key encryption. We call our

method Text-To-Image Encryption (TTIE).

KEYWORDS Network; Secured Communication; Text-to-

Image Encryption; Algorithm; Decryption;

Private key; Encoding.

1 INTRODUCTION

Information security is one of the most

important issues to be considered when

describing computer networks. The

existence of many applications on the

Internet, for example e-commerce

(selling and buying through the Internet)

is based on network security. In addition,

the success of sending and receiving

sensitive data using wireless networks

depends on the existence of a secure

communication (the Virtual Private

Network, VPN) [11]. One of the

methods which are used to provide

secure communication is Cryptography.

Cryptography (or sometimes referred to

as encipherment) is used to convert the

plain text to encode or make unreadable

form of text [9]. An Encryption method

uses what is known as an encryption key

to hide the contents of a plain text (make

it unintelligible). Without knowing the

decryption key it is difficult to determine

what the plain text is. In computer

networks; the sensitive data are

encrypted on the sender side in order to

have them hidden and protected from

263

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 263-271

The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012)

Page 2: 2012 (Vol. 1, No. 4)

unauthorized access and then sent via the

network. When the data are received

they are decrypted depending on an

algorithm and zero or more encryption

keys as described in "Fig.1".

Decryption is the process of converting

data from encrypted format back to their

original format [3]. Data encryption

becomes an important issue when

sensitive data are to be sent through a

network where unauthorized users may

attack the network. These attacks include

IP spoofing in which intruders create

packets with false IP addresses and

exploit applications that use

authentication based on IP and packet

sniffing in which hackers read

transmitted information. One of the

applications that are attacked by the

hackers is the E-mail. There are many

companies providing the E-mail service

such as Gmail, Hotmail and Yahoo mail.

These companies need to provide the

user with a certain data capacity, speed

access, as well as a certain level of

security. Security is an important issue

that we should consider when we choose

Web Mail [14].

Some of the techniques that are used to

verify the user identity (i.e. to verify that

a user sending a message is the one who

he claims to be) are the digital signature

and the digital certificate [5]. Digital

signature and digital certificate are not

the focus of this research.

There are some standard methods which

are used with cryptography such as

private-key (also known as symmetric,

conventional, or secret key), public-key

(also known as asymmetric), digital

signature, and hash functions [17]. In

private-key cryptography, a single key is

used for both encryption and decryption.

This requires that each individual must

possess a copy of the key and the key

must be passed over a secure channel to

the other individual [15]. Private-key

algorithms are very fast and easily

implemented in hardware. Therefore

they are commonly used for bulk data

encryption.

Mainly, there are two types of private-

key encryption; stream ciphers and block

ciphers [1].

Here is a text

message

Encryption Key

#%XYZ#$

Decryption Key

Here is a text

message

Receiver

(cipher text) (plaintext) (plaintext)

Secure

Channel

Figure 1 Encryption and Decryption methods with a secure channel for key exchange.

264

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 263-271

The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012)

Page 3: 2012 (Vol. 1, No. 4)

In stream ciphers a given text is

encrypted one byte or one bit at a time

whereas in block ciphers a given text is

divided into chunks and then chunks are

encrypted using an encryption algorithm.

Example of stream ciphers are RC4

ciphers and one time pad ciphers.

Examples of block ciphers are DES and

AES [15].

Data encryption is performed serially or

in parallel. Data encryption is performed

in parallel in order to speed up

cryptographic transformations. In Block

ciphers algorithms such as DES there are

some of the operations executed serially

like CBC and CFB and other operations

executed in parallel like ECB and OFB

[10]. Parallel encryption is not the focus

of this research. In this research we

focus on stream ciphers rather than block

ciphers.

The main components of the symmetric

encryption include - plaintext,

encryption algorithm, secret key, cipher

text and decryption algorithm. The

plaintext is the text before applying the

encryption algorithm. It is one of the

inputs to the encryption algorithm. The

encryption algorithm is the algorithm

used to transfer the data from plaintext

to cipher text. The secret key is a value

independent of the encryption algorithm

and of the plaintext and it is one of the

inputs of the encryption algorithm. The

cipher text is the scrambled text

produced as output. The decryption

algorithm is the encryption algorithm

run in reverse [16, 3, 14].

Public-key encryption uses two distinct

but mathematically related keys – public

key and private key. The public key is

the non-secret key that is available to

anyone you choose (it is often made

available through a digital certificate).

The private key is kept in a secure

location used only by the user. When

data are sent they are protected with a

secret-key encryption that was encrypted

with the public key. The encrypted

secret key is then transmitted to the

recipient along with the encrypted data.

The recipient will then use the private

key to decrypt the secret key. The secret

key will then be used to decrypt the

message itself. This way the data can be

sent over insecure communication

channels [16]. Examples on public key

encryption are Pretty Good Privacy

(PGP) and RSA. PGP is one of the most

public key encryption methods. RSA

[12] is based on the product of two very

large prime numbers (greater than 10100

).

The idea of RSA algorithm is that it is

difficult to determine the prime factors

of these large numbers. There are other

algorithms used to create public keys

such as E1Game1 and Rabin but these

algorithms are not common as RSA [9].

In this paper, we propose a new data

encryption algorithm based on

symmetric encryption technique. We

propose to encrypt a given text into an

image.

2 RELATED WORK

Bh. P., et al. [2] proposed the Elliptic

Curve Cryptography. In this method

encoding and decoding a text in the

implementation of Elliptic Curve

Cryptography is a public key

cryptography using Koblitz's method [7,

8]. In their work, each point on the curve

represents one character in the text

message. When the message is parsed

each character is encoded by its ASCII

code then the ASCII value is encoded to

one point on the curve and so on. Our

265

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 263-271

The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012)

Page 4: 2012 (Vol. 1, No. 4)

work differs from their work. In their

work they used public-key technique

whereas in our work we use private key

technique. They encoded each character

by its ASCII value but we encode each

character by one pixel (three integer

values - R for Red, G for Green and B

for Blue).

Singh and Gilhorta [15] proposed

encrypting a word of text to a floating

point number that lies in range from 0 to

1. The floating point number is then

converted into binary number and after

that one time key is used to encrypt this

binary number. In this paper, we encode

each character by one pixel (three

integer values R, G and B).

Kiran et al. [6] proposed a new method

for data encryption. In their method the

original text (plain text) was ordered into

a two-directional circular queue in a

matrix say A of a given size say m x n. In

their work data encryption is reliant on

matrix disordering. To do so, they

proposed to perform transformation

operations on the rows or the columns of

matrix A a number of times. They

proposed three types of transformation

operations to be performed on A. These

operations were encoded as follows; 0

for circular left shift, 1 for circular right

shift, and 2 for reverse operation. The

matrix disordering was carried out by

generating a positive random number say

R, and then this number is converted to a

binary number. The decision on which to

perform rows or columns transformation

was based on the value of the individual

bits in the binary number. For example if

the binary bit is 0 then row

transformation is performed otherwise (if

the binary bit is 1) column

transformation is performed. To

determine which transformation

operation should be carried out; another

random number is generated and then

divided by 3. The reminder of the

division is 0, 1, or 2. The reminder

represents the transformation operation.

In case of row transformation, two

distinct rows were selected randomly by

generating two distinct random numbers

say R1 and R2. Another two distinct

random numbers were generated c1 and

c2 that represent two distinct columns.

The two columns c1 and c2 were

generated in order to determine the range

of rows in which transformation had to

be performed. After the completion of

each transformation a sub-key is

generated and stored in a file key. The

file key is then sent to the receiver to be

used as decryption key. The sub-key

format is (T, Op, R1, R2, Min, Max)

where:

T: the transformation applied to either

row or column.

Op: the operation type coded as 0, 1, or

2, e.g., shift left array contents, shift

right array contents, and reverse array

contents.

R1 and R2: two random rows or

columns.

Min, Max: minimum and maximum

values of range for two selected R1 and

R2.

3 OUR ALGORITHM

Here we describe the main features of

our proposed algorithm TTIE. Our

algorithm includes two main phases

namely the TTIE phase (this is where

our work is based) and the ISE (Image-

Shuffle Encryption) phase. In the TTIE

phase the plain text is transformed

(encrypted) into an image. In this phase

the plain text is concatenated as one

string and then this string is stored into

an array of characters say C. For each

266

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 263-271

The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012)

Page 5: 2012 (Vol. 1, No. 4)

Server

Client

(RGB) or pixel 1

(RGB) or pixel 12

Pixels Matrix

Shuffle the matrix to produce a new Matrix. Random columns

/rows are swapped with another

random columns/rows.

Store the pixels into an Image “img” of type PENG

cryptography

R1,R2,R3 R34,R35,R36

c y

Key 1 is generated randomly. 3 random numbers for each character

Plaintext

Key 2

“img” image is sent to the client

Read the pixels from the

image “img”.

Re-shuffle the matrix to

produce the original one.

Use key 2

Get the pixel’s matrix

cryptography

R1,R2,R3

c y

Use key 1 Key1 R34,R35,R36

ciphertext

ciphertext

Plaintext

character in C, one pixel of the resulting

image is generated. Each pixel consists

of three integers created randomly in

advance and before the transformation

(encryption) begins (see Fig 3-A, key 1).

Each integer of the three integer values

represents one color. The color value is

in the range from 0 to 255. The result of

this phase is a matrix, say M, in which

each three contiguous columns in a

given row represent one character of the

original text (plain text). This is done in

order to make it difficult for hackers to

guess what the plain text is. To the best

of our knowledge, no previous work has

attempted transforming a text file into an

image.

The second phase is the ISE phase. The

work in this phase is based on a previous

work carried out by Kiran et al. [6]. In

the ISE phase the matrix M is shuffled a

number of times. The shuffle process

includes row swapping and column

swapping. In row swapping, two rows

are selected randomly and then swapped.

In column swapping two columns are

selected randomly and then swapped.

This matrix disordering makes it

difficult for hackers to guess the original

order of the matrix M. The shuffle key

(key 2) is shown in Fig. 3-B. These two

phases (the TTIE and the ISE) are

carried out on the sender machine (in

this paper it is the server machine) as

described in Fig. 2.

The encrypted message is then sent to

the client machine where the message is

Figure 2 The main steps of the Text-to-Image-Encryption (TTIE) algorithm Figure 2 The main steps of the Text-to-Image-Encryption (TTIE) algorithm

267

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 263-271

The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012)

Page 6: 2012 (Vol. 1, No. 4)

0#5#5#12#13#17#20#25#25#30#32#32#37#41#37#47#52#53#55#56#60#68#69#68#78#74#79#88#82#86#9

(A) Part of Key 1

5736834348:644:34:3641834:868:4348:644,34:364,438:1643,34::6413:316:33::6:4:38:364:138136::8313463:

(B) Part of Key 2

decrypted using key2 and key1

respectively.

4 OUR EXPERIMENT

Java NetBeans is used as a vehicle to

carry out our experiments. We build the

client's and server's programs on

different machines and then we tested

sending and receiving data on both sides.

We use the following text message in

our experiments:

"encryption is the conversion of data

into a form called a cipher text that

cannot be easily understood by

unauthorized people. decryption is the

process of converting encrypted data

back into its original form so it can be

understood. The use of encryption

decryption is as old as the art of

communication in wartime. a cipher

often incorrectly called a code can be

employed to keep the enemy from

obtaining the contents of transmissions.

technically a code is a means of

representing a signal without the intent

of keeping it secret.

examples are morse code and ascii

simple ciphers include the substitution of

letters for numbers the rotation of letters

in the alphabet and the scrambling of

voice signals by inverting the sideband

frequencies". [13].

"Fig. 3" shows part of the generated keys

namely "Key 1" and "Key 2" whereas

"Fig. 3" (A) shows the format of "Key

1". Each value is delimited by the #

symbol. The first three values (0, 5, 5)

represent one pixel in the result image.

In this pixel, R (the Red color value) = 0,

G (the Green color value) = 5, and B

(the Blue color value) = 5. In order to

guarantee that distinct letters have

unique colors i.e. unique RGB values,

we create 26 different ranges because of

26 alphabets. For example, these ranges

are unique subsets of the main set which

ranges from 0 to 255. The letter A may

be represented by RGB values in the

range from 0 to 9, the letter B may be

represented in the range from 10 to 19

and so on. This pixel (0, 5, 5) represents

the letter A. The next three values (12,

13, 17) are another pixel which

represents the letter B and so on.

Figure 3 The format of Key1 and Key2

Figure 4 Cipher text – the output of Text-to-Image-Encryption

268

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 263-271

The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012)

Page 7: 2012 (Vol. 1, No. 4)

"Fig. 3" (B) shows the format of "Key2".

Each two contiguous values represent

two columns in the matrix M. The first

pair in Key 2 is 375:364 which means

that column number 375 is swapped with

column number 364 and so on.

"Fig. 4" shows the cipher text (is the text

after it is encrypted as an image). The

image in Fig. 4 is zoomed out many

times to make it clear. In this image

pixels are created randomly and thus

they do not form a known shape like

tree, fish, mobile, etc. The image shown

in "Fig. 4" is sent to the client and on the

client side we decrypt the cipher text

shown in "Fig. 4" then we finally get the

original text message (i.e. the plain text).

5 ANALYSIS In our algorithm each letter is

represented by a random pixel, i.e., three

random values namely R, G and B. To

attack the data, hackers need to guess the

following:

1. That each three contiguous values

represent one letter. Since we send the

data as integers’ values, it is hard to

guess that each three contiguous values

represent one letter.

2. If a hacker is able to guess point 1,

then he needs to guess what random

numbers represent the letters A, B, C,

etc. In other words, a hacker needs to

guess the value of key 1 "Fig. 3". Note

that guessing the value of key 1 is

difficult since we shuffle (scramble) the

matrix using key 2 (key 2 is based on the

algorithm described in [6]). For

example, suppose that the message we

want to send is "abcd". Using key 1

"Fig. 3" (A) the random numbers

generated for "a", “b”, “c” and “d” are

(0,5,5), (12,13,17), (20,25,25), and

(30,32,32) respectively. The matrix

before shuffling is described in Table-1.

Table-2 describes the matrix after

shuffling (Table-2 describes a simple

swap operation where column 1 is

swapped with column 2).

Table 1 Pixels before shuffling- each three

contiguous integers in a row represent one pixel

or one letter.

Letter R-value G-value B-value

A 0 5 5

B 12 13 17

C 20 25 25

D 30 32 32

Table 2 Pixels after column 1 is swapped with

column 2

Letter R-value G-value B-value

? 5 0 5

? 13 12 17

? 25 20 25

? 32 30 32

Using statistical analysis, hackers may

guess the letters from Table-1. However,

it is very difficult for hackers to guess

the letters from Table-2 because the

order of the values RGB is changed. In

other words, each three contiguous

values RGB in Table-1 which represent

one letter are now distributed randomly

in Table-2 and thus make it difficult to

guess that letter even if hackers use

statistical analysis (a method involving

a statistical breakdown of byte patterns

such as the number of times any

particular value appears in the

encrypted output would quickly reveal

whether any potential patterns might

exist). Similarly, it is hard for "letter A

follows letter B" analysis to decrypt the

cipher text.

With the simple calculation, the number

of possible permutations to encrypt 26

letters is-

((256)3)26

) (1)

Since each pixel consists of three values

and each one of these values is in the

269

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 263-271

The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012)

Page 8: 2012 (Vol. 1, No. 4)

range from 0 to 255, choosing three

values has (256)3

permutations. We have

26 letters and thus the permutations for

26 letters is ((256)3)26

which is equal to

1.1679847981112819759721399310593

exp+195. The individual keys: key1 and

key2, are generated each time a new

message is sent. This is done in order to

avoid regularity in the resultant cipher

text.

6 CONCLUSION AND FUTURE

WORK

In this paper, we add another level of

data security at the top of the data

security system proposed by Kiran et al.

[6]. In our method of encryption we first

encrypted the text to an image (matrix of

pixels) then based on the work done by

Kiran et al. [6], we scrambled the matrix

to a new one making it more difficult for hackers to guess the original text

message. Our algorithm is good for text

encryption for a network system as well

as for individual offline machines. It is

also useful for e-mail security since all

messages stored in the mail box will be

displayed as images and thus even if

someone leaves the e-mail page on it is

difficult for others to guess the meaning

(the original text) of these images. In

future, we propose to investigate

dividing the text into blocks and then

transfer each block into an image and

thus create an individual key for each

block. This will make it difficult for

hackers to use statistical approach to

guess the color of each letter since

different colors will be assigned to a

specific letter when it appears in

different blocks. In addition we will

investigate the efficiency of our

proposed algorithm (the TTIE) when

large scale data collection (multiple

Gigabytes) is used.

ACKNOWLEDGMENT

I would like to acknowledge and extend

my heartfelt gratitude to Al-zaytoonah

University for their financial support to

carry out this work successfully.

REFERENCES

[1] Bellare, M., Kilian J., and Rogaway, P.: The

Security of cipher block chaining. In

Proceedings of the Conference on Advances

in Cryptology (CRYPTO’94). Lecture Notes

in Computer Science, vol. 839 (1994).

[2] Bh, P., Chandravathi, D., Roja, P.: Encoding

and decoding of a message in the

implementation of Elliptic Curve

cryptography using Koblitz’s method.

International Journal of Computer Science

and Engineering, 2(5) (2010).

[3] Chan, A.: A Security framework for privacy-

preserving data aggregation in wireless

sensor networks. ACM transactions on

sensor networks 7(4) (2011).

[4] Chomsiri, T.: A Comparative Study of

Security Level of Hotmail, Gmail and Yahoo

Mail by Using Session Hijacking Hacking

Test. International Journal of Computer

Science and Network Security IJCSNS, 8(5)

(2008).

[5] Goldwasser, S., Micali, S., L.Rivest, R.: A

Digital signature scheme secure against

adaptive chosen-message attacks, SIAM

Journal of Computing 17(2) pp. 281-308

(1998).

[6] Kiran Kumar, M., Mukthyar Azam, S., and

Rasool, S.: Efficient digital encryption

algorithm based on matrix scrambling

technique. International Journal of Network

Security and its Applications (IJNSA), 2(4)

(2010).

[7] Koblitz, N.: Elliptic Curve cryptosystems,

Mathematics of Computation, 48 (1987),

pp. 203-209 (1987).

[8] Koblitz, N.: A Course in Number Theory and

cryptography. 2'nd edition. Springer-Verlag

(1994).

[9] Lakhtaria K. Protecting computer network

with encryption technique: A Study.

International Journal of u- and e-service,

Science and Technology 4(2) (2011).

[10] Pieprzyk, J. and Pointcheval, D.: Parallel

Authentication and Public-Key

270

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 263-271

The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012)

Page 9: 2012 (Vol. 1, No. 4)

Encryption. The Eighth Australasian

Conference on Information Security and

Privacy (ACISP '03). Wollongong,

Australia) R. Safavi-Naini Ed. Springer-

Verlag, LNCS. (2003).

[11] Ramaraj, E., and Karthikeyan, S.: A New

Type of Network Security Protocol Using

Hybrid Encryption in Virtual Private

Networking. Journal of Computer Science

2(9) (2006).

[12] Rivest, R.L., Shamir, A and Adelman, L.:

A method of obtaining digital signatures

and public key cryptosystems. Comms.

ACM, 21(2) (1978).

[13] SearchSecurity , definition Encryption

[online] available at:

http://searchsecurity.techtarget.com/definit

ion/encryption Accessed on 13-06-2012.

[14] Shannon, C. E.: Communication Theory of

secrecy systems. Bell System Technical

Journal (1948).

[15] Singh, A., Gilhorta, R.: Data security using

private key encryption system based on

arithmetic coding. International Journal of

Network Security and its Applications

(IJNSA), 3(3) (2011).

[16] Stalling, W.: Cryptography and network

security principles and practices ,4th

edition Prentice Hall. [online] Available

at: http://www.filecrop.com/cryptography-

and-network-security-4th-edition.html,

Accessed on 1-Oct-2011.

[17] Zaidan, B., Zaidan A., Al-Frajat, A., Jalab,

H.: On the differences between hiding

Information and cryptography techniques:

An Overview. Journal of Applied Sciences

10(15) (2010).

271

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 263-271

The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012)

Page 10: 2012 (Vol. 1, No. 4)

Department of Information SystemsUniversity of Cape Town

South [email protected], {adrie.stander, jacques.ophoff}@uct.ac.za

Index Terms—Mobile-cellular network, Base station, Cell-phone, Location, Information accuracy

I. INTRODUCTION

It is well known that the location of a cellphone, and thus thelocation of its user, can be determined with a certain degreeof accuracy. This information can be used to offer variouslocation-based services and creates the opportunity to buildnew information services that can be useful to both cellphoneusers and companies. In addition, location information can beused in other scenarios, such as providing law enforcementagencies with tracking data [1]. One example is that of amurder suspect being found by police after inserting his SIMcard into the cellphone of a murder victim [2].

Location information can be used to aid police in trackingmovements during investigations and locating suspects. How-ever, it can also be valuable in tracing people for humanitarianreasons, such as search-and-rescue teams defining search areasfor locating missing persons. By increasing the accuracy oflocation information the process of finding the cellphone andits user can be made faster, simpler, and cheaper. In borderlinecases it can be the difference between finding someone in needof medical attention in time, or catching a suspect who wouldhave otherwise escaped.

Many of the most feasible methods for estimating the loca-tion of a cellphone within a mobile-cellular network dependson using the location of network base stations as known refer-ence points from which to calculate the estimated position ofthe cellphone. The benefit of such network-based approaches

is that no modifications to the handset or network are required.However, by using network, handset, or hybrid approaches theaccuracy of location information can be improved [1].

This study investigates the accuracy with which the lo-cations of network base stations are known, as inaccuracycan impair the ability of many of the most feasible methodsto provide accurate cellphone location estimates. It starts byproviding background information on current techniques fordetermining the location of a cellphone within a mobile-cellular network. Thereafter the research methodology fol-lowed in the investigation is discussed, followed by a reportof the data collected. Finally, the findings are presented andthe implications are highlighted.

II. BACKGROUND

Many handset and network techniques for determininglocation exist. The most widely known, using the internalhardware of the cellphone, is satellite positioning using GPSbut WiFi, Bluetooth, and augmented sensor networks can alsobe employed [3], [4], [5]. The accuracy of these techniquescan vary depending on the technology, line-of-sight, and sensornetwork coverage [6]. An improvement is to use such hardwarein combination with mobile-cellular network information, suchas in the case of Assisted-GPS (A-GPS) which uses networkresources in the case of poor signal reception.

In addition new algorithms have greatly improved the ac-curacy and efficiency with which a cellphone can calculateits position [7], [8]. However, major obstacles including highenergy usage and non-availability of features in older cell-phones remain. Thus using location methods based primarilyon mobile-cellular network information is widespread.

Global System for Mobile Communications (GSM) net-works were not originally designed to calculate locations forthe cellphones which access and make use of the network.Many methods have been proposed and developed to be retro-fitted to existing networks [9]. There are a range of accuraciesand costs associated with the various methods. The followingare the most feasible methods, in order of increasing potentialaccuracy.

• Cell identification (Cell ID) is the simplest location esti-mation method available, but also the least accurate. Theestimated area is at best a wedge shaped area, comprisingroughly a third of the cell (for three sectored sites), but

An Analysis of Base Station Location Accuracywithin Mobile-Cellular Networks

272

Liam Smit, Adrie Stander and Jacques Ophoff

Abstract—An important feature within a mobile-cellular net-work is that the location of a cellphone can be determined.As long as the cellphone is powered on, the location of thecellphone can always be traced to at least the cell from whichit is receiving, or last received, signal from the cellular network.Such network-based methods of estimating the location of acellphone is useful in cases where the cellphone user is unableor unwilling to reveal his or her location, and have practicalvalue in digital forensic investigations. This study investigatesthe accuracy of using mobile-cellular network base stationinformation for estimating the location of cellphones. Throughquantitative analysis of mobile-cellular network base stationdata, large variations between the best and worst accuracy ofrecorded location information is exposed. Thus, depending on therequirements, base station locations may or may not be accurateenough for a particular application.

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 272-279The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012)

Page 11: 2012 (Vol. 1, No. 4)

can include the entire circular area for sites using omni-directional antennas in low-density single sector cells[10].

• Round Trip Time (RTT) is merely a measure of distancefrom the base station which is calculated from the timetaken by a radio signal to travel from the base station tothe cellphone and back. It provides a drastic reductionin the estimated location area compared to the Cell IDmethod for the same site.

• Cell ID and RTT combines the aforementioned methodsto provide an estimated location for the cellphone wherethese areas overlap [11].

• Observed Time Difference of Arrival (OTDOA) useshyperbolic arcs from three (or more) base stations toestimate the location of a cellphone. These arcs aredetermined by the distance that the radio signals travel inthe measured time (i.e. the difference) [12].

• Angle of Arrival (AOA) is a seemingly practical solutiondue to its straightforward method of calculating an esti-mated location from the intersection of the bearings to thecellphone provided by each base station. In practice thismethod requires expensive antenna arrays, which limit itsfeasibility despite its potential for high accuracy [10].

It is important to bear in mind that all of the above methodsestimate the location of the cellphone, and thus its user, relativeto the location of the base station. Next follows a discussionof factors impacting on accuracy and ways of negating thesefactors.

A. Factors that negatively impact accuracy

There are a number of well recognized challenges to accu-rately determining the location of cellphones. In addition todegrading accuracy these challenges can also increase the costof estimating location. These challenges include non-line-of-sight and multi-path propagation of radio waves, the near-fareffect in Code Division Multiple Access (CDMA) based thirdgeneration networks [12], base station density (or lack thereof)and accuracy of base station locations [13], optimisations fornetwork capacity, and the unsynchronised nature of UniversalMobile Telecommunications System (UMTS) type networks[14].

There are varying levels of accuracy inherent to the methodsand combinations thereof, as well as the enhancements whichhave been implemented for a particular method. In order ofincreasing accuracy: Cell ID (the whole area of a circularcell), Cell ID and sector (the area of the wedge), Cell ID andRTT (circular band), Cell ID and the intersection of multipleRTT determined hyperbolic arcs and A-GPS (outdoor onlyand which requires GPS functionality to be available in thecellphone) [15]. Pilot correlation method (PCM) has been leftout of the list as it can be made as accurate as the fidelity ofthe spacing of the measurement sites [16].

Certain base stations with low utilisation, in small townsfor example, will not be sectored and there will only beone site. It will be possible to obtain a circular band fromRTT calculations, but to achieve a more precise location will

require adding another measurement technique such as PCMor probabilistic fingerprinting [17].

B. Methods of improving accuracy

To address these challenges there are various solutions andenhancements to methods for estimating location that can beemployed. Less accurate measurements can be identified andthen discarded, re-weighted or adjusted. It is feasible to usemore than the minimum number of required data points, othermethods which are not impacted by inaccurate measurements,and improving the precision of data by employing high fidelitymeasurements and oversampling [15]. It is also possible to em-ploy techniques such as forced soft handover and minimisingproblems by using methods which are not negatively affectedby challenges such as non-line-of-sight or multi-path radiowave propagation.

The methods of estimating location can be organised intotwo groups. The first group consists of those methods whichdo not depend on base station location and are thus unaffectedby the accuracy with which these locations are known. Thesemethods include A-GPS, PCM [16], probabilistic fingerprint-ing [17], bulk map-matching, and the centroid algorithm [18].

The second group consists of methods which estimate thelocation of the cellphone and its user relative to the location ofthe base station and are therefore dependant on the accuracywith which these network base station locations are known.These include the Cell ID based methods of Cell ID, Cell IDand RTT, enhanced Cell ID and RTT, as well as cell polygonsand RTT [15]. The Time of Arrival (TOA), OTDOA, as wellits enhancements, such as cumulative virtual blanking, areaffected in a similar fashion although this may have moreof an impact as these methods are meant to deliver greateraccuracy than the Cell ID based methods [14]. While not verywidespread in implementation, the methods of AOA and theTOA to the Time Difference of Arrival algorithm are alsonegatively impacted [12].

There are a range of direct and indirect costs that can beattributed to most methods. The greater the work involvedin network configuration, the larger the amount of additionalhardware, and the more involved the deployment the higher thecost. Some methods require more human intervention to setup, such as PCM and probabilistic fingerprint matching, whilstothers might require additional hardware, such as OTDOArequiring location measurement units. There is also the possi-bility that certain methods will reduce the network capacity.Thus it is vitally important to the network operator thatexisting infrastructure information (i.e. network base stationlocations) is as accurate as possible, to minimise and managefurther costs to improve accuracy.

In summary, it can be seen that there are many methodsof determining the location of a cellphone within a mobile-cellular network. While some of these are not dependent onbase station location, the majority of network-based methodsare. The accuracy of such data is thus the main focus of thisstudy.

273

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 272-279The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012)

Page 12: 2012 (Vol. 1, No. 4)

III. RESEARCH METHODOLOGY

A quantitative analysis of base station information in aSouthern African mobile-cellular network was performed. Thepopulation consisted of all active base stations that form partof the network. Any base station that was operational on thenetwork (including those that had recently gone live or arescheduled to be replaced) was included due to the possibilitythat such a base station could participate in estimating thelocation of cellphones.

To evaluate the accuracy of base stations locations, theyhad to be evaluated by comparing their recorded locations toobservations of their actual locations. For each base station aGPS location in a valid number format was stored in the net-work database. The method used to measure the base station’sactual observed location in order to be able to compare it tothe stored value also served to validate the stored value.

As this is a time consuming process it was not performed forall base station sites. Instead the entire population consistingof all available recorded base station locations was sampled.All sub-populations needed to be represented in the samplein order to be able to compare their results for commonalitiesor differences. Each of the ten regions which comprise theSouthern African network were individually queried to find alist of sites that contain operational base stations. The samplinginterval was determined by taking the number of sites anddividing it by the desired minimum sample size of thirtybase stations for each region. The sampling interval was thenrounded down in order to provide some spare sample basestation locations in the event of being unable to locate one ormore of the selected base stations and having to select another.A sampling method of a random starting number followed byperiodic sampling was employed.

For each sample the latitude and longitude was entered intoGoogle Maps [19] with maximum zoom enabled together withthe ‘Satellite’ and ‘Show labels’ options selected. The resultingaerial photograph was examined to identify the presence of abase station. If the base station could be identified then itsposition was measured using a set procedure:

• The map was centred on the base of the sampled basestation using the ‘Right-Click’ and ‘Center map here’function.

• The latitude and longitude of the map centred on the basestation was copied via the ‘Link’ function.

For each base station that was found by the above process,the following additional information was captured in a spread-sheet to add to the original recorded base station location:

• The base station’s location was categorized as servingeither: 1) a population centre (city, town, suburb, village,township, commercial or industrial area), or 2) an areaoutside of a population centre (mountains, road, farms ormines).

• Categorising information was captured for each basestation location: 1) technology generation (second and/orthird), and 2) equipment vendor.

Fig. 1. Aerial view of palm tree

Fig. 2. ‘Street View’ of palm tree

• The GPS coordinates of the recorded and measuredlocations were then used to calculate the difference inmetres between the two using the ‘Great Circle’ method:1) employ the law of cosines, 2) convert to radians, and3) multiply by the radius of Earth.

If a base station could not be identified from the aerialphotograph then the Google Maps Street View function wasused to assist with identifying the base station location. If thebase station still could not be detected then it was discardedand the next base station was selected and the identificationand measuring process repeated. Reasons for not being able toidentify a base station included unclear satellite photographs,the use of camouflage, and multiple base stations in close prox-imity to each other. An example of the difficulty in identifyingstructures is illustrated in Figures 1 and 2, which shows anaerial and ‘Street View’ of a base station camouflaged as apalm tree.

The first stage of analysis consisted of categorising thecollected data into various categories, such as geographicregion, technology type, vendor, site owner, and whether or notthe base station serves a population centre. This was followedby finding the minimum (best accuracy), maximum (worstaccuracy), median, average and standard deviation valuesfor the location accuracy data in each category. Accuracyresults for base stations were placed into categories of variousintervals of accuracy to better allow for evaluation in termsof desired levels of accuracy of the base station locations for

274

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 272-279The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012)

Page 13: 2012 (Vol. 1, No. 4)

TABLE ISUMMARY OF ENTIRE SAMPLE

IntervalSpacing

STDV Worst Best AVG Median SampleSize

5 152.38 1634 0.52 77.04 25.38 369

varying applications.The preceding steps allowed for comparisons between dif-

ferent categories to see if there were differences or similaritiesin terms of accuracy. By identifying the base stations sitesfor which the recorded location accuracy was far worse andcategorising them as outliers, these sites could be revisited inan attempt to find out why they differed so markedly to therest of the base station locations in the category.

IV. DATA ANALYSIS

Due to the nature of how the network database was con-structed the location data was both complete and in a validnumber format. Accuracy was examined for the entire sampleas well as the various categories of base stations. The best,worst, average (AVG) and median accuracies, together withthe standard deviation (STDV) were calculated and is shownin Table I.

By starting with a high level overview of all sampled basestation locations it is possible to gain an understanding of therange of accuracies for the overall sample population. Thedata is represented in Figure 3 as a cumulative percentage ofthe base stations for a given level of accuracy. For example66.67 percent of base stations have a recorded location that isaccurate to within 50 metres of the measured location while80 percent of recorded base station locations are accurate towithin 100 metres of their measured locations.

In a near ideal situation 100 percent of the base stationlocations would be accurate to less than two and half metresand rounded down, with zero deviation remaining the ultimateprize. This would result in a vertical line at zero metres fromzero to 100 percent (of base stations) after which it would thenmake a ninety degree turn to the right, indicating that all basestation locations are accurate to within the distances given onthe X axis.

Fig. 3. Entire Sample

Fig. 4. Map of South Africa [20]

Fig. 5. Distribution per region

A. Regions

The base stations that comprise the sample are situatedin ten regions. These regions are Central (CEN), Eastern(EAS), KwaZulu Natal (KZN), Lesotho (LES), Limpopo(LIM), Mpumalanga (MPU), as well as Northern (NGA), Cen-tral (SGC) and Southern Gauteng (SGS) and lastly Western(WES). These regions correspond in area to the provinces ofSouth Africa, which are illustrated in Figure 4 for reference.Figure 5 shows the distribution graph for these regions.

The KwaZulu Natal region stands out markedly as havingthe best average and median accuracy values. It also has thelowest worst accuracy figure, which all told, results in it havingthe lowest standard deviation.

The Lesotho region has an extremely large worst accuracyfigure which results in it having the worst average and thehighest standard deviation of all the regions.

The Central Gauteng region stands out for having thehighest median value, despite not having a large worst value.The accuracy of the Central Gauteng is lower that of the

275

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 272-279The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012)

Page 14: 2012 (Vol. 1, No. 4)

Fig. 6. Vendors

Lesotho and Southern Gauteng regions for the cumulative mostaccurate 80 percent of base stations portrayed in Figure 5.It lags the other regions until the 160 metres of accuracylevel is reached where it then begins to rapidly surpass thecumulative percentage of the other regions. In addition to theCentral Gauteng and Lesotho regions, the Southern Gautengand Northern Gauteng regions also lag behind the accuracy ofthe more accurate regions.

B. Vendors

The sampled base stations can also be categorised by thenetwork equipment vendors that supply them. These basestation vendors are Alcatel, Huawei, Motorola and Siemens.As before the highest (worst) numbers have been marked inbold and the lowest (best) numbers have been italicised inaddition to be marked in bold.

Looking at Table II it is clear that Siemens offers the bestoverall accuracy of the vendors and Huawei the worst, withAlcatel and Motorola falling in between these two extremes.

However when analysing Figure 6, it is apparent that Alcateloffers the best accuracy for the most accurate cumulative 85percent of its base stations that were measured (up to 110metres difference between recorded and measured locations).Only when the last 15 percent of the base stations with accu-racies worse than 110 metres are included, is it overtaken bySiemens. The accuracy of the base station location informationfor Huawei is confirmed as the lowest of the four vendors withMotorola assuming a position between it and the two more

TABLE IIBASE STATION DATA CATEGORISED BY VENDORS

Vendor STDV Worst Best AVG Median SampleSize

Alcatel 141.77 879.32 0.52 68.14 19.98 121Huawei 133.76 849.44 1.73 86.8 36.59 94Motorola 170.9 1634 1 77.12 25.27 150Siemens 62.05 296.55 1.99 47.52 19.35 94

Fig. 7. Technology generation

accurate vendors.

C. Technology generation

When categorising base station locations by technologygeneration (for example second or third) there are three cate-gories. This is due to co-location of base stations of differentgenerations on the same sites. It is however not a simple ‘onefor one’ correlation but rather a case where a site which hasa second generation base station on it may also have a thirdgeneration base station on it but the converse is not necessarilytrue. This results in the three categories of sites:

1) Those with only second generation base stations (2ndOnly).

2) Those with both third and second generation base sta-tions (3rd & 2nd).

3) Those with second generation base stations which willpossibly, but not necessarily, also include third genera-tion base stations (2nd (incl. 3rd)).

In comparing the sites in Figure 7 it becomes clear thatthe locations of those sites that contain third (and second)generation base stations are known with better accuracy thanthose containing only second generation base stations.

Sites that contain second generation base stations, andpossibly include third generation base stations, tend to fall inthe middle. Unfortunately there is no set of sites that containonly third generation base stations and which would enablethe comparison of sites that contain only second generationbase stations to those that contain only third generation basestations.

D. Site owner

Base station sites are not necessarily used exclusively by theowner of the sites. This leads to a situation where some basestations are installed on sites that belong to another networkoperator. The “Own” network sites constitute the vast majorityof the sampled base station locations. As such it was necessaryto combine the sites from the other vendors into a singlecategory “Other” in order to achieve a meaningful sample size.

276

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 272-279The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012)

Page 15: 2012 (Vol. 1, No. 4)

According to Table III despite the ”Own” category contain-ing a very large worst accuracy figure and being only slightlyworse for best accuracy, it offers better overall accuracy asshown by all other metrics.

When reviewing Figure 8, for any cumulative percentage,the “Own” category has a lower (better) accuracy measurefor base stations locations than the “Other” category for atleast the first cumulative 95 percent of most accurate recordedlocations.

E. Population centres

Base station locations contain base stations that either servecentres of population or the areas in between them. Basestations serving population centres have a higher median valuethan the those serving the areas between population centres.However, Figure 9 shows that base stations in population cen-tres only have better accuracy once the last (most inaccurate)15 percent of the base station locations are included.

F. Outliers

Outliers were defined as the ten percent of the total samplewith the worst accuracy. Notably this category covers allregions except for the KwaZulu Natal region and with onlyone base station location for Western region. In Table IV theresults for the ten percent least accurate base station locationsare presented. Even looking past the ‘Worst’ accuracy figureand instead at the average, median or even the ‘Best’ figuresthe outlier locations are clearly very inaccurate.

To gain an understanding of why outliers occur and howtheir accuracies can be so poor, examples of outliers were se-lected to illustrate the difference in recorded versus measuredaccuracy.

TABLE IIIBASE STATION DATA CATEGORISED BY SITE OWNER

Site owner STDV Worst Best AVG Median SampleSize

Own 151.05 1634 1 73.07 25.07 318Other 161.93 879.32 0.52 105.61 49.14 49

Fig. 8. Site owner

TABLE IVBASE STATION OUTLIERS

IntervalSpacing

STDV Worst Best AVG Median SampleSize

25 297.67 1634 178.65 410.15 303.92 38

The location of the access road (marked with a red ‘A’)which is used to reach the base station instead of the locationof the base station itself (marked with six red dots) has beenrecorded in Figure 10. This Northern Gauteng region basestation serves a population centre but its location is off by324 metres.

The Pretoria University building (tagged with Green arrow)in Figure 11 has been recorded instead of the actual locationof the base station (indicated by six red dots) on the grounds.This base station serves a population centre in the NorthernGauteng region. It has a difference of 178.5 metres betweenits recorded and measured locations.

Figure 12 shows that while the recorded location (marked bythe red ‘A’) is atop the same mountain in the Central region,it does not follow the track all the way to the base station(circled with red dots). This results in a deviation of 879 metresfrom the measured location of the base station which serves a

Fig. 9. Population centres

Fig. 10. Watloo Despatch

277

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 272-279The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012)

Page 16: 2012 (Vol. 1, No. 4)

Fig. 11. Pretoria University

Fig. 12. Carnarvon

population centre at the foot of the mountain.From the above data several points need to be considered.

Firstly, the large outliers and standard deviations for allvendors, technology generations, site owners, and almost allregions. The KwaZulu Natal region was a notable exception tothis pattern, proving by example that good accuracy is entirelypossible. Secondly, one category could be cumulatively moreaccurate for the majority of its (more accurate) base stationlocations but when including its least accurate base stations,these were so inaccurate that its overall accuracy would dropbelow that of another category. Lastly, the extent of theinaccuracy for the outliers was so great that it warrantedfurther assessment. This revealed the ease with which highlyinaccurate locations could be recorded.

V. CONCLUSIONS

This paper builds on previous research, emphasising theimportance of accurately knowing base station location forcellphone localisation [12], [21]. The nature of this studyallows it to be replicated in any country and for any technologytype or other category of base station site. The resultingdata shows that depending on the requirements, base stationlocations may or may not be accurate enough for a particular

application. This could have serious implications when thedata is used for security-related incidents.

Base station accuracies ranged from less than one metreto more that 1600 metres. Fifty percent of base stations wereaccurate to 25 metres (rounded) and 80 percent are accurate to100 metres (rounded). However to include 90 percent of basestations it would be necessary to accept base station locationsthat were off 180 metres (rounded). The deviation of the leastaccurate ten percent of base station locations ranged from179 to 1634 metres. The significance of these inaccuraciesand their impact would depend on the particular applicationand its requirement for accuracy. When investigating outliers adiscernible pattern emerged, revealing that the given locationswere actually the access point, or the access road to the basestation was recorded instead of the base station itself.

Network operators can improve the accuracy of the esti-mated locations that they are able to provide by increasingthe accuracy of recorded base station locations. This canbe done by analysing and measuring aerial photographs orthrough taking more accurate measurements when performingroutine maintenance, upgrades or equipment swap-outs of basestations.

REFERENCES

[1] I.A. Junglas and R.T. Watson, “Location-based services,” Commun. ACM,vol. 51, no. 3, pp. 65–69, 2008.

[2] J. Warner, “Murder Suspect Caught,” Weekend Argus (Sept. 11), p. 4,2010.

[3] V. Zeimpekis, G.M. Giaglis, and G. Lekakos, “A Taxonomy of Indoor andOutdoor Positioning Techniques for Mobile Location Services,” SIGecomExch., vol. 3, no. 4, pp. 19–27, 2003.

[4] M. Hazas, J. Scott, and J. Krumm, “Location-Aware Computing Comesof Age,” Comput., vol. 37, no. 2, pp. 95–97, 2004.

[5] A. Kupper, Location-Based Services: Fundamentals and Operation.Chichester: Wiley, 2005.

[6] S. von Watzdorf and F. Michahelles, “Accuracy of Positioning Data onSmartphones,” in Proc. 3rd Int. Workshop on Location and the Web,Tokyo, Japan, 2010, pp. 1–4.

[7] M. Ibrahim and M. Youssef, “A Hidden Markov Model for LocalizationUsing Low-End GSM Cell Phones,” in Proc. 2011 IEEE Int. Conf. onCommunications (ICC), Cairo, Egypt, 2011, pp. 1–5.

[8] J. Paek, K. Kim, J.P. Singh, and R. Govindan, “Energy-Efficient Position-ing for Smartphones using Cell-ID Sequence Matching,” in Proc. 9th Int.Conf. on Mobile Systems, Applications, and Services, Maryland, USA,2011, pp. 293–306.

[9] W. Buchanan, J. Munoz, R. Manson, and K. Raja, “Analysis and Mi-gration of Location-Finding Methods for GSM and 3G Networks,” inProc. 5th IEE Int. Conf. on 3G Mobile Communication Technologies,Edinburgh, United Kingdom, 2004, pp. 352–358.

[10] J. Borkowski, “Performance of Cell ID+RTT Hybrid Positioning Methodfor UMTS,” M. Sc. thesis, Tampere University of Technology, Finland,2004.

[11] J. Niemela and J. Borkowski. (2004) Topology planning considera-tions for capacity and location techniques in WCDMA radio networks.[Online]. Available: http://www.cs.tut.fi/tlt/RNG/publications/abstracts/topoplanning.shtml

[12] J.J. Caffery and G.L. Stuber, “Overview of Radiolocation in CDMACellular Systems,” IEEE Commun. Mag., vol. 36, no. 4, pp. 38–45, 1998.

[13] M. Mohr, C. Edwards, and B. McCarthy, “A study of LBS accuracyin the UK and a novel approach to inferring the positioning technologyemployed,” Comput. Commun., vol. 31, no. 6, pp. 1148–1159, 2008.

[14] P.J. Duffett-Smith and M.D. Macnaughtan, “Precise UE Positioning inUMTS using Cumulative Virtual Blanking,” in Proc. 3rd Int. Conf. on 3GMobile Communication Technologies, London, United Kingdom, 2002,pp. 355–359.

278

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 272-279The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012)

Page 17: 2012 (Vol. 1, No. 4)

[15] J. Borkowski, J. Niemela, and J. Lempiainen. (2004) Location Tech-niques for UMTS Radio Networks. [Online]. Available: http://www.cs.tut.fi/tlt/RNG/publications/abstracts/UMTSlocation.shtml

[16] J. Borkowski and J. Lempiainen, “Pilot correlation positioning methodfor urban UMTS networks,” in Proc. 11th European Next GenerationWireless and Mobile Communications and Services Conf., Tampere,Finland, 2005, pp. 1–5.

[17] M. Ibrahim and M. Youssef, “CellSense: A Probabilistic RSSI-BasedGSM Positioning System,” in Proc. 2010 IEEE Global Telecommunica-tions Conf., Cairo, Egypt, 2010, pp. 1–5.

[18] A. Varshavsky, M.Y. Chen, E. de Lara, J. Froehlich, D. Haehnel, J.Hightower, A. LaMarca, F. Potter, T. Sohn, K. Tang, and I. Smith, “AreGSM phones THE solution for localization?” in Proc. 7th IEEE Workshopon Mobile Computing Systems and Applications, Washington, USA, 2006,pp. 20–28.

[19] Google. (2012) Google Maps. [Online]. Available: https://maps.google.com/

[20] Htonl. (2010) Map of South Africa (via Wikimedia Commons). [On-line]. Available: http://commons.wikimedia.org/wiki/File:Map of SouthAfrica with English labels.svg

[21] J. Yang, A. Varshavsky, H. Liu, Y. Chen, and M. Gruteser, “AccuracyCharacterization of Cell Tower Localization,” in Proc. 12th ACM Int.Conf. on Ubiquitous Computing, Copenhagen, Denmark, 2010, pp. 223–226.

279

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 272-279The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012)

Page 18: 2012 (Vol. 1, No. 4)

Technical Security Metrics Model in Compliance with ISO/IEC 27001

Standard

M.P. Azuwa, Rabiah Ahmad, Shahrin Sahib and Solahuddin Shamsuddin

[email protected], {rabiah,shahrin}@utem.edu.my

[email protected]

ABSTRACT

Technical security metrics provide

measurements in ensuring the effectiveness

of technical security controls or technology

devices/objects that are used in protecting

the information systems. However, lack of

understanding and method to develop the

technical security metrics may lead to

unachievable security control objectives and

inefficient implementation. This paper

proposes a model of technical security

metrics to measure the effectiveness of

network security management. The

measurement is based on the security

performance for (1) network security

controls such as firewall, Intrusion Detection

Prevention System (IDPS), switch, wireless

access point and network architecture; and

(2) network services such as Hypertext

Transfer Protocol Secure (HTTPS) and

virtual private network (VPN). The

methodology used is Plan-Do-Check-Act

process model. The proposed technical

security metrics provide guidance for

organizations in complying with

requirements of ISO/IEC 27001 Information

Security Management System (ISMS)

standard. The proposed model should also

be able to provide a comprehensive

measurement and guide to use ISO/IEC

27004 ISMS Measurement standard.

KEYWORDS

Information security metrics, technical

security metrics model, measurement,

vulnerability assessment, ISO/IEC

27001:2005, ISO/IEC 27004:2009, Critical

National Information Infrastructure.

1 INTRODUCTION

The phenomena of instant grow and

increasing number of cyber attacks has

urged the organizations to adopt security

standards and guidelines. International

Organization for Standardization and the

International Electrotechnical

Commission (ISO/IEC) has developed

the ISO/IEC 27000 series of standards

that have been specifically reserved for

information security matters. Through

ISO/IEC 27001 Information Security

Management System (ISMS) –

Requirements [1], the organization may

comply and obtain the certification in

increasing level of protection for their

information and information systems.

Information security metrics can be

ineffective tools if organizations do not

have data to measure, procedures or

processes to follow, indicators to make

good protection decisions and people to

develop and report to the management.

To be useful, measurement of

information security effectiveness

should be comparable. Comparisons are

usually made on the basis of quantifiable

measurement of a common

characteristic. The main problems in the

information security metrics

development are identified; (i) lack of

clarity on defining quantitative effective

security metrics to the security standards

and guidelines; (ii) lack of method to

guide the organizations in choosing

security objectives, metrics and

280

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 280-288The Society of Digital Information and Wireless Communications (SDIWC), 2012 (ISSN: 2305-0012)

Page 19: 2012 (Vol. 1, No. 4)

measurements for mitigating current

cyber attacks [2][3].

Hulitt and Vaughn [4] report, lack of

clarity in a standard quantitative metric

to describe information system’s level of

compliance with the FISMA standard,

even though thorough and repeatable

compliance assessment conducted using

Risk Management Framework (RMF).

Bellovin [5] remarks that defining

metrics is hard. It is not infeasible,

because an attacker’s effort is often

linear, even when the exponential

security work is needed. Those pursuing

the development of a security metrics

program should think of themselves as

pioneers and be prepared to adjust

strategies as experience dictate [6]. It is

also known that ISO/IEC 27001

provides generic guidance in developing

the security objectives and metrics and

still lack of method to guide the

organizations [2][3].

1.1 Information Security Metrics

In understanding the meaning of

information security metrics, the security

practitioners and researchers have

simplified their definitions of

information security metrics and

measures (as described in Table 1).

Table 1: Definitions of Information Security

Metrics and Measures

Author Definition

Stoddard

et al. [7]

A metric is a measurement that is

compared to a scale or benchmark

to produce a meaningful result.

Metrics are a key component of risk

management.

Savola [8] Security Metric is a quantitative and

objective basis for security

assurance. It eases in making

business and engineering decisions

concerning information security.

The metrics are derived from

comparing two or more

measurements taken over time with

a predetermined baseline.

Brotby

[9]

The metric is a term used to denote

a measure based on a reference and

involves at least two points, the

measure and the reference. A

security is the protection from or

absence of danger.

The security metrics are categorized

by what they measure. The

measures include the process,

performance, outcomes, quality,

trends, conformance to standards

and probabilities.

Masera et

al. [10]

“Security metrics are indicators,

and not measurements of security.

Security metrics highly depend on

the point of reference taken for the

measurement, and shouldn’t be

considered as absolute values with

respect to an external scale.”

Hallberg

et al. [11]

“A security metric contains three

main parts: a magnitude, a scale

and an interpretation.

The security values of systems are

measured according to a specified

magnitude and related to a scale.

The interpretation prescribes the

meaning of obtained security

values.”

Lundholm

et al. [12]

The measurement quantifies only a

single dimension of the object of

measurement that does not hold

value (facilitate decision making) in

itself.

The metric is derived from two or

more of the measurement to

demonstrate an important

correlation that can aid a decision.

From these definitions, we propose the

definition as information security

metrics is a measurement standard for

information security controls that can be

quantified and reviewed to meet the

security objectives. It facilitates the

relevant actions for improvement,

provide decision making and guide

281

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 280-288The Society of Digital Information and Wireless Communications (SDIWC), 2012 (ISSN: 2305-0012)

Page 20: 2012 (Vol. 1, No. 4)

compliancy to security standards.

Information security measurement is a

process of measuring/assessing the

effectiveness of information security

controls that can be described by the

relevant measurement methods to

quantify data and the measurement

results are comparable and reproducible.

Hence, information security

measurement is a subset of information

security metric.

1.2 Technical Security Metrics and

Measurement

We found the research activities for

technical security metrics are very

limited. Also, there is lack of specific

technical security metrics research to

measure the technical security controls

from a total 133 security controls from

the ISO/IEC 27001 standard.

Vaughn et al. [13] define Technical

Target of Assessment (TTOA) as to

measure how much a technical object,

system or product is capable of

providing assurance in terms of

protection, detection and response.

According to Stoddard et al. [7],

technical security metrics are used to

assess technical objects, particularly

products or systems [8], against

standards; to compare such objects; or to

assess the risks inherent in such objects.

Additionally, the technical security

metrics should be able to evaluate the

strength in resistance and response to

attacks and weaknesses (in terms of

threats, vulnerabilities, risks, anticipation

of losses in face of attack) [13]. At the

same time, it indicates the security

readiness with respect to a possible set

of attack scenarios [10].

1.3 Effective Measurement

Requirement from ISO/IEC 27001

Standard

Information security measurement is a

mandatory requirement in ISO/IEC

27001 standard where it is indicated in a

few clauses in: 4.2.2(d) “Define how to

measure the effectiveness of the selected

controls or groups of controls and

specify how these measurements are to

be used to assess control effectiveness to

produce comparable and reproducible

results”, 4.2.3(c) “Measure the

effectiveness of controls to verify that

security requirements have been met”,

4.3.1(g) “documented procedures needed

by the organization to ensure the

effective planning, operation and control

of its information security processes and

describe how to measure the

effectiveness of controls”, 7.2(f) “results

from effectiveness measurements” and

7.3(e) “Improvement to how the

effectiveness of controls is being

measured”. The importance of

information security measurement is

well defined in these clauses.

2 SECURITY METRICS

DEVELOPMENT APPROACH

The development of technical security

metrics model (TSMM) is derived from

the following approach:

(1) The requirements of technical

security controls are based on

ISO/IEC 27002 ISMS – Code of

Practices standard [14].

(2) Identify relevant security

requirements

(3) Achieve security performance

objectives

(4) Align to risk assessment value

282

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 280-288The Society of Digital Information and Wireless Communications (SDIWC), 2012 (ISSN: 2305-0012)

Page 21: 2012 (Vol. 1, No. 4)

(5) The development of technical

security metrics should not be an

extensive list, but more focus on

the critical security controls that

provide high impact to the

organizations. According to

Lennon [15], “the metrics must be

prioritized to ensure that the final

set selected for initial

implementation facilitates

improvement of high priority

security control implementation.

Based on current priorities, no

more than 10 to 20 metrics at a

time should be used. This ensures

that an IT security metrics program

will be manageable.”

(6) Align to risk assessment value

(7) Ease of measurement.

(8) Provide the process to obtain

data/evidence, method and formula

to assess the security measurement

(9) Resistance and response to known

and unknown attacks

(10) Provide the threshold values to

determine the level of protection

(11) Provide actions to improve

(12) Comply to the ISO/IEC 27001

standard

3 TECHNICAL SECURITY

METRICS MODEL (TSMM)

The development of TSMM is based on

Plan-Do-Check-Act (PDCA) model. The

development of TSMM is described in

Figure 1.

3.1 PLAN Phase: (Selection of

Controls and Definition)

The focus is on the technical security

controls that will be extracted from the

total 133 security controls as stated in

the Annex A of ISO/IEC 27001

standard.

We define technical security metrics as a

measurement standard to address the

performance of security

countermeasures within the technical

security controls and to fulfill the

security requirements. The technical

security measures are based on

information security performance

objectives that can be accomplished by

quantifying the implementation,

efficiency, and effectiveness of security

controls.

ISO/IEC 27002 [14] provides the best

practice guidance in initiating,

implementing or maintaining the

security control in the ISMS. This

standard regards that “not all of the

controls and guidance in this code of

practice may be applicable and

additional controls and guidelines not

included in this standard may be

required”.

Federal Information Processing

Standards 200 (FIPS 200) [16] defines

technical controls as “the security

controls (i.e., safeguards or

countermeasures) for an information

system that are primarily implemented

and executed by the information system

through mechanisms contained in the

hardware, software, or firmware

components of the system”. These are

the basis of our definition for technical

security controls.

Based on NIST SP800-53 guidelines

[17], the technical security controls

comprise of:

(1) Access Control (AC-19 controls)

(2) Audit and Accountability (AU-

14 controls)

(3) Identification and Authentication

(IA-8 controls)

(4) System and Communications

Protection (SC-34 controls)

283

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 280-288The Society of Digital Information and Wireless Communications (SDIWC), 2012 (ISSN: 2305-0012)

Page 22: 2012 (Vol. 1, No. 4)

The total of technical security controls

from NIST SP800-53 guidelines is

seventy-five (75). In the Appendix H of

[18], the technical security controls are

extracted from Table H-2. This table is

mapping from the security controls in

ISO/IEC 27001 (Annex A) to NIST

Special Publication 800-53. We extract

and analyze these technical security

controls. We discover that:

(1) Within three (3) main domains

from ISO/IEC 27001 (Annex A)

that include:

A.10 Communications and operations management

A.11 Access Control

A.12 Information systems acquisition, development and

maintenance

(2) The initial total of technical

security controls is forty-five

(45).

(3) The identified technical security

controls only require a process or

policy implementation and not

related to technical

implementation, such as

A.11.1.1 Access control policy,

A.11.4.1 Policy on use of

network services, A.11.5.1

Secure log-on procedures,

A.11.6.2 Sensitive system

isolation, A.11.7.2 Teleworking,

A.12.3.1 Policy on the use of

cryptographic control and

A.12.6.1 Control of technical

vulnerabilities.

(4) There are relationships with other

security controls in NIST SP800-

53 document, including:

• Management controls:

Security Assessment and

Authorization (CA), Planning

(PL), System and Services

Acquisition (SA)

• Operational controls:

Configuration Management

(CM), Maintenance (MA),

Media Protection (MP),

Physical and Environmental

Protection (PE), Personnel

Security (PS), System and

Information Integrity (SI).

Figure 1: Technical Security Metrics Model

(TSMM)

The technical security controls should be

practical, customized and measured

according to organization’s business

requirements and environments.

A risk management approach will be

used in identifying the relevant security

controls. Threat and vulnerability

assessment will be carried out.

Threat and vulnerability assessment will

be carried out. Also, identifying both

impact and risk exposure to determine

the prioritization of security controls.

Cyber-Risk Index: A cyber-risk index is

used to evaluate the vulnerability and

threat probabilities related to the

successfulness of current and future

attacks. Attack-Vulnerability-Damage

284

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 280-288The Society of Digital Information and Wireless Communications (SDIWC), 2012 (ISSN: 2305-0012)

Page 23: 2012 (Vol. 1, No. 4)

(AVD) model [19] and Common

Vulnerability Scoring System (CVSS) -

Base Metric [20] are used to determine

this weighted-index. We will extent and

include the criticality or impact of loss to

the organization. The CVSS base score

is calculated using the information

provided by the U.S. National

Vulnerability Database (NVD) Common

Vulnerability Scoring System Support

v2 [21] and other relevant Cyber

Emergency Response Team (CERT)

Advisories and Report.

3.2 DO Phase: (Effective

Measurement)

The security requirements describe the

actual security functional for technical

security controls in protecting the

information systems. Security functional

includes the identification and

authentication, access control,

configurations/algorithm, architecture

and communication.

A set of performance objectives is

developed for each security requirement.

Vulnerability Assessment (VA) Index:

The VA index is that can be derived by

conducting the security or vulnerability

assessment to the information systems

through a simulation assessment,

vulnerability scanning or penetration

testing. This is based on the current

assessment of potential attacks and will

be weighted-index using the numeric

CVSS scores: "Low" severity (CVSS

base score of 0.0-3.9), "Medium"

severity (CVSS score of 4.0-6.9) and

"High" severity ( CVSS base score of

7.0-10.0). The VAI can also be derived

from Vulnerability-Exploits-Attack

(VEAbility) metrics [22]. The VEAbility

measures the security of a network that

is influenced by the severity of existing

vulnerabilities, distribution of services,

connectivity of hosts, and possible attack

paths. These factors are modeled into

three network dimensions: Vulnerability,

Exploitability, and Attackability. The

overall VEA-bility score, a numeric

value in the range [0,10], is a function of

these three dimensions.

At this phase, the data collection must be

easily obtainable and the measurements

are not complicated. The measurement

should be able to cater for current

(through audit report and evidence of

events) and future attacks.

3.3 CHECK Phase: (Security

Indicators and Corrective Action)

In verifying the effectiveness of controls,

we measure how much the control

decreases the probability of realization

of the described risks. The attributes

must be significant in determining the

increase or decrease of risk. The

expected measure function can be

derived by the percentage of the

successful or failure occurrences. For

example, number of patches successfully

installed on information systems (>

95%), number of security incidents

caused by attacks from the network (<

3%). The determination of the

percentage should consider that even

though the security controls are

implemented, the risk of attacks can still

occur. Therefore, the percentage depicts

the strength of the existing security

controls in mitigating the risks.

Security Indicator Index: If the measure

is equal to or below the

recommendation, the risk is adequately

controlled, thus explain the effectiveness

of the security controls. The proposed

indicators are the trends of the derived

measures and they must be within the

285

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 280-288The Society of Digital Information and Wireless Communications (SDIWC), 2012 (ISSN: 2305-0012)

Page 24: 2012 (Vol. 1, No. 4)

same measurement scale in order to

establish that the risk is adequately

controlled [23]. This indicator index can

also act as a compliance index to

ISO/IEC 27001 standard. Algorithm or

calculation combining one or more base

and/or derived measures with associated

decision criteria. For example: 0-60% -

Red; 60-90% - Yellow; 90-100% Green.

Decision Criteria: Thresholds, targets, or

patterns used to determine the need for

action or further investigation, or to

describe the level of confidence in a

given result (for example, Red –

intervention is required, causation

analysis must be conducted to determine

reasons for non-compliance and poor

performance; Yellow – indicator should

be watched closely for possible slippage

to Red; Green – no action is required).

Corrective actions provide the range of

Potential changes in improving the

efficiency and effectiveness of the

security controls. They can be prioritized

based on overall risk mitigation goals

and select based on cost-benefit analysis.

3.4 ACT Phase:

The developed technical security metric

and measurement will be validated by

the respective organizations. The metric

is to comply to ISO/IEC 27001 standard

requirements. The development of

technical security metrics will be based

on Information security measurement

model in ISO/IEC 27004 standard.

The measurement result should be

reported to the management in ensuring

the continuity and improvement of

information security in the organization.

4 CONCLUSIONS AND FUTURE

WORK

Malaysia government has seen the

importance of Critical National

Information Infrastructure (CNII)

organizations to protect their critical

information systems. In the year of 2010,

the government has mandated for their

systems to be ISO/IEC 27001 ISMS

certified within 3 years [24].

The ISO 27001 certification is one of the

most used corporate best practices for IT

security standards, addressing

management requirements as well as

identifying specific control areas for

information security. It provides a

comprehensive framework for designing

and implementing a risk-based

Information Security Management

System. The requirements and guidance

cover policies and actions that are

necessary across the whole range of

information security vulnerabilities and

threats. By customizing the security

requirements from ISO/IEC 27002 and

other relevant security standards and

guidelines, the CNII organizations will

implement the necessary security

controls in compliance with ISO/IEC

27001 ISMS standard.

The proposed TSMM is to provide

guidance for CNII organizations to

measure the effectiveness of the network

security controls in compliance with

ISO/IEC 27001 standard. The relevant

type of information security

measurement and metrics are interrelated

and worth to use in aligning with

business risk management. We also want

to explore the usability of the ISO/IEC

27004 standard and conduct a case study

at several CNII organizations.

286

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 280-288The Society of Digital Information and Wireless Communications (SDIWC), 2012 (ISSN: 2305-0012)

Page 25: 2012 (Vol. 1, No. 4)

ACKNOWLEDGMENT

The authors wish to acknowledge and thank

members of the research teams of the Long

Term Fundamental Research Grant Scheme

(LRGS) number

LRGS/TD/2011/UKM/ICT/02/03 for this

work. The research scheme is supported by

the Ministry of Higher Education (MOHE)

under the Malaysian R&D National Funding

Agency Programme.

5 REFERENCES

1. International Organization for

Standardization and International

Electrotechnical Commission, “Information

technology - Security techniques -

Information security management systems-

Requirements,” ISO/IEC 27001:2005, 2005.

2. R. Barabanov, S. Kowalski, and L.

Yngström, “Information Security Metrics:

Research Directions,” FOI Swedish Defence

Research Agency, 2011.

3. C. Fruehwirth, S. Biffl, M. Tabatabai, and E.

Weippl, “Addressing misalignment between

information security metrics and business-

driven security objectives,” Proceedings of

the 6th International Workshop on Security

Measurements and Metrics - MetriSec ’10,

p. 1, 2010.

4. E. Hulitt and R. B. Vaughn, “Information

system security compliance to FISMA

standard: A quantitative measure,” 2008

International Multiconference on Computer

Science and Information Technology, no. 4,

pp. 799–806, Oct. 2008.

5. S. M. Bellovin, “On the Brittleness of

Software and the Infeasibility of Security

Metrics,” IEEE Security & Privacy

Magazine, vol. 4, no. 4, pp. 96–96, Jul.

2006.

6. K. Stouffer, J. Falco, and K. Scarfone,

“Guide to Industrial Control Systems ( ICS )

Security,” National Institute of Standards

and Technology, NIST Special Publication

800-82, no. June, 2011.

7. J. Stoddard, M., Bodeau, D., Carlson, R.,

Glantz, C., Haimes, Y., Lian, C., Santos, J.,

and Shaw, “Process Control System Security

Metrics – State of Practice,” Institute for

Information Infrastructure Protection (I3P),

vol. Research R, no. August, 2005.

8. R. Savola, “Towards a Security Metrics

Taxonomy for the Information and

Communication Technology Industry,” in

International Conference on Software

Engineering Advances, 2007.

9. W. K. Brotby, Information Security

Management Metrics: A Definitive Guide to

Effective Security Monitoring and

Measurement. Auerbach Publications, 2009.

10. M. Masera and I. N. Fovino, “Security

metrics for cyber security assessment and

testing,” Joint Research Centre of the

European Commission,, vol. ESCORTS D4,

no. August, pp. 1–26, 2010.

11. J. Hallberg, M. Eriksson, H. Granlund, S.

Kowalski, K. Lundholm, Y. Monfelt, S.

Pilemalm, T. Wätterstam, and L. Yngström,

“Controlled Information Security: Results

and conclusions from the research project,”

FOI Swedish Defence Research Agency, pp.

1–42, 2011.

12. H. Lundholm, K., Hallberg, J., Granlund,

“Design and Use of Information Security

Metrics,” FOI, Swedish Defence Research

Agency, pp. ISSN 1650–1942, 2011.

13. J. Rayford B. Vaughn, R. Henning, and A.

Siraj, “Information Assurance Measures and

Metrics - State of Practice and Proposed

Taxonomy,” in Proceedings of the 36th

Hawaii International Conference on System

Sciences, 2003, p. 10 pp.

14. International Organization for

Standardization and International

Electrotechnical Commission, “Information

technology - security techniques - Code of

practice for information security

management,” ISO/IEC 27002:2005, vol.

2005, 2005.

15. E. B. Lennon, M. Swanson, J. Sabato, J.

Hash, L. Graffo, and N. Sp, “IT Security

Metrics,” ITL Bulletin, National Institute of

Standards and Technology, no. August,

2003.

16. W. J. Carlos M. Gutierrez, “Federal

Information Processing Standards 200 -

Minimum Security Requirements for

Federal Information and Information

Systems,” National Institute of Standards

and Technology,, no. March, 2006.

17. Computer Security Division and Information

Technology Laboratory, “Recommended

Security Controls for Federal Information

Systems and Organizations,” National

287

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 280-288The Society of Digital Information and Wireless Communications (SDIWC), 2012 (ISSN: 2305-0012)

Page 26: 2012 (Vol. 1, No. 4)

Institute of Standards and Technology, NIST

Special Publication 800-53 , Revision 3,

2010.

18. Computer Security Division and I. T.

Laboratory, “Security and Privacy Controls

for Federal Information Systems and

Organizations,” National Institute of

Standards and Technology, NIST Special

Publication 800-53 , Revision 4, no.

February, 2012.

19. T. Fleury, H. Khurana, and V. Welch,

“Towards A Taxonomy Of Attacks Against

Energy Control Systems,” in Proceedings of

the IFIP International Conference on

Critical Infrastructure Protection, 2008.

20. P. Mell, K. Scarfone, and S. Romanosky, “A

Complete Guide to the Common

Vulnerability Scoring System,” Forum of

Incident Response and Security Teams,

FIRST Organization, pp. 1–23, 2007.

21. “NVD Common Vulnerability Scoring

System Support v2,” NIST, National

Vulnerability Database (NVD),

http://nvd.nist.gov/cvss.cfm?version=2.

22. M. Tupper and a. N. Zincir-Heywood,

“VEA-bility Security Metric: A Network

Security Analysis Tool,” 2008 Third

International Conference on Availability,

Reliability and Security, pp. 950–957, Mar.

2008.

23. M. H. S. Peláez, “Measuring effectiveness in

Information Security Controls,” SANS

Institute InfoSec Reading Room,

http://www.sans.org/reading_room/whitepa

pers/basics/measuring-effectiveness-

information-security-controls_33398, 2010.

24. J. P. M. Malaysia, “Pelaksanaan Pensijilan

MS ISO/IEC 27001:2007 Dalam Sektor

Awam,” Unit Pemodenan Tadbiran dan

Perancangan Pengurusan Malaysia

(MAMPU), vol. MAMPU.BPIC, p. 1, 2010.

288

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 280-288The Society of Digital Information and Wireless Communications (SDIWC), 2012 (ISSN: 2305-0012)

Page 27: 2012 (Vol. 1, No. 4)

Ahmed B. Elmadani Department of Computer Science Faculty of Science Sebha University

Sebha Libya www.sebhau.edu.ly [email protected]

ABSTRACT

Keywords Digital Signature, Smart card, Hash, True identity and Biometric (face).

1. INTRODUCTION

A mathematical scheme for demonstrating the authenticity of a digital message or document is known as Digital Signature (DS) [1]. DS convince a recipient that a document was created by a known sender. DSs are commonly used for software distribution, financial transactions, and in other cases to avoid forgery and tampering

[2]. Digitally signed messages may be anything that can be represented as a bit or a string, examples include electronic mail, contracts, or a message sent via some other cryptographic protocol [3]. Hash function is used in creating and verifying a DS. Hash function is an algorithm which creates a digital representation of document. Few hashing algorithms have been developed such Secure Hash Algorithm – 128 (SHA-1) and Message Digest Version 5 (MD5) to be used in e-commerce [4]. SHA-1 is a secured hash algorithm – 160. Produces 160-bit hash value. It is designed by NIST & NSA in 1993 revised 1995 as SHA – 160, US standard for use with digital signature algorithm (DSA) signature scheme. SHA-256, SHA-384, and SHA-512. Designed for compatibility with increased security provided by the advanced encryption standard (AES) cipher[3]. In traditional DS, normally a smart card is used to perform signatures because the used cryptographic keys are stored inside the card [6]. However most of the existing DS systems, provide signature without proofing true identity[5], because they stand on using keys that anyone can use[7]. Therefore, documents have to be signed in such a way that proofs the true identity to avoid many attacks reported in [8][11]. This can be done only by using user’s personal characteristics such as fingerprint, Iris or face [7]. In automation security, faces are more secured than passwords, because of fine

An online secured document exchange, secured bank transactions, and other e-commerce requirements need to be protected in commercial environment as it becomes big,. Digital signature (DS) is the only means of achieving it. This paper introduces a prototype online-algorithm in signing and verifying a document digitally. Document’s hash value is calculated, and protected using keys derived from face characteristics. This paper presents a method in signing document differ from traditional systems using passwords, smartcards or biometrics based on direct access. It utilizes a wirelessly accessed biometrics type to provide:

1. Un tampered biometrics in digital signatures.

2. Proof of a true identity. It also investigates existing digital signature system that is based on smart card. The obtained results were translated in term of speed and security enhancement which is highly in demand of e-commerce society

289

Trusted Document Signing based on use of biometric (Face) keys

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 289-296The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 28: 2012 (Vol. 1, No. 4)

differentiation between seemingly identical and won’t be forgotten or stolen [9]. Faces are also more secured than fingerprint, because fingerprint can be spoof using jelly[10]. Face image as any digital image always needs to be enhanced, to come out with its features clearly. This is because of the low quality images captured using camera devices. An image once captured and resized, is filtered using one of the known filters methods such as Linear, Wiener, Median, or Gaussian [9]. The image using one or more filtering algorithm is filtered several times until it becomes clear. Then information can be constructed [12]. The constructed information are stored for future comparison use. Face structure is eyes, mouth, and there position, which are different from person to another. All related together forming a unique characteristic of face [9]. There are more factors that can be used, that might make recognition easy or difficult they are listed in the FERET dataset [15]. Several face recognition algorithms were introduced in recent years. One of them is to measure the resulted triangle between eyes and mouth, but this is trivial of change, so a measurement should be taken in age intervals [][16]. The first mention to eigenfaces in image processing, a technique that would become the dominant approach in following years, was made by L. Sirovich and M. Kirby in 1986, it is based on principal component analysis (PCA) [16]. It becomes a base of developing many new face algorithms such as the measurement of the importance of certain intuitive feature, geometric measures between eye distances, with length ratio [17]. This work considered as an improvement of the research done by Costas et al (2008), they perform face-based digital

signature in retrieving video segments using pre-extracted face in detection and recognition[14]. They use signature in retrieving while in this work we use segments of document to retrieve their signatures for verification. In our proposed DS system, we will introduce a system that uses keys derived from user’s face that will help in assuring true identity, face factors mentioned in [15] are out of our concern. In our security analysis, we only consider secure signature-generation systems that use SMCs to protect DS from attacks mention in [8]. Then to improve the use of biometrics in order to proof true user identity as in [13], and DS protection. Meanwhile avoid using systems based on biometric which can be tampered such as fingerprint [14]. In the proposed system, we shall construct keys from face that is protected using a Ron’s Code ver. 5 (RC5) a variable-key-size encryption algorithm. It is fast and suitable in protecting SMC keys [6]. Of course other solutions exist. However, they are out of the scope of this paper..

2. METHODOLOGY AND

DISCUSSIONS

The following paragraphs will discuss proposed algorithm, experiment and obtained results. 2.1 PROPOSED ALGORITHM Sequence of DS in the proposed system for any given document shown in Figure 1 are performed in five steps described as following:

• Enhancement, face image adjustment and filtering..

• Feature extraction, information extract and keys construction.

• Document signing, obtaining document fingerprint.

290

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 289-296The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 29: 2012 (Vol. 1, No. 4)

• Signatures protection, document

and keys protection. • Siging authenticity, signatures

matching.

Figure 1. Sequence process in proposed system 2.2 FACE IMAGE ENHANCEMENT At each sign-point, there is a fixed webcam that is used to capture face image.

Figure 2. Face image enhancement, and noise remove using wiener filter The selected area surrounds eyes, nose and mouth, within the dimension of 200x200 pixels. Figure 2 A shows an original image, while B presents the histogram of A, and it shows that information are not

distributed, it has to be filtered. In C a face image is shown after removing noise by using fast Fourier transform (FFT) “wiener filter”. It was used several times to come out with it features. The histogram of a well distributed information as a result filtering process was shown in D. Face image is then cropped to dimension 150x150 pixels in an area reach of information. It contains eyes, nose, and mouth to use it for feature extraction as shown in Figure 3.

Figure 3. Selected face image area that is reach of information.

2.3 INFORMATION EXTRACTION The cropped face image that prepared in paragraph 2.2 is used to extract features to calculate user key ( skey as sender’s key and rkey receiver’s key). User key are calculated using an equation (1).

∑ ===

mn

jis jixkey ,

0 1 ),(

∑ ===

mn

jir jixkey ,

0 2 ),(

(1) Where 1x and 2x are sender’s, receiver’s cropped face image 160x160 pixels and

mjni ...0,.....0 == . Obtained users keys are unique as results of applying the equation (1). Table 1 shows obtained users keys, it is insure that users can be distinguish from each other.

291

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 289-296The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 30: 2012 (Vol. 1, No. 4)

As a requirement of signing process, user requires another key ( srkey ), it is constructed after selecting a target user as a receiver of a document.

Table 1. Users keys User No.

User key ( skey or rkey )

6 581497 7 533018 8 668856 9 627684 18 632414

The key is constructed by combining the two keys ( skey and rkey ) using equation no. (2). The constructed key ( srkey ) is used in encryption process.

mnnjniwherejkeyikeykey rssr

++===

ΛΚΚ 1,1,

(2)

The constructed srkey is used in both sides for encryption or/decryption and to protect an outgoing document in sender’s side or incoming document in receiver’s side. A third column in Table 2 shows results of applying equation (2) to construct a key ( srkey ) that is used in an encryption process.

Table 2. Constructed key srkey between sender and receiver

Sender’s key

( skey )

Receiver’s key ( rkey )

Combined key’s ( srkey )

581497 7533018 58149775330187533018 581497 7533018581497668856 668856 668856668856 627684 632414 627684632414 632414 627684 632414627684

2.4 SIGNING PROCESS

A user who intends to sign a document (Doc) has to first select or prepare a document, then a process using equation no. (3) to calculate a fingerprint of the document. SHA-1 as stable hash algorithm was chosen to calculate document’s fingerprint. Then a sender invokes RC5 algorithm with a constructed key ( srkey ) to encrypt the calculated fingerprint as shown in equation (4).

Fingerprint = SHA-1(Doc) (3) Encrypted-fingerprint = RC5 srkey ( fingerprint)

(4) Sender prepares a message that contains document, its fingerprint and sender’s key and sent them to the receiver according to the equation no. (5) and as shown in Figure 4.

Message = (Encrypted-fingerprint, Doc, skey ) (5)

Figure 4. Sequence process of a document signing and message encryption 2.5 SECURE SIGNATURES To avoid un authorized use of document and used keys in signatures, a RC5 cryptographic algorithm is used to protected them. Message contains document (Doc), fingerprint and keys are prepared by sender and protected using a formed key ( srkey ) that only target

292

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 289-296The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 31: 2012 (Vol. 1, No. 4)

receiver can decrypt according to the equation (6).

Encrypted-Mseg= RC5 srkey (Message) (6)

2.6 AUTHENTICITY of SIGNATURES

Verification process is performed in the receiver’s side, receiver once he received an encrypted message, he decrypts it using his key ( rkey ) to obtain original document, sender’s key ( skey ), and encrypted fingerprint. Two processes are used one to calculate new fingerprint and second to construct combined key ( srkey ) as discussed in 2.3. The key is used to decrypt the received encrypted fingerprint. Signature is authenticated by comparing the two obtained fingerprints. A document is said authenticated and sent by trusted person if fingerprint are equals. In Figure 5

Figure 5 Received message decryption and signing authentication process. illustration of verification process starts by the decrypting of the received message with receiver’s key to obtain the sender’s key ( skey ). The skey will be used to

construct a combination key ( srkey ) that needed to decrypt received fingerprint. Receiver calculates fingerprint of received document using SHA-1 algorithm and compare the two fingerprints to see if they match. 2.7 TESTING THE ALGORITHM

Two signature points are configured using two connected computers where each was equipped with webcam. They are used to test the proposed algorithm. One is for document signing, where the second is for signature verification. The system was tested for acceptance and rejection in term of signature-verification running process. This test is used to discover the system’s incorrect decision. Use was made of 1030 matching trails (MT) and three security levels. Table 3 shows used intensity level for each of the three levels security. Group (1) uses 30 low intensity face images, group (2) uses 400 medium intensity face images, where group (3) uses 600 high intensity face images.

Table 3 Number of Recognized- Rejected users by the proposed system

The results of testing for the system to the MT, for group 1 a 28 out of 30 low intensity images were recognized, that is 93.33%. Meanwhile, 2 images were rejected with 6.67% as demonstrated in Figure 6.

Group

Description

Group (1)

Low

Group (2)

medium

Group (3)

High

Total

Number of Users

30 400 600 1030

Recognized 28 393 592 1013 Rejected 2 7 8 17

Recognized Rate

93.33 98.25 98.67 98.35

Error Rate 6.67 1.75 1.33 1.65

293

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 289-296The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 32: 2012 (Vol. 1, No. 4)

Figure 6 Accepted users by the proposed system In group 2 which presents medium intensity face images as shown in Figure 7, 393 out of 400 face images, were recognized with percentage of 98.25% and only 7 images were rejected with percentage of 1.75%.

Figure 7 Rejected users by proposed system In group 3, 600 high intensity face images were used as shown in Figure 8, 592 were recognized registering a 98.67% and 8 rejection, that is 1.33%. In summary 1030 face images were used, images got different intensity. 1013 of them were recognized with percentage of 98.35%, and only 17 of them were rejected, that is 1.65% and this demonstrates the success of the proposed system.

Figure 8 Accepted and rejected users by the proposed system In Table 4 tests of the proposed system done only for known users, the false acceptance rate (FAR) registered value equals to zero in all groups. The false rejection rate (FRR) goes in descending order which means configuring the system with big number of users will translate in getting less rejection as results show for low and all intensity.

Table 4 FAR and FRR Ranges No. Description FAR FRR 1. Low intensity 0 6.67 2. Medium intensity 0 1.75 3. High intensity 0 1.33 4. For all intensity 0 1.65

2.8 THIS ALGORITHM AGAINST

EXISTING ALGORITHMS

In recent years few algorithms are developed to solve document signing digitally, but they fail in covering a lot of issues. The proposed algorithm solves them as will be described below. Most of DS systems as in Sufreenmohd at el (2002) or in Elmadani at el (2005) are using smart card to store keys and they suffer from forgery or tampering, were the proposed algorithm solve this problem by authenticating user with their faces, that were stolen, forgotten, tampered and user has nothing to carry with a hand. The existing DS algorithms as in Sirovich and

294

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 289-296The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 33: 2012 (Vol. 1, No. 4)

Kirby (1987) are based on temple selection in extracting features, Givens at el (2003) and Yang (2010) algorithms are based on calculating values from image to compare them later with storing ones, such processes are time consuming, were in the proposed system features are based on forming keys which are numbers, directly processed no need to store them, which means protection from any attack mentioned by Langweg (2006). The proposed algorithm uses simple mathematic functions in key calculation different than algorithms used by Costas at el (2008) or used by Kirby and Sirovich (1990), our system is fast because it is based calculating numbers, it requires minor process, less memory space compared to them. 3. COCLUSION

A model of signing – verifying document signature and protecting it was presented. Meanwhile, an investigation and drawback of existing digital signatures were shown. The proposed algorithm uses person characteristics biometrics (face) which is not possibly stolen or forge or tampered. It provides an easy method in use, that requires nothing to carry with. Our results shows that with no doubt, face is strongly recommended for online document signing. 4. REFERENCES

1. Nentwich F, Kirda E and Kruegel C. Practical Security Aspects of Digital Signature Systems. Technical University Vienna. Technical. 2006.

2. Introduction to digital signature. www.e-signature.gov.eg/ ElectronicSignature_Mechanizm_Arabic. 2010.

3. Robshow M. MD2, MD5, SHA and other Hash Functions. RSA Laboratories Technical Report TR-101.1995.

4. Wang X, Feng D, Lai X and Yu. Collision for Hash Functions MD4, MD5, HAVAL-128, and RIPEDMD. Proceedings of the 2th Annual

International Cryptology Conference (Crypto ’04), Santa Barbara CA. 2004.

5. Elmadani. A. B. Digital Signature forming and keys protection based on person’s Characteristics. Proceedings of the IEEE International Conference on Information Technology and e-services (ICITeS’2012). Souse, Tunisia. 2012.

6. Elmadani A. B, Prakash V and Ramli A. R. Application of Smartcard & Secure Coprocessor, BICET conference. Brunei.2001.

7. Elmadani A. B. Human Authentication using FingerIris algorithm based on statistical approach the 2nd International in network digital conference (NDT '10), Prague Czech Republic. pp (288-296). 2010.

8. Spalka A. Cremers A and Langweg H. Protecting the Creation of Digital Signature with Trusted Computing Platform Technology Against Attacks by Trojan Horse. In IFIP Security Conference. 2001.

9. Fang, Y. Wang Y and Tan T. Combining Color, Contour and Region for Face Detection. ACCV2002: The 5th Asian Conference on Computer Vision, Melbourne, Australia. 2002.

10. Elmadani A. B, Prakash V, Ali, B. M, Ramli A. R and Jumari K. Fingerprint Access Control with Anti-spoofing Protection, Brunei Darussalam Journal of Technology and Commerce. Brunei. 2005.

11. Langweg H. Malware Attacks on Electronic Signatures Revisited. In Sicherheit 3rd Jahrestagug Fachbereich Sicherheit der Gesellschaft fuer Informatik. 2006.

12. Zhao W, Chellappa R, Phillips P. J and Rosenfeld A. Face Recognition: A Literature Survey. ACM Computing Survey. Vol. 35, no. 4. PP. 399–458. 2003.

13. Yang J. Biometrics Verification Techniques Combing with Digital Signature for Multimodal Biometrics Payment System. Proceedings of Fourth International Conference on Management of e-Commerce and e-Government (ICMeCG), pp. 405-420. China.2010.

14. Costas C, Nikolaidis N and Ioannis P. Face-based Digital Signatures for Video Retrieval. IEEE Transactions on Circuits and Video Technology, Vol. 18. No. 4. Pp. 549-553. 2008.

15. Givens G, Beveridge J, Bruce A, Draper B and Bolme D. A Statistical Assessment of Subject Factors in the PCA Recognition of Human Faces. Proceedings of Computer Vision and Pattern Recognition Workshop (CVPRW’03). Wisconsin USA. 2003.

295

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 289-296The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 34: 2012 (Vol. 1, No. 4)

16. Sirovich Land Kirby M. Low-dimensional

procedure for the characterization of human faces. Journal of the Optical Society of America A - Optics, Image Science and Vision, Vol 4. No 3. pp 519–524. 1987.

17. Kirby M and Sirovich L. Application of the karhunen-loeve procedure for the characterization of human faces. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 12. No. 1. Pp 103–108. 1990. Ahmed B. Elmadani was born in Libya 1956. He received Ph.D. degree at UPM University Malysia in 2003. He worked in computer science department Faculty of Science Sebha University (Libya), from 1997 to 1999 as Assistant lectuerar and head department of computer Science, from 2003 – 2008 as lectuerar at the same department, from 2009- till now as asistant prof. and Vice Dean at the same Faculty. His main research interests include cryptography, information security, imaging, digital signature and biometrics fingerprint.

296

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 289-296The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 35: 2012 (Vol. 1, No. 4)

* **

, and Paul Gardner-Stephen

* Computer Science Department, College of Computer Science and Information

Technology, King Faisal University,

P.O. Box: 400 Al-Hassa 31982, Kingdom of Saudi Arabia

[email protected]

**

School of Computer Science, Engineering and Mathematics, Faculty of Science and

Engineering, Flinders University,

GPO Box 2100, Adelaide SA 5001, Australia

[email protected], [email protected]

KEYWORDS: SPAM, email, Arabic, users,

English, Saudi.

1. INTRODUCTION Email is an important tool for many people

and they consider email as a necessary part of

their daily lives. Email enables people to

communicate with each other in a short time at

low cost. Although email gives benefits for

people who use it, some people, called

spammers, have exploited email for their

personal purposes. They send so-called SPAM to

a large number of recipients. They can use

programs known as spam-bots to catch email

addresses on the internet or they can buy email

addresses from individuals and organizations to

send email SPAM to these addresses [11]. They

297

A Comparative Study of the Perceptions of End Users in the Eastern,

Western, Central, Southern and Northern Regions of Saudi Arabia

about Email SPAM and Dealing with it

Hasan Alkahtani , Robert Goodwin**

ABSTRACT This paper presents the results of a survey of email

users in different regions of Saudi Arabia about email

SPAM. The survey investigated the nature of email

SPAM, how email users in the eastern, western,

central, southern and northern dealt with it, and the

efforts made to combat it. It also investigated the

effectiveness of existing Anti-SPAM filters in

detecting Arabic and English email SPAM.

1,500 participants located in the eastern, western,

central, southern and northern regions of Saudi

Arabia were surveyed and completed surveys were

collected from 1,020 of the participants.

The results showed that there were different

definitions for email SPAM based on different users’

opinions in Saudi Arabia. The results showed that the

participants in the central and western regions were

more aware of SPAM than the participants in other

regions.

The results revealed that the volume of email

SPAM was different from region to another and the

volume of SPAM received by the participants in the

northern and central regions was larger than that

received in other regions. The results indicated that

the majority of email SPAM received by the

participants in different regions was written in

English. The results showed that the most common

type of email SPAM received in Arabic was emails

related to forums and in English was phishing and

fraud, and business advertisements.

The results also showed that a few participants in

all regions responded to SPAM and the average of

the participants who responded to SPAM was larger

in the southern region than other regions.

The results showed that most of the participants

were not aware of Anti-SPAM programs and the

participants in the central region were more aware of

Anti-SPAM programs than the participants in other

regions. The results showed that the participants in

all regions estimated that the existing Anti-SPAM

programs were more effective in detecting English

SPAM than Arabic SPAM.

The results showed that most of the participants in

all regions were not aware of the government efforts

to combat SPAM and the participants in the central

region were more aware of the government efforts

than the participants in other regions.

The results also showed that most of the

participants in all regions were not aware of the ISPs

efforts to combat SPAM and the participants in the

central and western regions were more aware of the

ISPs efforts than the participants in other regions.

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 297-310The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 36: 2012 (Vol. 1, No. 4)

also use many methods to bypass SPAM filters

such as tokenization and obfuscation [27].

Email SPAM is defined as "Unsolicited,

unwanted email that is sent indiscriminately,

directly or indirectly, by a sender having no

current relationship with the recipient" [12],

[13]. It is also defined as Unsolicited Bulk

Email (UBE) that is sent to a large number of

recipients who were not asked if they wanted to

receive it [4], [14], [18]. Some studies [6], [7],

[25] defined email SPAM as Unsolicited

Commercial Email (UCE) that contains business

advertisements sent to a large number of

recipients.

There are legal and technical methods [2] to

combat SPAM. Legally, some countries enacted

laws against SPAM. Examples of these countries

include the United States of America [26],

European Union countries and Australia [5].

However, there are no laws in Saudi Arabia to

combat SPAM although research and projects

were conducted to assess the problem of SPAM

in the country.

Technically, there exist many filters to combat

SPAM. Examples of these filters include

content based filters such as Bayesian [24],

keywords [11] and genetic algorithms [15], and

origin based filters like black lists [11], white

lists [22], origin diversity analysis [16] and

challenge response systems [21]. However, some

of these techniques need to be updated to detect

new types of email SPAM due to spammers

developing ways to bypass these techniques.

This study aimed to gain an understanding

about:

a. The nature of email SPAM, its definition

based on email users’ opinions, its volume

and its types in different regions of Saudi

Arabia.

b. Differences between Arabic SPAM and

English SPAM received by the participants

in different regions of Saudi Arabia.

c. The effects of email SPAM on email users in

different regions of Saudi Arabia.

d. How email users in the eastern, western,

central, southern and northern deal with

email SPAM.

e. The efforts of government to combat email

SPAM.

f. The efforts of ISPs to combat email SPAM.

g. Evaluation of email users’ perception in

different regions of Saudi Arabia for the

effectiveness of Anti-SPAM filters in

detecting Arabic and English email SPAM.

2. METHODOLOGY

2.1. Measures It was decided that the best way to answer the

research questions was through a questionnaire.

Therefore, a questionnaire was distributed to the

participants in different region of Saudi Arabia

and the responses were analyzed.

Initially a pilot questionnaire was prepared and

distributed to a few participants to get their

comments about the questions. Then all the

participants completed the 10 page questionnaire

which included both yes/no answers and open

ended answers. The questionnaire consisted of

three main parts as follows.

2.1.1. General information questions

In this part, the participants were asked for the

following information: gender, age, nationality,

speaking language, highest level of education,

major area of study, work status and the nature

of the work. These questions helped in

understanding and comparing the level of

awareness of users about email SPAM.

Examples for the first part of questions of the

survey can be seen in Figure 1.

1. Gender:

O Male

O Female

2. What is your age?

3. Nationality:

O Saudi

O Other

4. What is your current work status?

O Student

O Employed

O Self employed

Figure 1: Examples of questions of the first part of the survey

2.1.2. Email SPAM questions

At the beginning of this part, the participants

were asked for a definition of email SPAM in

their own words in order to understand the

definition of email SPAM based on their

opinions.

Then the study defined email SPAM as “an unsolicited, unwanted, commercial or non-commercial email that is sent indiscriminately, directly or indirectly, to a large number of recipients without their permission and there is no relationship between the recipients and sender”. This definition was in the survey and

used to provide a reference point for the

remainder of the questions. Care was taken to

ensure that the respondents did not see the study

298

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 297-310The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 37: 2012 (Vol. 1, No. 4)

supplied definition until after they had supplied

their own definition of email SPAM to prevent

introducing a strong bias. The variety of

responses to the question of what is SPAM is

evidence that this approach was successful.

Some examples of email SPAM, keywords and

phrases used in email SPAM were given in the

survey.

The participants were asked if they knew about

email SPAM prior to reading the survey, and

what were the sources of their knowledge. The

participants were also asked if they received

email SPAM and how many email SPAMs they

received on average weekly. They were also

asked about the languages they received in email

and types of Arabic and English email SPAM.

The study focused on English and Arabic email

SPAM because English is the main language in

the world and Arabic is the native language in

Saudi Arabia.

The participants were asked about what they

did when they receive email SPAM (i.e. the

actions of email users in dealing with SPAM).

The actions of emails users in dealing with

SPAM described in the survey were as follows:

reading the entire email SPAM, deleting the

email SPAM without reading it, and contacting

the ISP and notifying it about email SPAM. The

participants were asked to choose one option

from the following options to answer their action

in dealing with SPAM. These options were as

follows: never, sometimes and always. Figure 2

shows an example for questions of email users in

Saudi Arabia about their actions in dealing with

email SPAM.

Note: the following question will ask you to choose

the appropriate option for your dealing with email

SPAM.

For example, when I am not reading the SPAM

email at all, I will circle the option "Never" in the

scale in the following table. If I sometimes read

SPAM, I will circle the option "Sometimes".

Read the

entire email

Figure 2: An example for questions of email users in Saudi

Arabia about their actions in dealing with email SPAM

The participants were asked if they purposely

responded to an offer made by a SPAM email

and what benefits they derived from email

SPAM. They were also asked if they were

affected by email SPAM and what were the

effects of email SPAM on them.

The participants were asked if they were aware

of Anti-SPAM filters to block email SPAM,

what were the sources of their knowledge about

these filters, and how effective these filters were

in detecting Arabic and English email SPAM.

Examples for the second part of questions of the

survey can be seen in Figure 3.

1. Everyone defines SPAM differently, in your own

words, how would you define email SPAM?

2. Did you know about SPAM emails prior to reading this

survey?

O Yes

O No

3. Have you received SPAM emails?

O Yes

O No

2. What is the

language of SPAM

email you receive on

average weekly? The

percentages should add

up to 100 %.

Percentage %

O English

O Arabic

O Other language

O Languages I do not

recognize

5. Are you aware of Anti-SPAM programs?

O Yes

O No

6. If you have used Anti-SPAM programs, please rate their

effectiveness in detecting English and Arabic email

SPAM?

Current

Programs\

Percentage

0% 25% 50% 75% 100%

The effectiveness of

current programs in

detecting Arabic

email SPAM

The effectiveness of

current programs in

detecting English

email SPAM

Figure 3: Examples of questions of the second part of the survey

2.1.3. Questions about the efforts of government

and ISPs to combat email SPAM

In this part, the participants were asked if they

were aware of government efforts to combat

SPAM and which efforts they were aware of.

The participants were also asked if they were

aware of ISPs efforts to combat SPAM and

which efforts they were aware of. Examples for

the third part of questions of the survey can be

seen in Figure 4.

Sometimes Never Always

299

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 297-310The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 38: 2012 (Vol. 1, No. 4)

1. Are you aware of efforts by the government in Saudi Arabia to

combat email SPAM?

O Yes

O No

2. Are you aware of efforts by ISPs in Saudi Arabia to combat

email SPAM?

O Yes

O No

Figure 4: Examples of questions of the third part of the survey

2.2. Participants The questionnaire was designed and

distributed to 1,500 participants in the central,

eastern, western, southern and northern regions

of Saudi Arabia. Completed questionnaires were

received from 1,020 participants in Saudi

Arabia.

34% of the participants were from the central

region, 20% were from the eastern region, 20%

were from the western region, 13% were from

the southern region and 13% were from the

northern region. Table 1 shows general

information about the participants who were

located in the Eastern, Western, Central,

Southern and Northern regions in Saudi Arabia.

Table 1: General information about the participants in the Eastern,

Western, Central, Southern and Northern regions of Saudi Arabia

Region Question

N S C W E

Part 1: General Information

61% 64% 57% 59% 62% Male Gender:

39% 36% 43% 41% 38% Female

35% 37% 35% 63% 58% 15-25

Age:

47% 38% 41% 26% 25% 26-35

12% 21% 17% 10% 14% 36-45

6% 2% 6% 1% 2% 46-55

0% 2% 1% 0% 1% 56 and more

86% 75% 81% 88% 90% Saudi Nationality:

14% 25% 19% 12% 10% Other

99% 99% 99% 100% 99% Arabic Language of

speaking: 75% 73% 63% 81% 62% English

1% 3% 3% 2% 2% Other

12% 15% 11% 17% 17% High school

Highest

level of

education:

8% 5% 7% 2% 2% Diploma

52% 49% 54% 70% 61% Bachelor

19% 17% 16% 7% 12% Master

9% 14% 12% 4% 8% PhD

26% 16% 20% 13% 17% Education and

teaching Major area

of study for the

participants

who had

diploma,

bachelor,

master or

PhD:

26% 31% 34% 40% 31%

Computer science

and information

technology

15% 20% 12% 5% 4% Social sciences

6% 5% 11% 7% 21% Physical and

biological sciences

10% 12% 8% 7% 16% Health sciences

and medicine

17% 16% 15% 28% 11% Other

45% 41% 29% 61% 58% Student

Work status: 51% 59% 70% 37% 42% Employed

4% 0% 1% 2% 0% Self-employed

58% 47% 48% 55% 44% Educational Nature of

work for the

employed

participants:

16% 8% 8% 8% 17% Medical

9% 16% 18% 20% 14% Technical

3% 24% 19% 16% 21% Management

14% 5% 7% 1% 4% Other

3. RESULTS This section described the responses of the

participants in the eastern, western, central,

southern and northern regions of Saudi Arabia

for the email users’ survey.

3.1. Respondents Definition and Awareness

of Email SPAM

Email users were asked for a definition of

email SPAM based on their opinions. The

responses showed that only 428 of 1,020

participants in different regions of Saudi Arabia

answered this question.

42% of the participants who answered this

question defined email SPAM as an email that

was sent randomly to numerous recipients and

contained Spyware, files, links, images or text

that aims to hack the computer or steal

confidential information such as email

passwords, credit card numbers and bank

account numbers.

39% defined email SPAM as an email that did

not contain an email address or that was sent

randomly, directly or indirectly by unknown

senders or sources to a large number of

recipients without their permission to receive it.

33% said that email SPAM was an email that

was sent randomly and contained malicious

programs such as Viruses, Trojans, Worms, or

contained hidden links, strange contents and

untrusted attachments that aimed to damage

computer, software and hardware, or aimed to

delete important information in a computer.

29% defined email SPAM as Unsolicited

Commercial Email (UCE) or email that was sent

to a large number of recipients and aimed to

promote commercial advertisements which

contained attractive words that were used to

encourage the recipient to buy medical, technical

and sexual products.

9% said that email SPAM was annoying and

unimportant email that was sent from friends,

but it was not sent in person and contained jokes,

greetings, invitations to subscribe to forums,

invitations for friendship by social networks

such as Facebook, competition, puzzles, political

and religious reviews, news, and scandals of

famous people in the world.

7% defined email SPAM as junk email or as

Unwanted, Unsolicited Bulk Email (UBE) that

was sent randomly to a large number of

recipients.

300

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 297-310The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 39: 2012 (Vol. 1, No. 4)

1% defined email SPAM as an email that was

not related to recipients’ work or was not related

to their interests.

From the definitions described above, it can

be clearly seen that there was no a specific

definition for email SPAM by email users and

that the most common definition for email

SPAM was that “an email that was sent randomly to numerous recipients and contained Spyware, files, links, images or text that aims to hack the computer or steal confidential information such as email passwords, credit card numbers and bank account numbers”. The

definitions described above indicated that some

definitions of users in Saudi Arabia for email

SPAM agreed with the international definitions

for email SPAM by defining email SPAM as

Unsolicited Commercial Email (UCE) and as

Unsolicited Bulk Email (UBE).

The differences in definition of email SPAM

could cause problems in enacting laws to combat

SPAM in Saudi Arabia and developing Anti-

SPAM filters for different languages such as

Arabic. This suggests that there is a scope to

specify an agreed definition for email SPAM

which could be used for enacting laws to combat

SPAM and developing Anti-SPAM techniques

in Saudi Arabia.

When the participants were asked if they knew

about email SPAM prior to reading the survey,

the results revealed that approximately third of

email users in Saudi Arabia did not know about

email SPAM and this is a significant and a risk

for Saudi society. The results of the survey

revealed that most of the participants indicated

prior awareness of SPAM, suggesting that the

survey itself has acted as a means of educating

the participants about SPAM and its impact.

This suggests that a broader survey or

information campaign about SPAM would have

a further positive impact in different regions of

Saudi Arabia. Also, this suggests that conducting

research related to SPAM and funding

researchers who work in the field of SPAM

could help in increasing the awareness of email

users in all regions about email SPAM and

hence reducing the impact of email SPAM in

Saudi Arabia.

As seen in Table 2, the results revealed that the

participants in the central and western regions

were more aware of SPAM than the participants

in other regions of Saudi Arabia. This could be

because of the major area of study where the

results indicated that the percentages of the

participants who studied computer science and

information technology in the western and

central regions were higher than the percentages

of the participants who studied the same area of

study in the other regions. Also, it could be

because of the work nature where the results

indicated that the participants who worked in

technical positions in the central and western

regions were more than the participants who

worked in the same positions in the other

regions. The results suggest that there should be

a focus on awareness programs about SPAM for

users in different regions of Saudi Arabia,

especially in the eastern, southern and northern

regions. These awareness programs could be

executed by the government sectors or private

sectors.

The results, as shown in Table 2, revealed that

most of the participants in all regions knew

about SPAM by self-education through the

internet and forums, and friends and relatives.

The results showed that there were prominent

efforts by school and university education in

informing users about SPAM in all regions

compared to other public and private sectors,

and the educational sectors in the southern

region have the highest percentage in the

awareness of users about SPAM.

The results also revealed that there was a

deficiency in the government efforts in

awareness of email users about SPAM in all

regions, and the efforts of the government in

informing users about SPAM was better in the

northern region than other regions. Also, the

results revealed that there were no government

efforts in informing users about SPAM in the

western region. The results also revealed that

there was a deficiency in the ISPs efforts in

awareness of users about SPAM although they

are one of the sectors who are responsible to

control internet service in Saudi Arabia.

This suggests that the government should

focus on the awareness of users about SPAM in

all regions, especially in the western region. The

awareness programs could be executed by

educational sectors such as universities,

broadcast media such as magazines and

newspapers, and sectors who are responsible to

provide and control internet services in Saudi

Arabia.

301

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 297-310The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 40: 2012 (Vol. 1, No. 4)

Table 2: Responses of the participants in the Eastern, Western, Central,

Southern and Northern regions about their knowledge about email SPAM

Region Question

N S C W E

Part 2: Email SPAM

37% 56% 72% 70% 57% Yes Did you know

about SPAM

emails prior to reading the

survey? 63% 44% 28% 30% 43% No

13% 13% 6% 7% 9% Internet Service

Providers (ISPs)

How

do you

know

about

SPAM

emails?

50% 51% 59% 76% 67% The internet and

forums

8% 11% 13% 21% 10%

Broadcast media such

as radio, TV,

newspapers and magazines

44% 48% 39% 56% 45% Friends and relatives

8% 4% 4% 0% 6%

Government

ministries and

commissions

40% 44% 41% 29% 38% Through my school or

university education

6% 7% 5% 3% 4% Other

3.2. Volume and Nature of Email SPAM in

Saudi Arabia

When the participants were asked if they

received email SPAM, the results showed that

most of the participants in Saudi Arabia received

email SPAM. Email users estimated they

received an average of 108 SPAM emails per

week.

Another study, conducted by [17], showed that

the participants received an average of 94.5

emails SPAM per week. By comparing the

volume of SPAM received in Saudi Arabia to

the volume of SPAM in that study [17], it can be

clearly seen that the volume of SPAM in Saudi

Arabia was broadly similar to the volume in that

study.

The results shown in Table 3 revealed that the

highest percentage of the participants who

received SPAM was in the southern region. The

results indicated that the average of the number

of email SPAM received weekly by the

participants was different from region to another.

The results revealed that the average of SPAM

received weekly was 77 emails SPAM in the

eastern region, 104 emails SPAM in the western

region, 126 emails SPAM in the central region,

95 emails SPAM in the southern region and 129

in the northern region. This indicated that the

number of SPAM received was larger in the

northern and central regions than other regions.

When the participants were asked about the

language of email SPAM that they received, the

results showed that the most email SPAM

received (59%) was in English, 34% was in

Arabic, 4% was not recognized and 3% was in

other languages.

A study conducted in Bahrain indicated that

64% of the respondents said that they received

English SPAM, 18% said that they received

Arabic SPAM and 18% said that they received

both Arabic and English SPAM [1]. The results

of this study indicated that the volume of

English SPAM received in Bahrain was similar

to the volume of English SPAM that received in

Saudi Arabia. The results of the study also

revealed that the volume of Arabic SPAM

received in Bahrain was less than that received

in Saudi Arabia.

As seen in Table 3, the results revealed that the

volume of English SPAM received was larger in

the northern region than other regions. Also, the

results showed that the volume of Arabic SPAM

was larger in the western region than other

regions. The number of unrecognized SPAM

was larger in the southern and northern regions

than other regions. The results showed that the

participants in the southern region received

SPAM in other languages such as Chinese,

Japanese, Russian, Turkish, French, Brazilian,

Spanish, Persian, German, Italian, Hindi, Urdu

and Hebrew more than other regions.

Table 3: Responses of the participants in the Eastern, Western, Central,

Southern and Northern regions about the languages of email SPAM

Region Question

N S C W E

Part 2: Email SPAM

65% 83% 73% 75% 70% Yes Have you

received SPAM

emails? 35% 17% 27% 25% 30% No

65% 61% 61% 51% 60% English What is the

language of SPAM email you

receive on

average weekly?

29% 30% 33% 43% 33% Arabic

5% 5% 4% 3% 4% Not recognized

1% 4% 2% 3% 3% Other language

When the participants were asked about the

types of Arabic and English emails SPAM that

they received, the results showed that there were

many types for both Arabic and English SPAM

and these types were different from Arabic to

English SPAM. Types of Arabic and English

SPAM and the differences between them can be

seen in Table 4.

302

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 297-310The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 41: 2012 (Vol. 1, No. 4)

Table 4: The differences between Arabic and English email SPAM

received by end users in Saudi Arabia

Types of email SPAM AR (%) EN (%)

Business 31 30

Religious and Political Party 5 2

Pornographic 10 24

Forums 36 3

Products and services 11 12

Phishing and Fraud 6 28

Other 1 1

Total 100 100

As described in Table 4, it can be clearly seen

that the volume of business advertisements,

emails from religious and political parties, and

emails related to forums was larger in Arabic

SPAM than English SPAM. The percentages

indicated that there was a significant difference

in composition between Arabic and English

SPAM, for example in the volume of forum

emails where this volume was much more in

Arabic SPAM than English SPAM.

Also, the results showed that the volume of

pornographic emails, products and services

emails, and phishing and fraud emails was larger

in English SPAM than Arabic SPAM. The

percentages indicated that there was a significant

difference between Arabic and English email

SPAM in the volume of pornographic and

phishing and fraud emails where this volume

was much more in English SPAM than Arabic

SPAM (See Table 4).

The results revealed other types of Arabic

SPAM that did not exist in English SPAM.

These types included news, training

consultation, jokes, scandals of famous people,

puzzles, greetings, competition, and invitations

by social networks websites such as Facebook.

A study conducted by the Communication and

Information Technology Commission (CITC) in

Saudi Arabia in 2007 showed that 64% of email

SPAM received in Saudi Arabia were direct

marketing, 25% were sexual emails, 5% were

religious emails, and 5% was other types [20].

However, this study did not specify if the email

SPAM received was written in Arabic or

English. The results of the CITC study indicated

that the volume of religious emails,

pornographic emails and other types of email

SPAM was similar to the volume of the same

types in this study.

The results, seen in Table 4, showed that the

volume of pornographic emails for both Arabic

and English email SPAM was lower compared

to the same type in other countries such as

Bahrain. The results of a study conducted in

Bahrain by [1] revealed that 76% of the

participants received pornographic emails while

24% did not receive pornographic emails. The

results of this study did not specify if the volume

of pornographic emails was larger in English or

Arabic. Therefore, the results of this study

indicated that the volume of pornographic emails

in Saudi Arabia was lower and this could be

because the access to pornographic websites is

not allowed for public in Saudi Arabia and this

could be contributed in reducing the volume of

SPAM email that sent from pornographic

websites.

Table 5 shows the different averages of Arabic

email SPAM received by the participants in the

eastern, western, central, southern and northern

regions of Saudi Arabia. The results revealed

that the participants in the southern region

received business advertisements more than the

participants in other regions. The volume of

religious and political emails received by the

participants in the eastern region was higher

compared to the same type received by the

participants in other regions. The results

indicated that the volume of pornographic emails

received in the western and central regions was

larger than the same type received in other

regions.

In addition, the results revealed that the

participants in the northern region received more

forums emails than the participants in other

regions. The volume of products and services

emails was larger in the eastern and western

regions than other regions. The results showed

that the volume of phishing and fraud was larger

in the western region than other regions. The

percentages also showed that the volume of

other types of Arabic SPAM was larger in the

eastern, central and southern regions than other

regions (See Table 5).

Table 5 shows the different averages of

English email SPAM received by the

participants in the eastern, western, central,

southern and northern regions of Saudi Arabia.

The results showed that the volume of business

advertisements was larger in the northern region

than other regions. The volume of religious and

political emails received in the western and

southern regions was larger compared to the

same type in other regions. The results revealed

that the participants in the eastern region

received pornographic emails more than other

regions. The volume of forums, products and

services, and other types of English SPAM was

larger in the western region than other regions.

303

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 297-310The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 42: 2012 (Vol. 1, No. 4)

The results also showed that the volume of

phishing and fraud emails was larger in the

southern region than other regions.

Table 5: Averages of Arabic and English email SPAM received by the

participants in the Eastern, Western, Central, Southern and Northern

regions of Saudi Arabia

Types of

email SPAM

E W C S N

AR

%

EN

%

AR

%

EN

%

AR

%

EN

%

AR

%

EN

%

AR

%

EN

%

Business 31 27 29 28 32 31 34 30 31 32

Religious and

Political

Parties

6 2 5 3 5 2 4 3 5 2

Pornographic 9 27 11 22 11 24 6 23 9 26

Forums 35 3 30 6 36 2 39 3 42 2

Products and

Services 13 9 13 17 10 13 11 9 8 10

Phishing and

Fraud 5 31 12 22 5 28 5 32 5 27

Other 1 1 0 2 1 0 1 0 0 1

A study conducted by [3] described some

keywords and phrases used in Arabic and

English email SPAM in Saudi Arabia. These

keywords and phrases were collected from

different ISPs in Saudi Arabia.

Examples of Arabic SPAM keywords and

phrases are as follows: "أدوية” ,”ألعاب“ ,"فياقرا”, ,”مسابقة” ,”مبروك لقد ربحت” ,”فرصة للربح” ,”ريجيم”بطاقة ”, ”انضم إلينا”, ”تعليم”, ”اربح مليون لایر سعودي” ”زواج”, ”حصريا ”, ”موضة”, ”خضراء للسفر إلى أمريكا”, ”جنس”, ”شريك العمر”, + فمافوق18 ”رومانسية”, ” ,”تبرعات“ ,"تدريب" ,"برامج", ”مفاجآت”, ”فضيحة”," ,"اشترك في المنتدى" شارك واربح عرض ”, ”جائزة” ," , ”ثورة“ ,”أقل اKسعار”, ”إباحية”, ”ھدية”, ”خاصمقاطع ”, ”أزياء” ,”دورة“ ,”أموال“ , ”بشرى” ,”أسھم“ ."للرجال فقط " and ,"اعمل من المنزل" ,”مضحكة Examples for English SPAM keywords and

phrases are as follows: "sex", "Cialis", “gift”, ”Dollar”, ”discount”, ”bonus”, "girls", "Viagra", "Loto winner", "Investment", "Forex", "Green", "Visa and Master", “reactivate your email account”, “Incomplete personal information”, “Verify your account”, “Account not updated”, “Financial Information Missing”, “$USD”, “You have won”, “fund”, “money”, “winning promotion”, “transferring”, "Training", "South Africa", "Partnership", "Bank loans", and "work and live in USA". 3.3. Actions of Email Users in Dealing with

SPAM The participants were asked about the

appropriate action for dealing with email SPAM.

In the survey, the participants were given three

actions for their dealing with SPAM. These

actions were as follows. The first action was that

reading the entire email SPAM. The second

action was that deleting the email SPAM without

reading it. The third action was that contacting

with the ISP and notifying it about email SPAM.

To answer this question, the participants were

asked to evaluate their actions in dealing with

SPAM by choosing one of the following options

for each action. The options for each action were

as follows: never, sometimes and always.

Firstly, when the participants were asked if

they read the entire email SPAM, the results

revealed that the most of the participants said

that they sometimes read the entire SPAM. The

results showed that participants in the eastern

and central regions were better than the

participants in other regions where the results

showed that the average of the participants who

said that they never read the entire email SPAM

was larger in the eastern and central regions than

other regions (See Table 6).

Secondly, when the participants were asked if

they delete the email SPAM without reading it,

the results showed that the most of the

participants said that they sometimes delete the

email SPAM without reading it. The results, as

shown in Table 6, revealed that the participants

in the central and eastern regions were better

than the participants in other regions where the

results indicated that the average of the

participants who said that they always delete the

email SPAM without reading it was larger in the

central and eastern regions than other regions.

Thirdly, when the participants were asked if

they contact with ISP and notify it about email

SPAM, the results revealed that the most of the

participants said that they never contact with ISP

and notify it about SPAM (See Table 6). The

results indicated that the participants in the

southern and northern regions were better than

the participants in other regions where the results

revealed that the average of the participants who

said that they always contact with ISP and notify

it about SPAM was larger in the southern and

northern regions than other regions.

The results of a study conducted by [17]

showed that 11.7% of the participants said that

they contacted their ISPs when they received

email SPAM. By comparing the results of two

studies, it can be clearly seen that most of email

users in the two studies did not contact with ISPs

regarding SPAM problems.

From the results shown above regarding the

actions of users in dealing with email SPAM, it

can be clearly suggest that the ISPs in Saudi

Arabia should inform users about email SPAM,

its impacts, technical and legal efforts of the

ISPs to combat SPAM, and what are the

304

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 297-310The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 43: 2012 (Vol. 1, No. 4)

necessary procedures that users do when they

receive SPAM.

When the participants were asked if they

responded to an offer made by a SPAM email,

the results showed that the most of the

participants in all regions did not respond to an

offer made by a SPAM email (See Table 6). The

results revealed that the participants in the

southern region responded to offers made by

SPAM email more than the participants in other

regions of Saudi Arabia.

The results indicated that the participants in

the western and southern regions were enjoyed

fun emails involved in SPAM more than the

participants in other regions. The results also

showed that the participants in the eastern and

northern regions used purchasing and selling

offers involved in SPAM email more than the

participants in other regions. Also, the results

revealed that the participants in the central and

southern and northern regions used SPAM as a

learning tool more than the participants in other

regions. The participants in the northern region

derived other benefits from SPAM such as

friendship requests more than the participants in

other regions (See Table 6).

The results indicated that as long as some users

responded to some offers of SPAM, email

SPAM could be increased and caused problems

for other users unless those users combat it. This

suggests that laws against SPAM in Saudi

Arabia could reduce the incidence of SPAM by

greatly reducing the ability of spammers to make

sales without fear of penalties.

Table 6: Actions of users in the Eastern, Western, Central, Southern and

Northern regions of Saudi Arabia in dealing with email SPAM

Region Question

N S C W E

Part 2: Email SPAM

29% 28% 37% 33% 40% Never 1- Read

the entire

email What

do you

do

when

you

receive

SPAM

email?

65% 62% 53% 62% 48% Sometimes

6% 10% 10% 5% 12% Always

5% 13% 7% 6% 11% Never 2- Delete

the email

without

reading it

62% 52% 50% 59% 49% Sometimes

33% 35% 43% 35% 40% Always

86% 73% 83% 87% 77% Never 3- Contact with ISP

and notify

it about

email

SPAM

6% 15% 14% 12% 19% Sometimes

8% 12% 3% 1% 4% Always

20% 34% 20% 15% 19% Yes Have you ever

purposely responded

to an offer made by a

SPAM email? 80% 66% 80% 85% 81% No

23% 16% 18% 10% 23% Purchasing and selling What benefits did

you derive from

SPAM emails?

46% 47% 47% 39% 33% Learning

50% 71% 54% 71% 56% Fun

4% 0% 0% 3% 3% Other

3.4. Effects of Email SPAM on End Users When the participants were asked if they

affected negatively by email SPAM, the results

revealed that approximately half of the

participants in all regions affected by email

SPAM (See Table 7).

The results showed that the participants in the

southern and northern regions were affected by

email SPAM more than the other participants in

other regions. This could be because of the most

of the participants in the southern and northern

regions were not aware of SPAM and the

effective ways in dealing with it. Also, this could

be because of dealing of the participants in the

southern and northern regions with offers made

by a SPAM email where the results revealed that

the participants in the southern and northern

regions responded to emails SPAM more than

the participants in other regions (See Table 7).

The results revealed that the main impact of

SPAM on users was that filling inboxes with

SPAM. The results showed that the participants

in the southern region were more affected by this

impact than the participants in other regions. The

results also showed that the second main impact

of SPAM on users was that the infection of

computers by a Virus, Worm or other malicious

program. The results revealed that the

participants in the northern and central regions

were more affected by this impact than the

participants in other regions (See Table 7).

The results showed that the participants in the

western region were affected by SPAM through

losing time and reducing productivity more than

the participants in other regions. The results

revealed that the participants in the eastern,

southern and western regions were affected by

SPAM through stealing personal information

such as user name, password and credit card

numbers more than the participants in other

regions. The results also revealed that the

participants in the eastern, western and central

regions felt less confidence in using the email

more than the participants in other regions. Also,

the results showed that the participants in the

central region were affected by email SPAM

through other effects such as annoying and

bothering more than the participants in other

regions (See Table 7).

305

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 297-310The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 44: 2012 (Vol. 1, No. 4)

Table 7: Effects of email SPAM on of users in the Eastern, Western,

Central, Southern and Northern regions of Saudi Arabia

Region Question

N S C W E

Part 2: Email SPAM

52% 51% 46% 37% 43% Yes Have you

been

affected

negatively

by email

SPAM?

48% 49% 54% 63% 57% No

16% 23% 18% 22% 23%

Stealing personal

information such as

user name, password

and credit card

numbers

What was

the impact

of email

SPAM?

35% 36% 44% 51% 45% Losing time and

reducing productivity

15% 7% 22% 23% 25% Less confidence in

using the email

56% 71% 65% 66% 52% Filling email inbox

59% 43% 58% 51% 55%

Computer was

infected by a Virus,

Worm or other

malicious program

3% 3% 4% 3% 2% Other impacts

3.5. Awareness of Anti-SPAM Filters and

the Effectiveness of Anti-SPAM Filters

in Detecting Arabic and English SPAM When the participants were asked if they

aware of Anti-SPAM programs, the results

revealed that the most of the participants in all

regions were not aware of Anti-SPAM

programs. The results indicated that the

participants in the central region were more

aware of Anti- SPAM programs than the

participants in other regions (See Table 8 ).

A study conducted in Bahrain [1] revealed that

26% of the participants knew about Anti-SPAM

programs while 74% did not know about Anti-

SPAM programs. By comparing the results of

Bahraini study to the results of this study, it can

be clearly seen that Saudi society was more

aware of Anti-SPAM programs than Bahraini

society, but still most Saudi society were not

aware.

When the participants were asked about how

they knew about Anti-SPAM programs, the

results showed that the majority of the

participants in all regions knew about Anti-

SPAM programs through the internet and forums

and through school and university education.

The results also revealed that there was a

deficiency in the government and ISPs efforts in

informing users about Anti-SPAM programs and

how they work. As seen in Table 8, there were

no government efforts to inform users about

Anti-SPAM programs in the western and

southern regions. This suggests that there should

be a coordinating between the government and

the sectors of providing the internet service in

Saudi Arabia in informing users in all regions,

especially in the western and southern regions,

about Anti-SPAM programs and how they work

to detect SPAM. This also suggests that

distributing free Anti-SPAM programs by the

government or by sectors of providing the

internet service for email users could reduce the

effects of SPAM in Saudi Arabia.

Table 8: Responses of the participants in the Eastern, Western, Central,

Southern and Northern regions of Saudi Arabia about their knowledge about Anti-SPAM programs

Region Question

N S C W E

Part 2: Email SPAM

28% 31% 44% 38% 38% Yes Are you

aware of

Anti-

SPAM

programs? 72% 69% 56% 62% 62% No

8% 10% 6% 8% 4% Internet Service

Providers (ISPs)

How did

you know

about

Anti-

SPAM

programs?

67% 52% 62% 79% 67% The internet and

forums

3% 5% 8% 3% 6%

Broadcast media

such as radio, TV,

newspapers and magazines

14% 48% 28% 25% 32% Friends and

relatives

11% 0% 3% 0% 6%

Government

ministries and

commissions

36% 52% 47% 27% 33%

Through my school

or university

education

6% 5% 5% 5% 1% Other

When the participants were asked to rate the

effectiveness of Anti-SPAM programs in

detecting Arabic and English SPAM, the results

revealed that the existing Anti-SPAM programs

were not completely effective in detecting

Arabic and English SPAM. This suggests that

the existing Anti-SPAM filters need to be

developed to detect SPAM in different

languages such as Arabic and English.

The results showed that the participants in all

regions estimated that the existing Anti-SPAM

programs were effective in detecting English

SPAM more than Arabic SPAM. This suggests

that there should be a focus on producing and

developing techniques to detect email SPAM in

Arabic language.

The evaluation of the participants in all regions

for the effectiveness of Anti-SPAM programs in

detecting Arabic and English SPAM can be seen

in Figure 5 and Figure 6.

306

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 297-310The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 45: 2012 (Vol. 1, No. 4)

Figure 5: The effectiveness of Anti-SPAM filters in detecting Arabic

email SPAM based on the evaluation of the participants in the Eastern,

Western, Central, Southern and Northern regions of Saudi Arabia

Figure 6: The effectiveness of Anti-SPAM filters in detecting English email SPAM based on the evaluation of the participants in the Eastern,

Western, Central, Southern and Northern regions of Saudi Arabia

3.6. Efforts of Government and ISPs to

combat SPAM

When the participants were asked if they

aware of the government efforts to combat

SPAM, the results showed that only a few

participants were aware of the government

efforts to combat SPAM. The results revealed

that users in the central regions were more aware

of the government efforts to combat SPAM than

other regions (See Table 9). This suggest that the

government should inform users about their

efforts to combat SPAM and should provide

awareness programs about SPAM, its impacts

and methods of combating it for users in all

regions of Saudi Arabia. This could help in

reducing the effects of SPAM on email users in

Saudi Arabia.

Table 9: The awareness of the participants in the Eastern, Western,

Central, Southern and Northern regions of Saudi Arabia about the government and ISPs efforts

Region Question

N S C W E

Part 3: Efforts of combating of Email SPAM in Saudi Arabia

23% 20% 30% 22% 20% Yes Are you aware of

efforts by the

government in Saudi

Arabia to combat

email SPAM? 77% 80% 70% 78% 80% No

10% 13% 16% 15% 11% Yes Are you aware of

efforts by ISPs in

Saudi Arabia to

combat email

SPAM? 90% 87% 84% 85% 89% No

The participants who were aware of

government efforts to combat SPAM were asked

about these efforts that they were aware of. Most

of the participants (62%) said that the

government efforts could be observed by King

Abdulaziz City for Science and Technology

(KACST). They said that KACST blocks

unsecured websites and websites that send

SPAM, informs people about dangerous security

attacks and their impacts, and conducts and fund

researches related to information technology

[19].

24% of the participants said that the

government recommended that each government

sector and private sector in Saudi Arabia should

apply security policy in the organization. The

policy should include: providing the

organization with software and hardware that are

necessary to avoid security attacks such as

Viruses and SPAM, awareness of employees and

customers about security attacks and methods of

combating them, conducting researches related

to security attacks and countermeasures for these

attacks, conducting training and workshops

related to security issues for employees,

employment of qualified people in the field of

networks security in the organization to deal

with security attacks, providing financial budget

to develop the work of security policy and

reviewing the security policy regularly to find

out the strengths and weaknesses of the work of

security policy.

22% said that the government established and

funded centres to deal with information security

issues. Examples for these centres are Centre of

Excellence in Information Assurance (COEIA)

[8], Computer Emergency Response Team

(CERT) [10] and Prince Muqrin Chair for

Information Security Technologies (PMC IT

SECURITY) [23]. They said the aims of these

centres were to inform people about security

attacks such as Viruses and SPAM and their

63%61%61%60%

51%

0

20

40

60

80

100

Region

The effectiveness of Anti-SPAM filters in detecting Arabic email SPAM based on the evaluation of email

users in Eastern, Western, Central, Southern and Northern regions in Saudi Arabia

Northern (n=130)

Southern (n=134)

Central (n=352)

Eastern (n=203)

Western (n=201)

85%83%

80%79%

74%

0

20

40

60

80

100

Region

The effectiveness of Anti-SPAM filters in detecting English email SPAM based on the evaluation of email

users in Eastern, Western, Central, Southern and Northern regions in Saudi Arabia

Southern (n=134)

Eastern (n=203)

Central (n=352)

Northern (n=130)

Western (n=201)

307

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 297-310The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 46: 2012 (Vol. 1, No. 4)

impacts, conducting and funding researches

related to security issues and conducting

conferences and workshops regarding security

attacks.

19% of the participants said that the

government efforts could be observed by

Communication and Information Technology

Commission (CITC). They said that CITC

funded Saudi National Anti-SPAM Program

project and created a website for this project that

includes information about SPAM, methods of

combating it and published it for public on the

internet. They also said that this project

informed people about SPAM by publishing

brochures or by subscription of people in

mailing list of CITC to make people look for the

new development in SPAM. The participants

also said that the project conducted some

researches regarding SPAM problems and

publish the results of researches for public. They

also said that CITC received complaints of

people regarding SPAM problems and it

processed these problems with the other

responsible government sectors [9].

18% said that some universities in Saudi

Arabia established centres for information

security which provide the following services for

people. First of all, information security centres

provide awareness of people about security

attacks. Second, these centres conducted

workshops, conferences and ongoing training in

the field of security issues and methods of

combating it for people. Third, centres published

valued researches in the field of security issues

for people and different libraries in Saudi

Arabia.

18% of the participants said that the

government enacted law for combating

electronic crimes in Saudi Arabia and there were

no specific laws for SPAM. They said that the

government sectors that are responsible to

execute the electronic crime law are

Communication and Information Technology

Commission (CITC) with coordination with

other legal sectors.

When the participants were asked if they

aware of the ISPs efforts to combat SPAM, the

results revealed that only a few participants were

aware of ISPs efforts to combat SPAM. The

results indicated that users in the central and

western regions were more aware of ISPs efforts

to combat SPAM than other regions (See Table

9). This suggests that the ISPs should provide

awareness programs about SPAM and its impact,

and their efforts to combat it for users in all

regions of Saudi Arabia which could help in

reducing the effects of SPAM on email users.

The participants who were aware of the ISPs

efforts to combat SPAM were asked about these

efforts that they were aware of. 42% of the

participants said that the ISPs used advanced

Anti-SPAM filters to block email SPAM before

it reaches end users inboxes.

26% said that the ISPs blocked websites or

forums that send email SPAM for recipients and

put them in black lists.

13% of the participants said that the ISPs

informed people about email SPAM and

methods of combating it by email, brochures,

and Short Message Service (SMS).

13% said that the ISPs warned customers not

to send SPAM, they received customers’

complaints regarding SPAM and they executed

some legal actions against people who sent email

SPAM such as disconnecting the internet service

and cancellation of the contract.

4. CONCLUSION AND FUTURE WORK This paper presented the results of a survey of

email users in the eastern, western, central,

southern and northern regions of Saudi Arabia

about email SPAM and how they deal with it.

The results showed that there was no a specific

definition for email SPAM and the most

common definition for email SPAM was that

“an email that was sent randomly to numerous recipients and contained Spyware, files, links, images or text that aimed to hack the computer or steal confidential information such as email passwords, credit card numbers and bank account numbers”.

The results revealed that approximately third

of users in Saudi Arabia did not know about

email SPAM and this is a significant and a risk

for Saudi society. The results showed that the

level of the awareness of the participants about

SPAM was different from region to another and

the participants in the central and western

regions were more aware of SPAM more than

the participants in other regions.

The results showed that the volume of email

SPAM was high in Saudi Arabia compared to

other countries. The results revealed that the

volume of email SPAM was different from

region to another and the volume of SPAM

received by the participants was larger in the

northern and central regions than other regions.

The results showed that most of the email SPAM

received in all regions was written in English

308

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 297-310The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 47: 2012 (Vol. 1, No. 4)

and the volume of English SPAM was different

from region to another.

The results also showed that there were many

types of Arabic and English SPAM received by

the participants in all regions. The results

showed that the most common type of Arabic

SPAM was forums emails and for English was

business advertisements, and phishing and fraud

emails and the volume of these types for both

Arabic and English were different from region to

another.

The results showed that a few participants in

all regions responded to SPAM and the average

of the participants who responded to SPAM was

larger in the southern region than other regions.

The results revealed that approximately half of

the participants in all regions were affected

negatively by email SPAM and the average of

the participants who affected negatively by

SPAM was larger in the southern and northern

regions than other regions.

The results showed that most of the

participants in all regions were not aware of

Anti-SPAM programs and the participants in the

central region were more aware of Anti-SPAM

programs than the participants in other regions.

The results showed that the participants in all

regions estimated that the existing Anti-SPAM

programs were more effective in detecting

English than Arabic SPAM.

The results showed that most of the

participants in all regions were not aware of the

government efforts to combat SPAM and the

participants in the central region were more

aware of the government efforts than the

participants in other regions.

Finally, the results showed that most of the

participants in all regions were not aware of the

ISPs efforts to combat SPAM and the

participants in the central and western regions

were more aware of the ISPs efforts than the

participants in other regions.

Future work could include investigating

government efforts to combat SPAM to find

more effective methods to combat SPAM.

Laws to combat SPAM in Saudi Arabia could

be investigated. This could be achieved by

taking the experiences of developed countries to

combat SPAM. This could help in enacting a

new clear law to combat SPAM in Saudi Arabia.

The legal and technical efforts of ISPs in Saudi

Arabia to combat email SPAM, and ways to

encourage ISPs to collaborate with each other

ISPs, private sectors, government sectors and

customers could be investigated.

Effective awareness programs to inform users

in all regions of Saudi Arabia, private sectors

and government sectors about SPAM, its effects

and methods of combating it could be

investigated.

Improving the performance of existing Anti-

SPAM filters in detecting Arabic and English

email SPAM could be investigated. This could

be achieved by testing the effectiveness of

existing Anti-SPAM filters in detecting Arabic

and English SPAM email and this could help in

creating and developing effective filters to detect

new types of Arabic and English SPAM.

A listing of keywords and phrases used in

Arabic email SPAM were involved in this

research and this could help in designing and

producing special Anti-SPAM filters for Arabic

SPAM.

5. REFERENCES

1. Al-A'ali, M.: A Study of Email Spam and How to

Effectively Combat It.

http://www.webology.org/2007/v4n1/a37.html ,

Webology (2007).

2. Alkahtani, H. S., Gardner-Stephen, P., Goodwin, R.: A

taxonomy of email SPAM filters. In: Proc. The 12th

International Arab Conference on Information

Technology (ACIT), pp. 351--356, Riyadh, Saudi

Arabia (2011).

3. Alkahtani, H. S., Goodwin, R., Gardner-Stephen, P.:

Email SPAM related issues and methods of controlling

used by ISPs in Saudi Arabia. In: Proc. The 12th

International Arab Conference on Information

Technology (ACIT), pp. 344--351, Riyadh, Saudi

Arabia (2011).

4. Androutsopoulos, I., Koutsias, J., Chandrinos, K. V.,

Spyropoulos, C. D.: An experimental comparison of

naive Bayesian and keyword-based anti-spam filtering

with personal e-mail messages. In: Proc of the 23rd

annual international ACM SIGIR conference on

Research and development in information retrieval, pp.

160--167, Athens, Greece (2000).

5. Australian Communications & Media Authority

(ACMA),

http://www.efa.org.au/Issues/Privacy/spam.html#acts

6. Boykin, O., Roychowdhury, V.: Personal Email

networks: an effective anti-spam tool. Condensed

Matter cond-mat 0402143, pp. 1--10 (2004).

7. Carreras, X., Marquez, L.: Boosting Trees for Anti-

Spam Email Filtering. In: Proc. of RANLP, 4th

International Conference on Recent Advances in

Natural Language Processing, pp. 1--7, Tzigov Chark,

BG (2001).

8. Centre of Excellence in Information Assurance

(COEIA), http://coeia.edu.sa/index.php/en/about-

coeia/strategic-plan.html

9. Communication and Information Technology

Commission (CITC) ,

http://www.spam.gov.sa/eng_main.htm

309

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 297-310The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 48: 2012 (Vol. 1, No. 4)

10. Computer Emergency Response Team (CERT),

http://www.cert.gov.sa/index.php?option=com_conten

t&task=view&id=69&Itemid=116

11. Cook, D., Hartnett, J., Manderson, K., Scanlan, J.:

Catching spam before it arrives: domain specific

dynamic blacklists. In: Proc. of the 2006 Australasian

workshops on Grid computing and e-research, pp.

193--202, Hobart, Tasmania, Australia (2006).

12. Cormack, G., Lynam, T.: Spam corpus creation for

TREC. In: Proc. of Second Conference on Email and

Anti-Spam (CEAS), pp. 1--2 (2005).

13. Cormack, G. V., Kolcz, A.: Spam filter evaluation

with imprecise ground truth. In: Proce. of the 32nd

international ACM SIGIR conference on Research

and development in information retrieval, pp. 604--

611, Boston, MA, USA (2009).

14. Damiani, E., Vimercati, S. D. C. d., Paraboschi, S.,

Samarati, P.: An Open Digest-based Technique for

Spam Detection. pp. 1--6, San Francisco, CA, USA

(2004).

15. Garcia, F. D., Hoepman, J.-H., Nieuwenhuizen, J. V.:

SPAM FILTER ANALYSIS. SEC, pp. 395--410

(2004).

16. Gardner-Stephen, P.: A Biologically Inspired Method

of SPAM Detection. 20th International Workshop,

pp. 53--56, DEXA (2009).

17. Grimes, G. A., Hough, M. G., Signorella, M. L.:

Email end users and spam: relations of gender and age

group to attitudes and actions. Computers in Human

Behavior 23, 1, 318--332 (2007).

18. Hovold, J.: Naive Bayes Spam Filtering Using Word-

Position-Based Attributes. In: Proc. Of Conference on

Email and Anti-Spam, pp. 1--8 (2005).

19. King Abdulaziz City for Science and Technology,

http://www.kacst.edu.sa/en/about/Pages/default.aspx

20. National Saudi Anti-SPAM Program,

http://www.spam.gov.sa/eng_stat2.htm

21. O'Brien, C., Vogel, C.: Spam filters: bayes vs. chi-

squared; letters vs. words. In: Proc. of the 1st

international symposium on Information and

communication technologies, pp. 291--296, Dublin,

Ireland (2003).

22. Pfleeger, S. L., Bloom, G.: Canning Spam: Proposed

Solutions to Unwanted Email. IEEE Security and

Privacy 3, 2, pp. 40--47 (2005).

23. [23] Prince Muqrin Chair for Information Security

Technologies (PMC IT SECURITY),

http://pmc.ksu.edu.sa/AboutPMC.aspx

24. Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.:

A Bayesian Approach to Filtering Junk E-Mail:

Learning for Text Categorization. Papers from the

1998 Workshop, pp. 1--8, Madison, Wisconsin

(1998).

25. Sakkis, G., Androutsopoulos, I., Paliouras, G.,

Karkaletsis, V., Spyropoulos, C. D., Stamatopoulos,

P.: A Memory-Based Approach to Anti-Spam

Filtering for Mailing Lists. Information Retrieval 6,

1, 49--73 (2003).

26. Sorkin, D. E.: SPAM LAWS. The Center for

Information Technology and Privacy Law,

http://www.spamlaws.com/ (2009).

27. Wittel, G. L., Wu, S. F.: On Attacking Statistical

Spam Filters. In: Proc. of the Conference on Email

and Anti-Spam (CEAS), pp. 1--7, Mountain View,

CA, USA (2004).

310

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 297-310The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 49: 2012 (Vol. 1, No. 4)

A Survey on Privacy Issues in Digital Forensics

Asou Aminnezhad

Faculty of Computer Science

and Information Technology

University Putra Malaysia [email protected]

Ali Dehghantanha

Faculty of Computer Science

and Information Technology

University Putra Malaysia

[email protected]

Mohd Taufik Abdullah

Faculty of Computer Science

and Information Technology

University Putra Malaysia

[email protected]

ABSTRACT

Privacy issues have always been a major concern in

computer forensics and security and in case of any

investigation whether it is pertaining to computer or

not always privacy issues appear. To enable

privacy’s protection in the physical world we need

the law that should be legislated, but in a digital

world by rapidly growing of technology and using

the digital devices more and more that generate a

huge amount of private data it is impossible to

provide fully protected space in cyber world

during the transfer, store and collect data. Since its

introduction to the field, forensics investigators,

and developers have faced challenges in finding the

balance between retrieving key evidences and

infringing user privacy. This paper looks into

developmental trends in computer forensics and

security in various aspects in achieving such a

balance. In addition, the paper analyses each

scenario to determine the trend of solutions in these

aspects and evaluate their effectiveness in resolving

the aforementioned issues.

KEYWORDS

Privacy, Computer Forensics, Digital Forensics,

Security.

1 INTRODUCTION

Computer forensics has always been a field which

is growing alongside technology. As networks

become more and more available and data transfer

through networks getting faster, the risks involved

gets higher. Malicious software, tools and

methodologies are designed and implemented every

day to exploit networks and data storage associated

with them to extract useful private information that

can be used in various crimes.

This is where computer forensics and security

comes in. The field applies to scientifically collect,

preserve, and recover latent evidence from crime

scenes with techniques and tools.

Computer forensics is the science of identifying,

analyzing, preserving, documenting and presenting

evidence and information from digital and

electronic devices, and it is meant to preserve the

privacy of users from being exploited.

Forensic specialists have a duty to their client to

pay attention about the data to be extracted that can

become possibly evidence, essentially it can be

digital evidence’s investigation and way guiding to

feasible litigation.

However, the process of extracting data evidences

itself opens up avenues for forensic investigators to

infringe user privacy themselves. The privacy

concern that computer forensics disclose can be

image, encrypted key , the user passwords and

utilize knowledge that more than aim of the

investigation. In order to prevent such potential

abuses and protect the forensics investigators as

well as users, researches and analysis has been

done in various fields to provide solutions for the

problem.

311

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 311-323The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 50: 2012 (Vol. 1, No. 4)

This paper comprises of 5 Sections and will be

presented as such: Section 2 determines the

limitations of the study, collects data from research

publications and reviews related works in the field

of privacy application in various fields and their

solutions. Section 3 analyses these solutions and

determine whether privacy can be preserved on

both user and forensic investigator’s perspective.

Section 4 identifies the overlooked privacy issues

by current developmental trends of privacy

preservation and its potential setbacks. Section 5

concludes the paper and summarizes the overall

development of technology in privacy preservation.

1.1 Limitations of the study

This paper focuses on statistical analysis based on

trends from 2006. Due to the technicalities of each

paper in specification of research field it is not

possible to rely solely on the results to reflect the

holistic picture of the real trend in privacy issues

when it comes to forensics investigations. It is also

difficult to fully explain the development trends of

privacy issues as they are delicate in each research

specimen. The research nature and scenarios used

cannot be fully dependably upon as they are not

necessarily applicable in another similar scenario.

The numbers of specimen provided are also too few

to adequately sustain very significant research

value. In this case, where most of the papers

reviewed are too specific in their corresponding

research field and purpose, it is difficult to

generalize the specimen into statistical data with

higher accuracy. We also realize that most

specimens are from the Elsevier journal platform,

and thus also acknowledge this as a form of

limitation on availability of more related research

publications in other sources.

We also credit another limitation on the lack of

graphical statistical data, as most of the papers

researched do not necessarily belong to statistical

based research. It is not practical to add statistical

assumptions into these graphical statistical data as

it will possibly divert the accurate picture of the

research.

1.2 Data Collection

In this research, a stringent data collection

procedure is set up. Such procedure is required as

the resource provided to achieve high level research

results is scarce, hence every important data cannot

be risked being left overlooked.

We consider 3 very important analyses: research

nature analyses, keyword analyses and individual

analytic platform. There is a total of 21 documents

analyzed based on the aforementioned 3

approaches.

Table 1 signifies the shift of research focus when it

comes to preserving privacy. It is rather evident

that the current focus of forensics and security

solutions are now more towards databases and

networking with the rise of dependency on cloud

computing technology, with 8 papers focusing on

that area. More data are being stored in third party

databases as compared to 5 years ago, and it

became a tempting source to gain valuable private

information. A shift of focus is inevitable from

software and systems to database and networking

under such circumstance where it is harder to gain

access to information without networking access

and maintain it for further exploitation.

Methodologies and framework still receive

adequate focus as these are the foundation of many

solutions that are to be proposed in the future.

The keyword analysis signifies the focus of each

specific specimen analyzed. As it is shown in Table

2, keywords used do not necessarily bear the same

signature as published in these specimens, but are

grouped based on their representation. For

example, a computer forensics publication with

digital forensics representation will be grouped

together as they represent similar research nature.

Keyword analysis provides a picture of techniques

and theories that are being emphasized within the

timeframe of this research paper.

The clear distinction on the focus of researchers to

privacy and digital forensics issues marked the

importance of balancing privacy and forensics.

Excluding the specific related issues, general

privacy and digital forensics focus achieved a total

of 24 keyword matches out of 21 papers. To

quantify, that would mean there are at least 3

papers that draw a comparison between both issues

312

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 311-323The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 51: 2012 (Vol. 1, No. 4)

in finding a balance as a major purpose of research.

The other important trend is the diversity of the

research. There are only 11 out of 53 representable

keywords identified that bear more than 2 keyword

matches. This means that more focus is given to

individually specified research subjects rather a

holistic picture of privacy-forensics balance.

The individual analytic platform is conducted as a

final data collection. This is done by picking up a

summary of each paper, and gives a brief

explanation of what the paper is trying to prove and

possible benefits from the publications.

Before a forensics investigator or computer security

designer works on finding evidence or putting up

detection systems, the first step is always to gather

information and plan. The problem with Standard

of Procedures (SOP) [1] of forensics investigations

are that there are many instances where forensics

investigators step into information that are not

necessarily related to a particular crime.

The Fourth Amendment of the Constitution of

United States of America is no stranger to digital

forensics investigators.

Table 1. Research Nature Analaysis

0

1

2

3

4

5

6

7

8

9

methodolgiesand

framework

software andsystems

database andnetworking

education andnetworking

0 2 4 6 8 10 12 14 16

Cybercrime Computer Prevention

Computer/Digital Forensics Fraud

Netflow Network Forensics

Cryptography Identity Based Encryption

Privacy Privacy Preserving Semantics

Statistical Database Onion Routing

Antiforensics Anonymizers

Security Log Files Analysis

Traffic Analysis Network Intelligence

Privacy Enhancing Technologies Transparency/Reliability

Legal Issues Warrants

Forensics Readiness Capability Information Privacy Incidents

Forensics Images Privacy Protection

Forensics Computing Education Forensics/Digital Investigation

Sequence Release Privacy Accurate

Privacy Preserving Object Compound Document Format

Document Security Electronic Document Information Leakage

Portable Document Format Information Forensics

Private Browsing Incognito In-Private

Privacy Preserving Forensics Encrpyted Data Searching Homomorphic Encryption Commutative Encryption Data Privacy/Protection

Sensor Web Distributed Information System

Forensics Database Relational Database

Suspicious Database Queries … Phisher Network

Forensics Framework PIS

Table 2. Keyword Analaysis

313

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 311-323The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 52: 2012 (Vol. 1, No. 4)

2 CURRENT TRENDS OF PRIVACY IN

DIGITAL FORENSICS

The Amendment protects people from unreasonable

seizure and searches, and warrants that allow such

seizure has to be specific to its cause. For example,

if a warrant is issued against an individual to be

searched for evidence of drugs, any related

searches that turned out to be child pornography

will not be eligible to be used against the

individual. The amendment also stretches to

interception of communication networks, including

wiretapping [2].

However, the Amendment only limits what type of

information to be searched and seized, not the

protocols on how they are to be searched and

seized. On this ground, [2] proposed that an audit

trail on methodologies used by forensics

investigators will be enough to verify if the

investigation protocols exceeded court

authorization.

Apart from a general audit, many related researches

also produced different models for forensics

investigations in recent years. In [3] proposed a

framework where enterprises can meet forensics

readiness to approach privacy related violations. It

consisted of a series of business processes and

forensics approach, executed in hierarchical order

such that enterprises can conduct quality privacy-

related forensics investigations on information

privacy incidents.

There are 2 later models proposed in 2010. Firstly,

in their research, [4] proposed a cryptographic

model to be incorporated into the current digital

investigation framework, where forensics

investigators first have to allow the data owner to

encrypt his digital data with a key and perform

indexing of the image of the data storage.

Investigators will then extract data from relative

image sectors that matches keywords they used,

with the encryption key. Image sectors without the

keywords will then not be revealed to forensics

investigators, guaranteeing privacy.

The next model proposed by [5] introduces a

layering system on data in order to protect privacy

of users from being violated and the forensics

investigators themselves from infringing privacy.

It allows forensics investigators to first obtain

information that is layered as not related to

individual before moving towards the next layer.

As each layer of information is justified and

obtained the layer gets deeper and closer in

relation to the individual until the final layer

where information is needed for forensics

investigation and directly linked to the person.

In [6], PPINA (Protect Private Information Not

Abuser) is proposed, an embedded framework in

Privacy Enhancing Technologies (PET), a

technology designed to preserve user anonymity

while accessing the internet. The framework

allows users to continue being anonymous unless

the server has enough evidence to prove that the

user is attacking the server, hence requesting a

forensics investigation entity to reveal user

identity. The framework is designed to achieve a

balance between user privacy and digital forensics,

where both goals can be achieved with a

harmonious combination of network forensics and

PET.

The development of digital forensics and security

on software level also raises many privacy related

issue. This includes information systems and

related tools.

The first software that is looking into is the

counter forensics privacy tool. A review was done

in 2005 on this software type that prevents

forensics investigators from accessing private

information by wiping out data like cache,

temporary files and registry values when executed.

In [7], the researchers evaluated 6 tools under this

category and found that while the tools potentially

eliminate vast majority of targeted data, they either

partially or fully failed in 6 evaluation sections

which they claim to function, including

incomplete wiping of unallocated space, erasing

targeted user and system files, registry usage

records, recoverable registry archive from system

restore point, recoverable data from special file

system structures and the tool’s own activity

records disclosure. The authors suggested that

encryption might be a better alternative to replace

these tools, such as Encrypting File System.

314

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 311-323The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 53: 2012 (Vol. 1, No. 4)

A similar analysis done on Privacy-Invasive

Software (PIS) by [8], software that collects user

information without user knowledge such as

spyware and advertisement software known as

adware, also found that current tools designed to

combat them (anti-spyware and adware) failed to

identify them fast enough or even identifying them

at all and have problems classifying PIS properly.

The research concluded that these tools, that be

run on similar algorithm dealing with viruses and

malware (signature identification) does not work

well on PIS due to its nature of existence in grey

area between business facilitating and malicious.

Manual forensics method, upon experiments,

provided better results instead.

Browsers also raise privacy related issues, as they

are used to perform many activities such as trading

online, which requires a private information

transfer. In [9] published an analysis on three

widely used browsers in terms of their private

browsing effectiveness. Private browsing is a

feature that prevents browsing history to be stored

in the computer’s data storage. The authors

concluded that while all three browsers do not

display visible evidences in private browsing

mode, related data can still be extracted with

proper forensics tool and methodology. From the

user’s viewpoint, the authors also concluded that

Google Chrome and Mozilla Firefox are better

private browsing solutions compared to Internet

Explorer.

Portable Document Format (PDF) is invented by

Adobe, credited with its security compared to

other document format. In [10], the researchers

released their review in this format, suggesting

that PDF is subject to leak information due to its

several interactive features, including flagging

content as “deleted” instead of really deleting

them, allow tracing of IP address on its

distribution, and very subject to hackers to collect

this information while using PDF to conduct

malicious cyber-attacks. The authors proved the

investigation with several tools and attacks,

suggested a few solutions on an administrator

level dealing with PDFs, such as the shocking

nature of PDF files received and systems like EES

(Elsevier Editorial System) to monitor PDF files.

In [11], on the concept of Onion Routing, pointing

out the evolution of the concept in preserving

privacy raised issues of difficulties during

investigations. Onion Routing is created to

absolutely prevent traffic analysis from third

parties by encrypting socket connections and act

as a proxy instead. Only the adjacent kin routers

along the anonymous connection can “unpeel” the

encryption as the packets approach its destination,

preventing hijacks and man-in-the-middle

phenomena. However, the author argued that the

same technology could be used by criminals to

prevent traffic analysis of forensics investigators

and bypass censorship, or combining the concept

to perform other malicious attacks on networks.

Such concept makes it very difficult for forensics

investigators to collect evidence as there are too

few avenues to access the information pockets

from third parties, unless access is gained from the

inside chain of the connection or tracing the last

router’s communication with the destination which

is the weakest protection in the chain.

In [12], the researcher published their findings on

preserving privacy in forensics DNA databases.

Such databases are designed to be centralized,

usable by forensics investigators globally to

identify criminal identities based on DNA

matches. To solve issues where such information

may be leaked into parties for non-investigative

purposes on forensics ground, the authors

proposed a framework in reworking the database

access controls to only accept certain queries that

are legitimate forensics queries. These queries

include blood samples and cell tissues that are

found at crime scenes.

In [13], the researcher outlined his research on

privacy issues raised by sensor webs and

distributed information systems, an active field

after the 911 incident. Distributed information

systems are information collecting systems with

huge data repository, including private

information such as financial and communications

records. Sensor webs use small, independent

sensors to collect and share information about

their environment without wire. The author

proposed several policies to maintain privacy in

distributed information systems and sensor webs,

315

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 311-323The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 54: 2012 (Vol. 1, No. 4)

including fundamental security primitives such as

low level encryption and authentication, human

interfaces to limit queries, selective revelation of

data, strong audits and better querying

technologies, with policy experimenting, security

and legal analysis, masking strategies to obtain

results.

Another networking issue arises in shared and

remote servers, servers that stores data for users as

a form of third party data storage. Essentially there

are two problems here; firstly, these servers are

owned by third party service providers, hence

getting access without their knowledge of what

investigators are looking for is difficult due to

permission grants (privacy preservation). Secondly,

the servers’ nature to be remote also makes it

difficult to trace evidence in a large number of

shared and distributed storage using traditional

forensics method of imaging (cloning) the storage

devices. The usual privacy issue of tampering into

irrelevant data also exists. To solve these problems,

[14] proposed two schemes, the homomorphic and

commutative encryption. The homomorphic

encryption is a scheme where both administrator of

remote servers and investigators encrypt their data

and queries. The administrator then uses the

encrypted queries with the investigator’s key to

search the server for relevant data, and the

investigator then decrypts the data with the

administrator’s key. The commutative encryption

introduces a Trusted Third Party (TTP) that

supervises the administrator to prevent unfair play.

The details are similar to homomorphic encryption,

with another layer of commutative-law based

encryption applied by TTP before the searching on

data storage is conducted. Both schemes allow

investigators to obtain information that they need

without exposing them to administrators of the

remote servers.

In [15], the researchers presented an approach to

detect accessing parties of leaked information from

a relational database through queries. In this

approach, the authors argued that suspicious

queries can be determined if and only if the

disclosed secret information could be inferred from

its answers. To do this, a series of optimization

steps involving the concept of replaceable tuples

and certificates, and database instances are

explained in relational mathematics. An algorithm

is constructed then from these optimization steps to

determine whether a query is suspicious with

respect to a secret and a database instance.

In [16], a framework in 2011 to preserve privacy

while handling network flow records is proposed.

Network flow recording collects information about

network traffic sessions. This information can

contain very private data, including network user

information; their activities on network, amount of

data transferred and used services. The authors

proposed a framework of integrated tools and

concepts to prevent such data from falling into the

wrong hands. The framework is divided into 3

sections: data collection and traffic flow recording,

combined encryption with Identity Based

Encryption and Advanced Encryption System, and

statistical database modelling and inference

controls. The framework is implemented to prevent

privacy on two phases, including encryption and

decryption of data collected and the manner of

constructing statistical reports such that inference

controls are applied to prevent a response to

suspicious queries.

To combat phishing that often leads to identity

theft, [17] proposed a framework in 2008 (citation

2008 a forensic). The framework is to counter-

phish phishers, using a fake service (phoneypot)

with traceable credential data (phoneytokens).

When a phisher is identified, he/she is directed to

the phoneypot and transact with it, transferring

phoneytokens into the phisher’s collection server.

This allows investigators to trace and profile the

identity of the phisher through these tokens. The

authors argued that even if the counter-phishing

attempt is discovered, it would have caused enough

problems to the phisher to avoid the target in the

future, protecting the user from further exploitation

by phishing attacks.

In general, database systems are supposedly

designed to store and handle data in a proper

manner. In [18], the researchers’ findings in 2007

that proved this wrong are published. They

concluded that database systems do not necessarily

remove stored data securely after deletion whereby

remnant data and operations can be found in

316

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 311-323The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 55: 2012 (Vol. 1, No. 4)

allocated storage. Database systems also made

redundant copies of data items that can be found in

file systems. These data present a strong threat to

privacy as not only investigators may find

themselves dealing with unwarranted data,

criminals may also access them for malicious

purposes. To avoid this, the authors designed a set

of transparency principles to ensure secure deletion

of data, modified database language (MySQL)

internals to encrypt the expunction log with

minimal performance impact that usually occur

when it comes to overwriting-encryption.

In 2008, [19] published a paper explaining the

importance of computer forensics to be practiced in

today’s networked organizations. It outlined the

key questions including the definition of computer

forensics, its importance, legal aspects involved

and online resources available for organizations to

understand computer forensics in a nutshell.

In [20], a paper is published that addressed a rising

problem of professionalism when it comes to

digital forensics in other fields. The author pointed

out that in many scenarios when it comes to

InfoSec professionals being deployed to work on

digital crime investigations their duties are very

limited to laws and legal systems, and lack the

intersection of business requirements from

enterprises and government. He argued that

coordination between different departments is

essential to achieve investigation goals, hence

proposed a GRC-InfoSec compliance effort. A few

suggestions put forth include a legal research

database to create a cross-referencing table of

regulatory actions and legal case citations to IT-

specific laws and guidelines, and presentation of

resulting costs and business disruption. (GRC

stands for Governance, Risk management and amp;

Compliance)

As for education, [21] published a system that

produces file system images for forensic computing

training courses. The system known as forensig,

developed with Python and Qemu, allows

instructors to set constraints on certain user

behavior such as deleting and copying files, in a

script which is then executed in a form of image

that can be analyzed by the students. The results

can then be matched with the input script. It solves

the issues of instructors using second hand hard

disks for analysis practice, which often times

contain private data.

Besides that, [22] tackle cybercrime-related issues.

Issues regarding privacy as a fundamental right,

comparison of legal issues between countries

discuss in the workshop. In addition there were few

works on privacy issues that may arise during

malware analysis [23,24], analysis of cloud and

virtualized environments [25-27], and in pervasive

and ubiquitous systems [28-32]. With growing

usage of mobile devices and Voice over IP (VoIP)

protocol several researchers tried to provide

privacy sound models for investigation in these

environments [33-36]. Finally, there were models

for forensics log protection while considering user

privacy in log access occasions [37,38].

3 DISCUSSION AND ANALYSIS OF

RESULTS

We believe that the development of solutions and

frameworks to contain privacy issues in various

fields are not synchronized. Our analysis is done

based on each field, with comparison to related

fields and their effects as a whole towards privacy

preservation. We found out that while research in

one field contributed compelling solutions that

might be a long term answer to privacy

preservation, it does not necessarily be the case on

another field. To analyze the development of each

field, we split the stakeholders in each section,

from users’ and forensics investigators’

perspectives.

3.1 Privacy Preservation from User’s

Perspective

We found that in the case of a user, the major

problem of preserving privacy is the lack of

knowledge and understanding. General users do not

know the technicalities of how networks and data

storage are being managed, and their rights in their

personal and private information being used by

organizations. Hence, researches and development

of a framework and systems with privacy

preservation of user’s data are focused more

317

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 311-323The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 56: 2012 (Vol. 1, No. 4)

towards passive preservation, without them

knowing how the framework and system preserve

their data.

We found this to be very effective, yet deceiving at

the same time. In instances where frameworks are

applied to networks and databases, for example the

inference controls and encryption framework that

are implemented on network flow recording and

traffic analysis, onion routing, cryptographic

approach on DNA forensic databases,

homomorphic and commutative encryption, and

sensor webs protection framework, the solutions

provided are usually effective in tackling

situational crisis on data privacy, and users usually

do not know such solutions are implemented to

protect their data from being exploited. However,

the review on counter-forensics privacy tools and

analysis of how database systems delete data, plus

the problems in Portable Document Format when

they “delete” data, proved the deceiving pictures of

these tools and systems being able to live up to

expectations, or placed a false dichotomy that they

deliver in their tasks. Especially when users

generally do not know if these tools work exactly

like what they expect, and assumed that they do

work, private data are constantly under threat of

being exploited by malicious parties with no

warning posed to the users to be aware of the

situation of their private data.

We also found that privacy preservation can never

be achieved at its fullest. The proposed frameworks

and models, with encryption and technologies

implemented, their findings have a similar issue; it

is particularly hard to design a fully protected

system, with constraints and assumptions primarily

added into the calculus to prove their frameworks

and models can function under these constraints.

The mention of “future works” or manual audits

have been used in particularly general models,

including sensor webs and distributed information

systems, database systems, relational database

query controls and counter-Phishing. This presents

another issue; not all users are aware of what type

of scenario their data would most likely be

exploited, or in which type of scenario their current

data storage is in. This contributes generally to

another problem; when user privacy is breached,

the need for different professionalism to handle the

investigations become difficult due to the lacking

of standardization and understanding of the

scenarios and the status quo.

Throughout these flaws, we understand that while

development and researches to preserve user

privacy better are getting better on the road, the

idea of a fully protected framework or model will

not suffice in the near future. It is important for

users to understand the need for them to secure

their private information at the best of their interest,

particularly when cloud computing technology is

on the rise, and more remote and shared data

storages are made available for users. Users must

know their responsibility in their own personal

information, and utilize as much as possible

combinations of several developed privacy

preserving solutions to protect their data well while

networking. From picking the right browser to

perform private browsing to using the services of

trusted organizations with proven functioning

privacy preservation policies and technologies in

place are a few sets of decisions and combination

of models and framework to secure private data

better.

We also think that users must always have the

awareness and understanding that their private data

might be leaked. Such awareness is needed with

status quo proving that privacy preservation is still

in its developmental stages in redefining their

borders and to what extent they should provide

protection. Users must always be prepared to face

scenarios and seek solutions when such leaks

happen, and know how forensics investigators

perform investigation without further threatening

their privacy in this regard.

To conclude this subsection, we believe that users

need to have a general understanding and

knowledge on how technologies aid in privacy

preservation while they are storing data on

networks, using tools and services, and if these

technologies are delivering their functions. We also

believe that users must understand that

technologies can only help in privacy preservation

that much and it is a collective effort of a

combination of technologies with professionalism

and expertise of other aspects to better privacy

318

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 311-323The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 57: 2012 (Vol. 1, No. 4)

preservation. It is also important that users are

prepared to deal with situations when their privacy

has been breached, and seek the best solutions

available, including forensics investigations. It is

also evident to us that development of privacy

preservation techniques and tools are predicated

more towards technical solutions rather than a

holistic approach, desynchronizing the focus to

tackle the problem.

3.2 Privacy Preservation from Forensics

Investigators’ Perspective

The jobs of forensics investigators are to collect,

preserve and analyze information, then reconstruct

the events of a crime. We found that when it comes

to privacy preservation from the forensics

investigators’ perspective, it is always a dilemma

strongly linked with user privacy and legal systems,

as pointed out by many related works.

We concur that forensics investigators’

procedural methodologies in collecting, preserving

and analyze information possess potential avenues

of user privacy infringement. Our agreement on

this course is based on a general assumption that

forensic investigators have vested interest in this

information; either they are important in proving a

court case or a crime, or they are important for

personal use, which often times contain malicious

purposes.

We found that the related research and proposed

solutions provided positive and negative effects in

forensics investigations. We argue that the

limitations and constraints implemented in these

systems and models do help in protecting forensics

investigators from infringing privacy, but on the

other hand, limit them from conducting forensics

investigation in a more direct and effective

approach.

We want to explain this on both levels. On the

positive note, constraints applied on various

frameworks, such as homomorphic and

commutative encryption, onion routing, inference

controls, DNA blood and tissue samples from the

crime scene as key queries, sequential data release

based on relational levels and network flow

recording framework all demonstrated a vast

implementation of constraints to protect unrelated

data from being exposed to forensics investigators

while conducting investigations. We believe that

sequential data release based on relational levels is

particularly critical in addressing privacy issues and

balancing user privacy and legal need to access

such private data, as it allows direct avenue to gain

access to private information through a specific

process, not as general as organized queries and

encryption. We believe that integration of these

technologies can bring more positive contribution

in aiding forensics investigators. Using the

Sequential release of information based on a

relational level as a framework to implement and

shape organized queries is an example of

integration of both techniques while conducting

forensics investigations.

However, there are negative sides of it as well. The

issues here are on the non-technical part of dealing

with privacy. We found that the most obvious

impact of the proposed frameworks, such as cross

referencing encrypted queries with data, onion

routing and strong audit are among the frameworks

that directly limit avenues that can be taken by

forensics investigators to approach their

investigations. We need to consider the assumption

that all crime investigations are time sensitive and

such constraints placed by these frameworks may

prolong the already time consuming investigation

progress, as investigators now have to plan their

investigation methods to be more technical and

direct in order to extract the right evidence. Besides

that, the possibility of extracting wrong or

irrelevant evidence still exists regardless of how

these frameworks are in place. The fact that tracing

private information without really knowing the

content and only based on keywords does not

necessarily reflect the nature of data collected,

meaning the data might not be useful to the

investigation, and risks the possibility of exposing

private information as well.

Finally, we found that ambiguity always exists in

privacy issues when it comes to forensics

investigators. We argue that a forensics investigator

is an individual that is equipped with decent

knowledge of computer security. We believe that if

an individual’s purpose of obtaining private

319

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 311-323The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 58: 2012 (Vol. 1, No. 4)

information is malicious, the data will still be

leaked into the wrong hands anyway. The idea is

regardless of how far technology has gone into

preserving privacy, it still runs the possibility of

being leaked and exposed, considering of their

possible use and management by another person

other than the user him/herself. While having such

technologies deter forensics investigators to use the

extracted information properly, it is still not a

guarantee that the information will not be misused

in the hands of forensics investigators, whether

intentional or unintentional.

To conclude, we believe that the proposed

frameworks, introduced technologies and

implemented models and tools believed to be able

to aid forensics investigators from infringing user

privacy while conducting investigations might not

be as one sided as it seems. We believe that the

rationale and professionalism of the forensic

investigators are important when handling private

data as their expertise in handling computer

security is on level enough to know how these

technologies work in protecting private data. We

also believe that such technologies still need to

remain to deter forensics investigators from drifting

off their professionalism, but essentially the

negative impacts of such deterrence in place might

jeopardize privacy even further with the possibility

of irrelevant information leaking out anyway, and

prolonging the forensics investigation process. We

conclude that it is important that the forensics

investigators know the sensitivity of data they are

going to handle in each investigation and

understand their professionalism is important in

preserving privacy.

3.3 Privacy Preservation from Technologies’

Perspective

We found that from a technology perspective, the

current development of cyber security and digital

forensics in preserving privacy may have reached a

bottleneck, and the latest developments are too

constrained to very few general security measures.

This in turn does not bring too much positive

improvement in the field, but returns negative

effects as well.

We analyzed some of the reviews and would like to

highlight several examples to support our findings.

The first problem with current technologies is the

similarity of techniques. We found that almost all

security measures taken in various frameworks and

models, be it database systems, remote servers,

relational databases or network flow recording, the

framework looks similar in terms of their

algorithm, which includes encryption, data deletion

and controls. We concur that some of the

combinations are effective, such as onion routing

and sequential data release in preserving privacy

from being exposed to unrelated parties. However,

assuming in general scenarios, similarity in security

frameworks often means faster workarounds being

developed by malicious hackers, as these

frameworks share a common structure, and provide

more examples for malicious parties to work their

ways around the security system. We also noticed

that in some of the frameworks proposed, the

authors made assumptions that otherwise will

jeopardize the system, and offer a contingency

solution. However, in one such scenario such as

onion routing, the author mentioned about how it

would also harm investigators should the

framework be used against them. As onion routing

renders traffic analysis from third parties

impossible, it would be extremely difficult to trace

or extract information from such routing method

used by malicious users for tracking and profiling

purposes. This is a typical example of how

technologies, even in the cyber security field, can

reserve wanted results and have an unexpected and

undesired effect when it is being used by the wrong

party.

The same happens to the commutative encryption

example. The framework could only work properly

under the assumption that the administrator

provides all database information in an encrypted

manner. Should this is not the case, not only the

extracted information by the forensics investigators

suffer possibilities of being irrelevant, it also

jeopardizes the process of investigation as the

forensics investigators would likely miss out

important evidence in reconstructing the sequence

of events on the crime.

320

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 311-323The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 59: 2012 (Vol. 1, No. 4)

To conclude, development of technologies in cyber

security and digital forensics are very much

predicated on technicalities only, and does not

necessarily provide more improvement to

preserving privacy as it has been expected to. The

similarity in frameworks and models proposed, plus

the possibility of technologies being used in the

wrong hands are all issues that have to be solved at

grassroots level to ensure privacy preservation is

successful. We believe that apart from technical

development, technologies will need to take into

consideration other aspects that influence digital

forensics and cyber security, including education,

business requirements, professionalism from other

related fields and work together to ensure a more

holistic level of improvement in preserving privacy

can be achieved. We also argue that technologies in

digital forensics and security can backfire and

become dangerous if it is reversely used by

malicious users with intent to harm and infringe

user privacy.

4 CRITICALLY OVERLOOKED ISSUES

As mentioned in the analysis section, we believe

that privacy issues stem from intention, and made

possible with the use of technology. However,

technology has already revolutionized to a level

that it is applicable to almost every industry; a good

example is how database technology is used in

storing DNA samples of criminals, which can stem

into medical forensics for a start. Research focus

should now be more emphasized on solving the

issue at a root problem rather than introducing

more technical countermeasures in the field, which

many publications in this research also proved to be

applicable on both privacy preservation and

exploitation use.

We also note that the focus on education and

awareness of intention of protecting privacy and

preservation in a professional forensics field are not

adequate enough to strike the balance between

privacy preservation and getting the investigation

done in quality level. We find that this is

particularly detrimental, as technologies that are

continuously being rolled out into the commercial

market will not be able to be utilized in satisfactory

level by professional forensics investigators

without proper training and awareness. This opens

up to more possibilities of abuse without consent or

abuse without a motive by investigators. Awareness

is also not given emphasis on the user’s side, and

this exposes users to higher risk of being abused

under the same paradigm. Simply put, even with

the latest technologies and framework in place to

preserve privacy, it would have been rendered

useless should both parties that use them are not

aware of their potential, and subject to risk of being

abused by such technologies instead.

5 CONCLUSION

This paper has identified various privacy issues in

cyber security and digital forensics, issues that use

for protecting privacy of data in forensic

investigation, whereby how forensics investigators

may have infringed user privacy while conducting

forensics investigations, and how user privacy is

always under threat without proper protection. It

has also reviewed the current development trend

shift in this industry, why such trend could have

happened and its drive.

The paper has reviewed various fields and their

development in the technicalities and technologies

to address this problem. The paper describing each

field in a nutshell that explains how these

technologies work, and what are their approaches

in solving the problem of preserving privacy. The

reviews are split into three sections, each with its

corresponding fields of reviews and explanation.

The paper then analyses these reviews and view

them from the user and forensics investigator’s

perspectives, whether such development in cyber

security and digital forensics actually improve the

efforts on preserving privacy. The paper concluded

that while every development has its positive

approach and finds the solution to what the authors

want to solve, the issue of privacy preservation still

exists, with the consideration of non-technical

aspects in professionalism in practice and the

ambiguity of scenarios causing some approaches to

be counterproductive. The paper also analyses on

how at a technical level, advanced technologies in

321

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 311-323The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 60: 2012 (Vol. 1, No. 4)

digital forensics and security are facing a

bottleneck in development and could bring about as

equal harms to the current efforts in preserving

privacy.

6 REFERENCES

[1] I-Long Lin, Yun-Sheng Yen, Annie Chang: “A

Study on Digital Forensics Standard Operation Procedure

for Wireless Cybercrime,” International Journal of Computer

Engineering Science (IJCES), Volume 2 Issue 3, 2012.

[2] C. W. Adams, “Legal issues pertaining to the

development of digital forensic tools,” Third International

Workshop on Systematic Approaches to Digital Forensic

Engineering, pp. 123-132, 2008.

[3] K. Reddy and H. Venter, “A Forensic Framework

for Handling Information Privacy Incidents,” Advances in

Digital Forensics, volume V, pp. 143-155, 2009.

[4] Frank Y.W. Law et al, “Protecting Digital Data

Privacy in Computer Forensic Examination,” Systematic

Approaches to Digital Forensic Engineering (SADFE), 2011.

[5] N. J. Croft, M.S. Olivier, “Sequenced release of

privacy-accurate information in a forensic investigation,”

Digital Investigation, volume 7, pp. 1-7, 2010.

[6] G. Antoniou, C. Wilson, and D. Geneiatakis,

PPINA – “A Forensic Investigation Protocol for Privacy

Enhancing Technologies,” Proceedings of the 10th IFIP on

Communication and Multimedia Security, pp. 185-195,

2006.

[7] M. Geiger and L. F. Cranor, “Counter-forensic

privacy tools,” Privacy in the Electronic Society, 2005.

[8] M. Boldt and B. Carlsson, “Analysing

countermeasures against privacy-invasive software,” in

ICSEA, 2006.

[9] H. Said, N. Al Mutawa, I. Al Awadhi and M.

Guimaraes, “Forensic analysis of private browsing

artifacts,” in International Conference on Innovations in

Information Technology, 2011.

[10] A. Castiglionea, A. D. Santisa and C. Sorien,

“Security and privacy issues in the Portable Document

Format,” The Journal of Systems and Software, volume 83,

pp. 1813–1822, 2010.

[11] D. Forte, “Advances in Onion Routing: Description

and backtracing/investigation problems,” Digital

Investigation, volume 3, pp. 85-88, 2006.

[12] P. Bohannon, M. Jakobsson and S. Srikwan,

“Cryptographic Approaches to Privacy in Forensic DNA

Databases,” Lecture Notes in Computer Science Volume

1751, pp 373-390, 2000.

[13] J.D. Tygar, “Privacy in sensor webs and

distributed information systems,” Software Security, pp. 84-

95, 2003.

[14] Y. M. Lai, Xueling Zheng, K. P. Chow, Lucas Chi

Kwong Hui, Siu-Ming Yiu, “Privacy preserving confidential

forensic investigation for shared or remote servers,” in

International Conference on Intelligent Information Hiding

and Multimedia Signal Processing, pp.378-383, 2011.

[15] S. Böttcher, R. Hartel and M. Kirschner,

“Detecting suspicious relational database queries,” in The

Third International Conference on Availability, Reliability

and Security, 2008.

[16] B. Shebaro and J. R. Crandall, “Privacy-

preserving network flow recording,” Digital Investigation,

volume 8, pp. 90-100, 2011.

[17] S. Gajek, and A. Sadeghi, “A forensic framework

for tracing phishers,” volume 6102 of LNCS, pages 19-33.

Springer, 2008.

[18] P. Stahlberg, G. Miklau, and B. N. Levine,

“Threats to privacy in the forensic analysis of database

systems,” ACM Intl Conf. on Management of Data

(SIGMOD/PODS), 2007.

[19] US-CERT, Computer Forensics, 2008.

[20] S. M. Giordano, “Applying Information Security

and Privacy Principles to Governance,” Risk Management

& Compliance, 2010.

[21] C. Moch and F. C. Freiling, “The forensic image

generator generator,” in Fifth International Conference on

IT Security Incident Management and IT Forensics, 2009.

[22] J. R . Agustina and F. Insa, “Challenges before

crime in a digital era: Outsmarting cybercrime offenders,”

Workshop on Cybercrime, Computer Crime Prevention and

the Surveillance Society, volume 27, pp.211-212, 2011.

[23] F. Daryabar, A. Dehghantanha, HG. Broujerdi,

Investigation of Malware Defence and Detection

Techniques,” International Journal of Digital Information

and Wireless Communications(IJDIWC), volume 1, issue 3,

pp. 645-650, 2012.

[24] F. Daryabar, A. Dehghantanha, NI. Udzir,

“Investigation of bypassing malware defences and malware

detections,” Conference on Information Assurance and

Security (IAS), pp. 173-178, 2011.

[25] M. Damshenas, A. Dehghantanha, R. Mahmoud, S.

Bin Shamsuddin, “Forensics investigation challenges in

cloud computing environments,” Cyber Warfare and Digital

Forensics (CyberSec), pp. 190-194, 2012.

[26] F. Daryabar, A. Dehghantanha, F. Norouzi, F

Mahmoodi, “Analysis of virtual honeynet and VLAN-based

virtual networks,” Science & Engineering Research

(SHUSER), pp.73-70, 2011.

[27] S. H. Mohtasebi, A. Dehghantanha, “Defusing the

Hazards of Social Network Services,” International Journal

of Digital Information, pp. 504-515, 2012.

[28] A. Dehghantanha, R. Mahmod, N. I Udzir, Z.A.

Zulkarnain, “User-centered Privacy and Trust Model in

Cloud Computing Systems,” Computer And Network

Technology, pp. 326-332, 2009.

[29] A. Dehghantanha, “Xml-Based Privacy Model in

Pervasive Computing,” Master thesis- University Putra

Malaysia 2008.

[30] C. Sagaran, A. Dehghantanha, R Ramli, “A User-

Centered Context-sensitive Privacy Model in Pervasive

322

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 311-323The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 61: 2012 (Vol. 1, No. 4)

Systems,” Communication Software and Networks, pp. 78-

82, 2010.

[31] A. Dehghantanha, N. Udzir, R. Mahmod,

“Evaluating user-centered privacy model (UPM) in

pervasive computing systems,” Computational Intelligence in

Security for Information Systems, pp. 272-284, 2011.

[32] A. Dehghantanha, R. Mahmod, “UPM: User-

Centered Privacy Model in Pervasive Computing Systems,”

Future Computer and Communication, pp. 65-70, 2009.

[33] S. Parvez, A. Dehghantanha, HG. Broujerdi,

“Framework of digital forensics for the Samsung Star Series

phone,” Electronics Computer Technology (ICECT),

Volume 2, pp. 264-267, 2011.

[34] S. H. Mohtasebi, A. Dehghantanha, H. G.

Broujerdi, “Smartphone Forensics: A Case Study with Nokia

E5-00 Mobile Phone,” International Journal of Digital

Information and Wireless Communications

(IJDIWC),volume 1, issue 3, pp. 651-655, 2012.

[35] FN. Dezfouli, A. Dehghantanha, R. Mahmoud

,”Volatile memory acquisition using backup for forensic

investigation,” Cyber Warfare and Digital Foresnsic, pp.

186-189, 2012

[36] M. Ibrahim, MT. Abdullah, A. Dehghantanha ,

“VoIP evidence model: A new forensic method for

investigating VoIP malicious attacks,” Cyber Security, Cyber

Warfare and Digital Forensic , pp. 201-206, 2012.

[37] Y. TzeTzuen, A. Dehghantanha, A. Seddon,

“Greening Digital Forensics: Opportunities and Challenges,”

Signal Processing and Information Technology, pp. 114-119,

2012.

[38] N. Borhan, R. Mahmod, A. Dehghantanha, “A

Framework of TPM, SVM and Boot Control for Securing

Forensic Logs,” International Journal of Computer

Application, volume 50, Issue 13, pp. 65-70, 2009.

323

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 311-323The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 62: 2012 (Vol. 1, No. 4)

Modelling Based Approach for Reconstructing Evidence of VoIP

Malicious Attacks

Mohammed Ibrahim, Mohd Taufik Abdullah and Ali Dehghantanha

Faculty of Computer Science and Information Technology

Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia

[email protected], {mtaufik, alid}@fsktm.upm.edu.my

ABSTRACT

Voice over Internet Protocol (VoIP) is a

new communication technology that uses

internet protocol in providing phone

services. VoIP provides various forms of

benefits such as low monthly fee and

cheaper rate in terms of long distance and

international calls. However, VoIP is

accompanied with novel security threats.

Criminals often take advantages of such

security threats and commit illicit activities.

These activities require digital forensic

experts to acquire, analyses, reconstruct and

provide digital evidence. Meanwhile, there

are various methodologies and models

proposed in detecting, analysing and

providing digital evidence in VoIP forensic.

However, at the time of writing this paper,

there is no model formalized for the

reconstruction of VoIP malicious attacks.

Reconstruction of attack scenario is an

important technique in exposing the

unknown criminal acts. Hence, this paper

will strive in addressing that gap. We

propose a model for reconstructing VoIP

malicious attacks. To achieve that, a formal

logic approach called Secure Temporal

Logic of Action(S-TLA+) was adopted in

rebuilding the attack scenario. The expected

result of this model is to generate additional

related evidences and their consistency with

the existing evidences can be determined by

means of S-TLA+

model checker.

KEYWORDS

Voice over IP, S-TLA+, Reconstruction,

malicious attack, Investigation, SIP,

Evidence Generation, attack scenario

1 INTRODUCTION

Voice-over Internet Protocols (VoIP) phone

services are prevalent in modern

telecommunication settings and demonstrate

a potentiality to be the next-generation

telephone system. This novel

telecommunication system provides a set of

platform that varied from the subjected and

closed environment offered by conventional

public switch network telephone (PSTN)

service providers [1]. The exploitation of

VoIP applications has drastically changed

the universal communication patterns by

dynamically combining video and audio

(Voice) data to traverse together with the

usual data packets within a network system

[2]. The advantages of using VoIP services

incorporated with cheaper call costs for

long distance, local and international calls.

Users make telephone calls with soft phones

or IP phones (such as Skype) and send

instant messages to their friends or loved

ones via their computer systems [3].

The development of VoIP has brought a

significant amount of benefits and

satisfactory services to its subscribers [2].

However, VoIP services are exposed to

various security threats derived from the

Internet Protocol (IP) [4]. Threats related to

this new technology are denial of service,

324

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 63: 2012 (Vol. 1, No. 4)

host and protocol vulnerability exploits,

surveillance of calls, hijacking of calls,

identity theft of users, eavesdropping and

the insertion, deletion and modification of

audio streams [5]. Criminals take advantage

of such security threats and commit illicit

activities such as VoIP malicious attacks.

This requires acquisitions, analysing and

reconstruction of digital evidence.

However, detecting and analysing evidence

of attacks related to converged network

application is the most complicated task.

Moreover, the complex settings of its

service infrastructure such as DHCP

servers, AAA server, routers, SIP registrar,

SIP proxies, DNS server, and wireless and

wired network devices also complicate the

process of analysing digital evidence. As a

result, reconstructing the root cause of the

incident or crime scenario would be

difficult without a specific model guiding

the process.

1.1 Related Work

In recent times, researchers have developed

new models to assist forensic analysis by

providing comprehensive methodologies

and sound proving techniques.

Palmer [6] first proposed a framework with

the following steps: identification,

preservation, collection, examination,

analysis, presentation as well as decision

steps. The framework was presented at the

proceeding of the first Digital Forensic

Workshop (DFRW) and served as the first

attempt to apply forensic science into

network system. The framework was later

cobble together and produced an abstract

digital forensic model with the addition of

preparation and approach strategy phases;

the decision phase was replaced by

returning evidence. However, the model

works independently on system technology

or digital crime [7].

Similarly, the work of Mandila and Procise

developed simple and accurate

methodology in incident response. At the

initial response phase of the methodology, it

is aimed at determining the incident, and

strategy response phase is formulated and

added [8]. On the other hand, Casey and

Palmer [9] proposed an investigative

process model that ensures appropriate

handling of evidence and decrease chances

of mistakes through a comprehensive

systematic investigation. Also in another

paper, it was reported that Carrier and

Spafford [10], has adopted the process of

physical investigation and proposed an

integrated digital forensic process. In

another approach [11] combined existing

models in digital forensic and comes up

with an extended model for investigating

cyber crime that represents the flow of

information and executes full investigation.

Baryamureeba and Tushabe reorganized

different phases of the work of Carrier and

Spafford and enhanced digital investigation

process by adding two new phases (i.e.

traceback and dynamite)[12] .

Other frameworks include the work of

Bebee and Clark which is hierarchical and

objective based for digital investigation

process[22]. However, all the

aforementioned models are applied to

digital investigation in a generalized form.

Meanwhile, Ren and Jin [14] were the first

to introduce a general model for network

forensic that involves the following steps:

capture, copy, transfer, analysis,

investigation and presentation. The authors

in [15] after surveyed the existing models

suggest a new generic model for network

forensic built from the aforementioned

models. This model consists of preparation,

detection, collection, preservation,

examination, analysis, investigation and

presentation.

325

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 64: 2012 (Vol. 1, No. 4)

Furthermore, many authors proposed event

reconstruction attacks models for instance

Stephenson [16] analysed the root cause of

digital incident and applied colored Petri

Nets for modelling of occurred events.

Gladyshev and Patel [17] developed event

reconstruction in which potential attack

scenarios are constructed based on finite

state machine (FSM) and neglecting

scenario that deviate from the available

evidence. The author in [18] uses a

computation model based on finite state

machine together with computer history and

came up with a model that supports the

existing investigation. Rekhis and Boudriga

proposed in [19], [20] and [21] a formal

logic entitled Investigation-based Temporal

Logic of Action (I-TLA) which can be used

to proof the existence or non-existence of

potential attack scenario for reconstruction

and investigation of network malicious

attacks. On the other hand, Pelaez and

Fernandez [22] in an effort to analyse and

reconstruct evidence of attacks in converged

network, logs correlation and normalization

techniques were proposed. However, such

techniques are effective if the data in the

file or forensic logs are not altered.

The existing models stated above are more

of generic not specific to a particular kind

of attacks. Therefore, the need for

reconstructing the evidences of malicious

attacks against VoIP is highly needed

because it plays an important role in

revealing the unknown attack scenario. As a

result, the reliability and integrity of

analysis of evidence in VoIP digital forensic

would be improved and enhances its

admissibility in the court of law. In view of

that, the work in this paper is focused on

reconstruction of Session Initiation Protocol

(SIP) server malicious attacks. Hence, the

VoIP evidence reconstruction model

(VoIPERM) is proposed that categorized

the previous model in [23] into main

components and subcomponents. The model

described VoIP system as a state machine

through which information could be

aggregated from various components of the

system and formulates them into hypotheses

that enable investigator model the attack

scenario. Following the reconstruction of

attack scenario, actions that contradict the

desirable properties of the system state

machine are considered to be malicious

[23]. Consequently, the collection of both

legitimate and malicious actions enables the

reconstruction of attack scenario that will

uncover new more evidence. To determine

the consistency of additional evidences with

respect to the existing evidence, a state

space representation was adopted that depict

the relationship between set of evidence

using graphical representation. The

graphical representation enables

investigators understand if generated

evidences can support the existing once.

Hence, it reduces the accumulation of

unnecessary data during the process of

investigation [23]. Additionally, the model

is capable of reconstructing actions

executed during the attack that moves the

system from the initial state to the unsafe

state. Thus, all activities of the attacker are

conceptualized to determine what, where

and how such an attack occurred for proper

analysis of evidence [23]. To handle

ambiguities in the reconstruction of attack

scenario, S-TLA+

is to be applied.

Essentially, the application of S-TLA+ into

computer security technology is efficient

and generic. On the other hand, S-TLA+ is

built on the basis of logic formalism that

accumulate forward hypotheses if there is

deficient details to comprehend the

compromised system [19].

In addition there were several works on

malware investigation [24,25], analysis of

cloud and virtualized environments [26-28],

privacy issues that may arise during

forensics investigation[29-34], mobile

326

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 65: 2012 (Vol. 1, No. 4)

device investigation [35-37] and greening

digital forensics process [38].

The main contribution of this paper is to

propose a novel model in VoIP digital

forensic analysis that can integrate digital

evidences from various components of

VoIP system and reconstruct the attack

scenario. Our objective is to reconstruct

VoIP malicious attacks to generate more

additional evidences from the existing

evidence. The remaining of the paper is

arranged as follows: next section discusses

VoIP malicious attacks; 3 discuss VoIP

digital forensic investigation, section 4

introduces the new model, section 5

discusses S-TLC model checker, section 6

case study and 7 conclusions.

2 VoIP MALICIOUS ATTACKS

In general, an appropriate term used related

to software built purposely to negatively

affect a computer system without the

consent of the user is called a malware [39].

And the increased number of malicious

activities during the last decade brought

most of the failures in computer systems

[40]. Nevertheless, Voice over IP is prone

to those malware attacks by exploiting its

related vulnerabilities. Having access to

VoIP network devices, intruders can disrupt

media service by flooding traffic, whip and

control confidential information by illicit

interception of call content or call signal.

Through impersonating servers, intruders

can hijack and make fake calls by spoofing

identities [3]. Consequently, the

confidentiality, integrity and availability of

the users are negatively affected. Also VoIP

services are utilized by spammers to deliver

instant messages, spam calls, or presence

information. However, these spam calls are

more problematic than the usual email spam

since they are hard to filter [3]. Similarly,

attacks can transverse gateways to an

integrated network system like traditional

telephony and mobile system. Meanwhile,

compromising VoIP applications composed

a link to break out security mechanisms and

attack internal networks [39]. Also,

attackers make use of malformed SIP

messages to attack embedded web servers

through Database injection vectors or Cross

Script attacks [39].

2.1 SIP Malicious Attack

As previously explained, this paper

considers SIP Server attacks. Several

attacks are related to SIP server, but the

most concern threat within research

community is VoIP spam. Generally, spam

is an unwanted bulk email or call,

deliberated to publicize social engineering.

The author in [3] discusses that “Spam

wastes network bandwidth and system

resources. It exists in the form of instant

message (IM), Voice and presence Spam

within a VoIP setting” [3]. It affects the

availability of network resources to

legitimate users which can result to denial

of service (DoS) attack. Spam originates

from the collection of session initiation in

an effort to set up a video or an audio

communications session. If the users

accepted, the attacker continues to transmit

a message over the real-time media. This

kind of spam refers to as classic

telemarketer Spam and is applicable to SIP

protocol and is well known as Spam over IP

Telephone (SPIT). However, spam is

categorized into instant Message (IM spam)

and presence Spam (SPPP). The former is

like email spam, but it is bulky and

unwelcome set of instant messages

encapsulated with the message that the

attacker wishes to send. IM spam is

delivered using SIP message request with

bulky subject headers, or SIP message with

text or HTML bodies. The latter, is like the

former, but it is placed on presence request

(that is, SIP subscribes requests) in an effort

327

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 66: 2012 (Vol. 1, No. 4)

to obtain the "white list" of users to transmit

them an instant message or set off another

kind of communication [3].

3 VoIP DIGITAL FORENSIC

INVESTIGATION

Lin and Yen [41] define digital forensic

science to preserve, identify, extract,

record as well as interpret the computer and

network system evidence and analyse

through complete and perfect methods and

procedures.” On the other hand, forensic

computing is particularly important

interdisciplinary research area founded from

computer science and drawing on

telecommunications and network

engineering, law, justice studies, and social

science [42]. However, to convene with the

security challenges various organizations

developed numerous models and

Methodologies that satisfy their

organizational security policy. Presently,

more than hundreds of digital forensic

procedures developed globally [43]. Also

the increase number of security challenges

in VoIP persuades researcher to developed

several models. On the other hand, in VoIP

digital forensic a standard operating

procedure called VoIP Digital Evidence

Forensic Standard Operating Procedure

(VoIP DEFSOP) is established [41].

Moreover, previous study noted that there

was not established research agenda in

digital forensic; to resolve that, six

additional research areas were proposed at

the 42nd

Hawaii international conference,

which include Evidence Modelling. In

evidence modelling investigation procedure

is replicated for practitioners and case

modelling for various categories of crimes

[44]. However, the increase number of

crimes associated with computers over the

last decade pushes product and company to

support in understanding what, who, where

and how such attack happened [45]. To

fulfil this current development, in this paper

the proposed model can support

investigation and analysis of evidence by

reconstructing attack scenario related to

VoIP malicious attacks. Afterwards, the

reconstruction of potential attack scenario

will assist investigators to conceptualize

what, where, and how does the attack

happened in the VoIP system.

4 VoIP EVIDENCE

RECONSTRUCTION MODEL

(VoIPERM)

The idea proposed in [43] is to assist

investigators in finding and tracing out the

origin of attacks through the formulation of

hypotheses. However, our proposed model

considered VoIP system as a state machine

(which observed the system properties in a

given state) and the model is built up from

four main components as shown below.

Figure 1. VoIP evidence reconstruction model

The explanation of each component is as

follows:

4.1 Terminal State/Available Evidence

This component observes the final state of

the system at the prevalence of the crime; it

328

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 67: 2012 (Vol. 1, No. 4)

is the primary source of evidence and is

characterized by the undesirable system

behavior. The terminal state provides

available evidence and gives an inside about

the kind of action acted upon on the

compromised system [23]. Other properties

of system compromise described by [21]

include any of the following:

Undesirable safety property of some system components

Unexpected temporal property

Given be the set of all reachable states in VoIP system and

be the collection of all

desirable properties in a given state. If

then the final state of the system is said to be unsafe and can be

represented as . For all actions

where is the sequence of actions associated with each reachable state;

then is said to be a malicious action. So

is signifying one of the available

evidence [23].

4.2 Information Gathering

This component is aimed to collect and

gather information that gives details about

VoIP system state. It requires the following

subcomponents.

VoIP components: these components provide services such as voice mail

access, user interaction media control,

protocol conversion, and call set up,

and so on. The components can be

proxy servers, call processing servers,

media gateways and so on, depends

on the type of protocol in use [23].

Moreover, software and hardware

behaviours are observed to assist the

investigator with some clue about

VoIP system state. VoIP system states

are defined as the valuation of

component variables that change as a

result of actions acted upon them.

If are components variables that

change by executing action in a given

state. These variables are referred to

as flexible variables given as

... and for any action that

transforms . Where and are

respectively variables in old and new

state and . Then the properties of

and are observed to decide whether

they belongs to the system desirable

properties [23].

VoIP vulnerabilities: These refer to any

faults an adversary can abuse and commit a

crime. Vulnerabilities make a system more

prone to be attack by a threat or permit

some degree of chances for an attack to be

successful [46]. In VoIP systems,

vulnerabilities include weaknesses of the

operating systems and network

infrastructures. Some weaknesses formed

from poor in design and implementation

security mechanism and Mis-configuration

settings of network devices. VoIP protocol

stack also associated with weaknesses that

attacker exploits and access text based

credentials and other private information.

4.3 Evidence Generation

In this component, hypotheses are

formulated based on information gathered

in the previous stage. The formulated

hypotheses are used in the process of

finding and generation of additional

evidence. The formal logic of digital

investigation is applied to consider available

evidence collected from different sources

and handle incompleteness in them by

generating a series of crime scenario

according to the formulated hypotheses.

This stage involves the following

subcomponents:

Hypothesis formulation: To overcome the lack of system details encountered

during the investigation, hypotheses

are formulated based on intruder’s

anticipated knowledge about the

329

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 68: 2012 (Vol. 1, No. 4)

system and the details of information

captured from VoIP components. The

basis of hypothesis formulation is to

predict the unknown VoIP malicious

attack. In this case, there is a need to

have specific variables attached to

hypotheses and VoIP components

respectively and make an assumption

to establish a relationship between the

variables. This determines what effect

of such hypothesis if it is applied to

VoIP components. To achieve this,

three main requirements are set out:

Hypotheses should establish a

relationship between system

states (that is, VoIP component

states in this regard), to avoid

violating the original properties

(Type Invariant) of the system

under investigation.

All hypotheses found to be

contradictory are eliminated to

avoid adding deceptive

hypotheses within a generated

attack scenario.

To efficiently select and

minimize the number of

hypotheses through which a

node is reached, the relationship

among the hypotheses should be

clearly expressed [19].

Moreover, the process of investigation

relied on the formulation of hypotheses to

describe the occurrence of the crime. At the

lowest levels of investigation, hypotheses

are used to reconstruct events and to

abstract data into files and complex storage

types. While at higher levels of

investigation, hypotheses are used to

explain user actions and sequences of

events [45]. An investigation is a process

that applies scientific techniques to

formulate and test hypotheses. At this point,

VoIP variables are signifying as (indigenous

Variable), while variables formed by

hypotheses are denoted as (Exogenous

Variable). Consequently, it describes how

VoIP components are expected to behave if

formulated hypotheses are executed.

However, Assumptions are obviously made

based on the expected knowledge of the

attacker about the system. The sets of

hypotheses are said to be variables

signifying attacker’s expected knowledge

about the system which is different from the

flexible variables as has been mentioned. However, all the variables derived from

hypothesis formulation are referred to as

constrained variables denoted by ... . Meanwhile, while hypotheses are aggregated care should be

taking to stay away from adding ambiguous

hypothesis that can prevent the system from

moving to the next state. In S-TLA+ it is

signifies inconsistency and denoted as

[19]

Modelling of Attack scenario: Digital forensic practices demands for the

generation of temporal analysis that

logically reconstruct the crime [26].

Also according to [47], in crime

investigation it is likely to reason

about crime scenarios: explanation of

states and events that change those

states that may have occurred in the

real world. However, due to the

complexity of understanding attack

scenario, to handle them, it is vital to

develop a model that simplifies their

description and representation within

a collection of information and set

aside new attacks to be regenerated

from the existing ones [19]. For this

reason, it is essential to model VoIP

malicious attacks to enable

investigators understand the attack

scenario and describes how and where

to acquire digital evidence. In this

regard, instead of modelling both the

system and witness statement as a

finite automata like in [40] an S-TLA+

is used to model attack scenario as its

330

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 69: 2012 (Vol. 1, No. 4)

support logic formulation with

uncertainty. In addition, evidences can

easily be identified with S-TLA+

using

a state predicate that evaluates

relevant system variables [19].

Moreover, S-TLA+ is an advancement

over a temporal logic of action (TLA).

However, a system is signified in

TLA by a formula of the form x:

[ ]v , relating the set of all its

authorised behaviours. It expresses a

system whose initial behaviour

satisfies and where every state

satisfies the next state relation or

leaves the tuple of specification

variable unchanged. The infinite

behaviour of the system is constrained

by the Liveness property (written as

a conjunction of weak and strong

fairness conditions of actions). In this

regard, TLA can be used in S-TLA+ to

illustrate a system’s progress from a

state to another, in advance of the

execution of an action under a given

hypothesis [11].Meanwhile, in S-

TLA+ a constrained variable with

hypothesis not yet express out,

assumed a fictive value denoted as

[19].

An action is a collection of Boolean

function true or false if (

: / , ′) = true i.e. each

unprimed variable in the state is replaced with prime variable ′ in

state the action become true [19].

( : / , ) = true i.e.

each non-assumed constrained

variable in state s is replaced with assumed constrained variable in

state t. The action becomes true, and

if { ⋀

then the set of actions is

said to be legitimate actions. Likewise

if { ⋀

then the set of actions

is said to be malicious actions, where

is the property satisfying the

behaviour of [23], Attack scenario

fragment are the collection of both

legitimate and malicious actions that

move the system to an unsafe state.

Thus, attack scenario denoted as is

defined [23]

Testing Attack scenario: the purpose of testing generated attack scenario is

to ascertain its reliability in respect to

the system behaviours. The properties

of the system at a given state is

examined, the investigator should

compare the properties of the

generated attack scenario with the

system final state. If any of the

scenarios satisfied the properties of

the final state, then the investigator

should then generate and print digital

evidence else the hypotheses should

be reformulated [23]. Let

be the set of the generated attack

scenario and be the set of

VoIP system states. If

} and then

satisfied the properties of the

system final state, where is the

property satisfying the behaviour of and ( ) otherwise known as

[23].

4.4 Print Generated evidence

Evidences can be generated from attack

scenario using forward and Backward

chaining phases adopted from inferring

scenarios with S-TLC [19]. However, the

proposed model after being logically proof

by the S-TLA+, it is expected to reconstruct

malicious attack scenario in the form of

specifications that can be verified using S-TLA

+ model checker called S-TLC. S-TLC

is a directed graph founded on the basis of

state space representation that verifies the

logical flow of specifications written in S-

331

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 70: 2012 (Vol. 1, No. 4)

TLA+ formal language. Therefore, absolute

reconstructions of attack scenario fragments

are represented and the logical relationships

between them are illustrated on a directed

graph [23]. At this point, the investigator is

likely to realize what, how, where and why

such an incident was accomplished in the

VoIP system. Also the resulting outcome of

the graph is to generate new evidence that

matches the existing evidence. For all

generated attack scenarios ⟨ ⟩

such that all the flexible variables

and constrained variable are

evaluated as and respectively, where is the valuation of all non-constrained

variables called a node core and is the

valuation of all constrained variables called

node label. Then, each reachable state can

be represented on the directed graph G with

their node core and node label as ,

respectively.

5 S-TLC MODEL CHECKER,

STATE SPACE

REPRESANTATION

A state can be represented on the generated

graph as a valuation of all its variables

including the constrained ones. It involves

two notions:

Node core: it represents the valuation of the entire non-constrained variables and

Node label: is a valuation of the entire constrained variables under a given

hypothesis.

Given a state t, tn is used to denote its

equivalent node core, tc to describe its

resulting environment (is a set of

hypotheses) and Label (G, t) to refer to its

label in graph G.

The S-TLC algorithm is built on three data

structures G, UF and UB , G refers to the

reachable directed graph under construction.

UF and UB are FIFO (first in first out)

queues containing states whose successors

are not yet computed, during forward and

backward chaining phases respectively. The

S-TLC model checker works in three phases

[19].

5.1 Initialization Phase

Initialization phase is the first stage in S-

TLC algorithm and involve the following

steps:

1. G as well as UF and UB are created and

initialized respectively to empty set

and empty sequence . At this step,

each step satisfying the initial

predicate is computed and then

checked whether it satisfies the

invariant predicate Invariant (that is a

state predicate to be satisfied by each

reachable state).

2. On satisfying the predicate Invariant, it

is appended to graph G with a pointer

to the null state and a label equal to the

set of hypotheses relative to the

current state. Otherwise, an error is

generated. If the state does not satisfy

the evidence predicate (i.e. a predicate characterized by

system terminal state that represent

digital evidence), it is attached to UF,

otherwise it is considered as terminal

state and append to UB which can be

retrieved in backward chaining phase

[19].

5.2 Forward Chaining UF

In this phase, all the scenarios that originate

from the set of initial system states are

inferred in forward chaining. This involves

the generation of new sets of hypotheses

and evidences that are consequent to these

scenarios. During this phase and until the

queue becomes empty, state is retrieved from the tail of UF and its successor states

are computed. For every successor state t

satisfying the predicate constraint (specified

to assert bound on the set of reachable

332

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 71: 2012 (Vol. 1, No. 4)

states), if the predicate Invariant is not

satisfied, an error is generated and the

algorithm terminates otherwise state is appended to G as follows:

1. If a node core tn does not exist in G, a

new node (set to tn) is appended to the

graph with a label equal to tc and a

predecessor equal to sn. State t is

appended to UB if satisfied

predicate , otherwise it is

attached to UF.

2. If there exists a node x in G that is

equal to tn and whose label includes tc,

then a conclusion could be made

stating that node t was added

previously to G. In that case, a pointer

is simply added from x to the

predecessor state sn.

3. If there exists a node x in G that is

equal to tn, but whose label does not

include tc, then the node label is

updated as follows:

tc is added to Label (G, x).

Any environment from Label (G, x), which is a superset of some other

elements on this label, is deleted to

ensure hypotheses minimality.

If tc is still in Label (G, t) then x is pointed to the predecessor state sn and

node t is appended to UB if it satisfies

predicate ateEvidenceSt .

Otherwise, it is attached to UF [19] The resulting graph is a set of scenarios that

end in any state satisfying the predicate

ateEvidenceSt and/or Constraint.

5.3 Backward Chaining Phase

All the scenarios that could produce states

satisfying predicate generated in forward chaining, are

constructed. During this phase and until the

queue becomes empty, the tail of UB,

described by state t, is retrieved and its

predecessor states (i.e. the set of states si

such that (si, t) satisfy action Next) which

are not terminal states and satisfy the

predicate Invariant (States that doesn’t

satisfy predicate Invariant are discarded

because this step aims simply to generate

additional explanations) and Constraint are

computed. Each computed state s is

appended to G as follows:

1. If sn is not in G, a new node (set to sn) is

appended to G with a label equal to the

environment sc. Then a pointer is added

from node tn to sn and state s is

appended to UB.

2. If there exists a node x in G that is equal

to sn, and whose label includes sc, then it

is stated that node s was been added

previously to G. In that case a pointer is

simply added from tn to the predecessor

state sn and s is appended to UB.

3. If there is x in G that is equal to Sn, but

whose label doesn't include sc, then

Label (G, t) is updated as follows:

sc is added to Label (G, x).

Any environment from Label (G, x) which is a superset of some other

elements in this label is deleted to

ensure hypotheses minimality.

If sc is still contained in the label of

state x then the node t is pointed to

the predecessor state x and the node

is appended to UB. The outcome of the three phases is a graph

G containing the set of possible causes

relative to the collected evidence. It

embodies different initial system states

apart from those described by the

specification [19].

6 CASE STUDY

To investigate VoIP malicious attack using

the proposed model, the following case

study on the reconstruction of spam over

Internet Telephony (SPIT) attack is

proposed, to investigate the denial of

service experienced by some of the VoIP

users as a result of VoIP spam. A direct

333

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 72: 2012 (Vol. 1, No. 4)

investigation shows that the network

bandwidth and other resources has been

exhausted by the server as it was busy

receiving and sending audio message

request to SIP URIs(Uniform Resource

Identifiers).

According to the VoIP evidence

reconstruction model, the first stage

emphasis on the identification of the

terminal state and the available evidence of

the attack.

6.1 Terminal State/Available Evidence

Exhausting of bandwidth and other

resource/sending an audio message request

to SIP URIs.

6.2 Information Gathering

This includes the following:

VoIP Components: these comprise both signalling and media

infrastructure. The former is based

on session initiation protocol (SIP)

in particular, that include: SIP

STACK (SS) (which is responsible

for sending and receiving,

manufacturing and parsing SIP

messages) and SIP addressing (SA)

(is based on the URI). The latter,

considered Real Transmission

Protocol (RT) (RTP stacks) which

code and decode, compress and

expand, and encapsulate and

demultiplex of media flows.

VoIP vulnerabilities: it can be as a result of the following:

a. Unchanged default passwords of

deployed VoIP platforms can be

strongly vulnerable to remote

brute force attack,

b. Many of the services that

exposes data also interact as web

services with VoIP system and

these are open to common

vulnerabilities such as cross-site

request forgeries and cross- site

scripting.

c. Many phones expose service that

allows administrators to gather

statistics, information and

remote configuration settings.

These ports open the door for

information disclosure that

attackers can use to gain more

insight to a network and identify

the VoIP phones.

d. Wrong configure access device

that broadcast messages enable

an attacker to sniff messages in

VoIP domain.

e. The initial version of SIP allows

plain text-based credentials to

pass through access device.

6.3 Evidence Generation

This stage involves the following:

Hypothesis formulation. Using the hypothesis that a VoIP running a service

on a default password can grant an

access to an intruder after a remote brute

force attack. A hypothesis stating that

service ports on VoIP phones expose

data, also interact as web services, an

intruder that have access to VoIP

service can exploit such vulnerability in

the form of cross-site scripting to have

an administrator access.

. Some phones expose a service that

allows administrators to gather statistics,

information and remote configuration, a

hypothesis stating that such phones can

grant an intruder with direct access to

administrative responsibility.

a. A hypothesis stating that there is a

wrong configured access device

which broadcast SIP messages. This

enables the attacker to intercept SIP

messages.

334

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 73: 2012 (Vol. 1, No. 4)

b. A hypothesis stating that the

messages are running on the initial

version of SIP, which has a

vulnerability that send a plain text

SIP message. The intruder that

intercepts the messages can extract

user information from the message.

c. An intruder who is equipped with

administrator function can create,

decode and send a request message

d. An intruder can extract SIP

extension/URIs by sending an

OPTION message request after

searching all ports running on 5060

in SIP domain, to send a SIP

message.

e. A hypothesis stating that the

credentials were encrypted with

cipher text requires an encryption

engine to enable the intruder to

digest SIP message header and

obtain other information.

Modelling of Attack Scenario: in

this case, we are to use STLA+

The specification describes the

available evidence with predicate

which uses the function request to state that the

machine is busy sending invite

audio messages.

In this segment we are to represent

hacking scenario fragment inform of

hypothetical action as described

below.

a. : There is a

Hypothesis stated that there is

vulnerability that VoIP running

service on a default password, an

intruder can easily brute force

and gain access and raise up his

privilege from no access( ) to access level ( )

on the VoIP network, by

performing brute force on

VoIP( ) default password.

b. : using the hypothesis

stating that the service ports on

VoIP has some vulnerabilities if

it is exploited can raise the

accessibility level of an attacker

from ( ) to

administrator access( ) by exploring service port

vulnerability ( . c. : A hypothesis stating

that some VoIP phones expose

service that allows

administrators to gather

information for remote

configuration. Such vulnerability

can grant a direct access from

( ) to an administrator

access ( ), if it’s exploited by exploring phone

vulnerability ( ). d. : hypothesis stating

that if there is wrong configured

access devices, which allow

messages to be broadcast a SIP

has vulnerabilities that send

messages with plain-text

credentials. If it’s exploited, an

intruder can intercept SIP

messages ( ) and eavesdrop.

e. : a user with

administrative access can

manufacture ( ), decode and encapsulate SIP

messages using SIP STACK

(SS).

f. the user requires SIP

extension or URIs to send an

invite messages, being equipped

with administrative access the

intruder sends OPTION message

request to extract SIP URIs (

) provided that the

335

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 74: 2012 (Vol. 1, No. 4)

service port is running on 5060

ports.

g. t: the intruder takes

advantage of vulnerability that

the device has an encryption

engine, it will enable him digest

the cipher text on SIP message

header field value to extract

other information related to SIP

message credentials.

h. : the intruder with

administrative access and

manufactured SIP message then

send an invite audio message

( ) to the server as a message request.

i. : the user then logout

from the VoIP domain.

The S-TLA+

attack scenario fragment

module is depicted in the figure below.

Figure 2. Generated attack scenario fragment using

S-TLA+

Testing Generated Scenario: given a set of a generated attack scenario, if any of

the scenarios satisfies the terminal state

of the system under investigation, then

digital evidence is generated and printed

otherwise the hypothesis is

reformulated. In the case study

presented above, an action

in the generated scenarios

satisfied the available evidence of the

terminal state of the system.

Print Generated evidence: To generate evidence from the attack scenario

fragment presented in Figure 2, we used

forward and backward chaining phases

as explained above. This has been

adopted from inferring scenarios with S-

TLC[19].

Figure 3. Forward chaining phase VoIP attack

scenario

The graph of Figure 3 shows the main

possible attack scenario on VoIP. Initially,

there is no user accessing the VoIP system.

The default password was not changed

during implementation of the system. An

336

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 75: 2012 (Vol. 1, No. 4)

intruder exploit this vulnerability by

performing an action and gained access to the VoIP Service and the

intruder further exploits vulnerability in the

service ports with an action and

gain and administrator access. Or exploit

VoIP phones vulnerability with an action

that grants access to administrative functions and obtain

Administrator access. The hacker can

intercept all the incoming messages into the

server by executing an action , as a result of exploiting a vulnerability in

which messages are sent as plain text based

on the initial version of SIP. With

administrative power, the intruder access

SIP URIs from the intercepted messages

after executing an action and

send an audio invite messages to the

collected URIs by performing an action

without any hypothesis been established in the last two actions.

Therefore the node labels remain the same

and then logout and leave evidences within

the system. The underlined texts in the

generated graph are the available evidence,

while others are new evidence generated

during an investigation.

The generated attack scenario stopped

inconsistency from occurring. The action

( ) is not part of the generated scenario as a result of contradicting with

action .

The generated graph after execution of

forward and backward chaining phase is

shown in Figure 4. It shows a new

generated scenario. It follows the same

pattern with the forward chaining phase, but

in this case the VoIP system is holding

information on received messages that are

not accessible to the intruder. The intruder

performs the same actions as in the forward

chaining phase and was granted an

administrator access. Thereafter, the

intruder manufactured a SIP invite

messages by executing an action

( ). The intruder access SIP URIs and send a SIP invite audio message

to the collected URIs by performing actions

and

respectively. No any hypotheses have been

established for these actions to be executed,

the intruder then logout from the system

after executing an action and leave digital evidence. The underlined texts in the

generated graph are the available evidence,

while other texts are new evidences

generated during reconstruction of attack

scenario.

Figure 4. Backward chaining phase, scenario attacks

on VoIP

7 CONCLUSIONS

In this paper, we proposed a model for

reconstructing Voice over IP (VoIP)

malicious attacks. This model generates

more specified evidences that match with

the existing evidence through the

reconstruction of potential attack scenario.

337

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 76: 2012 (Vol. 1, No. 4)

Consequently, it provides significant

information on what, where, why and how a

particular attack happens in VoIP System.

To harmonize our study, there is a need for

reconstruction of anonymous and Peer-to-

peer SIP malicious attacks.

REFERENCES

1. Yun-Sheng Yen, I-Long Lin, Bo-Lin Wu. A:

Study on the Mechanisms of VoIP attacks:

Analysis and digital Evidence. Journal of Digital

Investigation 8, 56–67 Science direct (2011).

2. Jaun C. Pelaez: Using Misuse Patterns for VoIP

Steganalysis. 20th International Workshop on

Database and Expert Systems

Application (2009).

3. Patric Park. Voice over IP Security. Cisco press

ISBN: 1587054698 (2009)

4. Hsien-Ming Hsu, Yeali S. Sun, Meng Chang

Chen. Collaborative Forensic Framework for

VoIP Services in Multi-network Environments.

In: Proc. 2008 IEEE International workshops on

intelligence and security informatics, pp. 260-

271 Springer-Verlag Berlin Heidelberg (2008)

5. Jill Slay and Mathew Simon: Voice over IP:

Privacy and Forensic Implication. International

Journal of Digital Crime and Forensics (IJDCF)

IGI Global (2009).

6. Palmer G. : A road map for digital forensic

research. In: First digital forensic research

workshop. DFRWS Technical Report New York

(2001).

7. Mark Reith, Clint Carr and Gregg Gunsch: An

Examination of Digital Forensic Models.

International Journal of Digital Evidence. Vol.

1Issue 3. Fall (2002)

8. Mandia K, Procise C.: Incident Response and

Computer Forensics. In: Emmanuel S. Pilli, R.C.

Joshi, Rajdeep Niyogi: Network Forensic

Frameworks: Survey and Research Challenges.

Digital Investigation pp.1-14, Elsevier(2010).

9. Casey E, Palmer G.: The investigative process.

In: Emmanuel S. Pilli, R.C. Joshi, Rajdeep Niyogi:

Network Forensic Frameworks: Survey and Research Challenges. Digital Investigation pp.1-14,

Elsevier(2010).

10. Barian Carrier, Eugene Spafford.: Getting

Physical with the Digital Investigation Process.

International Journal of Digital Evidence, Vol.2

Issue 2. Fall(2003).

11. Ciarduhain O.S.: An extended Model of

Cybercrime Investigation. International Journal

of Digital Evidence, Vol.3 Issue1.

Summer(2004).

12. Baryamureeba V. Tushabe F.: The Enhanced

Digital Investigation Process Model. In :

Proceedings of the fourth digital forensic

research workshop (DFRWS); (2004).

www.makerere.ac.ug/ics

13. Beebe NL, Clark JG: A Hierarchical,

Objectives-Based Framework For the Digital

Investigations Process. Digital Investigation

2(2) pp146-66. Elsevier(2005)

14. Ren W , Jin H. : Modeling the Network Forensic

Behavior. In: Security and Privacy for Emerging

Areas in Ccommunication Networks, 2005.

Workshop of the 1st International Conference

pp 1-8 IEEE(2005)

15. Emmanuel S. Pilli, R.C. Joshi, Rajdeep Niyogi: Network Forensic Frameworks: Survey and Research

Challenges. Digital Investigation pp.1-14,

Elsevier(2010).

16. Peter Stephenson.: Modeling of Post-incident

Root Cause Analysis. International Journal

of Digital Evidence 2, pp. 1-16 (2003).

17. Pavel Glydyshev and Ahmed Patel :Finite State

Machine Approach to Digital Event

Reconstructions, International Journal of Digital

Forensic & Incident, ACM pages 130-

149,(2004)

18. Brian D. Carrier and Eugene H. Spafford: An

Event-Based Digital Forensic Investigation

Framework. In: Proc. 2004 DFRWS 2004, pp.

1-12 (2004).

19. Slim Rekhis: Theoretical Aspects of Digital

Investigation of Security Incidents. PhD thesis,

Communication Network and Security (CN&S)

research Laboratory (2008).

20. Slim Rekhis and Noureddine Boudriga: Logic

Based approach for digital forensic

investigation in communication Networks.

Computers & Security pp 1-21, Elsevier (2011).

21. Slim Rekhis and Noureddine Boudriga: A

Formal Logic- Based Language and an

Automated Verification Tool for Computer

Forensic Investigation in communication

Networks. 2005 ACM symposium on Applied

Computing pp. 287-289 (2005)

22. Jaun C. Pelaez and Eduardo B Fernandez.

Network Forensic Models for Converged

Architectures. International Journal on

Advances in security, Vol 3 no 1 & 2 (2010).

23. Mohammed Ibrahim, Mohd Taufik Abdullah,

Ali Dehghantanha: VoIP Evidence Model : A

New Forensic Method For Investigating VoIP

Malicious Attacks. Cyber Security, Cyber

Warfare and Digital Forensic (CyberSec), IEEE

International Confence, Malaysia (2012).

338

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 77: 2012 (Vol. 1, No. 4)

24. F. Daryabar, A. Dehghantanha, HG. Broujerdi,

Investigation of Malware Defence and

Detection Techniques,” International Journal of

Digital Information and Wireless

Communications(IJDIWC), volume 1, issue 3,

pp. 645-650, 2012.

25. F. Daryabar, A. Dehghantanha, NI. Udzir,

“Investigation of bypassing malware defences

and malware detections,” Conference on

Information Assurance and Security (IAS), pp.

173-178, 2011.

26. M. Damshenas, A. Dehghantanha, R.

Mahmoud, S. Bin Shamsuddin, “Forensics

investigation challenges in cloud computing

environments,” Cyber Warfare and Digital

Forensics (CyberSec), pp. 190-194, 2012.

27. F. Daryabar, A. Dehghantanha, F. Norouzi, F

Mahmoodi, “Analysis of virtual honeynet and

VLAN-based virtual networks,” Science &

Engineering Research (SHUSER), pp.73-70,

2011.

28. S. H. Mohtasebi, A. Dehghantanha, “Defusing

the Hazards of Social Network Services,”

International Journal of Digital Information,

pp. 504-515, 2012.

29. A. Dehghantanha, R. Mahmod, N. I Udzir,

Z.A. Zulkarnain, “User-centered Privacy and

Trust Model in Cloud Computing Systems,”

Computer And Network Technology, pp. 326-

332, 2009.

30. A. Dehghantanha, “Xml-Based Privacy Model

in Pervasive Computing,” Master thesis-

University Putra Malaysia 2008.

31. C. Sagaran, A. Dehghantanha, R Ramli, “A

User-Centered Context-sensitive Privacy

Model in Pervasive Systems,” Communication

Software and Networks, pp. 78-82, 2010.

32. A. Dehghantanha, N. Udzir, R. Mahmod,

“Evaluating user-centered privacy model

(UPM) in pervasive computing systems,”

Computational Intelligence in Security for

Information Systems, pp. 272-284, 2011.

33. A. Dehghantanha, R. Mahmod, “UPM: User-

Centered Privacy Model in Pervasive

Computing Systems,” Future Computer and

Communication, pp. 65-70, 2009.

34. A.Aminnezhad,A.Dehghantanha,M.T.Abdullah

, “A Survey on Privacy Issues in Digital

Forensics,” International Journal of Cyber-

Security and Digital Forensics (IJCSDF)- Vol

1, Issue 4, pp. 311-323, 2013.

35. S. Parvez, A. Dehghantanha, HG. Broujerdi,

“Framework of digital forensics for the

Samsung Star Series phone,” Electronics

Computer Technology (ICECT), Volume 2, pp.

264-267, 2011.

36. S. H. Mohtasebi, A. Dehghantanha, H. G.

Broujerdi, “Smartphone Forensics: A Case

Study with Nokia E5-00 Mobile Phone,”

International Journal of Digital Information

and Wireless Communications

(IJDIWC),volume 1, issue 3, pp. 651-655,

2012.

37. FN. Dezfouli, A. Dehghantanha, R. Mahmoud

,”Volatile memory acquisition using backup for

forensic investigation,” Cyber Warfare and

Digital Foresnsic, pp. 186-189, 2012

38. Y. TzeTzuen, A. Dehghantanha, A. Seddon,

“Greening Digital Forensics: Opportunities and

Challenges,” Signal Processing and Information

Technology, pp. 114-119, 2012.

39. Mohammed Nassar, Radu State, Olivier

Festor: VoIP Malware: Attack Tool & Attack

Scenarios In: 2009 IEEE International

Conference on Communications (2009).

40. Mouna Jouini, Anis Ben Aissa, Latifa Ben

ArfaRabai, Ali Milli: Towards quantitative

measures of Information Security: A cloud

computing case Study” International Journal of

Cyber-Security and Digital Forensic (IJCSDF)

1(3):248-262. The society of Digital Information

and Wireless communications.(ISSN:2305-

0012)( 2012)

41. I-Long Lin, Yun-Sheng Yen: VoIP Digital

Evidence Standard Operating Procedure.

International Journal of Research and Reviews

in Computer Science 2, pp. 173 (2011).

42. Jill Slay and Mathew Simon: Voice over IP

forensics. In: e-Forensics 08 Proceedings of the

1st international conference on Forensic

applications and techniques in

telecommunications, information, and

multimedia workshop. Adelaide, Australia

(2008).

43. Siti Rahayu Selamat, Robiah Yusof, Shaharin

Sahib, Nor Hafeizah Hassan, Mohd Faizal

Abdollah, Zaheera Zainal Abidin. Traceability

in Digital Forensic Investigation Process. In:

2011 IEEE Conference on Open Systems, pp.

101-106 (2011).

44. Kara Nance Brian Hay, Matt Bishop. Digital

Forensic: Defining a Research Agenda Incident

Response. In: Proc. 42nd

Hawaii International

Conference on system science (2009).

45. Karen Kent Suzanne Chevaliar, Tim Grance,

Hung Dang. Integrating Forensic Techniques

into Incident Response. A white paper submitted

by Guidance Software Inc. UK (2006).

46. Tamjidyamcholo A, Dawoud R A.: Genetic

Agorithm for Risk Reduction of Information

Security. International Journal of Cyber-

Security and Digital Forensic(IJCSDF) 1(1):59-

339

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)

Page 78: 2012 (Vol. 1, No. 4)

66 (ISSN:2305-0012) the society of Digital

Information and wireless communications

(2012).

47. Jeroen Keppens and John Zeleznikow. “A

Model Based Approach for Generating Plausible

Crime Scenarios from Evidence. In: Proc. of the

9th International Conference on Artificial

intelligence and Law (2003).

340

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012)