International Journal of Computer Science Issues - CiteSeerX

IJCSIIJCSI

International Journal of

Computer Science Issues

Volume 7, Issue 4, No 7, July 2010 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814

© IJCSI PUBLICATION www.IJCSI.org

IJCSI proceedings are currently indexed by:

© IJCSI PUBLICATION 2010 www.IJCSI.org

IJCSI Publicity Board 2010 Dr. Borislav D Dimitrov Department of General Practice, Royal College of Surgeons in Ireland Dublin, Ireland Dr. Vishal Goyal Department of Computer Science, Punjabi University Patiala, India Mr. Nehinbe Joshua University of Essex Colchester, Essex, UK Mr. Vassilis Papataxiarhis Department of Informatics and Telecommunications National and Kapodistrian University of Athens, Athens, Greece

EDITORIAL In this fourth edition of 2010, we bring forward issues from various dynamic computer science areas ranging from system performance, computer vision, artificial intelligence, ontologies, software engineering, multimedia, pattern recognition, information retrieval, databases, security and networking among others. Considering the growing interest of academics worldwide to publish in IJCSI, we invite universities and institutions to partner with us to further encourage open-access publications. As always we thank all our reviewers for providing constructive comments on papers sent to them for review. This helps enormously in improving the quality of papers published in this issue. Apart from availability of the full-texts from the journal website, all published papers are deposited in open-access repositories to make access easier and ensure continuous availability of its proceedings. We are pleased to present IJCSI Volume 7, Issue 4, July 2010, split in nine numbers (IJCSI Vol. 7, Issue 4, No. 7). Out of the 179 paper submissions, 57 papers were retained for publication. The acceptance rate for this issue is 31.84%. We wish you a happy reading! IJCSI Editorial Board July 2010 Issue ISSN (Print): 1694-0814 ISSN (Online): 1694-0784 © IJCSI Publications www.IJCSI.org

IJCSI Editorial Board 2010 Dr Tristan Vanrullen Chief Editor LPL, Laboratoire Parole et Langage - CNRS - Aix en Provence, France LABRI, Laboratoire Bordelais de Recherche en Informatique - INRIA - Bordeaux, France LEEE, Laboratoire d'Esthétique et Expérimentations de l'Espace - Université d'Auvergne, France Dr Constantino Malagôn Associate Professor Nebrija University Spain Dr Lamia Fourati Chaari Associate Professor Multimedia and Informatics Higher Institute in SFAX Tunisia Dr Mokhtar Beldjehem Professor Sainte-Anne University Halifax, NS, Canada Dr Pascal Chatonnay Assistant Professor MaÎtre de Conférences Laboratoire d'Informatique de l'Université de Franche-Comté Université de Franche-Comté France Dr Karim Mohammed Rezaul Centre for Applied Internet Research (CAIR) Glyndwr University Wrexham, United Kingdom Dr Yee-Ming Chen Professor Department of Industrial Engineering and Management Yuan Ze University Taiwan

Dr Vishal Goyal Assistant Professor Department of Computer Science Punjabi University Patiala, India Dr Dalbir Singh Faculty of Information Science And Technology National University of Malaysia Malaysia Dr Natarajan Meghanathan Assistant Professor REU Program Director Department of Computer Science Jackson State University Jackson, USA Dr Deepak Laxmi Narasimha Department of Software Engineering, Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia Dr Navneet Agrawal Assistant Professor Department of ECE, College of Technology & Engineering, MPUAT, Udaipur 313001 Rajasthan, India Dr T. V. Prasad Professor Department of Computer Science and Engineering, Lingaya's University Faridabad, Haryana, India Prof N. Jaisankar Assistant Professor School of Computing Sciences, VIT University Vellore, Tamilnadu, India

IJCSI Reviewers Committee 2010 Mr. Markus Schatten, University of Zagreb, Faculty of Organization and Informatics, Croatia Mr. Vassilis Papataxiarhis, Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, Athens, Greece Dr Modestos Stavrakis, University of the Aegean, Greece Dr Fadi KHALIL, LAAS -- CNRS Laboratory, France Dr Dimitar Trajanov, Faculty of Electrical Engineering and Information technologies, ss. Cyril and Methodius Univesity - Skopje, Macedonia Dr Jinping Yuan, College of Information System and Management,National Univ. of Defense Tech., China Dr Alexis Lazanas, Ministry of Education, Greece Dr Stavroula Mougiakakou, University of Bern, ARTORG Center for Biomedical Engineering Research, Switzerland Dr Cyril de Runz, CReSTIC-SIC, IUT de Reims, University of Reims, France Mr. Pramodkumar P. Gupta, Dept of Bioinformatics, Dr D Y Patil University, India Dr Alireza Fereidunian, School of ECE, University of Tehran, Iran Mr. Fred Viezens, Otto-Von-Guericke-University Magdeburg, Germany Dr. Richard G. Bush, Lawrence Technological University, United States Dr. Ola Osunkoya, Information Security Architect, USA Mr. Kotsokostas N.Antonios, TEI Piraeus, Hellas Prof Steven Totosy de Zepetnek, U of Halle-Wittenberg & Purdue U & National Sun Yat-sen U, Germany, USA, Taiwan Mr. M Arif Siddiqui, Najran University, Saudi Arabia Ms. Ilknur Icke, The Graduate Center, City University of New York, USA Prof Miroslav Baca, Faculty of Organization and Informatics, University of Zagreb, Croatia Dr. Elvia Ruiz Beltrán, Instituto Tecnológico de Aguascalientes, Mexico Mr. Moustafa Banbouk, Engineer du Telecom, UAE Mr. Kevin P. Monaghan, Wayne State University, Detroit, Michigan, USA Ms. Moira Stephens, University of Sydney, Australia Ms. Maryam Feily, National Advanced IPv6 Centre of Excellence (NAV6) , Universiti Sains Malaysia (USM), Malaysia Dr. Constantine YIALOURIS, Informatics Laboratory Agricultural University of Athens, Greece Mrs. Angeles Abella, U. de Montreal, Canada Dr. Patrizio Arrigo, CNR ISMAC, italy Mr. Anirban Mukhopadhyay, B.P.Poddar Institute of Management & Technology, India Mr. Dinesh Kumar, DAV Institute of Engineering & Technology, India Mr. Jorge L. Hernandez-Ardieta, INDRA SISTEMAS / University Carlos III of Madrid, Spain Mr. AliReza Shahrestani, University of Malaya (UM), National Advanced IPv6 Centre of Excellence (NAv6), Malaysia Mr. Blagoj Ristevski, Faculty of Administration and Information Systems Management - Bitola, Republic of Macedonia Mr. Mauricio Egidio Cantão, Department of Computer Science / University of São Paulo, Brazil Mr. Jules Ruis, Fractal Consultancy, The Netherlands

Mr. Mohammad Iftekhar Husain, University at Buffalo, USA Dr. Deepak Laxmi Narasimha, Department of Software Engineering, Faculty of Computer Science and Information Technology, University of Malaya, Malaysia Dr. Paola Di Maio, DMEM University of Strathclyde, UK Dr. Bhanu Pratap Singh, Institute of Instrumentation Engineering, Kurukshetra University Kurukshetra, India Mr. Sana Ullah, Inha University, South Korea Mr. Cornelis Pieter Pieters, Condast, The Netherlands Dr. Amogh Kavimandan, The MathWorks Inc., USA Dr. Zhinan Zhou, Samsung Telecommunications America, USA Mr. Alberto de Santos Sierra, Universidad Politécnica de Madrid, Spain Dr. Md. Atiqur Rahman Ahad, Department of Applied Physics, Electronics & Communication Engineering (APECE), University of Dhaka, Bangladesh Dr. Charalampos Bratsas, Lab of Medical Informatics, Medical Faculty, Aristotle University, Thessaloniki, Greece Ms. Alexia Dini Kounoudes, Cyprus University of Technology, Cyprus Mr. Anthony Gesase, University of Dar es salaam Computing Centre, Tanzania Dr. Jorge A. Ruiz-Vanoye, Universidad Juárez Autónoma de Tabasco, Mexico Dr. Alejandro Fuentes Penna, Universidad Popular Autónoma del Estado de Puebla, México Dr. Ocotlán Díaz-Parra, Universidad Juárez Autónoma de Tabasco, México Mrs. Nantia Iakovidou, Aristotle University of Thessaloniki, Greece Mr. Vinay Chopra, DAV Institute of Engineering & Technology, Jalandhar Ms. Carmen Lastres, Universidad Politécnica de Madrid - Centre for Smart Environments, Spain Dr. Sanja Lazarova-Molnar, United Arab Emirates University, UAE Mr. Srikrishna Nudurumati, Imaging & Printing Group R&D Hub, Hewlett-Packard, India Dr. Olivier Nocent, CReSTIC/SIC, University of Reims, France Mr. Burak Cizmeci, Isik University, Turkey Dr. Carlos Jaime Barrios Hernandez, LIG (Laboratory Of Informatics of Grenoble), France Mr. Md. Rabiul Islam, Rajshahi university of Engineering & Technology (RUET), Bangladesh Dr. LAKHOUA Mohamed Najeh, ISSAT - Laboratory of Analysis and Control of Systems, Tunisia Dr. Alessandro Lavacchi, Department of Chemistry - University of Firenze, Italy Mr. Mungwe, University of Oldenburg, Germany Mr. Somnath Tagore, Dr D Y Patil University, India Ms. Xueqin Wang, ATCS, USA Dr. Borislav D Dimitrov, Department of General Practice, Royal College of Surgeons in Ireland, Dublin, Ireland Dr. Fondjo Fotou Franklin, Langston University, USA Dr. Vishal Goyal, Department of Computer Science, Punjabi University, Patiala, India Mr. Thomas J. Clancy, ACM, United States Dr. Ahmed Nabih Zaki Rashed, Dr. in Electronic Engineering, Faculty of Electronic Engineering, menouf 32951, Electronics and Electrical Communication Engineering Department, Menoufia university, EGYPT, EGYPT Dr. Rushed Kanawati, LIPN, France Mr. Koteshwar Rao, K G Reddy College Of ENGG.&TECH,CHILKUR, RR DIST.,AP, India

Mr. M. Nagesh Kumar, Department of Electronics and Communication, J.S.S. research foundation, Mysore University, Mysore-6, India Dr. Ibrahim Noha, Grenoble Informatics Laboratory, France Mr. Muhammad Yasir Qadri, University of Essex, UK Mr. Annadurai .P, KMCPGS, Lawspet, Pondicherry, India, (Aff. Pondicherry Univeristy, India Mr. E Munivel , CEDTI (Govt. of India), India Dr. Chitra Ganesh Desai, University of Pune, India Mr. Syed, Analytical Services & Materials, Inc., USA Dr. Mashud Kabir, Department of Computer Science, University of Tuebingen, Germany Mrs. Payal N. Raj, Veer South Gujarat University, India Mrs. Priti Maheshwary, Maulana Azad National Institute of Technology, Bhopal, India Mr. Mahesh Goyani, S.P. University, India, India Mr. Vinay Verma, Defence Avionics Research Establishment, DRDO, India Dr. George A. Papakostas, Democritus University of Thrace, Greece Mr. Abhijit Sanjiv Kulkarni, DARE, DRDO, India Mr. Kavi Kumar Khedo, University of Mauritius, Mauritius Dr. B. Sivaselvan, Indian Institute of Information Technology, Design & Manufacturing, Kancheepuram, IIT Madras Campus, India Dr. Partha Pratim Bhattacharya, Greater Kolkata College of Engineering and Management, West Bengal University of Technology, India Mr. Manish Maheshwari, Makhanlal C University of Journalism & Communication, India Dr. Siddhartha Kumar Khaitan, Iowa State University, USA Dr. Mandhapati Raju, General Motors Inc, USA Dr. M.Iqbal Saripan, Universiti Putra Malaysia, Malaysia Mr. Ahmad Shukri Mohd Noor, University Malaysia Terengganu, Malaysia Mr. Selvakuberan K, TATA Consultancy Services, India Dr. Smita Rajpal, Institute of Technology and Management, Gurgaon, India Mr. Rakesh Kachroo, Tata Consultancy Services, India Mr. Raman Kumar, National Institute of Technology, Jalandhar, Punjab., India Mr. Nitesh Sureja, S.P.University, India Dr. M. Emre Celebi, Louisiana State University, Shreveport, USA Dr. Aung Kyaw Oo, Defence Services Academy, Myanmar Mr. Sanjay P. Patel, Sankalchand Patel College of Engineering, Visnagar, Gujarat, India Dr. Pascal Fallavollita, Queens University, Canada Mr. Jitendra Agrawal, Rajiv Gandhi Technological University, Bhopal, MP, India Mr. Ismael Rafael Ponce Medellín, Cenidet (Centro Nacional de Investigación y Desarrollo Tecnológico), Mexico Mr. Supheakmungkol SARIN, Waseda University, Japan Mr. Shoukat Ullah, Govt. Post Graduate College Bannu, Pakistan Dr. Vivian Augustine, Telecom Zimbabwe, Zimbabwe Mrs. Mutalli Vatila, Offshore Business Philipines, Philipines Dr. Emanuele Goldoni, University of Pavia, Dept. of Electronics, TLC & Networking Lab, Italy Mr. Pankaj Kumar, SAMA, India Dr. Himanshu Aggarwal, Punjabi University,Patiala, India Dr. Vauvert Guillaume, Europages, France

Prof Yee Ming Chen, Department of Industrial Engineering and Management, Yuan Ze University, Taiwan Dr. Constantino Malagón, Nebrija University, Spain Prof Kanwalvir Singh Dhindsa, B.B.S.B.Engg.College, Fatehgarh Sahib (Punjab), India Mr. Angkoon Phinyomark, Prince of Singkla University, Thailand Ms. Nital H. Mistry, Veer Narmad South Gujarat University, Surat, India Dr. M.R.Sumalatha, Anna University, India Mr. Somesh Kumar Dewangan, Disha Institute of Management and Technology, India Mr. Raman Maini, Punjabi University, Patiala(Punjab)-147002, India Dr. Abdelkader Outtagarts, Alcatel-Lucent Bell-Labs, France Prof Dr. Abdul Wahid, AKG Engg. College, Ghaziabad, India Mr. Prabu Mohandas, Anna University/Adhiyamaan College of Engineering, india Dr. Manish Kumar Jindal, Panjab University Regional Centre, Muktsar, India Prof Mydhili K Nair, M S Ramaiah Institute of Technnology, Bangalore, India Dr. C. Suresh Gnana Dhas, VelTech MultiTech Dr.Rangarajan Dr.Sagunthala Engineering College,Chennai,Tamilnadu, India Prof Akash Rajak, Krishna Institute of Engineering and Technology, Ghaziabad, India Mr. Ajay Kumar Shrivastava, Krishna Institute of Engineering & Technology, Ghaziabad, India Mr. Deo Prakash, SMVD University, Kakryal(J&K), India Dr. Vu Thanh Nguyen, University of Information Technology HoChiMinh City, VietNam Prof Deo Prakash, SMVD University (A Technical University open on I.I.T. Pattern) Kakryal (J&K), India Dr. Navneet Agrawal, Dept. of ECE, College of Technology & Engineering, MPUAT, Udaipur 313001 Rajasthan, India Mr. Sufal Das, Sikkim Manipal Institute of Technology, India Mr. Anil Kumar, Sikkim Manipal Institute of Technology, India Dr. B. Prasanalakshmi, King Saud University, Saudi Arabia. Dr. K D Verma, S.V. (P.G.) College, Aligarh, India Mr. Mohd Nazri Ismail, System and Networking Department, University of Kuala Lumpur (UniKL), Malaysia Dr. Nguyen Tuan Dang, University of Information Technology, Vietnam National University Ho Chi Minh city, Vietnam Dr. Abdul Aziz, University of Central Punjab, Pakistan Dr. P. Vasudeva Reddy, Andhra University, India Mrs. Savvas A. Chatzichristofis, Democritus University of Thrace, Greece Mr. Marcio Dorn, Federal University of Rio Grande do Sul - UFRGS Institute of Informatics, Brazil Mr. Luca Mazzola, University of Lugano, Switzerland Mr. Nadeem Mahmood, Department of Computer Science, University of Karachi, Pakistan Mr. Hafeez Ullah Amin, Kohat University of Science & Technology, Pakistan Dr. Professor Vikram Singh, Ch. Devi Lal University, Sirsa (Haryana), India Mr. M. Azath, Calicut/Mets School of Enginerring, India Dr. J. Hanumanthappa, DoS in CS, University of Mysore, India Dr. Shahanawaj Ahamad, Department of Computer Science, King Saud University, Saudi Arabia Dr. K. Duraiswamy, K. S. Rangasamy College of Technology, India Prof. Dr Mazlina Esa, Universiti Teknologi Malaysia, Malaysia

Dr. P. Vasant, Power Control Optimization (Global), Malaysia Dr. Taner Tuncer, Firat University, Turkey Dr. Norrozila Sulaiman, University Malaysia Pahang, Malaysia Prof. S K Gupta, BCET, Guradspur, India Dr. Latha Parameswaran, Amrita Vishwa Vidyapeetham, India Mr. M. Azath, Anna University, India Dr. P. Suresh Varma, Adikavi Nannaya University, India Prof. V. N. Kamalesh, JSS Academy of Technical Education, India Dr. D Gunaseelan, Ibri College of Technology, Oman Mr. Sanjay Kumar Anand, CDAC, India Mr. Akshat Verma, CDAC, India Mrs. Fazeela Tunnisa, Najran University, Kingdom of Saudi Arabia Mr. Hasan Asil, Islamic Azad University Tabriz Branch (Azarshahr), Iran Prof. Dr Sajal Kabiraj, Fr. C Rodrigues Institute of Management Studies (Affiliated to University of Mumbai, India), India Mr. Syed Fawad Mustafa, GAC Center, Shandong University, China Dr. Natarajan Meghanathan, Jackson State University, Jackson, MS, USA Prof. Selvakani Kandeeban, Francis Xavier Engineering College, India Mr. Tohid Sedghi, Urmia University, Iran Dr. S. Sasikumar, PSNA College of Engg and Tech, Dindigul, India Dr. Anupam Shukla, Indian Institute of Information Technology and Management Gwalior, India Mr. Rahul Kala, Indian Institute of Inforamtion Technology and Management Gwalior, India Dr. A V Nikolov, National University of Lesotho, Lesotho Mr. Kamal Sarkar, Department of Computer Science and Engineering, Jadavpur University, India Dr. Mokhled S. AlTarawneh, Computer Engineering Dept., Faculty of Engineering, Mutah University, Jordan, Jordan Prof. Sattar J Aboud, Iraqi Council of Representatives, Iraq-Baghdad Dr. Prasant Kumar Pattnaik, Department of CSE, KIST, India Dr. Mohammed Amoon, King Saud University, Saudi Arabia Dr. Tsvetanka Georgieva, Department of Information Technologies, St. Cyril and St. Methodius University of Veliko Tarnovo, Bulgaria Dr. Eva Volna, University of Ostrava, Czech Republic Mr. Ujjal Marjit, University of Kalyani, West-Bengal, India Dr. Prasant Kumar Pattnaik, KIST,Bhubaneswar,India, India Dr. Guezouri Mustapha, Department of Electronics, Faculty of Electrical Engineering, University of Science and Technology (USTO), Oran, Algeria Mr. Maniyar Shiraz Ahmed, Najran University, Najran, Saudi Arabia Dr. Sreedhar Reddy, JNTU, SSIETW, Hyderabad, India Mr. Bala Dhandayuthapani Veerasamy, Mekelle University, Ethiopa Mr. Arash Habibi Lashkari, University of Malaya (UM), Malaysia Mr. Rajesh Prasad, LDC Institute of Technical Studies, Allahabad, India Ms. Habib Izadkhah, Tabriz University, Iran Dr. Lokesh Kumar Sharma, Chhattisgarh Swami Vivekanand Technical University Bhilai, India Mr. Kuldeep Yadav, IIIT Delhi, India Dr. Naoufel Kraiem, Institut Superieur d'Informatique, Tunisia

Prof. Frank Ortmeier, Otto-von-Guericke-Universitaet Magdeburg, Germany Mr. Ashraf Aljammal, USM, Malaysia Mrs. Amandeep Kaur, Department of Computer Science, Punjabi University, Patiala, Punjab, India Mr. Babak Basharirad, University Technology of Malaysia, Malaysia Mr. Avinash singh, Kiet Ghaziabad, India Dr. Miguel Vargas-Lombardo, Technological University of Panama, Panama Dr. Tuncay Sevindik, Firat University, Turkey Ms. Pavai Kandavelu, Anna University Chennai, India Mr. Ravish Khichar, Global Institute of Technology, India Mr Aos Alaa Zaidan Ansaef, Multimedia University, Cyberjaya, Malaysia Dr. Awadhesh Kumar Sharma, Dept. of CSE, MMM Engg College, Gorakhpur-273010, UP, India Mr. Qasim Siddique, FUIEMS, Pakistan Dr. Le Hoang Thai, University of Science, Vietnam National University - Ho Chi Minh City, Vietnam Dr. Saravanan C, NIT, Durgapur, India Dr. Vijay Kumar Mago, DAV College, Jalandhar, India Dr. Do Van Nhon, University of Information Technology, Vietnam Mr. Georgios Kioumourtzis, University of Patras, Greece Mr. Amol D.Potgantwar, SITRC Nasik, India Mr. Lesedi Melton Masisi, Council for Scientific and Industrial Research, South Africa Dr. Karthik.S, Department of Computer Science & Engineering, SNS College of Technology, India Mr. Nafiz Imtiaz Bin Hamid, Department of Electrical and Electronic Engineering, Islamic University of Technology (IUT), Bangladesh Mr. Muhammad Imran Khan, Universiti Teknologi PETRONAS, Malaysia Dr. Abdul Kareem M. Radhi, Information Engineering - Nahrin University, Iraq Dr. Mohd Nazri Ismail, University of Kuala Lumpur, Malaysia Dr. Manuj Darbari, BBDNITM, Institute of Technology, A-649, Indira Nagar, Lucknow 226016, India Ms. Izerrouken, INP-IRIT, France Mr. Nitin Ashokrao Naik, Dept. of Computer Science, Yeshwant Mahavidyalaya, Nanded, India Mr. Nikhil Raj, National Institute of Technology, Kurukshetra, India Prof. Maher Ben Jemaa, National School of Engineers of Sfax, Tunisia Prof. Rajeshwar Singh, BRCM College of Engineering and Technology, Bahal Bhiwani, Haryana, India Mr. Gaurav Kumar, Department of Computer Applications, Chitkara Institute of Engineering and Technology, Rajpura, Punjab, India Mr. Ajeet Kumar Pandey, Indian Institute of Technology, Kharagpur, India Mr. Rajiv Phougat, IBM Corporation, USA Mrs. Aysha V, College of Applied Science Pattuvam affiliated with Kannur University, India Dr. Debotosh Bhattacharjee, Department of Computer Science and Engineering, Jadavpur University, Kolkata-700032, India Dr. Neelam Srivastava, Institute of engineering & Technology, Lucknow, India Prof. Sweta Verma, Galgotia's College of Engineering & Technology, Greater Noida, India Mr. Harminder Singh BIndra, MIMIT, INDIA Dr. Lokesh Kumar Sharma, Chhattisgarh Swami Vivekanand Technical University, Bhilai, India Mr. Tarun Kumar, U.P. Technical University/Radha Govinend Engg. College, India Mr. Tirthraj Rai, Jawahar Lal Nehru University, New Delhi, India

Mr. Akhilesh Tiwari, Madhav Institute of Technology & Science, India Mr. Dakshina Ranjan Kisku, Dr. B. C. Roy Engineering College, WBUT, India Ms. Anu Suneja, Maharshi Markandeshwar University, Mullana, Haryana, India Mr. Munish Kumar Jindal, Punjabi University Regional Centre, Jaito (Faridkot), India Dr. Ashraf Bany Mohammed, Management Information Systems Department, Faculty of Administrative and Financial Sciences, Petra University, Jordan Mrs. Jyoti Jain, R.G.P.V. Bhopal, India Dr. Lamia Chaari, SFAX University, Tunisia Mr. Akhter Raza Syed, Department of Computer Science, University of Karachi, Pakistan Prof. Khubaib Ahmed Qureshi, Information Technology Department, HIMS, Hamdard University, Pakistan Prof. Boubker Sbihi, Ecole des Sciences de L'Information, Morocco Dr. S. M. Riazul Islam, Inha University, South Korea Prof. Lokhande S.N., S.R.T.M.University, Nanded (MH), India Dr. Vijay H Mankar, Dept. of Electronics, Govt. Polytechnic, Nagpur, India Dr. M. Sreedhar Reddy, JNTU, Hyderabad, SSIETW, India Mr. Ojesanmi Olusegun, Ajayi Crowther University, Oyo, Nigeria Ms. Mamta Juneja, RBIEBT, PTU, India Dr. Ekta Walia Bhullar, Maharishi Markandeshwar University, Mullana Ambala (Haryana), India Prof. Chandra Mohan, John Bosco Engineering College, India Mr. Nitin A. Naik, Yeshwant Mahavidyalaya, Nanded, India Mr. Sunil Kashibarao Nayak, Bahirji Smarak Mahavidyalaya, Basmathnagar Dist-Hingoli., India Prof. Rakesh.L, Vijetha Institute of Technology, Bangalore, India Mr B. M. Patil, Indian Institute of Technology, Roorkee, Uttarakhand, India Mr. Thipendra Pal Singh, Sharda University, K.P. III, Greater Noida, Uttar Pradesh, India Prof. Chandra Mohan, John Bosco Engg College, India Mr. Hadi Saboohi, University of Malaya - Faculty of Computer Science and Information Technology, Malaysia Dr. R. Baskaran, Anna University, India Dr. Wichian Sittiprapaporn, Mahasarakham University College of Music, Thailand Mr. Lai Khin Wee, Universiti Teknologi Malaysia, Malaysia Dr. Kamaljit I. Lakhtaria, Atmiya Institute of Technology, India Mrs. Inderpreet Kaur, PTU, Jalandhar, India Mr. Iqbaldeep Kaur, PTU / RBIEBT, India Mrs. Vasudha Bahl, Maharaja Agrasen Institute of Technology, Delhi, India Prof. Vinay Uttamrao Kale, P.R.M. Institute of Technology & Research, Badnera, Amravati, Maharashtra, India Mr. Suhas J Manangi, Microsoft, India Ms. Anna Kuzio, Adam Mickiewicz University, School of English, Poland Dr. Debojyoti Mitra, Sir Padampat Singhania University, India Prof. Rachit Garg, Department of Computer Science, L K College, India Mrs. Manjula K A, Kannur University, India Mr. Rakesh Kumar, Indian Institute of Technology Roorkee, India

TABLE OF CONTENTS 1. Fundamental Frequency Estimation of Carnatic Music Songs Based on the Principle of Mutation Rajeswari Sridhar, Karthiga S and Geetha T V 2. Modified Uniform Triangular Array for Online Full Azimuthal Coverage via JADE-MUSIC Algorithm over MIMO-CDMA Channel Sami Ghnimi and Ali Gharsallah 3. An Efficient Software Engineering Ontology Tool for Knowledge Sharing Polala Niranjan Reddy and Kukatlapalli Pradeep Kumar 4. HLAODV – A Cross Layer Routing Protocol for Pervasive Heterogeneous Wireless Sensor Networks Based On Location Jasmine Norman and J. Paulraj Joseph 5. Frequent Pattern Mining Using Record Filter Approach D. N. Goswami, Anshu Chaturvedi and C. S. Raghuvanshi

Pg 1-10 Pg 11-18 Pg 19-27 Pg 28-37 Pg 38-43

IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 4, No 7, July 2010 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814

1

Fundamental Frequency Estimation of Carnatic Music Songs Based on the Principle of Mutation

Rajeswari Sridhar , Karthiga S and Geetha T V

Department of Computer Science and Engineering

Anna University, Chennai, India

Abstract

Fundamental frequency estimation is very essential in Carnatic music signal processing as it is the basic component that needs to be used to determine the melody string of the signal after estimating the other frequency components. In this work a new algorithm to estimate the fundamental frequency of Carnatic music songs and film songs based on Carnatic music is proposed and implemented. The algorithm is based on the biological mutation theory which is implemented using the characteristics of Carnatic music where the concept of neutral mutations is adopted. Hence, the principle used is that, the signal characteristics do not change if it is mutated with another signal having the same frequency components. For determination of the fundamental frequency the three features namely, MFCC, spectral flux, and centroid of the original are estimated. The mutating signal is derived in a similar manner musicians adjust their singing frequency range for a particular song. The pre-recorded 'S', 'P', 'S' is used for mutating the input signal at three positions namely, beginning, middle and end. Then the same set of features namely MFCC, spectral flux, and centroid are also extracted for the mutated signal. Then by comparing the features of the original signal with the mutated signal, the signal which matches closely with the features of the original signal in all the three positions is identified and the frequency corresponding to the lower 'S' of the signal which is used for mutating is identified as the fundamental frequency of the input signal. This algorithm was evaluated using the measures of Harmonic Error, Absolute difference between mean pitches and Absolute difference in standard deviation and it was observed that the proposed algorithm yielded a better result than the existing algorithms for estimating fundamental frequency, for the input considered. Keywords: Fundamental frequency, Music signal processing, Carnatic music.

1. Introduction

Fundamental frequency estimation of the audio signal is a classical problem in signal processing [1]. The estimation of fundamental frequency has been a research topic for many years both for speech and music signal processing. Fundamental frequency is the physical term for pitch [2]. Pitch is defined as the perceptual attribute of sound which

is the frequency of a sine wave that is matched to the target sound in a psychophysical experiment. Fundamental frequency is essential in speech signal processing for determining the speaker in Speaker Verification or Recognition systems. The estimation of fundamental frequency is essential in music signal processing in order to determine pitch pattern, range of pitch frequencies, music transcription, and music representation systems [1] [3]. In Carnatic music, estimating the fundamental frequency is essential and it is the foundation for determining the important characteristic of carnatic music – Raga [2]. In Carnatic music, the concept of fundamental frequency is quite different than that mentioned in speech and western music. In Carnatic music fundamental frequency refers to frequency of the middle octave ‘S’ which is synonymous to the note C in a keyboard in Western music. Therefore in our work, we have proposed an algorithm for the identification of fundamental frequency of Carnatic music songs based on the biological theory of mutation. The characteristics of Carnatic music have been adopted for the implementation of this mutation based algorithm. This paper is organized as follows: Section 2 talks about some existing work in fundamental frequency estimation, Section 3 discusses about Carnatic music characteristics, Section 4 on the proposed mutation based algorithm, Section 5 on the Experimental setup and results analysis and finally Section 6 concludes the paper. 2. Existing Work Fundamental frequency is defined as the lowest frequency at which a system vibrates freely and hence requires determination of the lowest frequency component from the input signal. Fundamental frequency is also defined as the reciprocal of the time period between the two lowest peak points of a given signal and hence it can also be determined by looking at the time domain representation of the signal to yield the successive lowest peak points. The signal based features that are used for the

IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 4, No 7, July 2010 www.IJCSI.org

2

determination of fundamental frequency can be classified as Time Domain features, Spectral features, Cepstral features and features that are motivated based on auditory theory. Many of the algorithms that are available for fundamental frequency estimation of speech and music are based on the estimation of frequency domain features and auditory motivated features. In the algorithm developed by Arturro Camachho and John Harris [1] the authors have estimated the fundamental frequency of speech and music signal based on spectral comparisons. The average peak to valley distance of the frequency representation of the signal is estimated at harmonic locations. This value is computed at several regions of the input signal and the distance is estimated between successive average peak to valley value. Then fundamental frequency value is determined as the least distance of the average peak to valley values. This work was implemented as a combined work for speech and music. The time complexity of this algorithm is very high in the worst case situation since the distance measure needs to be computed between successive segments for all possible combinations in the input signal. Another algorithm was developed by Alain de Cheveigne and Hideki Kawahara [3] which is also a generalized algorithm for speech and music. It is based on the well-known auto-correlation method which is in turn based on the model of auditory processing. The steps involved include determining of auto-correlation value, then correcting the errors in the computed value by computing the difference function between the auto-correlation values, normalizing the value of the difference function by estimating the mean value, and iterating this correlation value to determine the fundamental frequency of the input signal. The time taken to correct the errors is very high as it is a generic algorithm for speech and music. Another algorithm developed by Robert C Maher and James Beauchamp [4], uses a two way mismatch procedure for estimating the fundamental frequency of music signals. This algorithm is based on estimating the quasi-harmonics value which requires computing the inverse square root of the fluctuating matrix and identifying the lowest value. In this algorithm, fundamental frequency is determined by computing this quasi-harmonic value for short-time spectra of the input signal. The same value is determined in the neighbouring spectra and then the fundamental frequency is estimated as the least value of the sample input segment considered. Many algorithms have also been developed for estimating multiple fundamental frequencies available in an input signal [5] [6]. In the algorithm developed by Chunghsin Yeh, Axel Robel and Xavier Rodet [5] a quasi-harmonic model is developed to determine the components of harmonicity, spectral smoothness. After determining the components, a score value is assigned for the computed harmonicity value and

spectral smoothness and based on the score value the fundamental frequency is estimated. The algorithm developed by A.P. Kalpuri [6], is also based on harmonicity, spectral smoothness and synchronous amplitude evolution within a single source for determining the fundamental frequency. The authors have implemented an iterative approach where the fundamental frequency of the most prominent sound is computed and it is subtracted from the mixture and this process of computation and subtraction is iterated to determine the fundamental frequency of the signal. The authors have used spectral components like the spectral envelope and spectral smoothness. In another algorithm developed by Boris Doval and Xavier Rodet [7], the fundamental frequency of audio signal is estimated based on the evolution of the signal by assigning a probabilistic value to the pseudo-periodic signal. This algorithm developed a HMM based on the estimated spectral features to identify the fundamental frequency of the signal and hence it requires lot of training to determine the evolution of the signal. In another work proposed by Yoshifumi et al [8] the authors have identified peak values of amplitude in each segment of the frequency domain. After identifying the peak frequency based amplitude this value is represented as a sequence of pulses and auto-correlation function is applied to determine the pitch of a speech signal. This algorithm was applied only for speech signal. All the algorithms that were developed was targeted for Western music and Speech in general and hence has used the signal characteristics of music and the characteristics of human speech for determination of the fundamental frequency. All these algorithms are also based on the fact that the lowest frequency available in the input signal is the fundamental frequency. In addition, these algorithms have high computational complexity. Since these algorithms are for determining the lowest frequency component, these algorithms are not suited for Carnatic music signal processing because of the characteristics of Carnatic music. All the algorithms proposed for Western music are under the assumption that the lowest frequency component in the input signal is the fundamental frequency but however this concept cannot be used for Carnatic music signal processing because in Carnatic music a singer can sing in two octaves that range from the mid-value of the lower octave till the mid-value of the higher octave including the middle octave [2]. Therefore the lowest frequency will not necessarily correspond to the middle octave S and hence the lowest frequency cannot be assumed as the fundamental frequency for Carnatic music signal processing. The algorithm proposed by A.P. Kalpuri [6], motivated us to use a spectral comparison based approach whereby we were motivated to move to biological theory of mutation to implement the spectral comparison algorithm. In the


3

algorithm implemented by [6] the authors have used spectral smoothness and harmonicity as features and they have used spectral comparisons between the segments of the same file. In our algorithm we determine features like spectral flux, centroid and MFCC and compare the input signal’s features with that of the mutated input signal. For mutating the input signal the octave interval characteristics of Carnatic music is used. 3. Carnatic Music Characteristics The algorithm for fundamental frequency has been implemented based on the characteristics of Carnatic music and hence it is required to explore some of the basic characteristics of Carnatic music. Carnatic music and Hindustani music are traditional Indian music systems and are very different from the traditional Western system of music. Carnatic music system is a just tempered system of music compared to the even tempered system of Western music. Just tempered system of music gives the singer the flexibility to start a particular song at any frequency as the fundamental frequency. An additional important difference is that in Carnatic music an octave has 22 intervals as against the 12 intervals of an octave system in Western music and Hindustani music [2]. The fundamental frequency normally refers to the frequency of the middle octave ‘S’ which is at a frequency of 240 Hz. This S corresponds to the C in Western music. In Carnatic music the singer normally starts at a frequency higher than 240 Hz and refers to this starting frequency as the ‘S’. In addition, a Carnatic music song is sung in two octaves. The two octaves refer to the second half of the lower octave, the full of the middle octave and the first half of the higher octave. Hence, in order to span a frequency range of two octaves it is very important that the singer chooses the fundamental frequency with necessary caution. Hence, the frequency range of singing depends on the fundamental frequency ‘f’ and it ranges from ‘3f/4’ to ‘3f’ thus ranging over two octaves. The need for fundamental frequency determination of Carnatic music arises because it is necessary to identify the Raga of Carnatic music piece. In Carnatic music, a Raga is defined as the sequential arrangement of the swaras. There are essentially seven swaras in Carnatic music called S, R, G, M, P, D, N which is synonymous to C, D, E, F, G, A, B in the Western music. The ascending order of the arrangement of swaras is called Arohanam and the descending order of the arrangement of swaras is called Avarohanam as given in [2]. The Raga can be classified into Parent Raga and Child Raga. A parent raga is one in which all the seven swaras are available in the Arohanam and Avarohanam. A parent raga is created by choosing all the seven swaras S, R, G, M, P, D, N. This results in a combination of 72 thereby resulting in 72

parent Ragas [2] [9] [10]. Therefore in order to determine the raga of a particular Carnatic music song it is mandatory to know the swaras available in the song which in turn is very much dependent on the fundamental frequency. Therefore depending on the starting frequency which is referred to as the frequency of the middle octave ‘S’ the other frequencies would slide depending on a ratio given in Table 1. Hence it becomes very essential to determine the fundamental frequency of the song to determine the various swara patterns.

Table1: Swara and their Ratio with the middle octave S Swara Ratio Swara Ratio S 1 M2 27/20 R1 32/31 P 3/2 R2 16/15 D1 128/81 R3 10/9 D2 8/5 G1 32/27 D3 5/3 G2 6/5 N1 16/9 G3 5/4 N2 9/5 M1 4/3 N3 15/8

One more concept in Carnatic music is that the R, G, D, N can take three frequencies, M can take two frequencies and P can take only one and hence a singer can choose any one of these smaller differences in frequencies also as a fundamental frequency for singing. In addition, the frequency of the Carnatic music swaras is not discrete but is continuous. Therefore the frequency range between 240 Hz and 256.4 Hz is identified as R1, between 256.4 and 260 is termed as R2 and so on. Hence in Carnatic music the singer can choose any one of these smaller swaras also as fundamental frequency for a particular song thus resulting in a choice of one of the 22 available frequency components in an octave as the fundamental frequency. To identify the fundamental frequency of the input song it is sufficient to identify the first few seconds of the input song as this segment has the Kalpana Swara. Kalpana Swara is the prefix tune that is sung before the beginning of the song as an aalapana or along with the Pallavi of the song [2]. Kalpana Swara is the ornamentation that is rendered to the song to emphasize the raga of the song. Therefore the Raga of the song is conveyed by the time the singer finishes the Pallavi of the song where Pallavi refers to the beginning two lines of a song [2]. In addition, Kalpana Swaras should be sung according to the fundamental frequency of the chosen song.[2]. Hence it is sufficient to consider the first few seconds of the song to identify the fundamental frequency of the input song. Therefore in our algorithm it was ensured that in the first 30 seconds of duration three or four samples of 5 seconds duration from the input song are taken for determining the fundamental frequency according to our mutation based algorithm.


4

4. Fundamental Frequency estimation 4.1 Basic idea of the algorithm – Mutation The concept of mutation is a well known methodology used in many computer applications and in particular for signal processing applications [11] [12] [13]. Mutation is a phenomenon which is normally identified in a DNA molecule as a change in the DNA’s sequence which is due to radiation, viruses or exposing a body to a different environment or surrounding [14]. The process of mutation which can influence the change of DNA sequence could result in an abnormality in the exposed cell. Some mutations are harmful and others are beneficial. In addition, we have a concept called as neutral mutation which does not have any effect be it beneficial or harmful but however just changes the DNA’s sequence without affecting the overall structure of the DNA. The DNA is exposed to changes but this change is not causing any impact because the changed pattern is in such a way that it is one of the various combination of the existing DNA’s sequence itself. In the work done by Cristian et al [11], the authors have utilized the concept of mutation to perform genetic algorithm coding to design IIR filters. The authors have utilized the mutation operators like uniform mutation and non-uniform mutation that would select a gene from the available gene pool. After creating a gene pool, Principal Component Analysis is performed on the created pool set which is also based on the concept of mutation, “mutation tends to homogenize the components to avoid having few principal components and neglecting the others”. Using the determined code values IIR filters were designed where the coefficients of IIR filters are determined using the proposed mutation technique by the authors. The authors claimed that the results of the IIR filters were better than the Newton based strategy. In the work done by David Lu [12], the author has utilized the mutation strategy to decide the notes to be used for transcribing a piece of music. Here the authors create a gene pool of possible transcriptions for a particular piece of music and then use mutation theory that would assign a fitness value to determine the exact transcription against the possibilities of all the available transcriptions. The author has used the mutation theory of irradiate, nudge, lengthen, split, reclassify and assimilate to determine the transcription sequences. In another algorithm done by Gustavo Reis et al [13], genetic algorithms were used for music transcription. This algorithm is similar to the one proposed by [12]. This is another algorithm in which the gene pool is iterated to determine the transcription sequence.

These algorithms for music transcription and IIR filter design motivated us to move towards mutation theory to check for the possibility of determining the fundamental frequency. In our algorithm we exploit the feature of neutral mutation to determine the fundamental frequency of the signal. The signal’s frequency components are similar to the DNA’s sequence. In the event of neutral mutation, the structure of DNA sequence is retained. Similarly, in our algorithm if the mutating signal is made to imbibe into the input signal, the mutated signal’s frequency characteristics will be the same as the original input signal’s frequency characteristics then the mutated signal and input signal would have the same set of frequency components. After mutating the signal if the signal characteristics are identical to the original signal then the fundamental frequency of the original signal is the same as the fundamental frequency of the mutating signal. 4.2 Algorithm and System Architecture The pseudo code of the basic algorithm is given below. Algorithm_Mutation_FundamentalFrequency(Input Signal, Local Oscillator) Feature Extraction (Signal .wav) Features = Extract MFCC, Spectral Flux, Spectral centroid Q1 = Q2 = Q3 = ∞ I = 1 While (Local Oscillator exists) { Mutate the original signal at beginning, middle and end with the ith SPS` from the local oscillator database FundamentalFrequency = Frequency of ith ‘S’ ModFeatures1 = Feature Extraction(Input signal Mutated

at beginning) ModFeatures2= Feature Extraction(Input Signal Mutated

at middle) ModFeatures3 = Feature Extraction (Input Signal Mutated

at end) Leastval1 = Compare ModFeatures1 with Features Leastval2 = Compare ModFeatures2 with Features Leastval3 = Compare ModFeatures3 with Features If (Leastval1 < Q1 & Leastval2 < Q2 & Leastval3 < Q3)

Then Q1 = Leastval1 Q2 = Leastval2 Q3 = Leastval3 FundamentalFrequency = Frequency of the ith S

I = I+1 } The proposed system architecture is shown in Figure 1.


5

Figure 1 Architecture diagram

The various components of the architecture are the local oscillator database, the feature extractor and the comparison block. The local oscillator consists of the pre-recorded signals of ‘S’’P’’S`’ of all the 22 intervals of Carnatic music. The feature extractor is an algorithm that extracts the features like MFCC, Spectral flux and Spectral centroid. The comparison algorithm computes the distance between the extracted features of the original signal and the modified signal. The proposed algorithm is a constant time algorithm as the number of iterations this algorithm has to be evaluated is predetermined which is equal to the number of intervals of the octave of Carnatic music and is 22.

The components of the system architecture are explained in detailed below. 4.2.1 Local Oscillator Database The local oscillator database is one in which the pre-recorded ‘S’’P’’S’ of all the intervals of Carnatic music are stored. In Carnatic music there are 22 intervals for an octave scale [2]. A Carnatic singer can start tuning their song to any one of these 22 intervals. This choice of one of the 22 intervals depends on the song that is being sung and also on the fundamental frequency of the singer. If the song has a frequency range in the lower octave than in the higher octave then this song will have a better rendering if it is being sung by a person having lower fundamental frequency and vice-versa. Hence it is obvious that singers whose fundamental frequency is not high do not choose songs which require singing in the higher octave rather than the middle octave. Also, singers whose fundamental frequency is high do not choose songs which require singing in the lower octave. Therefore in order to sing for two octaves the singer cannot choose to start singing at 400 Hz if the singer’s fundamental frequency is nearly 300 Hz. Therefore, we have recorded samples of SPS starting from 240 Hz till the next octave 480 Hz. This signal is called as the mutating signal which will be imbibed into the original signal for computing the features. The process of mutating is done to the original signal in three positions. The three positions that are chosen are middle, beginning and end. The necessity of three positions arises due to the fact that the characteristics of ‘S’’P’’S’ can occur at the beginning, end or in the middle. The three mutated signals are given individually to the feature extraction to compute the features. These 3 set of features are later compared with the features of the original signal. 4.2.2 Feature Extraction There are two feature extraction blocks in the figure. Both the blocks essentially extract the same set of features. The first feature extraction block extracts the features from the input signal. The other feature extraction block extracts the features from the mutated signal. The signal is mutated at the beginning, end and middle of the input signal, with the first SPS from the local oscillator database as explained in the algorithm explained above, thereby generating three modulated signals. Then we extract a set of features from all the modulated signals. Then the features of the mutated signal are compared with the features of the original signal. The following features are extracted. Mel Frequency Cepstral Coefficients (MFCC)

Feature Extraction (MFCC, Spectral Centroid, Spectral Flux

Similarity Check of the original signal with all the modulated components

I N P U T

Modulated with all possible signals at the Beginning

Modulated with all possible signals at the end End

Modulated with all the possible signals at the middle

Mixer – Local Oscillator having SPS signal of all the intervals of Carnatic music

Feature Extraction of the modulated signals at all positions (MFCC, Spectral Centroid, Spectral Flux

Least value for all the three cases – Frequency of S is output


6

MFCC are based on discrete cosine transform (DCT) [15]. These coefficients are defined as the log power of the amplitude after modifying the given spectrum to a cepstrum. The mel spaced filter banks are modeled based on the perception of hearing and hence the filter banks are linear till 1000 Hz and logarithmic for the frequencies above 1000 Hz. The mel frequency cepstral coefficients are defined as: where K is the number of the subbands and L is the desired length of the cepstrum. The value of L is very small compared to K for the dimension reduction purpose and that of k is also less than K which are the filter bank energy after passing through the kth triangular band-pass filter. The input signal is converted to a frequency scale using FFT and then converted to frequency scale. After determining the mel-frequency scale, frequency bands are decided using the Mel-frequency scale and S is given by

Spectral Centroid

Spectral centroid is defined as the median of the spectrum [15]. In order to determine spectral centroid, we divide the frequency band (i.e 0 to Fs/2, where F, is the sampling frequency in Hz) into a fixed number of subbands and compute the centroid for each subband using the power spectrum of the music signal.

where P (f) is the power spectrum and γ is a constant controlling the dynamic range of the power spectrum. By setting γ < 1, the dynamic rage of the power spectrum can be reduced.

Spectral flux

Spectral flux is a measure of how quickly the power spectrum of a signal changes [15]. It is calculated by comparing the power spectrum of one frame against the power spectrum of its previous frame. More precisely, it is usually calculated as the 2-norm (also known as the Euclidean distance) between the two normalized spectra. The spectral flux that is calculated in this manner is not dependent upon overall power or on phase considerations (since only the magnitudes are compared). After estimating the features the input signal is now ready for comparison with the modified signal for determining the fundamental frequency of the input signal

4.2.3. Comparison Check The features are estimated and the algorithm as explained in the pseudo-code above is executed. Initially the set of features MFCC, Spectral flux and Centroid are determined from the input signal and the mutated signal. The Euclidean distance value between the original signal and that of the mutated signal is determined first with MFCC feature. If the algorithm is not able to determine the fundamental frequency of the input signal since more than one ‘S’’P’’S’ has the least distance value or the distance computation is not the least for all the three positions, then the spectral flux and centroid are used as features for determining the Euclidean distance between the original signal and the mutated signal to identify the fundamental frequency of the input signal. 5. Experimental Set up and Result Analysis The input signal is sampled at 44.1 KHz. In our case, we estimated the fundamental frequency of nearly 100 songs and validated against the typical fundamental frequency of the singers singing range and also validated the computed result with that determined by musicologists. Tamil film music songs and Classical Carnatic music songs sung by singers like Balamuralikrishna, Ilayaraja, M.S. Subbalakshmi and Nithyasree were chosen for the purpose. Two or three samples of nearly 5 seconds from each song were chosen. The input signal was made to go through the feature extractor to extract the features like MFCC, Spectral flux and Spectral centroid. The input signal was then mutated with the local oscillator database as explained in the algorithm at the beginning, middle and end. The mutation is done at three positions because of the fact that the input signal considered is of very small duration. S, P, S can appear anywhere within the 5 seconds duration signal and hence when we consider only one position of the input signal for mutation, the frequencies S, P, S, of the mutating signal may not coexist with that of the input signal. After mutating the signal, we determine the features and run a Euclidean distance based comparison algorithm to determine the distance between the features of the original signal and the mutated signals at the three positions for all values of S, P, S from the local oscillator database. The results are tabulated and the various fundamental frequency values observed for the singers for different songs are plotted.

Lnk

knSk

Cnk

kk ,..,1),)

2

1(cos((log

2

1

dffpfw

dffpffw

Cmhm

lm

m

hm

lm

m

)()(

)()(

γ

γ

|||||| 1 ii XXFlux

KksS k 0,'


7

Figure 2 The input signals various features and the mutated signal’s MFCC values for the SPS that has matched for Balamuralikrishna

Figure 3 The input signals various features and the mutated signal’s MFCC values and flux for the SPS that has matched for Ilayaraja

The Figure 2 shows the features extracted from the input signal and the corresponding mutated signals for the MFCC feature for the singer Balamuralikrishna. In this figure when MFCC coefficients were determined for the mutated signal and the original signal, the distance measure gave a least value and hence determination of fundamental frequency was done using this measure. In Figure 3, the MFCC coefficients for Illayaraja’s input were calculated and based on which a conclusion could not be reached regarding the fundamental frequency. Therefore, we took another feature, spectral flux to determine the fundamental frequency.

Figure 4 Fundamental Frequency Comparison of Dr. M. Balamuralikrishna

Figure 5 Fundamental Frequency Comparison of Dr. Ilayaraja

Figure 6 Fundamental Frequency Comparison of Dr. M. S. Subbulakshmi

Figure 7 Fundamental Frequency Comparison of Ms. Nithyasree Mahadevan

The fundamental frequencies of different singers were determined pertaining to the different songs sung by them. The results we got were also in the range between 320 and 400 Hz. The results that were computed with the algorithm were given to Musicologists to identify the fundamental frequency of the individual songs. The same set of songs was also tested using the YIN algorithm as given in [3]. The various results for the different songs as determined by the musicologist, computed by the mutation algorithm and that of the YIN algorithm is plotted in Figure 4 to Figure 7. As can be seen from the figure, there was a


8

difference of 5 to 10 Hz at most between the one that is identified by the musicologists and the one that is determined by the algorithm. The normal singing frequency range of Nithyasree Mahadevan is nearly 400 Hz as suggested by Musicologists. The same observations were made for Dr. Ilayaraja, Dr. M. Balamuralikrishna and Dr. M. S. Subbulakshmi whose normal fundamental frequency of singing is 240 Hz, 320 Hz, and 330 Hz respectively. The sudden peaks that were observed in their fundamental frequencies are due to the fact that the sample chosen started at the higher octave rather than at the middle octave. Another sample in the same song yielded a different fundamental frequency. This ambiguity was later resolved by computing the distance measures on multiple samples and the sample value which gave the least difference in distance for a given ‘S’’P’’S’ is identified as the fundamental frequency. However when the results were compared between the output of the YIN algorithm and that of the mutation algorithm it was observed that YIN algorithm’s output always determined the lowest frequency component available in the input or the harmonics value of the lowest frequency of the input song. This argument was the one which was put forward to justify a separate algorithm for Carnatic music fundamental frequency determination. The output of the YIN algorithm actually corresponds to the lower octave D, N as against the middle octave S. 5.1 Algorithm Evaluation The Mutation based algorithm and the YIN algorithm were implemented to determine the fundamental frequency of four singers and the algorithm was evaluated for the following parameters based on the evaluation suggested in [16] by Bojan Kotnik et al. The authors have used the already proposed parameters by Joseph Martino et al [17] and Ying et al for parameters like gross error high, gross error low, voiced errors, unvoiced errors, average mean difference in pitch and average difference in standard deviation. All these parameters estimated the percentage in difference between the actual frequency and the computed frequency by considering the speech signal as a voiced and unvoiced signals. Another algorithm proposed by [18] estimated precision and recall and also F-measure for evaluating the fundamental frequency. All these measure gave an estimate of identifying the correct frequency against a wrong frequency as fundamental. This motivated us to introduce a new evaluation parameter based on the observed results. We term this parameter as harmonic frequency estimation error. Some of the algorithms that are already available for speech and western music mostly gave the harmonic of the lowest frequency and hence we used this as one more parameter for evaluation. In addition we also used parameters like average mean difference in

pitch and average difference in standard deviation. The parameters are discussed below 5.1.1 Harmonic frequency Estimation Error (HE) In Western music or Speech in general the lowest frequency component is termed as the fundamental frequency or pitch. However in Carnatic music, the lowest frequency component is not the fundamental frequency as already explained. When comparison of the algorithms of YIN and mutation based were made, it was observed that YIN determined the Harmonic of the lowest frequency in more number of situations than the mutation based algorithm. This is because of the voiced and unvoiced components being present in the input music piece. When the fundamental frequency is available in the unvoiced component segment this frequency is skipped and the algorithm identified the harmonic of the fundamental frequency [16]. The Harmonic frequency estimation error is defined as the ratio of harmonic of the fundamental estimated as against the determination of the fundamental frequency. The determination of the harmonic error is important for Carnatic music signal processing since the determination of fundamental frequency is important for identification of swara pattern thereby leading to Raga identification. If the harmonic is identified as the fundamental frequency then it would result in the wrong swara pattern. In addition, the fundamental frequency indicates the singing range of the singer. For example, if the harmonic frequency is 500 Hz as against the fundamental 250 Hz then it indicates the singing range from 250 Hz to 1500 Hz as against 125 to 750 Hz. Therefore determination of this error is essential in determining the singing range of the singer and help in correct Raga identification. The harmonic error is estimated for the four singers based on the fundamental frequency identified which is listed in the Figures 4 to Figures 7.The results are tabulated and the performance chart is given in Figure 8. As can be seen in the figure the mutation based algorithm and the one determined by musicologists have a low error rate in estimating the harmonic frequency as against the actual fundamental frequency when compared to YIN. The problem with the singer Ilayaraja is the singing range is at a low fundamental frequency between 200 Hz to 240 Hz, hence the mutation algorithm also identified the harmonic of the fundamental frequency. As can be observed in the YIN algorithm where the lowest frequency is termed as the fundamental frequency, the probability of the harmonic frequency being identified as the fundamental frequency as against the actual fundamental frequency is high.


9

Figure 8 Harmonic Frequency Estimation Error

5.1.2 Absolute difference between mean values (ABDM) The absolute difference (in Hz) between the mean values (ABDM) of the reference fundamental frequency which is the normal singing range of the singers and the actual estimated fundamental freqeuency is estimated as given in [16] as ABDM[Hz] =

abs{MeanRefPitch[Hz] – MeanEstPitch[Hz] }. The average fundamental frequency as estimated by the mutation algorithm, YIN algorithm and musicologists were determined and the reference pitch was chosen as 400Hz, 320 Hz, 300Hz, 250 Hz for Nithyasree, M.S. Subbulakshmi, Balamuralikrishna and Ilayaraja respectively. The reference pitch is chosen by observing their normal range of singing and the absolute difference between the mean values is estimated and is given in Figure 9.

Figure 9 Absolute difference between mean values

It was observed that for singers Nithyasree, M.S. Subbulakshmi the estimations by YIN algorithm and mutation algorithm were comparable while for singer Balamuralikrishna the YIN algorithm gave a higher difference between the computed value and the reference value. 5.1.3 Absolute difference between standard deviations (AbsStdDiff) The absolute difference (in Hz) between the standard deviations of reference fundamental frequency and the actual estimated fundamental frequency is computed as given in [16] by AbsStdDiff[Hz] = abs{ StdRef[Hz] – StdEst[Hz] } .

Figure 10 Absolute differences between standard deviation

The mentioned mean values and standard deviations are computed on whole reference and estimated F0 data respectively. The graph showing the absolute difference between standard deviation is plotted in Figure 10. It is observed that the YIN algorithm deviated to a greater extent from the other two algorithms since the YIN algorithm computed the harmonic of the lowest frequency in most of the situations for all the singers. The mutation algorithm as well as the one determined by musicologist was almost the same for all the singers. 6. Conclusion and Future Work In this work a new system for identification of Fundamental frequency of Carnatic music songs and film music songs based on Carnatic music is implemented and tested. This algorithm was compared with the existing algorithm of YIN. It was observed that the YIN algorithm always determined the lowest frequency component of the input signal as the fundamental frequency which is not the case always in Carnatic music processing. This work of identification of Fundamental frequency is an essential module for Signal processing applications of speech and music. It is even more mandatory for Carnatic music processing because of the just tempered behavior of the system. In the just tempered behavior system the frequencies belonging to one Raga pattern is based on the fundamental frequency of the song depending on the time of rendering the song. Hence, after estimating the fundamental frequency this module can be integrated with our already determined algorithm for Raga identification and Singer identification. In our earlier system, the fundamental frequency was assumed to be 240 Hz and used to determine the raga of a given song. Now this system, with integration of the fundamental frequency module can be computed on the fly and hence the raga can be determined in a dynamic manner. Our earlier Singer identification system used a set of coefficients called as CICC [9] which were based on fundamental frequency. Now these coefficients would be identified with the identification of fundamental frequency making it a robust Singer identification system. References


10

[1] Arturo Camacho and John G Harris, “A Sawtooth waveform inspired pitch estimator for speech and music”, Journal of the Acoustical society of America, 2008 [2] Prof. P Sambamurthy, “South Indian Music” Vol 1 – 6, The Indian Music Publishing house, India. [3] Alain de Cheveigne and Hideki Kawahara, “YIN a fundamental frequency estimator for speech and music”, Journal of Acoustical society of America, 2002 [4] Robert C Maher and James Beauchamp, “Fundamental frequency of music signals using a two way mismatch procedure”, Journal of Acoustical society of America, 1994 [5] Chunghsin Yeh, Axel Robel and Xavier Rodet, “Multiple fundamental frequency estimation of polyphonic music signals”, ICASSP 2005 [6] A.P. Kalpuri, “Multiple fundamental frequency estimation based on harmonicity and spectral smoothness”, IEEE Transactions on Speech and Audio Processing, 2003 [7] Boris DOVAL and Xavier RODET, “Fundamental frequency estimation and tracking using maximum likelihood harmonic matching and HMM”, ICASSP 1993 [8] Yoshifumi Haraa, Mitsuo Matsumotob and Kazunori Miyoshia, “Method for estimating pitch independently from power spectrum envelope for speech and musical signal”, J. Temporal Des. Arch. Environ. 9(1), December, 2009 [9] Rajeswari Sridhar and T.V. Geetha, “Raga Identification of Carnatic music for Carnatic music Information Retrieval”, International Journal of Recent trends in Engineering, 2009 [10] Rajeswari Sridhar and T.V. Geetha, “Music information retrieval of Carnatic songs based on Carnatic music singer identification, IEEE International Conference on Computer and Electrical Engineering, 2008 [11] Cristian Munteanu and Vasile Lazerescu, “Improving mutation capabilities in a real-coded genetic algorithm”, First European Workshop on Evolutionary Computation in Image Analysis and Signal Processing, EvoIASP '99 [12] David Lu, “Automatic Music Transcription using Genetic algorithms and Electronic synthesis”, Thesis of David Lu. [13] Gustavo Reis, Nuno Fonseca, Francisco Fernandez de Vega, Anibal Ferreira,“Hybrid Genetic algorithm based on gene fragment competition for polyphonic music transcription”, Springer LNCS, December 2008 [14] Howard Ochman, “Neutral Mutations and Neural substitution in Bacterial Genomes”, Society for Molecular Biology and Evolution, 2003. [15] Rabiner and Juang, “Fundamentals of Speech Recognition”, Prentice Hall Signal Processing Series, 1993 [16] Bojan Kotnik, Harald Höge2, Zdravko Kacic, “ Evaluation of Pitch detection algorithms in adverse conditions”, Proc. 3rd International Conference on Speech Prosody, Dresden, Germany, pp. 149-152, 2006. [17] Martino, J., Yves, L, “ An Efficient F0 Determination Algorithm Based on the Implicit Calculation of the Autocorrelation of the Temporal Excitation Signal”, Proc. EUROSPEECH'99, Budapest, Hungary. [18] Ying, G., Jamieson, H., Mitchell, C., “A Probabilistic Approach to AMDF Pitch Detection”, Proc. ICSLP 1996, Philadelphia, PA. [19] Mert Bay Andreas F. Ehmann J. Stephen Downie, “EVALUATION OF MULTIPLE-F0 ESTIMATION AND

TRACKING SYSTEMS”, 10th International Society for Music Information Retrieval 2009, Kobe, Japan Rajeswari Sridhar is a Sr. Lecturer in the Department of Computer Science and Engineering at Anna University, Chennai, India. She is currently doing her Ph.D in the area of Carnatic music signal processing. She has nearly 10 publications to her credit and is interested in the areas of Signal Processing, Theoretical Computer Science, Language Technologies S. Karthiga is a PG student of the Department of Computer Science and Engineering at Anna University Chennai. Her areas of interest include, Data Structures, Algorithms and Information retrieval Dr. T. V. Geetha is a Professor in the Department of Computer Science and Engineering at Anna University, Chennai, India. She has more than 20 years of teaching experience and has produced 10 Ph.D students so far and is guiding 11 students. She has more than 180 publications to her credit and is interested in the area of Tamil Computing. She is currently working on Language technologies and managing sponsored and funded projects.


11

Modified Uniform Triangular Array for Online Full Azimuthal Coverage via JADE-MUSIC Algorithm over MIMO-CDMA

Channel

Sami GHNIMI and Ali GHARSALLAH Physics laboratory of the soft matter, Research unit: Circuits and Electronic systems HF

Tunis Elmanar University 2092 Tunis, Tunisia

Abstract This pap er investigates a Mo dified Uniform Triangular Ar ray (MUTA) to support online spa ce-time MIMO-CDMA location based services with full azimuthal coverage via JADE-MUSIC algorithm. A new space-time lifting preprocessing (STLP) scheme is introduced as a decorrelating process of coher ent signals throu gh the dense/NLOS multipath MIMO channel b efore appl ying the JADE-MUSIC estimator. Uniform- H-Array (U HA) and Uniform-X-Array (UXA) geometries are established for performance co mparisons with the proposed MUTA. Computer simula tions under environment Matlab are described to illustrate the performance of online joint angle/delay estimation with MUTA-MIMO base station app lying JADE-MUSIC in conjunction with STLP scheme in 360° azimuth region. Keywords: MIMO-CDMA, fading multipath proppagation, NLOS, array processing, coherent case, decorrelating scheme, JADE, MUSIC.

1. Introduction

Multiple-Signal-Classification (MUSIC) [1 -2], is in troduced as th e popular sup er-resolutive al gorithm fo r lo cation based services. Its well resistance t o near-far si tuation and the high resolution capability, which theoretically is independent of the power of t he multiple-access interfe rence (MAI) [3], intersymbol interference (ISI) and noise effects, are im portant advantages over conventional estimation techniques. The high computational com plexity streaming f rom t he ei genvalue decomposition an d lim ited cap ability in Non -Line-of-Sight (NLOS) h igh scattered m ultipath propagation con ditions are on th e other hand its major d rawbacks. Su ch situ ation is usually en countered in m ultiuser (M ultiple-Input Mu ltiple-Output Cod e-Division M ultiple-Access) MIMO-C DMA channel [4-5]. The recei ved multipath signals are al ways highly correlated (coherent).The space-time covariance matrix (STCM) of in coming signals is th erefore singu lar and nondiagonal. Furthermore, t he hi gh o bservation demand o n antenna array requ ired by the MUSIC alg orithm make it unattractive for real -time track ing of space-tim e channel parameters. Rather, sev eral p reprocessing alg orithms which aimed at decorrelating t he coherent signals were proposed to support M USIC algo rithm. Th e “Bi-d irectivity s moothing scheme” i ntroduced by M arius Pesa vento i n [6] , “Spat ial-

Smoothing sc heme” [7- 8], an d t he “ Modified S patial-Smoothing sc heme” [9] are most recent examples. Init ially, these al gorithms were pr oposed for direction-of-arrival estimation tech niques in the case of uniform lin ear array (ULA) [10]. Thereafter, the y ha ve been a ssociated t o s ome estimation algorithms for jointly estimating AOA’s and delays of c oherent si gnals [11-14]. Un fortunately, because the high spatial an d tem poral a mbiguities arisin g fro m the co herent case ove r M IMO-CDMA communication cha nnel, these algorithms do not achieve good estimation accuracy in space and t ime dom ains. F urthermore, t hey are restricted for only ULA geometry, wh ich limit their capability in fu ll azimuthal coverage [15]. In t his paper, we present a new M UTA t hat achi eves a ful l coverage in 36 0° azim uth r egion. Th ereafter, th e pr oposed MUTA will be used i n co njunction wi th a new STLP decorrelating scheme to support online location based services via (j oint a ngle and delay est imation wi th M USIC) J ADE-MUSIC algorithm. The proposed STLP scheme shows a good capability in reso lving sp atial an d tem poral a mbiguity streaming from the coherent case through the MIM O-CDMA communication channel. Thus, it achieves a high decorrelation capability based on few data observation snapshots. The rest of the pa per is organized a s follows . T he MIMO-CDMA system model is presen ted i n sectio n 2. Sectio n 3, presents t he proposed JADE-MUSIC est imation m ethod and gives details of STL P s cheme and M UTA geometry. Computer si mulations are described in sectio n 4 to illu strate the performance o f jo int AOA/delay estimation with MUSIC algorithm in con junction wi th STLP sc heme and M UTA t o support online fu ll azim uthally co verage. Fin ally, so me conclusions are drawn in section 5.

2. MIMO-CDMA System Model

Let us consi der t he u plink of a n M -user asy nchronous (16_QAM) MIMO-CDMA communication system operating in a multipath p ropagation en vironment. Assum ed a Symbol -Rate-Maximization-Scheme (SRMS) i s empl oyed t hough t he


12

MIMO channel. W ith this fo rmer sche me, each use r is employing N transmit antennas whereas the base station has

an array of N a ntennas. Assumed th at th e transmitted sig nal from the jth antenna elem ent of the ith user arrives at the receiver via ijK multipaths. Consider that the kth path due to the jth tran smit an tenna o f th e ith user i s depart ing i n direction havi ng azim uth and el evation angl es ijij , and arrives at the base sta tion receiver from a zimuth direction ijk with channel propagation parameters ijk and ijk representing the fading coeffici ent and path-delay, respectively [16]. The m odel in reception is sim plified and only the one-dimensional case ),( 0 is taken into account.

The overall continuous -time baseband received si gnal-vector )t(X due to the M users can hence be formulated as

M

i

N

jijkij

K

kijkijk )t(N)t(m)(diag)(S)t(X

i ij

1 1 1

(1)

For notational conve nience, the received signal can be rewritten in a more compact form as

NM

i

i.ii C)t(N)t()t(X

∑1

MBS

(2)

where M,...,,i 21

N

jiji

KTTiN

Ti

Tii

KTTiN

Ti

Tii

KNiNiii

KK

C)t(M),...,t(M),t(Mt

C)(diag),...,(diag),(diag

C,...,,

j

i

i

1

21

21

21

M

B

SSSS

(3)

and N,...,,j 21

ij

ij

ij

ij

ij

ij

KTijKijijijijijij

KTijKijijij

KNijKijijij

C)t(m),...,t(m),t(mM

C,...,,

C)(S),...,(S),(S

21

21

21S

(4)

With

csi,PNn

ijijijiij nTtcnI,Stm

(5)

where 211212 M

ij ,...,mwithn),j)m()m((nI Z denote t he ith user’s seque nce of M _QAM channel sym bols transmitted by its jth antenna element during the nth channel symbol period csT , with

(6)1

2

1

1

2

1

r.k.jexpr..jexpS

r.k.jexpr.,.jexp,S

ijk

N

jijkijk

N

jijk

ijij

N

jijijij

N

jijiji

c

i

c

i

T

Ti

u

u

represent the space array ste ering vector for the transm itting mobile terminal associated with the ith user and the space array steering vector asso ciated with the kth path fro m the jth transmit an tenna o f th e ith user at the receiving base

station.

NTziyixiNiii r,r,rr,...,r,r 3

21ir and

Tijijijijijcij sin,cossin,coscos.cFk 2 are the transmit sensor location matrix and the wave number vector pointing t owards t he direction-of-departed ( DOD) having azimuth and elevation angles ijij , .

NTzyxN r,r,rr,...,r,rr 3

21 and Tijkijkcij ,sin,cos.cFk 02 are the receiver sensor location m atrix and the wave number vector pointing towards the AOA having azimuth direction ijk .

tc i,PN denot es one peri od of t he PN spreadi ng waveform associated with the ith user and appl ied across al l its transmitting antenna elements.

The noi se vec tor tN consi sts of N independent zero-m ean complex Gaussian components with

NnH )t(N)t(NE I2 (2)

where 2n is the power of the narrow-band noise.

The kth signal co mponent of th e received sig nal-vector tX due to th e jth receiving antenna ele ment, N,...,,k 21 is therefore sam pled at a constant sam pling rat e cs TF 1 and then passed t hrough a Tap ped-Delay-Line (TDL) of l ength

cN2 time s lots. In total, a b ank of N-TDLs is av ailable at the front-end of the receiving a ntenna array [17]. A Chip-Matched Filters may be employed at the input of each receiving antenna element. The 2 Nc-dimensional discretised output frame due t o the kth TDL at the th

on observation peri od i s d efined by ok nx and the total formed discretised signal is represented by

the complex matrix nX ,

obso

LToNN,kokokok

LNNTN

TT

L,...,,n,N,...,,kwith

Cnx,...,nx,nxnx

Cnx,...,nx,nxn

obs

c

obsc

2121

1221

221

X

(8)

obsL denotes the observation length or the number of snapshots. Using m atrix notation, the disc retised received signal can be expressed as a linear com bination of the space-tim e a rray steering m atrix iH , the matrix of fadi ng coe fficients iB , the


13

message signal matrix iM associated with the ith user and t he noise matrix as following

obsc LNNM

i

i.ii Cnnn

2

1∑ NMBHX

(9)

With

c

ijc

ijijK

ij

ijij

ic

NNnH

KNN

il

ijK

il

ijil

ij

ij

KNNiNiii

n.nE

CcS...,

,...cS,cS

C,...,,

22

221

221

21

INN

j

jjh

hhhH

(10)

ijkl is the discrete version of the path-delay,

,c

jkijki

i.e. c

c

jkijki Nmod

Tl

(11)

ic represents one peri od of t he PN-seque nce of t he ith user padded wi th cN zeros at the e nd. jki is th e to tal kth pat h length due t o t he jth trans mit antenna a ssociated with the ith user and c is the pr opagation vel ocity. )(S ijk can be considered the space array st eering vector or spatial signature and i

lcij 2j the time steering vector or temporal signature o f

the kth path due to the jth transmit antenna associated with the ith user. j is a cc NN 22 shift operator m atrix, having the following expressions:

TTNciiii

NTN

TN

c

CC

C

O,N],....,[),(c

O

O

...

.....

.....

.....

...

...

...

110

0

0100

001000010000

122

2

1

1

Ij

(12)

3. Proposed JADE-MUSIC Estimation Method

3.1 Proposed Space-ti me Lifting preprocessing (ST LP) Scheme

Applying the maximum-likelihood estim ation tech nique [18], we obtained the second order statis tics of nX referred to as practical covariance for a finite observation interval equivalent to obsL snapshots matrix as following

nnL

ˆ Hn

LnnobsXX

obs

XXR

0

0 1

1 (13)

In order to a pply the propose d iterative space-tim e lifting decorrelating s cheme, a pre processing sche me is carried out first. It m erely ensures the passage from the temporal dimension of XXR̂ to fre quency dom ain. This transfo rmation leads to an equivalent Van dermonde structure that will be exploited in the space-tim e lif ting schem e. The pre processed covariance matrix is defined as following,

cNNnHH

MM

HPXXPP

~~

.ˆ.ˆ

22

21 IHBRBH

RR

OO (14)

with

∑1

1

111

1

1111

1

M

i

iKKH

MM

ll

ll

MjKMj

jKj

KK,Cn.nEand

,...,...,

,...,...,

S,...,S...,

,...S,...,S~

MjMjKMj

jjKj

Mj

j

MMR

H

(15)

PO is the pre-processing operator and such,

FF *)c*/(.diag dNP 1 IO (16)

F is the equivalent Vanderm onde structure of the Fou rier Transformation matrix,

c

cccc

c

c

Nj

NNNN

N

N

ewith

...

.....

.....

.....

...

...

...

1212122120

122420

1220

0000

1

F

(17)

Once the preprocessed c ovariance matrix is obtained, the proposed STL scheme can be carried out.

Let us co nsider a ULA-MIMO base station with N receiving antennas and a ssumed a bank of N-TDLs with each of length

cN2 time slots is available at the front-end of the ULA.

The m ain goa l of applying the proposed space-time lifting (STL) scheme is to track the space-time channel parameters of coherent signals within spatial and te mporal domains related to the multiple antenna on the ULA and multiple time slots within the TDLs. The basic idea is to e xploit the spatial and te mporal redundancies arising from multiple antennas and multiple slots (by inter-sensors/slots or/and intra-sensors/slots tracking (ISST)) to isolate and to id entify the v arying space-tim e


14

channel parameters through the preprocessed signal covariance matrix.

As si milar to “Spatial-S moothing schem e“, the N-ULA is divided into SN overlapping subarrays, each of size SL . Let sensors SL,...,,21 forming the first subarray , sensors 132 SL,...,, form ing the second s ubarray, and the

sensors N,...,N,N SS 1 forming the th

SN subarray. The TDL

associated with the jth rec eiving antenna ele ment of the ths subarray SS L,...,,jandN,...,,s 2121 is d ivided

into TN sub-TDLs each of size TL time slots with the first sub-TDL stores the TL,...,,21 signal sam ples, the second s ub-TDL stores the 132 TL,...,, signal sa mples and cTT N,...,N,N 21

signal samples will be stored at the thTN sub-TDL. SL and TL

are considere d the num ber of spatial and te mporal shifts, respectively. As results, TN temporal lifted preprocessed covariance sub-matrices are obtained within the 2SL submatrices as sociated with the ths subarray fo rmed by the SLs,...,s,s 1 receiving antenna ele ments, with SN,...,s 21 . Finally, referring to the last details, the overall spatial-temporal lifted preprocessed covariance matrix is hence obtained as following

TS

ts

S Tts

LLn

Ht,s,STL

H

t,Ts,SH

MM

N

s

N

tt,Ts,St,s,STL

TSSTLP

ˆ

~DD

DD~

NNˆ

22

1 1

11

111

I

HBRB

HR

(18)

with

L

s

L

sMjKs,S

L

sMjs,SMjs,S

L

s

L

sjKs,S

L

sjs,Sjs,S

s,SS

M

S

Si

SS

S,...,S,S...,

,...S,...,S,S

diagD

1 1121

1 11

12111 1

T T MjMjKT MjMj

T T jjKT jj

L

t

L

t

l

t,T

L

t

l

t,T

l

t,T

L

t

L

t

l

t,T

L

t

l

t,T

l

t,T

t,T

,...,,...,

,...,...,,

diagD

1 11

1 11

21

112111

(19)

where

MjMjKMj

jjKj

Mj

j

l

t,T

l

t,T

l

t,T

l

t,T

MjKs,SMjs,S

jKs,Sjs,S

t,s,STL

,...,...,

,...,...,

S,...,S...,

,...S,...,S~

1

1111

1

1

111

H

c

ijijKT

c

ijijK

c

ijijKijijK

S

N

l.Ltj

N

l.tj

N

l.tjl

t,T

ijk

T

Lsssijks,S

e,...,e,e

kr,...,r,rjexpS

1

1

(20)

3.2 Proposed JADE-MUSIC Algorithm

Referring to th e Fig. 1, the pr oposed JADE-MUSIC algorithm is based on the eigenvalue decom position of the spat ial-temporal lifte d preprocesse d covariance matrix STLPR provided by the space-time lifting preprocessing sc heme. Eigenvectors of STLPR are separated into two o rthogonal subspaces, called the s ignal subspace )Signal(

STLPE and n oise subspace )Noise(

STLPE . If th ose eigenvecto rs which belongs to the noise subspace )Noise(

STLPE are included in matrix noiseV̂ , then the joint AOAs/Delays of incomi ng multipath signals can be estimated by locating peaks from the JAD E-MUSIC spat ial-temporal spectrum given by

l,w.ˆ.ˆ.l,w

l,STLMF,d

Hnoisenoise

HSTLMF,d

MUSIC,d

VV1

(21)

With l,w STLMF,d is the space-time l ifted matched filter (STLMF) Beamformer associated to the desired user and used for scanning the spatial and tem poral uncertainty regions. It is defined by the following expression,

c

T

cc

S

N

lLj

N

lj

N

ljl

TL

T

LSL

l

TLSLSTLMF,d

e,...,e,e,

,kr,...,r,rjexpS

Sl,w

12

21

1

0

(22)


15

Fig. 1 Proposed JADE-MUSIC Algorithm.

3.3 JADE-MUSIC Algorithm with Proposed MUTA

The geom etries of t he prop osed M UTA as wel l as t hose of UXA and UHA are depicted in Fig. 2.

Fig. 2 (a): Proposed MUTA, (b): UXA, (c):UHA.

The JA DE-MUSIC est imation al gorithm using t he pr oposed MUTA is carried out in several steps.

1.1.1 Step 1: - Dat a acq uisition and sam pling f or UL A o f 1 to

get 1nX .

- Dat a acqui sition and sam pling f or ULA o f 2 to

get 2nX .

- Dat a acq uisition and sam pling f or UL A o f 3 to

get 3nX .

1.1.2 Step 2:

- Formation of practical covariance matrix 1XXR̂ .



1.1.3 Step 3:

- Space-tim e lifting preproc essing sche me for 1XXR̂ to

provide 1STLPR̂ .

- Space-tim e lifting preproce ssing schem e for 2XXR̂ to

provide 2STLPR̂ .

- Space-tim e lifting preproce ssing schem e for 3XXR̂ to

provide 3STLPR̂ .

- carry out the STLMF Bea mformer l,w STLMF,d to perform space-time scanning.

We proceeded with 2SN overlapping suba rrays, each of size 3SL antenna ele ments fo r sp atial l ifting and 7TN sub-TDLs each of size 56TL time slots for temporal lifting.

1.1.4 Step 4:

- C ompute ei genvalue decom position of 1STLPR̂ to g et th e

array eigenva lues

11

1111321

kˆ...ˆˆˆˆ

and deduce 1noiseV̂ .


array ei genvalue

22

2222321

kˆ...ˆˆˆˆ



array ei genvalue

33

3333 ˆ...ˆˆˆˆ321

k


1.1.5 Step 5: - Com pute the JADE-M USIC spatial-te mporal s pectrum

l,w.ˆ.ˆ.l,winvl, STLMF,dnoisenoise

HSTLMF,dMUSIC,d

H

111 VV

for space and t ime scanni ng regi ons 120101 ,...,,I scanS , cccc 1).T-(N:1.T:0.TscanTI .

- Com pute the JADE-M USIC spatial-te mporal s pectrum


HSTLMF,dMUSIC,d

H

222 VV


16

for space and t ime scanni ng regi ons 2401221212 ,...,,I scanS , cccc 1).T-(N:1.T:0.TscanTI

- Com pute the JADE-M USIC spatial-te mporal s pectrum


HSTLMF,dMUSIC,d

H

333 VV

for space and t ime scanni ng regi ons 3602422413 ,...,,I scanS , cccc 1).T-(N:1.T:0.TscanTI

- Co mpute the to tal JADE-MUSIC sp atial-temporal spectrum for the proposed MUTA,

l,,l,,l,l, MUSIC,dMUSIC,dMUSIC,dMUTAMUSIC,d 321

- Plot l,MUTAMUSIC,d for 36010 ,...,,x ,

cccc 1).T-(N:1.T:0.Ty .

As similar to the JADE-MUSIC with MUTA, th e total JADE-MUSIC spatial-temporal spectrum for the UXA is computed as following,

42

31

22

11 X,SX,SX,SX,S I,X

MUSIC,dI,X

MUSIC,dI,X

MUSIC,dI,X

MUSIC,dUXAMUSIC,d ,,,l, with

9001 ,...,X,SI , 180912 ,...,X,SI , 2701813 ,...,X,SI

and 3602714 ,...,X,SI .

Furthermore, th e to tal JADE-MUSIC sp atial-temporal spectrum for the UHA is computed as following,

22

11 H,SH,S I,H

MUSIC,dI,H

MUSIC,dUHAMUSIC,d ,l, with 18001 ,...,H,SI

and 3601812 ,...,H,SI .

For both, UXA and U HA, the STLP sch eme i s carri ed out with 2SN overlapping subarrays, each of size 4SL antenna elements fo r sp atial l ifting an d 5TN sub-TDL s each of size 58TL time slots for temporal lifting.

4. Simulation Results

Computer simul ations under M atlab envi ronment have been conducted to ev aluate the j oint AOA/d elay es timation performance o f t he prop osed JADE-M USIC method us ing MUTA. The parameters used in the simulation are summarized in TABLE I.

TABLE I. MIMO-CDMA SYSTEM SIMULATION PARAMETERS

System Parameters Notation Parameter ‘s

values

Number of users Number of data symbol/user System Modulation PN Gold-Sequences length Chip period Chip rate Over sampling factor Sampling period Carrier frequency Number of Trans. Antenna Number of Receiv. Antenna

M Nd Mod Nc

cT

cT1

q

qT

scT

cF

iN N

6 500 16_QAM 31 200ns

sMchips5 / 1 200ns GHz42. 2 9 or 10

The space-time channel para meters for desired user are set

to ccccc T.,,T.,,T.,,T.,,T., 251211018376041193203 for 5 incoming multipaths asso ciated with t he first d esired transmitted wave. For the second desired transmitted wave, are

set to ccccc T.,,T.,,T.,,T.,,T., 24902520310300182451590 .

Fig. 3 display the est imation of joint Azimuth-AOAs/delays of the 10 multipaths associated with the desired user applying the proposed JA DE-MUSIC al gorithm vi a MUTA. T he proposed STLP scheme in conjunction with MUTA makes it possible to release co mpletely the desire d use from MAI, ISI a nd noise effects. Thus, t he desi red Azim uth-AOAs and del ays are accurately es timated. All peak s were ve ry narrow and the exactly coincide with the real Azimuth-AOA/delay, even in the case of co-delayed, co-directi onal, cl ose-delayed and close-directional coherent si gnals. This reveal ed the hi gh capabi lity and super resolution of the proposed JADE-MUSIC estimation method. Which make it suitab le for online tracking of space-time channel parameters of perfectly coherent multipath signals over MIMO-CDMA channel.

We not that all the simulation results are car ried out with only 10obsL snapshots when form ing the practical covariance

matrix.


17

Fig. 3 Spatial-temporal 3D (upper plot) and 2D (lower plot) output pseudo spectrums of JADE-MUSIC algorithm with proposed MUTA.

Fig. 4 displays the output of JADE-MUSIC algorithm applying UXA. The 10 desi red m ultipaths are wel l resol ved; however we remark the appearance of tw o secondary peaks, which are marked by black crosses o n 2D pseudo-spectrum. The desi red signal subspace is estimated too smaller than the real one. Then the projection of any s pace-time steering vector into the noise subspace results in ISI and MAI, which explain the appearance of these additive peaks.

The estimation results with JADE-MUSIC algorithm via UHA are illustrated in Fig. 5. Only five peaks are clearly seen . The space-time channel parameters of some coherent signals where not resol ved. Although we conserved t he sam e i nterference environment considered in previous simulations depicted in Fig 3 and Fig. 4, the peaks provided with JADE-MUSIC via UHA are relatively broad and lower compared to those of J ADE-MUSIC with UXA and proposed MUTA.

Fig. 4 Spatial-temporal 3D (upper plot) and 2D (lower plot) output pseudo spectrums of JADE-MUSIC algorithm with UXA.

Fig. 5 Spatial-temporal 3D (upper plot) and 2D (lower plot) output pseudo spectrums of JADE-MUSIC algorithm with proposed UHA.

Fig. 6 displays the azimuth-AOA/Delay root mean square error (RMSE) performance versus t he Signal-to-Noise-Ratio (SNR) for t he pr oposed JADE -MUSIC est imation method wi th different array geom etries. It is clearly see n that the JADE-MUSIC al gorithm wi th t he pro posed M UTA i s wi th hi gh spatial and tem poral reso lutions co mpared to th e JADE-MUSIC via UHA an d ULA, respectively. The perf ormance of UXA is com parable to that of MUTA in all SNR conditions and they are very close e specially for delay s param eters. Although, the number of antenna elements in UHA and ULA is greater than that of MUTA a nd UXA, they do not provide the space-time channel param eters of all desired incoming multipaths. This revealed that the UHA and ULA are not suitable for full azimuthally tracking.


18

Fig. 6 Azimuth-AOA (upper plot) and delay (lower plot) RMSE estimation performance versus SNR for different array geometries.

5. Conclusions

A m odified un iform t riangular array (M UTA) i n co njunction with a new S TLP deco rrelating schem e are pr oposed i n this paper to support JADE-MUSIC o nline full azimuthal tracking of cohe rent si gnals over M IMO-CDMA channel . With this STLP form er schem e and M UTA geo metry, t he pr oposed JADE-MUSIC est imation m ethod ca n achi eve a hi gh s patial and tem poral d ecorrelating cap ability a nd en sure a fu ll coverage in 360° azimuth region. Computer simulations show that th e JADE-MUSIC alg orithm wi th p roposed MUTA outperforms that of UXA, UHA and ULA geometries.

References [1] R. O. Schmidt, “Multiple emitter location and signal param eter

estimation”, IEEE Trans. Antennas and Propagation, vol. AP-34, 1986, pp. 278–279.

[2] Jen-Der Lin, Wen-Hsien Fang and Jiunn-Tsair Chen , “Constrained T ST M USIC for joint spa tio-temporal chann el parameter est imation in D S/CDMA sy stems,” W ireless Communications and Mobile Computing, vol. 5, February 2005, pp. 57–67.

[3] Upamanyu Madhow, “Blind Adaptiv e Interference Suppression for Near-Far Resistant Acquisiti on and Demodulation of Direct-Sequence CDM A Signals”, IEEE Transa ctions on S ignal Processing, vol. 45, 1997, pp. 287–297.

[4] Papakonstantinou and K. Slo ck, “Direct Lo cation Estim ation for M IMO System s in M ultipath Env ironments”, IE EE GLOBECOM 2008, 2008, pp. 1–5.

[5] Lionel Sa cramento and Ham ouda W alaa, “ Multiuser Decorrelator D etectors in M IMO CD MA S ystems over Nakagami F ading Channels,” IEEE transactions on wireless communications, vol. 8, 2009, pp. 1944–1952.

[6] Marius Pesavento, Al ex B. G ershman, Martin Haardt, “Unitary Root-MUSIC with a Real-V alued Eigend ecomposition: A Theoretical an d Experimental Perform ance Study ”, IEEE Transactions on Signal Processing, vol. 4, 2000, pp. 1306–1314.

[7] Nikolakopoulos, K.V. Anagnostou, D. Christodoulou and C.G. Chryssomallis, “Estimation of direc tion of arr ival for coh erent signals in wireless com munication s ystems,” I EEE Antennas and Propagation, vol. 1, June 2004, pp. 419–422.

[8] John S. Thom pson, Peter M. Grant, B ernard Mulgrew, “Performance of Spatial Sm oothing Algorithm s for Correlated Sources”, IEEE Transactions on Signal Processing, vol. 4, 1996, pp. 1040–1046.

[9] K. V. S. Hari and B. V. Ramakrishnan, “Performance analysis of a m odified spatial sm oothing technique for dir ection estimation”, Signal Processing, vol. 79, November 1999, pp. 73–85.

[10] Jisheng Dai and Zhongfu Ye, “O n Spatial Sm oothing for DOA Estimation of C oherent Signals in th e Presence of Unknown Mutual Couplin g”, IET Signal Pr ocessing, vol. 3, Decem ber 2009, pp. 1–8.

[11] M. C. Vanderv een, C. B. Papadias and A. Pa ulraj, “Joint angle and delay estimation (JADE) for multipath signals arriving at an antenna array,” IEEE Communications, vol. 1, 1997, pp.12–14.

[12] F. L. Liu, J. K. Wang, R. Y. Du and G. Yu, “Joint DOA-delay estimation base d on spac e-time m atrix m ethod in w ireless channel,” Proceeding of ISCIT2005, 2005, pp.354–357.

[13] M. A. Hernandez, L. G enis and R. Calders , “ Subspace bas ed estimation of parameters and line ar spa ce-time m ultiuser detection for W CDMA systems,” Proc. IEEE Symp. On Spread Spectrum Tech. and Appli., 2000, pp. 249–253.

[14] M. Chenu-Tournier, P. Chevalier and J. . Barbot, “A param etric spatio-temporal channe l es timation techniqu e for F DD U MTS uplink,” P roc. IEEE S ensor Arra y and M ultichannel S ignal Processing Workshop, 2000, pp. 12–16.

[15] K. Maheswara Redd y and V. U. Reddy, “ Analysis o f S patial Smoothing with Uniform Circular Arrays,” I EEE Tr ansactions on Signal Processing, vol. 47, June 1999, pp. 1726–1730.

[16] A. Manikas, Differe ntial Geom etry in Array Processing . Imperial College Press, 2004.

[17] Neil J. Be rshad, José Ca rlos M. Berm udez and Jean-Yves Tourneret, “ Stochastic A nalysis of the LM S A lgorithm for System Identi fication W ith Subspace Inputs”, IE EE Transactions on Signal Proce ssing, vol. 56 , March 2008, pp . 1018–1027,.

[18] M.H. Li and Y.L. Lu , “Improving the Perform ance of GA-ML DOA Est imator wi th a Re sampling Sc heme”, El sevier-Signal Processing, vol. 84, October 2004, pp. 1813-1822.

Sami GHNIMI received the degree in electronic and instrumentation engineering in 2003 and the M.S. degree in electronics device from the Faculty of Sciences of Tunis, Tunisia, in 2005; He is currently working toward the Ph.D. degree in electrical and electronic engineering at the Faculty of Sciences of Tunis. Since 2006, he is a teacher assistant at the High School of Medical Technology of Tunis (HSMTT), Tunis ELMANAR University. His current research interests include MIMO system and multi-carrier technologies for the fourth generation of wireless communication. Ali GHARSALLAH received the degree in radio-electrical engineering from the Higher School of Telecommunication of Tunis in 1986 and the Ph.D. degree in 1994 from the National School Engineers of Tunis. Since 1991, he was with the Department of Physics at the Faculty of Sciences, Tunis. Since 2005, he is the Director of Engineers studies with the Ministry for the Higher Education, Scientific Research and Technology, Tunisian Republic. His current research interests include antennas, array signal processing, multilayered structures and microwave integrated circuits.

IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 4, No 7, July 2010 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 19

An Efficient Software Engineering Ontology Tool for Knowledge Sharing

Polala Niranjan Reddy, Kukatlapalli Pradeep Kumar HOD, Dept.of CSE, Kakatiya Institute of Technology and Science, Warangal, A.P, India, 506015

Asst. Professor, Dept. of ECM, Narayana Engineering College, Nellore, A.P, India, 524004

Abstract

Ontology is an important concept in Computer Science to formally represent kn owledge in a way so ftware can process the knowledge and reason about it. The software engineering on tology assists in de fining in formation for the exchange of semantic project information framework. It int ends to cl ear u p th e am biguities that oc cur in th e knowledge sharing between the software engineers. This paper presents the basic ontological representations for a given software project that which is well developed using the rudimentary software engineering principles. It draws a bead on an ontology model of software engineering to represent its knowledge. This paper also elicits about the analysis of S E Ontology and its advantages/applications with the ex ample scenarios. F inally, a p ractical implementation of the in -depth ontological representation is elicited at the terminal of the paper with appropriate illustrations.

Keywords: Software Engineering, Ontology development, Multisite software development, Knowledge Sharing and Knowledge Engineering. 1. Introduction With the invent of the Internet, the development of the s oftware i n va rious fields, ha ve become m ore cushy and c omfortable. R ealizing t he pr os of multisite software development, major MNC’s and the corporate sectors have moved their business to the countries where the employees work for curtail and pare sal aries. S oftware de velopment has increasingly focused on the Internet, which enables a multisite environment that allows multiple teams residing across cities, regions, or countries to work together i n a networked di stributed fash ion t o

develop t he soft ware. H owever, t he e ffective communication an d coordination ac ross multiple sites is extremely important for the global software development. Team members, team leaders and the managers who carry out, control, manage different tasks and activities respectively may not be located at the sa me site in a mu ltisite environment. Consider a sc enario of a software development process whe re the team members work in a particular site and the pe rson w ho m anage t hem, their t eam leader i s at di fferent s ite who controls them a nd col lects, integrates the co mpleted modules for furt her e nhancement of the soft ware project. As t he team completes th eir r espective module and send the same to the team leader. They draft i n t heir own form of rep resentations for conclusions on the completed module with respect to th eir cu lture, c ustoms an d tradition which th ey follow i n t heir day t o day life. It is obvious t hat they m ight ha ve not c ome fac e t o fac e an d never met as they wo rk online. So , str ict software engineering principles should be followed, to have a bet ter c ommunication am ong the te ams and t he team members. The incongruity in analysis, design, documentation, p resentation, a nd diagrams co uld prevent prope r acc ess by ot her st ake h olders i n a particular software project. S eldom i ssues of this kind are kept enigmatic. In r eference to the above discussed problems, the software engineering has a commonly understood body of knowledge and is an easily learnt subject that includes some of the latest technology and methodology that is easily adopted. As the teams at different sites refer to various texts in the same software e ngineering dom ain, ea ch individual ha ve a personal gui de and w hen they


communicate with eac h ot her the ir te rminology could be quite startling and unusual. This leads to inconsistency and equivocation am ong t he t eams. Communication is the real challenge that everyone face in their dai ly l ife a nd the affective communication is an important part of a successful business. Ontology is a n important part of de veloping a shared u nderstanding across a project to lessen the problems. This paper i s organized i n fi ve sect ions. The next Section de scribes s oftware engi neering an d knowledge engineering as a part of re lated wo rk. Section 3 p resents proposed work. Section 4 de als with ex perimentation a nd results of a case study . Finally section 5 prese nts conclusions a nd future scope of the proposed work. 2. Related Work Software en gineering is t he “application of a systematic, disciplined, an d qua ntifiable a pproach to the development, operation, and maintenance of software”. Alt hough th e cla im o f so ftware development bei ng a n e ngineering discipline i s subject to on going d iscussions, th ere is n o do ubt that i t h as undergone fundamental changes du ring the la st th ree d ecades. This assertion holds tru e both for em ergence of ne w technology and sophistication of methodology. In order t o c ope up with t he c omplexity i n the software, there ha s been a c onstant d rive t o ra ise the level of a bstraction t hrough modeling and higher-level programming languages. For example, the p aradigm o f m odel-driven de velopment proposes that the modeling art ifacts are “executable”, i.e. through automated validation and code ge neration as being a ddressed by the OM G Model Driven Ar chitecture ( MDA). Ho wever, many problems have o nly part ially bee n solved including c omponent reuse , c omposition, validation, information and application integration, software te sting an d quality. Su ch fundamental issues a re th e m otivation f or new a pproaches affecting e very sing le a spect in So ftware Engineering. The e ngineering of knowledge-based syst ems i s a discipline w hich is cl osely related wi th Software Engineering. The t erm Kno wledge En gineering i s often ass ociated wi th the de velopment of expert-systems, i nvolving m ethodologies as w ell a s knowledge representation t echniques. S ince its early days t he not ion o f “o ntology” in c omputer

science ha s e merged from tha t di scipline, givi ng rise to Ontology Engineering, which we focus on in this pa per. I n computer sc ience, the c oncept “ontology” is interpreted in many di fferent way s and concrete o ntologies can vary i n seve ral dimensions, suc h as degree of formality, authoritativeness or q uality. As pr oposed by Oberle, different ki nds o f o ntologies can be classified ac cording to pu rpose, specificity and expressiveness. T he first di mension ra nges fro m application ont ologies t o refe rence o ntologies t hat are pri marily use d to re duce te rminological ambiguity among members of a community. In the specificity dimension, Oberle distinguishes generic (upper level), core and domain ontologies. Domain ontologies a re spec ific t o a universe o f discourse, whereas generic and c ore ontologies meet a hi gher level of generality. Due to the emergence of the “semantic web” vision ontologies have b een a ttracting m uch attention recently. Along wi th this vision, new technologies and tools have bee n de veloped fo r o ntology representation, machine-processing, a nd o ntology sharing. Th is makes th eir a doption in r eal-world applications m uch easi er, whi le ont ologies ar e about to enter mainstream. Hence, we therefore try to al leviate so me of t he co nfusion by pr oviding a framework for c ategorizing po tential u ses of ontologies in Software Engineering. 2.1 Ontology in Software Engineering Ontology is t he philosophical study of the na ture of being, existence or reality in general, as well as the basi c categories of being and t heir rel ations. Traditionally listed as a part o f the major branch of philosophy kn own a s metaphysics, o ntology deal s with questions concerning what entities exist or can be sa id to exist, an d how such en tities ca n be grouped, related within a hierarchy, and subdivided according t o simi larities and di fferences. In computer sci ence and i nformation sc ience the ontology has a key rol e t o pl ay wi th t he fo rmal representation of t he knowledge by a se t of concepts wi thin a domain and the relationships between those concepts.


Figure1: The abstract view of Knowledge Engineering

It is used t o reason about the p roperties of t hat domain, and may be used to describe the domain. Ontology provides a shared vocabulary, which can be used to m odel a domain — t hat is, the type o f objects a nd/or c oncepts t hat exi st, a nd the ir properties a nd r elations. An Ontology in t he f ield of Artificial Intelligence (AI) is an “ Explicit Specification o f a Conc eptualization” [1][2]. Ontologies a re u sed in artificial in telligence, the Semantic Web, Systems engineering, Software engineering, B iomedical i nformatics, Library science, Ent erprise b ookmarking, and Information architecture as a form of the knowledge representation about t he world o r s ome part of i t. The basic a bstract vi ew of t he knowledge engineering with all its outcomes is shown in fig 1. The cr eation o f domain ontologies is a lso fundamental to the definition a nd use of an enterprise ar chitecture f ramework. Th e ac tual content and the domain are represented in the fig 2 with Semantic a nd t he Pr agmatic r epresentations respectively. The content in t he semantics ( Actual meanings) are a ca n be S tuff, Thi ngs, and Relationships. The Domains i n the pra gmatic (Deal ing or concerned w ith fa cts or act ual occ urrences) a rea can be K nowledge dom ain, Applications d omain, and Functional domain. Com bining b oth the Content and the Domain knowledge forms the basis for the Ontology. A si mple a nd ve ry regul ar ontological representation can be a st andard library in a programming language environment which has all t he m ethods, at tributes, classes a nd packages that gives th e answer fo r th e p reliminary question of “ What E xists” in a pro gramming la nguage. However, som e R epresentations m ay be poo r due lack quality in design, implementation and so forth. So, a m ore specialized schema must be cr eated to

make t he i nformation use ful, an d for this, we utilize ontology.

Figure2: Ontology with its ‘Content’ and the ‘Domain’

Concepts An a bstract view o f re presenting the soft ware engineering knowledge i s sho wn i n fig .3. The whole se t of software engi neering c oncepts representing soft ware engineering d omain knowledge is captured in ontology. Ba sed on a particular problem domain, a project or a particular software development proba bly uses o nly part of the whole set of s oftware engineering c oncepts. The s pecific software e ngineering c oncepts use d for t he pa rticular so ftware de velopment project representing software e ngineering sub domain knowledge are captured in ontology. Ontology i n the are a of computer science represents the effort to formulate an exhaustive and rigorous conceptual schema within a given domain [3]. The ge neric software engineering kn owledge represents all software engineering concepts, while specific software engineering knowledge represents some c oncepts o f s oftware engineering for t he particular problem domain. If a pr oject uses purely object-oriented methodology, then the concept of a data flow diagram may not necessarily be included in spec ific c oncepts. In stead, it i ncludes concepts like class diagram, activity diagram, and so on. However, fo r ea ch p roject in the developmental domain, t here exists p roject information or a ctual data including project ag reements an d pro ject understanding. The project inf ormation es pecially meets a particular project need and is needed with the so ftware engineering kn owledge t o define instance knowledge in ontology.


Figure 3: Schematic overview of the Software Engineering

Ontology

Note th at th e domain k nowledge is sep arate f rom instance knowledge. T he dom ain knowledge is quite d efinite, wh ile th e in stance kn owledge is particular t o the problem dom ain and developmental do main in a proj ect. Once all domain k nowledge, s ub domain knowledge, and instance knowledge a re captured in ontology, i t i s available fo r sharing a mong s oftware en gineers through the Internet. The m ain p urpose of the soft ware engineering ontology i s t o ena ble c ommunication bet ween computer systems or software engineers in order to understand common soft ware e ngineering knowledge and t o per form certain t ypes of computations; i t also enables k nowledge sha ring and reuse. The k ey i ngredients that make up the software engineering o ntology are a voca bulary of basic software en gineering t erms and a prec ise specification of wha t those terms mean. For software engi neers or com puter system s, di fferent interpretations i n di fferent contexts ca n m ake the meaning of terms confusing and am biguous, but a coherent terminology adds cl arity an d fa cilitates a better understanding. S oftware e ngineering ontology ha s specific instances fo r t he corresponding software engineering concepts. 2.2 Developing Ontology In the dom ain of knowledge engineering methodology fo r de veloping ontology, there a re some fundamental rules in ontology design. These

rules m ay seem rather dogmat ic. H owever, th ese rules can often help in making design decisions. - There is no one correct way to model a domain - There are always viable alternatives. The best so lution al most a lways d epends on t he application t hat y ou have in m ind a nd t he extensions that you anticipate. - On tology development is n ecessarily an iterative process. - C oncepts in t he ontology should be c lose t o objects (physical or lo gical) an d relationships i n your domain of interest. These are most likely to be nouns ( objects) or ve rbs (relationships) i n sentences that describe your domain. Deciding what we are going to use the o ntology for, a nd how de tailed or general t he ontology is going to be , w ill gui de many o f t he m odeling decisions down the road. Pros of developing Ontologies: - Share common un derstanding of information

among people or agents - Reuse of domain knowledge - Make domain assumptions explicit - Separate domain knowledge from opera tional

knowledge - Analyze domain knowledge After we define an i nitial version of the ontology, we c an e valuate and debug i t by using i t i n applications or problem-solving m ethods or by discussing it with experts in the field, or both. As a result, we will almost ce rtainly n eed t o revise t he initial ontology. Then we can create a knowledge base by defining individual i nstances of t hese c lasses filling in specific sl ot v alue in formation a nd a dditional slot restrictions. Ho wever, t he c oncept of t he ‘Ontology’ e xists i n each an d eve ry domain an d about a ll t he phases o f the soft ware development process. A s ‘ class’ r epresent a r eal world e ntity, everything exp lained wi th the c lasses and th eir relationships. V arious software e ngineering ontology modeling are e licited in the next sections with a case study for deeper evaluations. 3. Proposed Work Many different m odeling ontologies ha ve be en developed. M ostly use d are the K nowledge Interchange Format (KIF ) [4] and knowledge representation languages desi gned from KL-ONE


[5]. However, these representations have ha d little success o utside. AI re search laboratories and require a steep learning curve. KIF provides a Lisp-like syntax t o express sentences of first order predicate logic and de scendants o f K L-ONE include descri ption logics or terminological logics that provide a formal ch aracterization of t he representation Tr aditionally, AI k nowledge representation has a l inear sy ntax. The rec ent papers doc umented in t he literature, to use the Unified M odeling Lan guage for t he ontology modeling [6][7][8]. In Unified Modeling Language ontology information is modeled in class diagrams and O bject C onstraint La nguage (OC L) [8]. However, t here i s controversy, re garding whe ther or not ontology g oes bey ond t he st andard UM L modeling. However, t he st andard UML c annot express a dvanced ontology f eatures su ch a s constraints o r rest rictions. Therefore, a dditional notations ne ed to be defined in or der to leverage expressiveness in the ontology. Note t hat the m odels underlying ont ology should be di stinguished from i ts use i n s oftware development t o m odel t he a pplication domain model. Thi s k ind of a gile m odeling m ethod f or ontology desi gn has s ome benefits d erived from using the sa me paradigm for model ing ontology and kn owledge. In thi s paper, gra phical notations of m odeling software e ngineering ontology are presented. The main aim is not only to create a gra phical re presentation t o make it easi er to understand, but also, this model should be able to capture t he semantic ri chness of t he de fined software engineering ontology. 4. Experimentation and Results This part of the section deals with a case study of the practical implementations w ith r espect to th e ontology b asics. I t is elicited w ith ap propriate screen shots explaining the each and every module clearly. The main application works as follows. This project is developed as a windows application on the Visual S tudio fra mework versi on 3.5. Thi s concentrates on t he pi ctorial repre sentation of the applications d eveloped on t he sam e domain. The application t akes ot her softw are projects as i nput, i.e., by br owsing from t he current l ocation. The appropriate com piled .exe fi le sh ould only be selected a s an i nput. After selecting t he . exe fi le, the co rresponding te xtboxes show s t he selected

folder de tails, .e xe fil es et c. Loading t he same; would provide the namespaces, number of classes, number of methods, number of parameters to it that are used in writin g the code at the implementation phase of the project. The classes, methods, parameters ar e s hown se parately in th e r espective fields shown under.

The vivacious module that is developed is to draw the sa me conc epts i n a pictorial repre sentation. It gives t he hie rarchical stru cture of t he wh ole concepts that are used in the project. It also depicts the r elation between th e classes w ith the o ther classes. At l ast t he sc hema re presenting t he ontology can be saved as a JPEG, GIF, BMP or any other format.

This pav es way for the ba sics o f the Ont ological representation in th e sof tware eng ineering. Th e main use of this application is that there is no need to walk through the entire thousands of lines code to analyze the project.

Project input:

Figure 4: Project input module with the folder details and

contents However, it is a lways preferred to ass ay and explore the concepts which are in a dia grammatic representation. The same was elicited in the current project. T he screen sh ots of the a pplication are shown in the further sections.


Fig 4 i s the project input module, the user is a sked to prov ide th e corresponding inpu t to the application. The i nput m ay be an y ot her application/project (t he debug fil e). The pr oject should posses a ‘.exe’ file; which in turn mean that it should be a complete project or application that is in use. After giving the input, three text boxes are displayed, namely;

‘Folder contents’, ‘.exe files’ and ‘Folder details’. These are shown clearly within their respective text boxes. However, the ‘folder details’ field contains the creat ion ti me, full name, la st ac cess t ime, l ast write ti me of th e fo lder wh ere th e ac tual application is installed. This paves way for the basis of the ontology in the software projects. The fi rst step of the ontological representations o f th e concepts of software engineering starts he re. However, a s m entioned earlier, the project/application se lection proc ess takes on, as shown in the figure 5.

Figure 5: Application/ project selection process

It i s deve loped wi th t he he lp of t he ‘t ree view control’ i n Visual S tudio C # .N et. The W indows Forms Tre e View C ontrol hel ps t o display the hierarchy of nodes that can be used to represent the

organization structure, fil e sy stem or any othe r system which includes hierarchical representation.

For each node added in the hierarchy, user can add a child node to it or a sibl ing no de t o it p rovided there is a parent node for the selected node present, as depicted in the fig5.

As th is pr oject is a w indows app lication, th e project/application sel ection i s done by searching in the local d rives. Th e three bu ttons n amely; Select, Exit, Cancel can basically serve t he user to navigate through the application. The same concept can be sca led t o a we b site appl ication or ca n be inserted in a network (typical LAN).

Project classification:

Loading th e corresponding app lication, t he following a re d isplayed wi th r espect to t he d ata available in the project, which is shown in the fig6 above. Namespaces, Class Names, Methods, Parameters. The Na mespace t hat is us ed over here is the ‘project_diagrams’. Some of the Classes are ‘Jclassview’, ‘form2’, ‘classdiagram’, ‘AssociationDrawer’, ’ClassDrawercontainerpanel’. Some of the P arameters tha t were us ed ar e ‘nClassId’, ‘drData’, ‘value’, ‘strFilePath’. Some of the Methods are ‘ExtractDllMethod’, ‘RetriveMethodandParameterInfo’, ‘RetiveClassInfo’, ‘MakeDataSet’, ‘CheckIsProprtyMethod’, ‘CollectPropertyMethod’, ‘GetClassInfo_DataRowIndex’.


Figure 6: Detecting the classes, methods and parameters The respective co unt i.e ., number of Namespaces, Class nam es, Methods , Parameters a re also depicted in the summery information text box. This module i s developed using t he ‘Grid V iew control’ in the Visual Studio in with c# language. These con cepts re fer directly to th e ontology definition where as o ntology i s an “Explicit Specification of a C onceptualization”. However, the c oncepts th at ar e i n th e ap plication ar e explicitly sp ecified ov er here w ithout r eferring or going again t o t he im plementation p hase (c oding) in the software development process. Ontology Type 1:

Figure 7 is a window which has the menu options such as t he ‘File’, ‘Set tings’, thi s in t urn c ontains the ‘open file’, ‘d raw di agram’, ‘s elect t he ro ot node’, ‘exit’. This developed using the ‘Tree View Control’, shows all methods, classes, parameters in a hi erarchical represe ntation. It depi cts the hierarchical representation in a pictorial enactment of t he c oncepts. It al so sh ows t he rel ationship, mainly the i nheritance between th e cl asses which reside i n the sa me. Th e co ncepts in th e cl ass are disclosed and are shown when they are desired by the user to view the whole concepts. The sa me di agram ca n be save d a s JP EG, G IF, BMP, TIFF as desired by the user for reference.

The diagram fig 7 shows t he clear pi cture of t he concepts that are used to develop the project and is very e asy to a nalyze th e th ings; t his r efers to th e basics of t he ontology in t he m ultisite s oftware development p rocess. Thi s is aim ed to c anvass o r dissect t he conceptions o f t he project be fore it i s delivered to the customer/client.

Figure 7: Pictorial contents of the concepts used in developing the actual application

However, fi g 8 shows t he act ual ontological representations of the various concepts used in the developing t he a pplication. Their relation and t he hierarchy are also shown with clear representations. The same can be vi ewed, sa ved fo r fu rther enhancements and in formation processing of that particular software application in the same domain. Ontology Type 2:

The following d epiction sh own in th e figure 9 is another t ype of t he ontology re presentations, as explained above, this module also takes the .dll file or the .exe file as the input. All the contents in the application are sho wn i n a tree fo rmat a s l ike a super cl ass a nd s ub cl ass fo rmat, a side of t he window. A fter se lecting th e r oot node, th e dependency relation lik e diagram i s sh own with respect to the namespaces, classes, and methods.


If the names pace root nod e is se lected, th e corresponding classes contained in it are depicted.

Figure 8: Class diagram with the relations among the classes

Figure 9: Class diagram with the relations among the classes

When a C lass i s se lected as root node, then th e appropriate methods contained in it are represented as a dependency diagram. Thus t he ontologies co ncept in t he s oftware engineering domain c an be i llustrated. However, this application ca n ac t as an ef ficient s oftware engineering on tology tool for common knowledge sharing es pecially in the m ultisite software development.

5. Conclusions In this paper, we ha ve analyzed the characteristics of software engineering ontology. T he alternative formalism ha ve bee n de fined i.e., gr aphical notations of m odeling software e ngineering ontology. T he m odeling notations are use d to design s oftware en gineering ont ology. When t he knowledge o f t he soft ware e ngineering domain i s represented i n a declarative formalism, t he set of software eng ineering concepts, th eir relations, and their constraints are reflected in the representation that re presents kn owledge. Thu s, th e software engineering ontology can be defined by using a set of software engineering representational terms. The software en gineering o ntology i s organized by concepts, not words. This i s in orde r to recognize and avo id potential lo gical ambiguities. A New different soft ware engi neering o ntology ha s been developed for c ommunication pur poses. A ca se study wi th t he p ractical im plementations were implemented and deployed. References [1] T.R. Grub er, “ A Translation A pproach to Portable Ontology Specification,” Knowledge Acquisition, 1993. [2] T.R. Gru ber, “ Toward Principles for th e Design of Ontologies Us ed for K nowledge S haring,” Pro c. Int’ l Workshop Formal Ontology in Conceptual Analysis and Knowledge Representation, 1993 [3] Wikipedia, “ Ontology (C omputer S cience) from Wikipedia, the Free Encyclopedia,” http://en.wikipedia.org/wiki/Ontology_ %28computer_science%29, June 2006. [4] M.R. Genesereth, “Knowledge Interchange Format—Draft P roposed American N ational S tandard,” http://logic.stanford.edu/kif/dpans.html, 1998.


[5] R.J. Brachman and J.G. Schmolze, “An Overview of the K L-ONE Knowledge Representation S ystem,” Cognitive Science, pp. 171-216, 1985. [6] D . D uric, “MD A-Based Ontology Infr astructure,” Computer Science and Information Systems, vo l. 1, no . 1, 2004. [7] P. Kogut et al., “UML for Ontology Development,” The Kno wledge En g. Re v., v ol. 1 7, no . 1, pp. 6 1-64, 2002. [8] J . E vermann, “A U ML a nd OW L De scription of Bunge’s Up per-Level Ontology Model, ” Soft ware and Systems Modeling, vol. 8, no. 2, pp. 235-249, Apr. 2009.

P.Niranjan Reddy received B.E. (Computer Technology) from Nagpur University in 1992 and M.Tech (Computer Science and Engineering) from NIT ,Warangal in 2001. He has been working as a faculty member in the department of CSE of KITS, Warangal, Since 1996. Presently he is heading dept of CSE. He also a research scholar pursuing his research in CSE in K.U., Warangal. He authored two text books, Theory of computation and Computer Graphics in the field of Computer Science. He published 5 papers in international journals and 6 papers international conferences. He is member of the ISTE and CSI.

Pradeep Kumar, born in India 1985, obtained his M.Tech in Software Engineering in 2010 from Kakatiya Institute of Technology and Sciences, Warangal. He received his B.Tech degree in Electronics and Computer Engineering in 2007 from Narayana Engineering Collage, Nellore. He was one of the toppers in his university in M.Tech and is currently working as an Assistant Professor at Narayana Engineering Collage, Nellore. His research interests include Software Engineering Ontology, Knowledge Sharing, Knowledge Management, and Genetic Algorithms.


HLAODV – A Cross Layer Routing Protocol for

Pervasive Heterogeneous Wireless Sensor Networks Based On Location

Jasmine Norman1 , J.Paulraj Joseph2

1Vellore Institute of Technology, Chennai – 14, India

2 Manonmaniam Sundaranar University, Tirunelveli-12, India

Abstract A pervasive network consists of heterogeneous devices with different computing, storage, mobility and connectivity properties working together to solve real-world problems. The emergence of wireless sensor networks has enabled new classes of applications in pervasive world that benefit a large number of fields. Routing in wireless sensor networks is a demanding task. This demand has led to a number of routing protocols which efficiently utilize the limited resources available at the sensor nodes. Most of these protocols either support stationary sensor networks or mobile networks. This paper proposes an energy efficient routing protocol for heterogeneous sensor networks with the goal of finding the nearest base station or sink node. Hence the problem of routing is reduced to finding the nearest base station problem in heterogeneous networks. The protocol HLAODV when compared with popular routing protocols AODV and DSR is energy efficient. Also the mathematical model of the proposed system and its properties are studied. Keywords: Pervasive, Sensor, Heterogeneous, Routing, Location 1. Introduction Pervasive Computing is a technology that pervades the users’ environment by making use of multiple independent information devices (both fixed and mobile, homogeneous or heterogeneous) interconnected seamlessly through wireless or wired computer communication networks which are aimed to provide a class of computing / sensory / communication services to a class of users, preferably transparently and can provide

personalized services while ensuring a fair degree of privacy / non-intrusiveness. The goal of pervasive computing is to create ambient- intelligence, reliable connectivity, and secure and ubiquitous services in order to adapt to the associated context and activity. To make this envision a reality, various interconnected sensor networks have to be set up to collect context information, providing context-aware pervasive computing with adaptive capacity to dynamically changing environment. Wireless sensor networks (WSN) can help people to be aware of a lot of particular and reliable information anytime anywhere by monitoring, sensing, collecting and processing the information of various environments and scattered objects [24]. The flexibility, fault tolerance, high sensing, self-organization, fidelity, low-cost and rapid deployment characteristics of sensor networks are ideal to many new and exciting application areas such as military, environment monitoring, intelligent control, traffic management, medical treatment, manufacture industry, antiterrorism and so on [18,23]. Therefore, recent years have witnessed the rapid development of WSNs. In this paper, we address the issue of cross-layer networking for the pervasive networks , where the physical and MAC layer knowledge of the wireless medium is shared with network layer, in order to provide efficient routing scheme to prolong the network life time. Unique characteristics of a WSN include limited power, ability to withstand harsh environmental conditions, ability to cope with node failures, mobility of nodes, dynamic network topology, communication failures, heterogeneity of nodes,


large scale of deployment and unattended operation. The challenges of WSN have been studied by Yao K [29]. The key challenge in wireless sensor networks is maximizing network lifetime. Routing for WSNs is one of the most active research areas. Energy efficiency and network capacity are perhaps two of the most important issues in wireless ad hoc networks and sensor networks. Many to one communication paradigm is widely used in regard to sensor networks since sensor nodes send their data to a common sink for processing. This many-to-one paradigm also results in non-uniform energy drainage in the network.

Sensor networks can be divided in to two classes as event driven and continuous dissemination networks according to the periodicity of communication. In event-driven networks, data is sent whenever an event occurs. In continuous dissemination networks, every node periodically sends data to the sink. Routing protocols are usually implemented to support one class of network in order to save energy. Almost all the research involved with routing is related to sending the sensed data to a control center or to a fixed destination. This paper argues that the problem of routing can be reduced to sending the data to the nearest base station, as the base station will have the capacity to directly deliver the data to the control center, to which the sensor is attached to. This not only will reduce the time delay but also will be energy efficient.

The assumption of homogeneous nodes does not always hold in practice since even devices of the same type may have slightly different maximal transmission power. There also exist heterogeneous wireless networks in which devices have dramatically different capabilities, for instance, the communication network in the Future Combat System which involves wireless devices on soldiers, vehicles and UAVs. In contrast to a traditional static wireless sensor network which consists of a large number of small sensor nodes with low computational, storage and communication capabilities, such limitations no longer apply in a mobile sensor network. In [27] the use of vehicles as sensors in a “vehicular sensor network,” a new network paradigm that is critical for gathering valuable information in urban environments is studied. However, existing routing protocols for WSNs are built on the network architecture (called flat architecture) such that all sensor nodes are homogeneous and send their data to a single sink

node by multiple hops [3,5,15,21]. Such a flat architecture is inapplicable to many real applications with large-scale and heterogeneous sensor nodes. A typical network configuration consists of sensors working unattended and transmitting their observation values to some processing or control center, the so-called sink node, which serves as a user interface. Due to the limited transmission range, sensors that are far away from the sink deliver their data through multihop communications, i.e., using intermediate nodes as relays. The given scheme is based on probabilities. The probability as relay node is high for the base station, medium for the mobile sensors and very low for the stationary sensors. Thus the stationary sensors are less likely to be selected as a hop for the relay of information. Deterministic choices based on heavy collection of information into the message are replaced by probabilistic choices by using classical optimization heuristics. We also modeled the heterogeneous network as a random geometric graph and studied the properties.

In this paper, we present a new event driven routing protocol for the pervasive heterogeneous networks which prolongs the life time of the network by considering type of nodes. Simulation results show that our protocol outperforms the traditional routing approaches in terms of network lifetime and latency and is more suitable for real world applications. The remainder of the paper is organized as follows. Section II provides a brief overview of the related work. Section III explains the operation of the new routing protocol. Section IV gives the mathematical model of the system. Section V compares the performance of HLAODV and the protocols used in traditional schemes. Section VI provides the conclusion of the work and discusses future directions. 2. Related Work Pervasive Computing promises a world where computational artifacts embedded in the environment will continuously sense our activities and provide services based on what is sensed. Sensor networks enable to accomplish the goal of pervasive computing partially. Sensor networks introduce new challenges that need to be dealt with as a result of their special characteristics. Their new requirements need optimized solutions at all layers of the protocol


stack in an attempt to optimize the use of their scarce resources. In particular, the routing problem, has received a great deal of interest from the research community with a great number of proposals being made. In [ 8] L.Chen et al have studied a cross layer design for routing in ad hoc wireless networks. Basically the existing protocols can be fit in one of two major categories: on-demand such as AODV [21] and DSR [15], and proactive such as DSDV [22] and OLSR [9]. The review of these protocols is found in [4, 14]. Ad hoc on-demand distance vector (AODV) routing [21] adopts both a modified on-demand broadcast route discovery approach used in DSR [15] and the concept of destination sequence number adopted from destination-sequenced distance-vector routing (DSDV)[22]. Directed diffusion [13] is a good candidate for robust multi hop multi path routing and delivery. This enables diffusion to achieve energy savings by selecting empirically good paths and by caching and processing data in-network (e.g., data aggregation). The authors in [2, 10] have analyzed the performance of the popular protocols after classification. The common belief is that a multi-hop configuration with rather small per-hop distance is the only viable energy efficient option for wireless sensor networks. [3,5,25] have studied the various options for energy efficient wireless sensor network. Location-based algorithms [16,17,31] rely on the use of nodes position information to find and forward data towards a destination in a specific network region. Position information is usually obtained from GPS (Global Positioning System) equipment. They usually enable the best route to be selected, reduce energy consumption and optimize the whole network. In [18] Ye Ming Luz et al have proposed location based energy efficient protocol. Na Wang et al in [19] have studied the performance of the probabilistic multi path geographic based protocols. In [32] position-based routing protocols are surveyed and classified into four categories: flooding-based, curve-based, grid-based and ant algorithm-based.

There is very less research work done related to heterogeneous sensor networks. The integration of different wireless access technologies combined with the huge characteristic diversity of supported services in next-generation wireless systems creates a real heterogeneous network. Authors in [12] have

proposed a secure routing protocol for heterogeneous sensor networks. In [1] the authors proposed a generic practical framework that optimizes media streaming in heterogeneous systems by taking advantage of cost and resource characteristic diversity of the integrated access technologies and the buffering capability of streaming applications. In [20, 30] the authors proposed localized topology control algorithms for heterogeneous wireless multi-hop networks. In [30] each node selects a set of neighbors based on the locally collected information. Random graphs are typically used to represent sensor networks. The authors in [6, 7, 11] have studied the application of random geometric graph to wireless sensor networks. Chen Avin in [7] had investigated the property of random geometric graphs that has implication for routing and topological control in sensor networks. The goal was to construct a special subgraph, the Restricted Delaunay Graph, that permits efficient routing, based only on local information. In [6,11] the authors studied the toplogy and connectivity properties of random geometric graphs. In this paper we propose an energy efficient routing protocol called HLAODV for heterogeneous sensor networks using location. The model is mathematically represented as a random geometric graph and its properties are studied. 3. System Model

-Stationary Node -Mobile node Base Station

Fig. 1 Heterogeneous Sensor Networks


In real world, at a given time, there may be stationary, mobile and powerful base stations existing together in a region. Assuming all the nodes know their destination ID, when an event occurs or when requested by the base station, they try to forward the data to their base station. The topology changes continuously due to the mobility of the nodes. It will be practically impossible most of the times to directly forward the data to the base station due to the nature of radio signals. Hence the problem is to find a neighbour (hop) towards the destination. This is done repeatedly till the destination is reached. In a heterogeneous setup there may be a few base stations in a region. So we argue that for a given node to forward the data, it is enough to find the nearest base station even if the node’s base station is different. Also only high energy nodes get selected as relay nodes sparing the less energy stationary nodes thus prolonging the network life time.

When a node senses an event, it sends a request packet which contains the Node ID, Destination ID , Time and Location. A node (i) which receives the request packet computes the probability of a link between itself and the source. The factors that are taken into consideration are the distance between the source and the node, the energy level of the node, the type of the node and the type of the node’s neighbours. The initial probabilities are set based on the type of the node. If the type is a base station or a sink node (Value : 2) , the probability p(i) is set to 0.75. If the type is a high energy rechargeable node (Value : 1) , the probability p(i) is set to 0.5 and for the low energy static node (Value : 0), p(i) is set to 0.1. The probability may be increased or decreased after receiving a request packet. If the probability is greater than 0.5, a reply packet is sent to the source node. Otherwise the request packet will be discarded. The reply packet consists of Neighbour ID, Location, Type, Time and the Probability. When a node receives a reply packet, it updates its routing table with Neighbour ID, Location, Time, Type and the Probability. Finally the node picks the best neighbour from the routing table by applying the A* search algorithm. All the nodes maintain a table of recent request/reply packets. When a request packet arrives, the node checks whether any recent reply packet had been sent to any node in the region, not necessarily to the source node. This is because of the fact that when an event occurs, all the nodes in the region (within a

specified radius) sense the same. After ‘t’ seconds the recent packets automatically get deleted from the table. This policy helps to avoid congestion and redundancy and is highly energy efficient.

Table 1: REQ Packet

Node ID Dest ID Location Time

Table 2: REP Packet

Node ID

Dest ID

Prob Type Location Seq no

Time

Table 3: Route Table Fields

Node

ID Neighbour ID

Prob Type Location Seq No

Time

3.1 A* Algorithm to find the best neighbour The problem is to find a minimum cost path from a source to a destination. The optimum path in wireless sensor networks is the minimum energy conservation path. The algorithm works based on the type of node. Assuming high energy base stations and high bandwidth mobile nodes which could be recharged, the probabilities are set. The probability differs for each request. The static nodes with less energy level will not participate in routing in order to save energy. A* algorithm is applied to pick the best neighbour from the routing table of a node. The cost function is the distance between the source and the destination. Assuming intermediate base stations or sink nodes that will have the capacity to directly route the packet to the destination, we reduce the problem to finding the nearest base station problem. The heuristic function computes the link quality by combining the probability, type, time and the direction of the destination. As probabilities are self computed, when a reply packet arrives, the node instead of picking the highest probability node as the nextHop , checks the time stamp and the type. If there is a node with slightly less probability which arrived lately, the node will prefer it as a hop to forward the data rather than the high probability one. This is because of the mobility of the nodes. C(i) = dist(i,j) , the distance between the source and the destination. H(i) = f(p(j) , T(j), L(j)) where p(j) is the probability of node j, T(j) is the time the reply packet is sent from j, L(j) is the location of j.


Fig. 2 Schematic Representation of the Model

If we form the convex hull of the nodes within a neighbourhood say a radius r, then only one node would be allowed to transmit at a given time. This avoids traffic congestion and redundancy. Algorithm

1. Source Sends REQ packet 2. Node Receives REQ packet 3. Node Checks Recent REQ/REP List 4. If (! Recent)

a. Node Self computes Probability P

b. If P >= 0.5 , node sends a REP packet

c. Else discard it; Exit; 5. Else Discard it; Exit; 6. Source receives a REP packet 7. Source updates the Route Table 8. Apply A* Algorithm to pick the best

neighbour 9. Forward Data to the next Hop 10. If the next Hop is the Destination , Exit; 11. Else If the next Hop is a base station ,

Exit; 12. Else Forward; Go to 1; 13. Return;

4. Mathematical Model Let there be n number of nodes within a radius r. The problem is to find an optimal path from a source to a destination. Random Geometric Graphs (RGG) have been a very influential and well-studied model of large networks, such as sensor networks, where the network agents are represented by the vertices of the RGG, and the

direct connectivity between agents is represented by the edges. Informally, given a radius r, a random geometric graph results from placing a set of n vertices uniformly and independently at random on the unit torus [0, 1]2 and connecting two vertices if and only if their distance is at most r, where the distance depends on the chosen metric. Connecting two vertices, u, v is possible if and only if the distance between them is at most a threshold r, ie. d (i, j) ≤ r. Several probabilistic results are known about the number of components in the graph as a function of the threshold r and the number of vertices n. An edge appears iff d(i,j) is less than r and if the probability computed based on the distance between i and j , type of j , neighbours of j and energy level of j is greater than a threshold value(0.5). Let R(i,j) be the directed random geometric graph for the sensor model under study. Then, R(i,j) = 1 if p(i,j) > =0.5 = 0 , Otherwise where p(i,j) = f(d(i,j) , e(j), t(j),n(j)) d(i,j) – Distance between i and j e(j) – Remaining energy level of j = Ej – � ek , k = 0 to j-1 t(j) = 0 for Low energy Static node

1 for High Energy Node 2 for Base station/ Sink node

n(j) = 1 if the neighbour is a base station or the neighbour is close to a base station

Source Compute Probability

Send REP Packet REQ Packet P > 0.5

REP Packet

Update

Route Table

NextHop

Apply Heuristics


Fig. 3 RGG with selected path

We will denote s(i) as the set of all nodes in φ(i) whose distance to node N is smaller than predefined radius r. Decisions at node i will be based on the following variables:

1. An estimation of the available energy at neighboring nodes, {Eij, j � s(i)}.

2. The distance to each of the neighbouring node , { min d(i,j) < r }

3. The neighbours type and closeness to a base station , { t(j) = 2 or 1 , � n(j) where t(n(j)) = 2 }

The following operations are possible in the graph. 1. Adding an edge – When a node receives a reply packet with probability greater than 0.5, an edge will be added. 2. Deleting an edge – Since the nodes could be mobile, after a specific time period, the probability of an edge may go down. In this case the edge will be deleted. Assuming that most energy consumption is caused by transmissions, the estimation E(i,j)k+1 = E(i,j) k – m(j) k ET(1) where m(j) is the number of messages transmitted by node j at time k and ET is the energy consumed per transmission. Note that our model assumes that the energy consumptions are the same at each transmission (which is a

reasonable approximation if information is sent in packets of equal size), and that node i ’listens’ all transmissions done by its neighbor, j. All these variables are grouped into observation vector x. Each node with a message to transmit states the decision as a result of solving a hypothesis testing problem with two hypotheses, T = 0 or T = 1, where: • T = 1 if at least one neighbor will forward the message. • T = 0 if no neighbor will forward the message in which case the message will be discarded. Depending on its belief about the value of T, node i will make decision D1 (the message is transmitted) or D0 (the message is not transmitted). To do so, we define cost C(i,T) = 1 if � j , p(i,j) > 0.5 = 0 , Otherwise The optimal path can be obtained if all the nodes are reachable from a sink node or a base station in one or two hops. Otherwise the model is reduced to AODV. The topology can be reconstructed to prolong the network life time. From the definition of the graph it follows that, this graph is not symmetric. i.e, R(i,j) ≠ R(j,i) Proof: Assume i is not in the proximity of a base station and j is closer to a base station. So j’s

Base Station/ Sink Node

Static Less Energy Node

High Energy Mobile Node


computed probability is high and the link exists between i and j. On the contrary, the probability computed by i will be low either because of its type or due to the proximity of the node. So j will not select i as the next Hop to reach its destination. So there is no edge between j and i. There may be isolated vertices in this model as nodes with less energy level are less likely to participate in routing. So the graph is not a connected graph. Only one edge within the radius is selected for transmission and so the order of the algorithm is O(1). 5. Performance Analysis We simulate this protocol on GloMoSim, [26] a scalable discrete-event simulator developed by

UCLA. This software provides a high fidelity simulation for wireless communication with detailed propagation, radio and MAC layers. We compare the routing protocol named as HLAODV with two popular sensor networks routing protocols – AODV and DSR 5.1 Simulation Model The GloMoSim library [26] is used for protocol development in sensor networks. The library is a scalable simulation environment for wireless network systems using the parallel discrete event simulation language PARSEC. The distributed coordination function (DCF) of IEEE 802.11 is used as the MAC layer in our experiments. It uses Request-To-Send (RTS) and Clear-To-Send (CTS) control packets to provide virtual carrier

Table 4: Assumed Parameters

Parameters Value Transmission range 250 m Simulation Time 5M Topology Size 2000m x 2000m Number of sensors 55 Number of sinks 16 Mobility Trace File Traffic type Constant bit rate Packet rate 8 packets/s Packet size 512 bytes Radio Type Standard Packet Reception SNR Radio range 350m MAC layer IEEE 802.11 Bandwidth 2Mb/s Node Placement Node File Initial energy in batteries 10 Joules Signal Strength Threshold -80 dbm Energy Threshold 0.001mJ

sensing for unicast data packets to overcome the well-known hidden terminal problem. There are some initial values used in the simulation. Table 4 lists the assumed parameters. Intel Research Berkeley Sensor Network Data and WiFi CMU data from Select Lab [28] are used to get the positions for the nodes. The experiment is repeated for varying number of nodes. CBR traffic is assumed in the model. For mobility, trace file is used. The new protocol is written in Parsec and hooked to GloMoSim. All the three protocols are simulated in GloMoSim to enable

comparisons among them. When a packet is generated, the corresponding routing algorithm is invoked. 5.2 Performance Metrics For the evaluation of protocols the following metrics have been chosen. Each metric is evaluated as a function of the topology size, the number of nodes deployed, location and the data load of the network.


Latency – This is a measure of execution

time. It is the total time taken by the various protocols for the given CBR traffic to complete within the simulation time.

Energy Spent – This is measured in terms of signals received and transmitted. The energy spent on each node is directly proportional to the number of signals received and transmitted. Less number is an indicative of energy conservation.

Congestion – The parameters for congestion evaluation are number of collisions and number of timeout packets generated. Obviously more number of collisions and timeout packets indicate congestion in the traffic.

Load Balance - The number of nodes used in the transmission. This is also an indication of energy conservation at each node.

5.3 Simulation Results Figure 4 shows the execution time of three protocols for different sets of nodes and traffic. The execution time increases as the traffic increases. Due to control packets overhead in route discovery and maintenance AODV and DSR have high execution time as against the proposed protocol. Both AODV and DSR do not differentiate nodes. When there are no base stations HLAODV tends to take more time than AODV and DSR protocol.

Execution Time

0

1

2

3

4

5

6

1 2 3 4 5 6 7 8

CBR Traffic

Tim

e

HLAODV

AODV

DSR

Fig 4. Packet Delivery Time

Figures 5 and 6 show the number of signals received and transmitted by the nodes. There is equal energy spent on receiving phase as transmission phase. There is a sharp difference in signals received in the new protocol as compared to others. In signals transmitted there are only a

few nodes are affected in HLAODV. The graphs are indicative of less energy spent in HLAODV compared to AODV and DSR. This clearly indicates the energy efficiency of the HLAODV protocol.

Signals Received

05000

1000015000200002500030000350004000045000

1 5 9 13 17 21 25 29 33 37 41 45 49 53

Node

Sig

nal

s HLAODV

AODV

DSR

Fig 5. Total Number of Signals Received

Signals Transmitted

0200400600800

100012001400160018002000

1 5 9 13 17 21 25 29 33 37 41 45 49 53

Node

Sig

nal

s HLAODV

AODV

DSR

Fig 6. Total number of Signals Transmitted

Figure 7 and 8 show the congestion control of the protocols by studying the number of collisions and time out packets. The proposed protocol has very few number of collisions as compared with other protocols. Moreover the timeout packets are generated less in number in HLAODV. The reason is that within a specific region, only one node is allowed to transit for a period of t seconds. This not only avoids congestion but also takes care of redundancy suppression. Also it spares the energy of the nodes in the transmission of redundant data.


Number of Collisions

0

200

400

600

800

1000

1200

1400

1600

1 5 9 13 17 21 25 29 33 37 41 45 49 53

Nodes

Co

llis

ion

s

HLAODV

AODV

DSR

Fig 7 . Number of Collisions

TimeOut Packets Generated

00.5

11.5

22.5

33.5

44.5

55.5

6

1 5 9 13 17 21 25 29 33 37 41 45 49 53

Hu

nd

red

s

Node

CT

S T

imeo

ut

Pac

kets

HLAODV

AODV

DSR

Fig 8. Time out Packets Generated

6. Conclusion

Wireless sensor networks and radio frequency identification (RFID) devices are quickly becoming a vital part of our infrastructure with applications ranging from supply-chain management to home automation and healthcare. These tiny, pervasive computing devices have extremely limited power resources and computational capabilities. On the other side there also exist heterogeneous wireless networks in which devices have dramatically different capabilities. In this paper we proposed an energy efficient routing protocol for heterogeneous pervasive networks based on location. Simulation results show that our protocol HLAODV outperforms AODV and DSR in energy efficiency, latency, load balancing, redundancy suppression and congestion control. The model is a cross layer design as the link parameters determine the routing scheme. Our next goal is to identify the minimum number of base stations required to get an optimal path and to study a secure routing scheme for heterogeneous networks.

7. References

[1]. Ahmed H. Zahran, Cormac J. Sreenan, “Threshold-Based Media Streaming Optimization for Heterogeneous Wireless Networks” , IEEE Transactions On Mobile Computing, Vol. 9, No. 6, 2010 ,753

[2]. A. H. Azni, Madihah Mohd Saudi, Azreen Azman, and Ariff Syah Johari D , “Performance Analysis of Routing Protocol for WSN Using Data Centric Approach” , World Academy of Science, Engineering and Technology , 2009 , 53

[3]. Bandyopadhyay S and Coyle E , “ An energy efficient hierarchical clustering algorithm for wireless sensor networks” , IEEE Infocom , 2003, pp 1713-23

[4]. Bharat Kumar Addagada, Vineeth Kisara and Kiran Desai , “A Survey: Routing Metrics for Wireless Mesh Networks” , 2009

[5]. Bhardwaj M, Garnett T and Chandrakasan A P , “ Upper bounds on lifetime of sensor networks”, IEEE International Conference on Communications (Helsinki) , 2001, pp 785-790

[6]. Bhupendra Gupta , Srikanth K Iyer , D Manjunath , “Topological Properties Of The One Dimensional Exponential Random Geometric Graph”, Random Structures & Algorithms , Volume 32 , Issue 2 , 2008, pp: 181-204

[7]. Chen Avin , “Random Geometric Graphs: An Algorithmic Perspective” , Ph,D dissertation, University of California , Los Angeles , 2006

[8]. L. Chen, S. H. Low, M. Chiang, J. C. Doyle , “Cross-Layer Congestion Control, Routing and Scheduling Design in Ad Hoc Wireless Networks”, IEEE International Conference on Computer Communications. Proceedings In INFOCOM , 2006, pp. 1-13.

[9]. T. Clausen, Ed., P. Jacquet, “ Optimized Link State Routing Protocol (OLSR)” , Network Working Group, Request for Comments: 3626

[10]. S. Das, R. Castaneda, and J. Yan, , "Simulation-Based Performance Evaluation of Routing Protocols for Mobile Ad Hoc Networks", Mobile Networks and Applications, Vol. 5, No. 3, 2000, pp 179-189

[11]. J. D´ıaz D. Mitsche X. P´erez-Gim´enez , “On the Connectivity of Dynamic Random Geometric Graphs, Symposium on Discrete Algorithms” , Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms , 2008, pp 601-610

[12]. Feilong TANG, Minyi GUO, Minglu LI, Cho-Li WANG , Mianxiong Dong, “Secure Routing for Wireless Mesh Sensor Networks in Pervasive Environments”, International Journal Of Intelligent Control And Systems , VOL. 12, NO. 4, 2007, pp 293-306

[13]. C. Intanagonwiwat, R. Govindan, and Estrin, “Directed diffusion: A scalable and robust communication paradigm for sensor networks”,


in Proc. of ACM MobiCom’00, Boston, MA, USA, 2000, pp. 56–67

[14]. Jamal N. Al-Karaki Ahmed E. Kamal , “Routing Techniques in Wireless Sensor Networks: A Survey” , 2004

[15]. D. B. Johnson, D. A. Maltz, and Y-C Hu., “ The Dynamic Source Routing Protocol for Mobile Ad Hoc Networks (DSR)” , IETF Mobile Ad Hoc Networks Working Group, Internet Draft , 2003

[16]. Karp, B.; Kung, H. T. , “GPSR: Greedy perimeter stateless routing for wireless networks”, In Proceedings of the Sixth Annual ACM/IEEE International Conference on Mobile Computing and Networking (MobiCom), Boston, USA, 2000, pp. 243–254.

[17]. Ko, Y. B.; Vaidya, N. H. , “Location-Aided Routing (LAR) in mobile ad hoc networks”, Wireless Networks , 6, 2000, 307–321

[18]. F. L. LEWIS , “Wireless Sensor Networks - Smart Environments: Technologies, Protocols, and Applications” , ed. D.J. Cook and S.K. Das, John Wiley , 2004

[19]. Na Wang and Chorng Hwa Chang , “Performance analysis of probabilistic multi-path geographic routing in wireless sensor networks” , International Journal of Communication Networks and Distributed Systems , Vol 2 , 2009, pp 16 – 39

[20]. Ning Li , Jennifer C. Hou , “Topology Control in Heterogeneous Wireless Networks: Problems and Solutions”, IEEE/ACM Transactions on Networking (TON) , Volume 13 , Issue 6 , 2005, pp 1313 - 1324

[21]. C. E. Perkins, E. M. Royer, and S. R. Das, “Ad Hoc On-Demand Distance Vector (AODV) Routing” , IETF Mobile Ad Hoc Networks Working Group, IETF RFC 3561

[22]. C. E. Perkins and P. Bhagwat, , “Highly dynamic destination-sequenced distance-vector routing (DSDV) for mobile computers” . In Proceedings of the ACM Special Interest Group on Data Communications (SIGCOMM), 1994, pp 234-244

[23]. J. Polastre, R. Szewcyk, A. Mainwaring, D. Culler, J. Anderson, “Analysis of Wireless Sensor Networks for Habitat Monitoring in Wireless Sensor Networks” , Kluwer Academic Publishers (NY), 2004, pp. 399-423.

[24]. Robert Grimm, Tom Anderson, Brian Bershad, and David Wetherall , “A System Architecture for Pervasive Computing”, ACM SIGOPS European Workshop, Proceedings of the 9th workshop on ACM SIGOPS European workshop: beyond the PC: new challenges for the operating system , 2000, pp 177 - 182

[25]. M. Singh and V.K. Prasanna, “Optimal energy-balanced algorithm for selection in a single-hop sensor network”, IEEE international workshop on SNPA ICC, 2003

[26]. M. Takai, L. Bajaj, R, Ahuja, R. Bagrodia and M. Gerla, “GloMoSim: A Scalable Network

Simulation Environment”, Technical report 990027, UCLA , 1999

[27]. Uichin Lee, Eugenio Magistretti, Biao Zhou, Mario Gerla, Paolo Bellavista, Antonio Corradi, “Efficient Data Harvesting in Mobile Sensor Platforms” , percomw, 2006, pp.352-356, Fourth IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOMW'06)

[28]. Wireless Sensors Location Data : http://www.select.cs.cmu.edu/data/index.html

[29]. Xiang-Yang Li , Wen-Zhan Song , Yu Wang , “Localized topology control for heterogeneous wireless sensor networks” , ACM Transactions on Sensor Networks (TOSN) , Volume 2 , Issue 1 , 2006, pp 129 - 153

[30]. YAO Kung , “Sensor Networking: Concepts, Applications, and Challenges”, ACTA Automatica Sinica , Vol. 32, No. 6 , 2006

[31]. Ye, F.; Zhong, G.; Lu, S.; Zhang, L. , “GRAdient broadcast: a robust data delivery protocol for large scale sensor networks” , Wireless Networks, 11(3), 2005 , pp 285-298.

[32]. Zhang Jin , Yu Jian-Ping , Zhou Si-Wang , Lin Ya-Ping, Li Guang , “A Survey on Position-Based Routing Algorithms in Wireless Sensor Networks” , Algorithms , 2, 2009, pp 158-182


38

Frequent Pattern Mining Using Record Filter Approach

D. N. Goswami1, Anshu Chaturvedi2 and C.S. Raghuvanshi3

1 SOS in Comp. Appl., Jiwaji University Gwalior, M.P. 474001, India

2 Dept. of Comp. Appl., MITS College of Enggineering Gwalior, M.P. 474001, India

3 Dept. of Comp. Appl., GICTS College of Professional Education Gwalior, M.P. 474001, India

Abstract In today’s emerging world, the role of data mining is increasing day by day with the new aspect of business. Data mining has been proved as a very basic tool in knowledge discovery and decision making process. Data mining technologies are very frequently used in a variety of applications. Frequent itemsets play an essential role in many data mining tasks that try to find interesting patterns from databases, such as association rules, correlations, sequences, episodes, classifiers, clusters. Frequent patterns are the itemsets that are frequently visited in database transactions at least for the user defined number of times which is known as support threshold. Presently a number of algorithms have been proposed in literature to enhance the performance of Apriori Algorithm, for the purpose of determining the frequent pattern. The main issue for any algorithm is to reduce the processing time. Present paper proposes a new record filtering based approach which takes very less time for performing computations during mining process. Experiments have been performed on synthetic datasets and the results have been presented. The results show that proposed approach performs well in terms of execution time and ultimately enhances efficiency as compared to traditional Apriori approach. Keywords: Association Rule, Apriori, Frequent Patterns, Record Filtering

1. Introduction Data mining is the process of finding interesting trends or patterns in large datasets to steer decision about future activities. It is the analysis of dataset to find unsuspected relationship and to summarize the data in new ways which are both understandable and useful. Evolutionary progress in digital data acquisition and storage technology has resulted in huge and voluminous databases. Data is often noisy and incomplete, and therefore it is likely that many

interesting patterns will be missed and reliability of detected patterns will be low. This is where, Knowledge Discovery in Databases (KDD) and Data Mining (DM) helps to extract useful information from raw data. Frequent patterns are those that occur at least a user-given number of times (referred as minimum support threshold) in the dataset. Frequent itemsets play an essential role in many data mining tasks that try to find interesting patterns from databases, such as association rules, correlations, sequences, episodes, classifiers, clusters. Frequent pattern mining is one of the most important and well researched techniques of data mining. The mining of association rules is one of the most popular research domain. The original motivation for searching association rules came from the need to analyze so called supermarket transaction data, that is, to examine customer behavior in terms of the purchased products. Association rules describe how often items are purchased together. Such rules can be useful for decisions concerning product pricing, promotions, store layout and many others. 2. Problem The problem of mining association rules is to generate all rules that have support and confidence greater than or equal to some user specified minimum support and minimum confidence threshold respectively. A formal statement of the association rule problem is given in [1], [9], [10], [11]. Let = { i1, i2, i3, i4………. im } be a set of m distinct literals called items, D is a set of transactions (variable length) over . Each transaction contains a set of items i1, i2, i3, i4……….. ik . Each transaction is associated with


39

an identifier, called TID. An association rule is an implication of the form X Y, where X, Y and X Y = 0. Here X is called the antecedent and Y is called the consequent of the rule. The rule X Y holds in the transaction set D with confidence if among those transactions that contain X % of them also contain Y. The rule X Y has support S in the transaction set D if S% of transactions in D contains X Y. The selection of association rules is based on these two values (some additional constraints may also apply). These are two important measures of rule interestingness. They respectively reflect usefulness and certainty of a discovered rule. They can be described by the following equations: Support (X Y) = Frequency (X Y) / /D/ Confidence (X Y) = Frequency (X Y) / Frequency (X) where /D/ represents the total number of transactions (tuples) in D. A frequent itemset is an itemset whose number of occurrences is above a minimum support threshold. An itemset of length k is called k-itemset and a frequent itemset of length k as k-frequent itemset. An association rule is considered strong if it satisfies a minimum support threshold and minimum confidence threshold. 3. Classical Frequent Pattern Mining Algorithms There are different algorithms and approaches for frequent pattern discovery. Techniques to discover the association among data, such as AIS [1], SETM [2], and Apriori [1][3] have been widely studied. Apriori is a great achievement in history of association rule mining, Apriori algorithm was first proposed by Agrawal et al. The AIS is just a straightforward approach that requires many passes over the database, generating many candidate itemsets and storing counters of each candidate while most of them turn out to be not frequent. Apriori is more efficient during the candidate generation process for two reasons, Apriori employs a different candidates generation method and a new pruning technique. There are two processes to find out all the large itemsets from the database in Apriori algorithm. First the candidate itemsets are generated, then the database is scanned to check the actual support count of the corresponding itemsets. During the first scanning of the database the support count of each item is calculated and the large 1-itemsets are generated by pruning those itemsets, whose supports are below the predefined threshold. In each pass

only those candidate itemsets that include the same specified number of items are generated and checked. The candidate k-itemsets are generated after the k-1th passes over the database by joining the frequent K-1 itemsets. All the candidate, k-itemsets are pruned by checking their sub (k-1) –itemsets, this k-itemsets candidate is pruned out because it has no hope to be frequent according to the apriori property. The Apriori property says that every sub (k-1) –itemsets of the frequent k-itemsets must be frequent. An analysis of Apriori algorithm has let the authors to identify the following limitations-

I. The first issue in Apriori is that it generates a large number of candidate itemsets.

II. The second lacuna is that it takes a large number of database scans in order to discover frequent patterns.

4. The Apriori Algorithm The Apriori algorithm [4],[5],[6],[7],[8] is also called the level-wise algorithm and was proposed by Agrawal and Srikanth in 1994. It is the most popular algorithm to find all the frequent sets which use the downward closure property. The advantage of the algorithm is that before reading the database at every level, it prunes many of the sets which are unlikely to be frequent sets by using the Apriori property, which states that all nonempty subsets of frequent sets must also be frequent. This property belongs to a special category of properties called anti-monotone in the sense that if a set cannot pass a test, all of its supersets will fail the same test as well. Using the downward closure property and the Apriori property, this algorithm works as follows. The first pass of the algorithm counts the number of single item occurrences to determine the L1 or single member frequent itemsets. Each subsequent pass, K, consists of two phases. First, the frequent itemsets Lk-1 found in the (k-1)th pass are used to generate the candidate itemsets Ck, using the Apriori candidate generation algorithm. Next, the database is scanned and the support of the candidates in Ck is determined to ensure that Ck itemsets are frequent itemsets.

4.1 Steps of Algorithm

Initialize: k := 1, C1 = all the 1- item sets; read the database to count the support of C1 to determine L1. L1 := {frequent 1- item sets}; k:=2; //k represents the pass number// while (Lk-1 ≠ ) do begin


40

Ck := gen_candidate_itemsets with the given Lk-1 prune(Ck) for all transactions t T do increment the count of all candidates in CK that are contained in t; Lk := All candidates in Ck with minimum support ; k := k + 1; end Answer := k Lk ;

4.2 Working Example of Apriori

To understand the functioning of classical Apriori algorithm, we consider a database of 15 transactions containing an item set I = {I1,I2,I3,I4,I5} of five items.

Table 1: Database (D)

TID Items

T1 I1, I3, I5

T2 I1, I4

T3 I4, I5

T4 I2, I3, I4

T5 I1, I2, I3

T6 I2, I4, I5

T7 I2, I5

T8 I2, I3, I4, I5

T9 I4

T10 I2, I3, I4, I5

T11 I3, I4

T12 I1

T13 I2, I4, I5

T14 I4, I5

T15 I1, I2, I3, I4, I5

Before starting the Apriori we assume absolute support count of 3. In the first step of classical Apriori we take the candidate set of one item and scan the database to count the support of each member of candidate set

Table2

Scan D to count the support of

each candidate

Itemset

Sup. coun

t Compare candidate

support with minimum

support count to get frequent set

Itemset

Sup.coun

t I1 5 I1 5 I2 8 I2 8 I3 7 I3 7 I4 11 I4 11 I5 9 I5 9

Candidate Frequent set of 1 item set of 1 item

After determining the frequent set of 1 item, we generate the candidate set of 2 items by merging the frequent set of 1 item. After that we again scan the database D to count the support of each element of candidate set and generate the frequent set of 2 items by comparing support count with minimum support count. Table3


each Candidate

Itemset

Sup.coun

t

Compare candidate



Itemset

Sup.coun

t I1, I2 2 I1, I3 3 I1, I3 3 I2, I3 5 I1, I4 2 I2, I4 6 I1, I5 2 I2, I5 5 I2, I3 5 I3, I4 5 I2, I4 6 I3, I5 5 I2, I5 6 I4, I5 7 I3, I4 5 I3, I5 4 I4, I5 7

Candidate Frequent set of 2 items set of 2 items

Further we generate a candidate set of 3 items by using frequent 2 item sets and pruning technique. After that we again scan all the transactions in database D to count the support of each element of candidate set in order to get the frequent set by comparing them with the minimum support count. Table4


each candidate

ItemsetSup.coun

t

Compare candidate

support with minimum support

count to get frequent set

Itemset Sup. count

I2, I3, I4 4 I2, I3, I4 4 I2, I3, I5 3 I2, I3, I5 3 I2, I4, I5 5 I2, I4, I5 5 I3, I4, I5 3 I3, I4, I5 3


In the next step we generate candidate set of 4 items by using frequent 3 item sets and pruning technique and determine the support of candidate set by scanning all the transactions available in the database in order to get frequent set of 4 items.


41

Table5

Scan D to

count the support of

each candidate

Itemset Sup.

Count Compare candidate support

with minimum support


ItemsetSup. count

I2, I3, I4, I5

3 I2, I3, I4, I5

3


In this way classical Apriori discover all frequent item set by scanning all the transactions in each repetitive scan and thus takes a lot of time. 5. Proposed Record Filter Approach The author has critically analyzed the apriori algorithm and observed that we have to count the support of itemsets many times during mining process. Since counting the occurrences of itemsets is a time-consuming process hence, the present paper proposes a novel approach for mining frequent patterns that takes less time as compared to Apriori algorithm. In case of Apriori algorithm when we count the support of candidate set of length k, we also check its occurrence in transaction whose length may be greater than, less than or equal to the k. But in the proposed approach support count of candidate sets only in the transaction records whose length is greater than or equal to the length of candidate set is checked, because candidate set of length k, can not exist in the transaction record of length k-1 , it may exist only in the transaction of length greater than or equal to k.

5.1 Steps of Proposed Algorithm

Initialize: k := 1, C1 = all the 1- item sets; read the database to count the support of C1 to determine L1. L1 := {frequent 1- item sets}; k:=2; //k represents the pass number// while (Lk-1 ≠ ) do begin Ck := gen_candidate_itemsets with the given Lk-1 prune(Ck) for all transactions t whose length is greater than or equal to k T do increment the count of all candidates in CK that are contained in t; Lk := All candidates in Ck with minimum support ; k := k + 1; end

Answer := k Lk ;

5.2 Working Example

To illustrate the working of proposed approach, we use the above mentioned transactional database D Shown in Table1. The transactional database (Table 1) contains 15 transactions with an item set I = {I1, I2, I3, I4, I5} of five items and we consider the same minimum support count of 3. Initially we consider the candidate set of size one and determine the support count as shown below Table6


each candidate

Itemset

Sup.coun

t Compare candidate



Itemset

Sup.coun

t I1 5 I1 5 I2 8 I2 8 I3 7 I3 7 I4 11 I4 11 I5 9 I5 9

Candidate Frequent set of 1 item set of 1 item

Next we generate candidate set of size-two and determine the support count only in the transactions which contain at least two items. Hence the transaction T12, that contains a single item will not be considered during this step.

Table7


each candidate

Itemset

Sup.coun

t

Compare candidate



Itemset

Sup.coun

t I1, I2 2 I1, I3 3 I1, I3 3 I2, I3 5 I1, I4 2 I2, I4 6 I1, I5 2 I2, I5 5 I2, I3 5 I3, I4 5 I2, I4 6 I3, I5 5 I2, I5 6 I4, I5 7 I3, I4 5 I3, I5 4 I4, I5 7


Further we generate a candidate set of size-3 and determine the support count by considering only those transactions which contain at least 3 items. Hence the transaction containing only one or two items will not be


42

0

100

200

300

400

500 1000 1500 2000

Apriori

RecordFilteringBasedApproach

scanned throughout the database (i.e. T2, T3, T7, T9, T11, T12, T14)

Table8


each candidate

Itemset Sup. coun

t

Compare candidate

support with minimum support


Itemset Sup. count

I2, I3, I4 4 I2, I3, I4 4 I2, I3, I5 3 I2, I3, I5 3 I2, I4, I5 5 I2, I4, I5 5 I3, I4, I5 3 I3, I4, I5 3


In next step, we generate the candidate set of size-4 and determine the support count by considering only those transactions which contains at least 4 items. In this process we ignore those transactions that contain 1, 2 or 3 items.

Table9


each candidate

Itemset Sup.

Count Compare candidate support

with minimum support


ItemsetSup. count

I2, I3, I4, I5

3 I2, I3, I4, I5

3


In this way proposed approach discovers the frequent itemsets of all size by saving considerable amount of processing time. 6. Performance Evaluation To explore the performance of proposed algorithm, synthetic dataset is used and all the experiments are performed on Pentium IV 2.93 GHz PC machine with 512 MB RAM, running Microsoft Windows 2000. This algorithm is implemented in Java and used hash-set to calculate the candidate itemsets. All the runtime reports include both CPU time and I/O time. For the comparative study of classical Apriori and proposed approach, we have taken a database of 5000 transactions containing 50 unique items. During this analytical process we have considered 1000 transactions to generate the frequent pattern with the support count of 10% and the process is repeated by increasing the transaction gradually. Table below (Table 10) shows the execution time corresponding to different transaction sizes.

Table 10: Execution time in seconds for different transaction size

Finally as a result of critical analysis, we can see that proposed approach (Record filtering based approach) takes only 90% time in comparison to classical Apriori. Hence, we save approx 10 % time in the of proposed approach.

7. Conclusion

Present paper proposes a new record filter based algorithm which is a variation of the Apriori algorithm and performs fewer database scans than Apriori and utilizes only transaction of specific sizes for the generation of frequent itemsets. As observed by many researchers counting the occurrences of itemsets is a time consuming activity, this paper introduces a new strategy of considering only those transactions whose length is greater than or equal to the length of candidate set is checked, because candidate set of length k , can not exist in the transaction record of length k-1 , it may exist only in the transaction of length greater than or equal to k. Due to this, proposed approach takes very less time for performing computations during mining process. Experiments have been performed on synthetic datasets and the results have been presented. The results show that proposed approach performs well in terms of execution time and ultimately enhances efficiency as compared to traditional Apriori approach.

Transaction Size

Execution time (seconds) Apriori

Execution time (seconds)

Record Filtering Based Approach

500 42 37

1000 92 82

1500 167 149

2000 392 348


43

References [1]Agrawal, R., Imielinski, T., and Swami, A. N. 1993. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, 207-216. [2] Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules. In Proc. 20th Int. Conf. Very Large Data Bases, 487-499. [3] Agarwal, R. Aggarwal, C. and Prasad V., A tree projection algorithm for generation of frequent itemsets. In J. Parallel and Distributed Computing, 2000. [4] R. Agrawal and R. Srikant. Fast algorithms for mining association rules. IBM Research Report RJ9839, IBM Almaden Research Center, San Jose, California, June 1994. [5] A. Amir, R. Feldman, and R. Kashi. A new and versatile method for association generation. Information Systems, 2:333–347, 1997. [6] R.J. Bayardo, Jr. Efficiently mining long patterns from databases. In L.M. Haas and A. Tiwary, editors, Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, volume 27(2) of SIGMOD Record, pages 85–93. ACM Press, 1998. [7] S. Parthasarathy, M. J. Zaki, M. Ogihara, S. Dwarkadas; Incremental and interactive sequence mining; Int'l Conf. on Information and Knowledge Management; 1999. [8] Helen Pinto, Jiawei Han, Jian Pei, Ke Wang, Qiming Chen, Umeshwar Dayal; Multi-Dimensional Sequential Pattern Mining; Int'l Conf. on Information and Knowledge Management; 2001. [9] Richard Relue, Xindong Wu, Hao Huang; Efficient runtime generation of association rules; Int'l Conf. on Information and Knowledge Management; October 2001. [10] Assaf Schuster, Ran Wolff, and Dan Trock; Distributed Algorithm for Mining Association Rules; IEEE Int'l Conf. on Data Mining; November 2003. [11] Wei-Guang Teng, Ming-Syan Chen, and Philip S. Yu; Resource-Aware Mining with Variable Granularities in Data Streams; SIAM Int'l Conf. on Data Mining; 2004 Dr. D.N.Goswami

D.N. Goswami is Professor and Head in the School of Studies in Computer Science, Jiwaji University, Gwalior. He has done Master in Computer Applications and Ph.D. in Computer Science from Jiwaji University. His Research interests includes Software Quality and Reliability analysis, Adhoc Networks ,Relational Data base Management Systems and Data Mining. He has guided Ph.D. theses in Computer Science and Applications.

Dr.Anshu Chaturvedi

Anshu Chaturvedi Currently working as Lecturer in Department of Computer Applications at Madhav Institute of Technology and Sciences, Gwalior. She has obtained her Ph. D. in 2009. Her research interests include Adhoc Networks, Data Mining. Operating Systems and Security. She is a life member of Computer Society of India She has seven years of experience in the academic field. She has also won Young Scientist Award in 2009. Mr. C.S.Raghuvanshi

C.S.Raghuvanshi is doing Ph.D in computer science from jiwaji university Gwalior under the guidance of Dr. D.N.Goswami and also working as lecturer in GICTS college of professional education Gwalior. He has done M.sc (IT) from Jiwaji University Gwalior and M.Tech(IT) From AAI Deemed University Allahabad. His Research interests includes Data Mining, Adhoc Network,Software Engineering.

IJCSI CALL FOR PAPERS JANUARY 2011 ISSUE

V o l u m e 8 , I s s u e 1

The topics suggested by this issue can be discussed in term of concepts, surveys, state of the art, research, standards, implementations, running experiments, applications, and industrial case studies. Authors are invited to submit complete unpublished papers, which are not under review in any other conference or journal in the following, but not limited to, topic areas. See authors guide for manuscript preparation and submission guidelines. Accepted papers will be published online and indexed by Google Scholar, Cornell’s University Library, DBLP, ScientificCommons, CiteSeerX, Bielefeld Academic Search Engine (BASE), SCIRUS, EBSCO, ProQuest and more. Deadline: 05th December 2010 Notification: 10th January 2011 Revision: 20th January 2011 Online Publication: 31st January 2011 Evolutionary computation Industrial systems Evolutionary computation Autonomic and autonomous systems Bio-technologies Knowledge data systems Mobile and distance education Intelligent techniques, logics, and

systems Knowledge processing Information technologies Internet and web technologies Digital information processing Cognitive science and knowledge

agent-based systems Mobility and multimedia systems Systems performance Networking and telecommunications

Software development and deployment

Knowledge virtualization Systems and networks on the chip Context-aware systems Networking technologies Security in network, systems, and

applications Knowledge for global defense Information Systems [IS] IPv6 Today - Technology and

deployment Modeling Optimization Complexity Natural Language Processing Speech Synthesis Data Mining

For more topics, please see http://www.ijcsi.org/call-for-papers.php All submitted papers will be judged based on their quality by the technical committee and reviewers. Papers that describe on-going research and experimentation are encouraged. All paper submissions will be handled electronically and detailed instructions on submission procedure are available on IJCSI website (www.IJCSI.org). For more information, please visit the journal website (www.IJCSI.org)

© IJCSI PUBLICATION 2010

www.IJCSI.org

IJCSIIJCSI

© IJCSI PUBLICATION www.IJCSI.org

The International Journal of Computer Science Issues (IJCSI) is a well‐established and notable venue

for publishing high quality research papers as recognized by various universities and international

professional bodies. IJCSI is a refereed open access international journal for publishing scientific

papers in all areas of computer science research. The purpose of establishing IJCSI is to provide

assistance in the development of science, fast operative publication and storage of materials and

results of scientific researches and representation of the scientific conception of the society.

It also provides a venue for researchers, students and professionals to submit ongoing research and

developments in these areas. Authors are encouraged to contribute to the journal by submitting

articles that illustrate new research results, projects, surveying works and industrial experiences that

describe significant advances in field of computer science.

Indexing of IJCSI 1. Google Scholar 2. Bielefeld Academic Search Engine (BASE) 3. CiteSeerX 4. SCIRUS 5. Docstoc 6. Scribd 7. Cornell's University Library 8. SciRate 9. ScientificCommons 10. DBLP 11. EBSCO 12. ProQuest

International Journal of Computer Science Issues - CiteSeerX

Documents